Convert PDF to JSON
Drop a PDF, describe what you want, and get clean structured JSON. The output is API-friendly: numbers as numbers, dates as ISO strings, arrays for lists. No flat text, no positional fragments.
Why this matters
Most PDF-to-JSON converters return a tree of text positions and font info — useless to anyone who actually wants the data. ExtractFox returns the data itself: fields the way you'd model them in your own application, ready to insert into a database or pipe to an LLM.
How it works
- Step 1Upload your PDF
Native, scanned, image-based, multi-page — all work.
- Step 2Pick a template or describe the schema
Use a prebuilt schema (invoice, statement, contract) or describe the fields in plain English. ExtractFox infers the JSON shape and returns it stably.
- Step 3Download or POST to your API
Download as .json, or hit the REST API to wire extraction into your backend.
Common use cases
Sample output
Example: invoice extracted as JSON
| vendor | Acme Supplies Ltd. |
| invoice_number | INV-00284 |
| issue_date | 2026-04-12 |
| due_date | 2026-05-12 |
| currency | USD |
| subtotal | 232 |
| tax | 46.4 |
| total | 278.4 |
| description | quantity | unit_price | amount |
|---|---|---|---|
| A4 paper, 80gsm, 500 sheets | 12 | 4.5 | 54 |
| Toner cartridge, black | 2 | 89 | 178 |
Frequently asked questions
How do I extract structured JSON from a PDF?+
Upload the PDF here, choose a template or describe the schema you want, and download the JSON. Numbers come through as numbers, dates as ISO strings, lists as arrays.
Is the JSON schema stable across runs?+
Yes. Within a given template (or a given description), field names and types are stable, so you can write code against the output and trust it across documents.
Can I provide my own schema?+
On the paid plan, you can POST a JSON Schema (or Zod schema in TypeScript) along with the PDF, and the response will conform to it.
How is this different from pdf.js or pdfplumber JSON output?+
Those libraries return positional text and layout metadata — useful if you're building a viewer, useless if you want the data. ExtractFox returns the data itself, in the shape you'd model it in your application.
Can I get one JSON object per PDF, or one row per record?+
Both. By default you get one object per PDF, with arrays inside for repeating records. For batch processing many PDFs (one row per file), wrap multiple calls and concatenate the results.