Convert PDF to JSON

Drop a PDF, describe what you want, and get clean structured JSON. The output is API-friendly: numbers as numbers, dates as ISO strings, arrays for lists. No flat text, no positional fragments.

Drop a PDF or image here, or click to browse

Max 20 MB per file · PDF, PNG, JPG, WEBP, HEIC

Pro: drop up to 25 files at once for bulk extraction

What to extract from this pdf?

or describe it yourself

Extracting structured fields

Why this matters

Most PDF-to-JSON converters return a tree of text positions and font info — useless to anyone who actually wants the data. ExtractFox returns the data itself: fields the way you'd model them in your own application, ready to insert into a database or pipe to an LLM.

How it works

Step 1
Upload your PDF
Native, scanned, image-based, multi-page — all work.
Step 2
Pick a template or describe the schema
Use a prebuilt schema (invoice, statement, contract) or describe the fields in plain English. ExtractFox infers the JSON shape and returns it stably.
Step 3
Download or POST to your API
Download as .json, or hit the REST API to wire extraction into your backend.

Common use cases

Document data into a database — invoices, contracts, statements as rows

AI pipeline input — feed structured doc data to a downstream LLM

API integration — extract once, push to many systems

Knowledge graph construction — entities and relations from documents

Audit trails — store the structured extraction alongside the original PDF

Sample output

Example: invoice extracted as JSON

vendor	Acme Supplies Ltd.
invoice_number	INV-00284
issue_date	2026-04-12
due_date	2026-05-12
currency	USD
subtotal	232
tax	46.4
total	278.4

line_items

description	quantity	unit_price	amount
A4 paper, 80gsm, 500 sheets	12	4.5	54
Toner cartridge, black	2	89	178

Frequently asked questions

How do I extract structured JSON from a PDF?+

Upload the PDF here, choose a template or describe the schema you want, and download the JSON. Numbers come through as numbers, dates as ISO strings, lists as arrays.

Is the JSON schema stable across runs?+

Yes. Within a given template (or a given description), field names and types are stable, so you can write code against the output and trust it across documents.

Can I provide my own schema?+

On the paid plan, you can POST a JSON Schema (or Zod schema in TypeScript) along with the PDF, and the response will conform to it.

How is this different from pdf.js or pdfplumber JSON output?+

Those libraries return positional text and layout metadata — useful if you're building a viewer, useless if you want the data. ExtractFox returns the data itself, in the shape you'd model it in your application.

Can I get one JSON object per PDF, or one row per record?+

Both. By default you get one object per PDF, with arrays inside for repeating records. For batch processing many PDFs (one row per file), wrap multiple calls and concatenate the results.