Convert PDF to JSON

Drop a PDF, describe what you want, and get clean structured JSON. The output is API-friendly: numbers as numbers, dates as ISO strings, arrays for lists. No flat text, no positional fragments.

Drop a PDF or image here, or click to browse
Max 20 MB per file · PDF, PNG, JPG, WEBP, HEIC
Pro: drop up to 25 files at once for bulk extraction
What to extract from this pdf?
or describe it yourself
Extracting structured fields

Why this matters

Most PDF-to-JSON converters return a tree of text positions and font info — useless to anyone who actually wants the data. ExtractFox returns the data itself: fields the way you'd model them in your own application, ready to insert into a database or pipe to an LLM.

How it works

  1. Step 1
    Upload your PDF

    Native, scanned, image-based, multi-page — all work.

  2. Step 2
    Pick a template or describe the schema

    Use a prebuilt schema (invoice, statement, contract) or describe the fields in plain English. ExtractFox infers the JSON shape and returns it stably.

  3. Step 3
    Download or POST to your API

    Download as .json, or hit the REST API to wire extraction into your backend.

Common use cases

Document data into a database — invoices, contracts, statements as rows
AI pipeline input — feed structured doc data to a downstream LLM
API integration — extract once, push to many systems
Knowledge graph construction — entities and relations from documents
Audit trails — store the structured extraction alongside the original PDF

Sample output

Example: invoice extracted as JSON

vendorAcme Supplies Ltd.
invoice_numberINV-00284
issue_date2026-04-12
due_date2026-05-12
currencyUSD
subtotal232
tax46.4
total278.4
line_items
descriptionquantityunit_priceamount
A4 paper, 80gsm, 500 sheets124.554
Toner cartridge, black289178

Frequently asked questions

How do I extract structured JSON from a PDF?+

Upload the PDF here, choose a template or describe the schema you want, and download the JSON. Numbers come through as numbers, dates as ISO strings, lists as arrays.

Is the JSON schema stable across runs?+

Yes. Within a given template (or a given description), field names and types are stable, so you can write code against the output and trust it across documents.

Can I provide my own schema?+

On the paid plan, you can POST a JSON Schema (or Zod schema in TypeScript) along with the PDF, and the response will conform to it.

How is this different from pdf.js or pdfplumber JSON output?+

Those libraries return positional text and layout metadata — useful if you're building a viewer, useless if you want the data. ExtractFox returns the data itself, in the shape you'd model it in your application.

Can I get one JSON object per PDF, or one row per record?+

Both. By default you get one object per PDF, with arrays inside for repeating records. For batch processing many PDFs (one row per file), wrap multiple calls and concatenate the results.

Related extractors

Compared to alternatives