Extract data from any PDF

Drop any PDF — a contract, a form, a report, a statement — and pull the data you want as JSON, CSV, or Excel. No templates to set up, no per-document tuning.

Drop a PDF or image here, or click to browse
Max 20 MB per file · PDF, PNG, JPG, WEBP, HEIC
Pro: drop up to 25 files at once for bulk extraction
What to extract from this pdf?
or describe it yourself
Extracting key fields

Why this matters

Most PDFs were not designed to give up their data. ExtractFox uses a model that reads PDFs the way a person does — including scanned and image-based ones — so you don't need OCR plus a parser plus a regex pile.

How it works

  1. Step 1
    Upload the PDF

    Native PDFs, scanned PDFs, image-based PDFs, multi-page — all fine.

  2. Step 2
    Choose how you want to extract

    Prebuilt schema for common document types, or free-text instruction for everything else.

  3. Step 3
    Get clean JSON or a spreadsheet

    Inspect the result as a table, then export to JSON, CSV, or Excel.

Common use cases

Native PDFs — invoices, contracts, reports built from text
Scanned PDFs — receipts, statements, archived paperwork
Image-based PDFs — phone scans, faxed documents
Multi-page PDFs — long statements, annual reports, contracts
Form PDFs — government forms, applications, intake forms
Mixed PDFs — text pages plus scanned pages in one file

Sample output

Example: free-text extraction from an annual report PDF

Request: "pull total revenue, net income, and EPS for each year shown"

Result:
{
  "metrics_by_year": [
    { "year": 2024, "total_revenue": 12450000000, "net_income": 1820000000, "eps": 4.82 },
    { "year": 2025, "total_revenue": 13980000000, "net_income": 2104000000, "eps": 5.49 },
    { "year": 2026, "total_revenue": 15210000000, "net_income": 2387000000, "eps": 6.12 }
  ]
}

Frequently asked questions

How do I extract data from a PDF?+

Upload the PDF on this page, pick a document type or describe what you want extracted, and click Extract. Download the result as JSON, CSV, or Excel.

What types of PDFs are supported?+

Native PDFs (text-based), scanned PDFs (image-based), and form PDFs all work. Up to 20 MB and up to many pages.

Can I extract specific fields rather than the whole document?+

Yes. In the description box below the document tiles, type exactly what you want — for example, 'just the total and the due date' — and ExtractFox will return only those fields.

How does this compare to traditional PDF extraction libraries like pdfplumber or Tabula?+

pdfplumber and Tabula need clean tables and predictable layouts. ExtractFox understands document structure semantically, so it works on messy real-world PDFs — including scans, mixed layouts, and documents where the data isn't in a tidy grid.

Will extraction preserve the order of items in the original document?+

Yes. Lists, tables, and ordered data come back in the order they appear in the PDF — top to bottom, left to right.

Can I extract data from password-protected PDFs?+

No — remove the password before uploading. We don't store decrypted versions of your files.

How do I extract just the raw text from a PDF (not structured data)?+

Use the PDF-to-text extractor — same engine, but tuned for plain-text output (with Markdown, body-only, headings-only, and table-only modes). Handles both digital and scanned PDFs in one pass.

Related extractors

Used by

Compared to alternatives