Extract data from any PDF
Drop any PDF — a contract, a form, a report, a statement — and pull the data you want as JSON, CSV, or Excel. No templates to set up, no per-document tuning.
Why this matters
Most PDFs were not designed to give up their data. ExtractFox uses a model that reads PDFs the way a person does — including scanned and image-based ones — so you don't need OCR plus a parser plus a regex pile.
How it works
- Step 1Upload the PDF
Native PDFs, scanned PDFs, image-based PDFs, multi-page — all fine.
- Step 2Choose how you want to extract
Prebuilt schema for common document types, or free-text instruction for everything else.
- Step 3Get clean JSON or a spreadsheet
Inspect the result as a table, then export to JSON, CSV, or Excel.
Common use cases
Sample output
Example: free-text extraction from an annual report PDF
Request: "pull total revenue, net income, and EPS for each year shown"
Result:
{
"metrics_by_year": [
{ "year": 2024, "total_revenue": 12450000000, "net_income": 1820000000, "eps": 4.82 },
{ "year": 2025, "total_revenue": 13980000000, "net_income": 2104000000, "eps": 5.49 },
{ "year": 2026, "total_revenue": 15210000000, "net_income": 2387000000, "eps": 6.12 }
]
}Frequently asked questions
How do I extract data from a PDF?+
Upload the PDF on this page, pick a document type or describe what you want extracted, and click Extract. Download the result as JSON, CSV, or Excel.
What types of PDFs are supported?+
Native PDFs (text-based), scanned PDFs (image-based), and form PDFs all work. Up to 20 MB and up to many pages.
Can I extract specific fields rather than the whole document?+
Yes. In the description box below the document tiles, type exactly what you want — for example, 'just the total and the due date' — and ExtractFox will return only those fields.
How does this compare to traditional PDF extraction libraries like pdfplumber or Tabula?+
pdfplumber and Tabula need clean tables and predictable layouts. ExtractFox understands document structure semantically, so it works on messy real-world PDFs — including scans, mixed layouts, and documents where the data isn't in a tidy grid.
Will extraction preserve the order of items in the original document?+
Yes. Lists, tables, and ordered data come back in the order they appear in the PDF — top to bottom, left to right.
Can I extract data from password-protected PDFs?+
No — remove the password before uploading. We don't store decrypted versions of your files.
How do I extract just the raw text from a PDF (not structured data)?+
Use the PDF-to-text extractor — same engine, but tuned for plain-text output (with Markdown, body-only, headings-only, and table-only modes). Handles both digital and scanned PDFs in one pass.