Extract text from any PDF

Drop a PDF — digital, scanned, image-based, or a mix — and get clean text back. ExtractFox uses Google Gemini to read both kinds of PDFs end-to-end, so you don't need separate tools for digital text vs scanned-image OCR.

Drop a PDF or image here, or click to browse
Max 20 MB per file · PDF, PNG, JPG, WEBP, HEIC
Pro: drop up to 25 files at once for bulk extraction
What to extract from this pdf?
or describe it yourself
Extracting all text

Why this matters

Most 'PDF to text' tools handle one case well: either they pull text out of digital PDFs (where it's already encoded) or they OCR scanned PDFs (and lose layout). ExtractFox handles both in a single pass — and on scanned PDFs the AI reading is dramatically more accurate than legacy OCR engines like Tesseract or older versions of Acrobat.

How it works

  1. Step 1
    Upload the PDF

    Native digital PDFs, scanned PDFs, image-based PDFs, multi-page — all in one tool. Up to 20 MB.

  2. Step 2
    Pick what to extract

    All text (default), or specific slices: just the body, just the headers, only one page range, only a specific section.

  3. Step 3
    Copy or download

    .txt, .md (with detected headings), or structured JSON if you asked for fielded extraction.

Sample output

Example: text from the first 2 pages of a scanned report PDF

textAnnual Report 2026 Acme Corporation --- Page 1 --- To our shareholders, Fiscal year 2026 was a year of strong growth and operational improvement. Revenue grew 8.8% year-over-year to $15.21 billion, driven by continued expansion in our enterprise segment and the launch of three new product lines. --- Page 2 --- Financial highlights Total revenue: $15.21B (+8.8% YoY) Operating income: $2.87B (+13.0% YoY) Diluted EPS: $6.12 (+11.5% YoY) Cash from operations: $3.42B

Frequently asked questions

How do I extract text from a PDF?+

Upload the PDF here and pick a mode (all text, Markdown, body only, etc.). Click Extract and download as .txt, .md, or JSON.

Does it work on scanned PDFs?+

Yes — and that's where it matters most. Scanned PDFs need real OCR; ExtractFox uses Google Gemini's vision model, which is dramatically more accurate than legacy OCR engines on real-world scans.

How is this different from pdf.js or pdfplumber?+

pdf.js and pdfplumber pull text out of digital PDFs by reading the embedded text layer. They don't do OCR — on scanned PDFs they return nothing useful. ExtractFox handles both digital and scanned PDFs in one pass, with the same quality.

Can I extract text from a specific page range?+

Yes. Type your range in the description box — e.g. 'extract text from pages 3 to 7'. The model returns only those pages.

What about multi-column layouts and tables?+

Multi-column reading order is detected automatically — text comes through in natural left-to-right, top-to-bottom order per column. For tables specifically, use the Tables only mode to get structured rows instead of flowing text.

Will the formatting be preserved?+

Plain-text mode preserves line breaks and paragraph spacing. Markdown mode preserves headings, lists, and tables. For full visual formatting (fonts, colors, exact positions), no text-extraction tool can preserve those — you need a converter to a format like .docx.

How is this different from the PDF data extractor?+

The data extractor returns structured fields (invoice line items, contract clauses, etc.). The text extractor returns the words themselves as plain text. Pick text extraction when you want the document's content; pick data extraction when you want specific values out of it.

Related extractors

Compared to alternatives