Extract links from a PDF

Drop a PDF and get every link out — clickable annotations, footnote URLs, and visible plain-text URLs. Returns a flat list with the page each link came from, ready to copy into a spreadsheet or feed into a checker.

Drop a PDF or image here, or browse

PDF or image · up to 20 MB

Processed in-flight — never stored on our servers.

What should we pull from this pdf?

Or pick specific fields

Or describe it yourself

Why this matters

Most PDF tools either dump the whole text (and you regex for URLs) or only see the clickable annotations (and miss the plain-text ones). ExtractFox reads the PDF the way a person does, so visible URLs and clickable links both come back, deduped, with the page number where each one appeared. SEO audits of a folder of whitepapers mean opening each file and cmd-clicking every link — or regexing broken URLs from a text dump that also captures footnote numbers and DOI fragments. Broken-link checkers wired to annotation metadata miss URLs someone typed in body text or footnotes that readers actually copy-paste. Compliance teams reviewing vendor security PDFs need every external domain catalogued before approving the document, and that's tedious when links span 80 pages of SOC-2 evidence.

How it works

Step 1
Upload the PDF
Digital, scanned, or image-based PDFs all work. Up to 20 MB.
Step 2
Pick what to pull
All links (default), only external URLs, only email addresses, only links inside footnotes, or links per page.
Step 3
Export
Copy the list, or download as CSV / JSON. Each row carries the URL, the link text (if any), and the page number.

Common use cases

SEO link audit — extract every URL from a folder of PDF whitepapers and run them through a broken-link checker

Security review — catalog external domains referenced in vendor SOC-2 reports and policy PDFs

Research bibliography cleanup — pull citation URLs from academic PDFs into a spreadsheet for Zotero import

Email harvesting — collect contact emails from conference proceedings and vendor brochures

Sample output

Example: links from a 4-page research paper PDF

links

url	link_text	page_number	type
https://arxiv.org/abs/2401.12345	arxiv.org/abs/2401.12345	1	plaintext
https://github.com/acme/research	github.com/acme/research	1	clickable
mailto:author@university.edu	author@university.edu	1	clickable
https://doi.org/10.1145/3456789.3456790	[14]	4	clickable
https://example.org/dataset	example.org/dataset	4	plaintext

Frequently asked questions

How do I extract links from a PDF?+

Upload the PDF here, pick a mode (all links, external only, emails, footnote links, or grouped by page), and click Extract. Download as CSV or JSON.

Does it find plain-text URLs, or only clickable ones?+

Both. Most PDF link tools only read the clickable annotations baked into the file. ExtractFox reads the visible text too, so URLs that someone typed in but never made clickable still come through.

Will it work on a scanned PDF?+

Yes. The model reads the page visually, so URLs in scanned documents are extracted the same way as URLs in digital PDFs. OCR happens automatically.

Can I get just the email addresses?+

Yes — pick the Email addresses only mode. It returns every email from both mailto: links and plain-text addresses, with the surrounding context.

How is this different from the PDF text extractor?+

The text extractor returns the document's full text. This one returns just the URLs (and their page numbers and link text), already deduped and structured.

What's the largest PDF this handles?+

20 MB per upload. For larger PDFs, split first or use the API which supports streaming larger files.

Does it deduplicate URLs that appear on multiple pages?+

By default, identical URLs are deduplicated but you keep one row per page they appear on — so you know every location. Use the Grouped by page mode if you want a per-page inventory instead of a flat list.

Can I filter out internal links and file:// paths?+

Yes — pick the External URLs only mode. It skips mailto: links, internal document anchors (#section-3), file:// paths, and footnote markers that aren't real web URLs.

Related extractors

Extract text from any PDF

Free PDF-to-text converter: pull clean text from any PDF — digital, scanned, or image-based. One tool for both, no quality compromise. No signup.

Extract data from any PDF

Free PDF data extractor: pull structured data from any PDF — invoices, contracts, forms, reports, statements. Export to Excel, CSV, or JSON. No signup.

Extract data from any website

Free website data extractor: paste a URL and get structured data as Excel, CSV, or JSON — no selectors to write. No signup.

From the blog

TutorialApril 22, 2026

Extract hyperlinks from Excel and Google Sheets (VBA, Apps Script, Python)

Copy-paste VBA, Office Scripts, Google Apps Script, and Python openpyxl to extract the real URL behind Excel and Sheets hyperlinks — including bulk export from .xlsx XML.