Extract links from a PDF
Drop a PDF and get every link out — clickable annotations, footnote URLs, and visible plain-text URLs. Returns a flat list with the page each link came from, ready to copy into a spreadsheet or feed into a checker.
Why this matters
Most PDF tools either dump the whole text (and you regex for URLs) or only see the clickable annotations (and miss the plain-text ones). ExtractFox reads the PDF the way a person does, so visible URLs and clickable links both come back, deduped, with the page number where each one appeared.
How it works
- Step 1Upload the PDF
Digital, scanned, or image-based PDFs all work. Up to 20 MB.
- Step 2Pick what to pull
All links (default), only external URLs, only email addresses, only links inside footnotes, or links per page.
- Step 3Export
Copy the list, or download as CSV / JSON. Each row carries the URL, the link text (if any), and the page number.
Sample output
Example: links from a 4-page research paper PDF
| url | link_text | page_number | type |
|---|---|---|---|
| https://arxiv.org/abs/2401.12345 | arxiv.org/abs/2401.12345 | 1 | plaintext |
| https://github.com/acme/research | github.com/acme/research | 1 | clickable |
| mailto:author@university.edu | author@university.edu | 1 | clickable |
| https://doi.org/10.1145/3456789.3456790 | [14] | 4 | clickable |
| https://example.org/dataset | example.org/dataset | 4 | plaintext |
Frequently asked questions
How do I extract links from a PDF?+
Upload the PDF here, pick a mode (all links, external only, emails, footnote links, or grouped by page), and click Extract. Download as CSV or JSON.
Does it find plain-text URLs, or only clickable ones?+
Both. Most PDF link tools only read the clickable annotations baked into the file. ExtractFox reads the visible text too, so URLs that someone typed in but never made clickable still come through.
Will it work on a scanned PDF?+
Yes. The model reads the page visually, so URLs in scanned documents are extracted the same way as URLs in digital PDFs. OCR happens automatically.
Can I get just the email addresses?+
Yes — pick the Email addresses only mode. It returns every email from both mailto: links and plain-text addresses, with the surrounding context.
How is this different from the PDF text extractor?+
The text extractor returns the document's full text. This one returns just the URLs (and their page numbers and link text), already deduped and structured.
What's the largest PDF this handles?+
20 MB per upload. For larger PDFs, split first or use the API which supports streaming larger files.