Extract citations from a PDF or text
Pull every citation out of a PDF or pasted text — full bibliographic entries from the references section, in-text markers (Smith 2023; [14]) mapped to their source, and any DOIs or URLs. Export as BibTeX, RIS, CSL JSON, or a flat spreadsheet.
Why this matters
Bibliography management tools (Zotero, Mendeley, EndNote) extract references from PDFs by pattern-matching on font and layout — they break on legal briefs, government reports, and any document that doesn't follow APA/MLA conventions. ExtractFox reads the page semantically, so it picks up citations regardless of style and reliably maps in-text markers to their full reference.
How it works
- Step 1Upload the PDF or paste text
Research papers, legal briefs, reports, or pasted text. Multi-column layouts and footnoted citations both work.
- Step 2Pick a mode
All citations, in-text markers only, references section only, or grouped by section — and choose an export format.
- Step 3Export
BibTeX (.bib), RIS, CSL JSON, or CSV. Drop straight into Zotero, Mendeley, EndNote, or a spreadsheet.
Sample output
Example: 3 of the references from a research paper PDF, exported as CSL JSON
| id | type | title | author | issued | container-title | volume | DOI |
|---|---|---|---|---|---|---|---|
| vaswani2017attention | paper-conference | Attention Is All You Need | [{"family":"Vaswani","given":"Ashish"},{"family":"Shazeer","given":"Noam"},{"family":"Parmar","given":"Niki"}] | {"date-parts":[[2017]]} | Advances in Neural Information Processing Systems | 30 | — |
| devlin2019bert | paper-conference | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | [{"family":"Devlin","given":"Jacob"}] | {"date-parts":[[2019]]} | NAACL-HLT | — | 10.18653/v1/N19-1423 |
Frequently asked questions
How do I extract citations from a PDF?+
Upload the PDF, pick a mode (all citations, in-text mapped, BibTeX, CSL JSON, or legal citations), and export. The result drops straight into Zotero, Mendeley, EndNote, or any reference manager that accepts BibTeX or RIS.
How is this different from Zotero's PDF metadata extraction?+
Zotero's extractor reads metadata stored in the PDF and pattern-matches the references section. It works well on standard APA/MLA papers and breaks on most legal briefs, government reports, and theses. ExtractFox reads the page semantically — it works regardless of citation style.
Will it map in-text markers to their references?+
Yes — pick the In-text markers mapped mode. Each marker comes back with the sentence it appeared in, the page, and the full reference it points to.
Can I get just the legal citations?+
Yes. The Legal citations mode pulls only case, statute, regulation, and treaty citations and returns them in Bluebook-style fields with reporter, volume, page, year, and court.
What about footnoted citations?+
Handled the same way. Citations in footnotes are tagged with the page they appeared on; the full bibliographic version still ends up in the bibliography list.
What citation styles does it recognize?+
APA, MLA, Chicago, IEEE, Vancouver, Bluebook, ACS, AMA, and most journal-specific variants. The mode detects the style and parses fields accordingly.