How to extract embedded files and attachments from a PDF
PDFs can carry attached files — Excel sheets, source data, supporting docs. Acrobat shows them; most other readers don't. Here's how to get them out, on any OS.
PDFs have a feature most readers hide: embedded file attachments. A research paper can ship with the dataset attached, an annual report with the source spreadsheet, an invoice with a packing slip. The attachments are in the file but invisible until you go looking.
Acrobat / Adobe Reader
Open the PDF, click the paperclip icon in the left rail (View → Show/Hide → Navigation Panes → Attachments if it's hidden). Each attachment shows its name and size. Right-click → Save Attachment.
Preview on macOS
Preview doesn't show embedded attachments. The file is still there — you just can't see it. Either open in Acrobat, or extract via the command-line tools below.
Command line: pdfdetach
Comes with poppler-utils (brew install poppler on Mac, apt install poppler-utils on Linux):
pdfdetach -list report.pdf # list attachments pdfdetach -saveall -o out/ report.pdf # extract every attachment to out/
This is the right tool for batch — process a folder of PDFs and dump every attachment in one command.
Python: pypdf or pikepdf
pikepdf is the cleaner API for attachments:
import pikepdf with pikepdf.open("report.pdf") as pdf: for name, attachment in pdf.attachments.items(): with open(name, "wb") as f: f.write(attachment.read_bytes())
Don't confuse attachments with these
- Embedded images — image objects rendered on the page, not separate files. See the post on extracting images from a PDF.
- Embedded fonts — for rendering, not for extraction. Stripped or copied via specific font tools, not pdfdetach.
- Form attachments inside PDF forms — sometimes accessible via the Attachments panel, sometimes only via the form's submit-data interface.
When the data you want is in the PDF body, not the attachment
If the PDF doesn't have attachments and the data you need is in the visible content (tables, forms, text), the attachment route is a dead end. ExtractFox's PDF data extractor handles the visible content — pair it with pdfdetach if you also need the bundled files.