How to remove metadata from a PDF (for privacy)
Author, software, GPS, edit history — every PDF leaks more than you think. The reliable ways to strip metadata before sharing, in any tool you already have.
Every PDF you share carries metadata you probably didn't mean to send. Author name from the OS user account. Producer software version. Creation and modification timestamps. Sometimes XMP fields with custom data, GPS from the phone that scanned the page, or revision history from the editor. Before you send a PDF outside your team — to a journalist, a regulator, a public records request, a court — you almost always want to strip it.
What's actually in there
PDF metadata lives in two places. The document information dictionary holds the classic fields: Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate. The XMP packet (modern, XML-based) can hold the same fields plus arbitrary custom schemas — anything from camera EXIF to internal tracking IDs. A complete metadata-removal tool has to clear both.
Beyond metadata, there are other leak surfaces worth knowing about: the PDF's internal object stream may retain text from earlier revisions, embedded fonts can carry the original file path, and image XObjects often carry full EXIF including GPS. Stripping only the document-level metadata leaves all of those in place.
The four reliable methods
- Use Adobe Acrobat's 'Sanitize Document' (Pro only) for a one-click belt-and-braces remove of everything: metadata, hidden text, embedded files, comments. This is the gold standard if you have it.
- Use exiftool from the command line for batch jobs. One command per file or per folder, runs on Linux/Mac/Windows, deterministic.
- Use qpdf to re-write the PDF without its metadata streams. Useful when you also want to linearize or compress the file.
- Print to PDF as a last resort. The 'Save as PDF' or 'Microsoft Print to PDF' workflow drops most metadata and any document history, at the cost of converting all selectable text to a flat re-rendered page.
exiftool: the most useful one to know
exiftool is a Perl script with no real dependencies that ships in every package manager. It reads and writes metadata across hundreds of file formats — including PDF.
# Strip every metadata field from a single PDF exiftool -all= -overwrite_original input.pdf # Strip metadata from every PDF in a folder exiftool -all= -overwrite_original -ext pdf . # Inspect what's there before you strip exiftool input.pdf
The -all= flag clears every metadata tag exiftool can write. -overwrite_original avoids leaving a .pdf_original backup file, which is what you want for a real privacy pass — those backups are how metadata leaks back into a workflow.
qpdf: when you also want to clean up the file
qpdf is a structural PDF tool. It can re-write a file without its metadata streams and clean up unused objects in one pass:
qpdf --linearize --object-streams=disable \ --remove-unreferenced-resources=yes \ input.pdf output.pdf # Then strip metadata exiftool -all= -overwrite_original output.pdf
The combination of qpdf re-writing and exiftool stripping is the most thorough non-Acrobat approach. It removes orphaned objects (which sometimes contain old metadata or embedded files), clears the metadata streams, and produces a smaller file.
What none of these catch
Metadata is not the same thing as redaction. If your PDF contains a black rectangle drawn over text, the text is still in the file — exiftool will not remove it. If your PDF was scanned and contains an OCR layer, the OCR'd text is also in the file. Real redaction requires either Acrobat's redaction tool or rasterizing the affected pages and re-OCR'ing them after stripping.
- Visible-but-recoverable text under black boxes — needs redaction, not metadata removal.
- Embedded fonts with the original file path — qpdf re-write handles this, exiftool alone does not.
- Images inside the PDF carrying their own EXIF and GPS — strip the images separately, or rasterize and re-OCR the page.
- Comments and annotations — Acrobat's Sanitize handles these; exiftool does not touch them.
- Form-field history — same: Acrobat or a re-render handles it.
Verifying the result
Always re-inspect after stripping. The fastest check is to run exiftool again and confirm the only fields that come back are the ones the PDF library re-wrote (typically PDF Version, Page Count, Linearized). If anything else is still there, the strip didn't work.
For the inverse problem — pulling metadata out of a PDF to read what's there before you publish — see the companion post on extracting PDF metadata, or just drop a file into ExtractFox.