All posts
TutorialApril 19, 20263 min read

How to extract images from a Word document

The fastest way to pull every image out of a .docx file at original resolution — using nothing more than the file extension trick that works in any unzip tool.

By Dawid Sibinski

A .docx file is secretly a ZIP archive. Inside it, every image you've inserted into the document lives in a media folder at original resolution. You don't need any special software to get them out.

The .zip rename trick

  1. Make a copy of report.docx so you don't accidentally corrupt the original.
  2. Rename the copy to report.zip.
  3. Unzip it.
  4. Open the word/media/ folder. Every embedded image is there as image1.png, image2.jpeg, etc., at original quality.

Works on Windows, macOS, and Linux without any extra tools. Same trick works on .pptx (ppt/media/) and .xlsx (xl/media/).

From inside Word

Right-click any image → Save as Picture. Right format options, sometimes downsampled depending on Word's image compression settings. Fine for one or two; tedious for many.

Word's File → Save As → Web Page also dumps every image to a folder, but applies its own compression. The .zip rename trick gives you the original bytes; this gives you a re-encoded version.

Programmatic: python-docx

from docx import Document doc = Document("report.docx") for i, rel in enumerate(doc.part.related_parts.values()): if "image" in rel.content_type: with open(f"img_{i}.{rel.content_type.split('/')[-1]}", "wb") as f: f.write(rel.blob)

Useful when you're processing hundreds of documents and want to drop the images straight into a database or DAM.

The .doc (legacy) edge case

Old .doc files (Word 97–2003 binary format) aren't ZIP archives. The .zip trick fails. Either open in Word and save-as .docx first, or use Apache Tika or LibreOffice's headless mode to convert. Once it's .docx, the standard methods work.

More on tutorial

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →