All posts
Tutorial4 min read

Extract text from PowerPoint files (.pptx): 3 reliable methods

How to extract text from PowerPoint files with Outline View, python-pptx, unzip/XML parsing, and OCR for slides where text is baked into images.

By · Updated

To extract text from a PowerPoint file, start with Outline View or Save As Outline for a one-off export. For code, use python-pptx to read text shapes and tables. If the slide is an image or screenshot, use OCR or image-based extraction instead.

Most .pptx files are easy to extract from — text lives in shapes that any tool can read. The hard cases are slides built as images, screenshots of dashboards, or decks exported to PDF where the original .pptx is gone.

1. PowerPoint's outline view

View → Outline View shows every text box as plain text in slide order. Select all, copy, paste into your destination. This catches title and body text but misses content inside grouped shapes, SmartArt, and image-based text.

Faster variant: File → Save As → Outline (.rtf). You get a clean text file of every text element on every slide.

2. python-pptx for programmatic access

MIT-licensed, handles every text shape including those in groups and tables:

from pptx import Presentation prs = Presentation("deck.pptx") for i, slide in enumerate(prs.slides, 1): for shape in slide.shapes: if shape.has_text_frame: for para in shape.text_frame.paragraphs: print(i, para.text)

Add an extra branch for shape.has_table to walk table cells. For grouped shapes, recurse into shape.shapes when shape.shape_type is GROUP.

3. Slides as images

If the deck is a series of image-only slides (common for branded marketing decks and screenshots-of-dashboards decks), neither of the above works. Two options:

  • Export to PDF, then OCR with ocrmypdf or run through a PDF text extractor.
  • Export each slide as PNG (File → Export → PNG), then run them through ExtractFox's image data extractor with a prompt like "extract all visible text in reading order."

Online .pptx files

If the file is on SharePoint or Google Slides, both support exporting to PDF or .pptx for free. The Google Slides API also exposes presentation content directly via REST — useful for automated pipelines pulling from a shared Drive.

More on tutorial

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →