14 posts

Tutorial

Every tutorial post on the ExtractFox blog.

TutorialJune 28, 20267 min read

How to decode a QR code from an image or PDF

Decode QR codes from photos, screenshots, and PDFs using pyzbar, OpenCV, and AI. Covers URLs, Wi-Fi configs, vCards, payment requests, and multiple QR codes per image.

TutorialJune 28, 20267 min read

How to extract a clean recipe from a website, video, or image

Pull ingredients and steps from recipe blogs, TikTok, YouTube, and Instagram using schema.org scraping, yt-dlp transcription, and AI extraction — without wading through life stories.

TutorialJune 28, 20267 min read

How to extract and digitize handwritten text

Transcribe handwritten notes, letters, clinical notes, and historical documents using HTR tools, Tesseract with preprocessing, and AI vision models. Includes accuracy expectations by script type.

TutorialJune 28, 20267 min read

How to extract data from a filled form (PDF, scan, or photo)

Pull field values from any filled form — AcroForm PDF fields, printed text, or handwriting — with pypdf, pdfminer, and AI extraction for mixed or handwritten forms.

TutorialJune 28, 20266 min read

How to extract data from a chart or graph (image or PDF)

Reverse-engineer bar, line, pie, and scatter charts back into numbers using WebPlotDigitizer, Python, and AI extraction. Works on screenshots, report PDFs, and dashboard photos.

TutorialJune 28, 20267 min read

How to convert a PDF to Excel

Turn PDF tables and structured data into clean Excel files. Covers Tabula, pdfplumber, pandas, and AI extraction for invoices, statements, and scanned PDFs — with real column headers and numeric values.

TutorialJune 28, 20267 min read

How to convert a PDF to JSON

Turn PDFs into structured JSON for APIs, databases, and AI pipelines. Covers PyMuPDF for raw text, pdfplumber for tables, and AI extraction for any document type with schema output.

TutorialJune 28, 20268 min read

How to extract data from Zillow listings

Three ways to get structured data from Zillow — the Zillow API (Bridge Interactive), Zillow's bulk listing exports, and AI extraction from listing PDFs and screenshots.

TutorialJune 28, 202610 min read

How to extract a table from a PDF with Python

Three Python libraries for PDF table extraction — pdfplumber, Tabula-py, and Camelot — with code, when to use each, and how to handle scanned PDFs where text-based extraction fails.

TutorialApril 30, 20268 min read

Extract data from a pivot table in Excel

Extract data from an Excel PivotTable with Paste Values, GETPIVOTDATA, Show Details, Power Query, or screenshot-to-Excel extraction.

TutorialApril 27, 20269 min read

Extract text from PowerPoint (.pptx): Outline View, python-pptx, speaker notes

Extract all text from a .pptx with Outline View, python-pptx (tables, groups, speaker notes), or OCR for image slides — copy-paste code for automation pipelines.

TutorialApril 23, 20269 min read

Extract video metadata as JSON with FFprobe, MediaInfo, or yt-dlp

Copy-paste ffprobe, MediaInfo, and yt-dlp commands to extract duration, codecs, resolution, bitrate, chapters, and YouTube metadata as JSON — plus batch scripts for whole folders.

TutorialApril 22, 20268 min read

Extract hyperlinks from Excel and Google Sheets (VBA, Apps Script, Python)

Copy-paste VBA, Office Scripts, Google Apps Script, and Python openpyxl to extract the real URL behind Excel and Sheets hyperlinks — including bulk export from .xlsx XML.

TutorialApril 2, 20268 min read

How to extract code from a video tutorial

Extract source code from a programming tutorial video with clean screenshots, a code-aware extractor, or an FFmpeg frame pipeline for longer recordings.