All posts
EngineeringApril 22, 20265 min read

The end of templates: how AI extraction actually works

Per-supplier templates were the only way to extract structured data from PDFs for two decades. Multimodal models change the shape of the problem.

By Dawid Sibinski

If you've ever built a PDF parser, you know the routine. A new supplier sends an invoice in a layout you've never seen. You write a regex, then another. You map fields by absolute coordinates. Six months later, the supplier tweaks their header and the parser silently breaks.

This was the only way to do it. OCR gave you raw text, layout-unaware. Anything structured had to be reconstructed by hand. Every new document type was a project. Every layout change was a regression.

What changed

Multimodal models read documents the way a person does. They see the page — header, body, line items, totals — not a stream of OCR tokens. Given a JSON schema, they fill it in. Given a freeform request, they figure out the structure themselves.

The economic implication is bigger than the technical one. Per-supplier templates were the unit of cost. Removing them removes the reason most data-extraction projects existed.

Where templates still help

Strict validation. Audit trails. When you need to prove a specific field came from a specific region of a specific page. For most workflows — accountants closing the month, recruiters parsing CVs, real-estate teams abstracting leases — the model is enough.

What we're betting on

ExtractFox starts with the model and adds the things it can't do well: schema design, output formatting, human review when confidence drops. Every improvement to the underlying model improves every extractor we ship, without us writing more parser code.

More on engineering

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →