All posts
TutorialApril 14, 20264 min read

How to extract zip and postal codes from addresses (Excel, Sheets, Python)

Three reliable patterns for pulling postal codes out of free-text addresses — Excel formulas for clean US data, regex for international, libpostal when nothing else works.

By Dawid Sibinski

Pulling a US zip code out of a clean address is a one-line formula. Pulling postal codes out of international addresses where the format varies by country is a different problem altogether.

Excel: US zip codes (clean addresses)

If your addresses end in a 5-digit zip (or zip+4):

=RIGHT(A1, 5) // last 5 chars =RIGHT(A1, 10) // last 10 chars (catches zip+4) =TEXTAFTER(A1, " ", -1) // last whitespace-delimited token (Excel 365)

TEXTAFTER with -1 is the cleanest in modern Excel — it returns whatever follows the last space, which is almost always the postal code in US format.

Excel: regex for messier data

Excel 365 added REGEXEXTRACT in 2024:

=REGEXEXTRACT(A1, "\b\d{5}(?:-\d{4})?\b")

Catches "94103" or "94103-1234" anywhere in the cell. For UK postcodes:

=REGEXEXTRACT(A1, "\b[A-Z]{1,2}\d{1,2}[A-Z]?\s*\d[A-Z]{2}\b")

Google Sheets

REGEXEXTRACT has been in Sheets for years with the same syntax. For US zip:

=REGEXEXTRACT(A1, "\d{5}(-\d{4})?")

Python: regex by country

import re US_ZIP = re.compile(r"\b\d{5}(?:-\d{4})?\b") UK_PC = re.compile(r"\b[A-Z]{1,2}\d{1,2}[A-Z]?\s*\d[A-Z]{2}\b", re.I) CA_PC = re.compile(r"\b[A-Z]\d[A-Z]\s*\d[A-Z]\d\b", re.I) def extract_postal(address: str, country: str = "US"): p = {"US": US_ZIP, "UK": UK_PC, "CA": CA_PC}[country] m = p.search(address) return m.group(0) if m else None

Country-specific regex is fast and reliable when you know the country up front. Falls apart when you don't.

Multi-country: libpostal

If country isn't known and addresses are international:

from postal.parser import parse_address parse_address("221B Baker Street, London NW1 6XE, UK") # [('221b', 'house_number'), ('baker street', 'road'), # ('london', 'city'), ('nw1 6xe', 'postcode'), ('uk', 'country')]

libpostal handles 200+ countries' postal formats. Trained on OpenStreetMap. Heavyweight install (compiles C) but worth it for any serious international address work.

Common pitfalls

  • Excel strips leading zeros — "03062" becomes "3062" if the column is formatted as Number. Format as Text first.
  • 9-digit zips written without hyphens look like phone-extension prefixes. The regex \d{5}(?:-?\d{4})? handles both forms.
  • Some countries have no postal codes (Ireland used to, mostly). Your extractor should return null, not crash.

More on tutorial

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →