The passport MRZ format, explained (and how to parse it)
TD3, two lines, 44 characters each, four check digits. The ICAO 9303 machine-readable zone, what every field means, and a Python parser that validates it.
The two cryptic-looking lines at the bottom of every passport's photo page are the Machine Readable Zone — the MRZ. They follow ICAO Doc 9303, an international standard, which means a passport from Brazil, Japan, or the UK all parse the same way. Once you know the format, parsing it is a fixed-offset substring problem with four check digits to validate.
TD3: the passport-specific MRZ
Passports use the TD3 format: two lines, 44 characters each, no exceptions. (Travel documents that are smaller — visas, ID cards — use TD1 or TD2, with three lines or two shorter lines. Same idea, different lengths.) Every character is uppercase A–Z, 0–9, or the filler character '<'. No spaces, no lowercase, no punctuation.
Anatomy of line 1
Line 1 carries identity. Positions are 1-indexed:
- 1: Document type — 'P' for passport.
- 2: Document subtype — usually '<' (filler) but some states use a letter for diplomatic or service passports.
- 3–5: Issuing state — ISO 3166-1 alpha-3 (e.g. 'GBR', 'USA', 'IND').
- 6–44: Holder's name — surname, then '<<', then given names, with single '<' between name parts. Filled to 44 characters with trailing '<'.
Example: P<GBRDOE<<JANE<ELIZABETH<<<<<<<<<<<<<<<<<<< → passport, UK, surname DOE, given names JANE ELIZABETH.
Anatomy of line 2
Line 2 carries the document data and check digits. This is the part that catches OCR errors:
- 1–9: Passport number, padded with '<' if shorter than 9 chars.
- 10: Check digit for the passport number.
- 11–13: Nationality (ISO 3166-1 alpha-3).
- 14–19: Date of birth — YYMMDD.
- 20: Check digit for the date of birth.
- 21: Sex — 'M', 'F', or '<' for unspecified.
- 22–27: Date of expiry — YYMMDD.
- 28: Check digit for the date of expiry.
- 29–42: Personal number / optional data, padded with '<'.
- 43: Check digit for the personal number.
- 44: Composite check digit over passport number + DOB + expiry + personal number.
How the check digits work
Each check digit is a single digit (0–9) computed by ICAO's weighting scheme: weights 7, 3, 1 cycling across the input string, sum mod 10. Letters map to 10–35 (A=10, B=11 … Z=35), and '<' maps to 0. The composite check digit at position 44 covers all the dated fields together — it's how a border system spots a single-character OCR error. If line 2's passport-number check digit comes out wrong, you've almost certainly OCR'd a 0 as 'O' or a 1 as 'I'.
def mrz_check_digit(value: str) -> int: weights = [7, 3, 1] total = 0 for i, char in enumerate(value): if char == "<": n = 0 elif char.isdigit(): n = int(char) else: n = ord(char.upper()) - 55 # A=10, B=11, ... total += n * weights[i % 3] return total % 10
Parsing in Python: mrz library
Once you have the two lines as strings, the simplest parser is the mrz library (Konstantin Klepikov's, on PyPI). It validates check digits and returns a typed object:
from mrz.checker.td3 import TD3CodeChecker lines = ( "P<GBRDOE<<JANE<ELIZABETH<<<<<<<<<<<<<<<<<<<", "1234567894GBR9004129F3203018<<<<<<<<<<<<<<06", ) checker = TD3CodeChecker("\n".join(lines)) print(checker.fields()) # {'document_type': 'P', 'country': 'GBR', 'surname': 'DOE', # 'name': 'JANE ELIZABETH', 'document_number': '123456789', # 'birth_date': '900412', 'sex': 'F', 'expiry_date': '320301', ...} print(bool(checker)) # True if all check digits validate
Roll your own only if you have to. The library handles the YYMMDD century roll-over for dates of birth (a passport-holder born in 1990 vs 2090) and the filler-padding edge cases that bite first-time parsers.
Common mistakes when parsing
- OCR'ing the MRZ with a general-purpose OCR. The MRZ uses OCR-B, a font designed for machine reading — but only if your OCR is configured to expect it. Tesseract default settings frequently misread '0' as 'O', '1' as 'I', and '<' as 'c'.
- Treating the YY in dates as 19YY without checking the expiry. ICAO 9303 doesn't pin a century. The reliable convention: birth-year ≤ current YY → 20YY, otherwise 19YY. Do the same for expiry but assume the future.
- Reading the visual fields and ignoring the MRZ check digits. The MRZ exists precisely so you can validate. If the printed surname says SMITH but the MRZ says SMTTH, the check digit will fail — that's your signal to flag the document for review.
- Hardcoding country names. Always store the alpha-3 code from positions 3–5 (line 1) and 11–13 (line 2). They should match. If they don't, the document is dual-national or there's an OCR error.
When OCR + parse is the wrong tool
If you only need the MRZ from one or two passports, OCR + the mrz library is fine. If you're building anything that runs in production — KYC onboarding, hotel check-in, visa applications — you want both the MRZ and the visual fields cross-validated. The visual fields catch OCR errors in the MRZ; the MRZ check digits catch OCR errors in the visual fields. Doing this manually means writing two pipelines and a reconciler.
Or you skip the OCR step entirely and use a multimodal model that reads the page the way a person does — both the printed fields and the MRZ in one pass, with a stable schema.
Further reading
- ICAO Doc 9303 part 4 — the official specification for passport MRZ. Public PDF, dense but authoritative.
- The mrz Python package documentation for TD1 (ID cards), TD2 (older ID cards, some visas), and MRVA/MRVB (visa formats).
- Border-control style guides — most public-facing border agencies publish a short note on which passport variants their systems handle.