Passport MRZ format explained: fields, check digits, and parser code
How to parse passport MRZ data under ICAO 9303: TD3 field positions, check digits, example MRZ lines, Python parser code, and when to extract the full passport instead.
A passport MRZ is the two-line Machine Readable Zone at the bottom of the photo page. In ICAO 9303 TD3 format, each line is exactly 44 characters. Line 1 stores document type, issuing state, and name. Line 2 stores passport number, nationality, birth date, sex, expiry date, optional data, and check digits.
Once you know the format, parsing passport MRZ data is a fixed-offset substring problem with four check digits to validate. The hard part is usually OCR quality, not the parser.
TD3: the passport-specific MRZ
Passports use the TD3 format: two lines, 44 characters each, no exceptions. (Travel documents that are smaller — visas, ID cards — use TD1 or TD2, with three lines or two shorter lines. Same idea, different lengths.) Every character is uppercase A–Z, 0–9, or the filler character '<'. No spaces, no lowercase, no punctuation.
Passport MRZ field map
| Line | Positions | Field | Example |
|---|---|---|---|
| 1 | 1 | Document type | P |
| 1 | 3-5 | Issuing state | UTO |
| 1 | 6-44 | Surname and given names | ERIKSSON<<ANNA<MARIA |
| 2 | 1-9 | Passport number | L898902C3 |
| 2 | 10 | Passport-number check digit | 6 |
| 2 | 11-13 | Nationality | UTO |
| 2 | 14-19 | Date of birth | 740812 |
| 2 | 22-27 | Date of expiry | 120415 |
| 2 | 44 | Composite check digit | 0 |
Anatomy of line 1
Line 1 carries identity. Positions are 1-indexed:
- 1: Document type — 'P' for passport.
- 2: Document subtype — usually '<' (filler) but some states use a letter for diplomatic or service passports.
- 3–5: Issuing state — ISO 3166-1 alpha-3 (e.g. 'GBR', 'USA', 'IND').
- 6–44: Holder's name — surname, then '<<', then given names, with single '<' between name parts. Filled to 44 characters with trailing '<'.
Example: P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< -> passport, issuing state UTO, surname ERIKSSON, given names ANNA MARIA.
Anatomy of line 2
Line 2 carries the document data and check digits. This is the part that catches OCR errors:
- 1–9: Passport number, padded with '<' if shorter than 9 chars.
- 10: Check digit for the passport number.
- 11–13: Nationality (ISO 3166-1 alpha-3).
- 14–19: Date of birth — YYMMDD.
- 20: Check digit for the date of birth.
- 21: Sex — 'M', 'F', or '<' for unspecified.
- 22–27: Date of expiry — YYMMDD.
- 28: Check digit for the date of expiry.
- 29–42: Personal number / optional data, padded with '<'.
- 43: Check digit for the personal number.
- 44: Composite check digit over passport number + DOB + expiry + personal number.
MRZ check digit calculator
The MRZ check digit is calculated with repeating weights 7, 3, and 1. Digits keep their numeric value, letters map from A=10 through Z=35, and the filler character '<' maps to 0. Multiply each character value by its weight, sum the results, and take the total modulo 10.
For example, passport number L898902C3 produces check digit 6 by the raw formula. If the MRZ line contains a different digit, assume an OCR error or a misread field boundary before trusting the parsed result.
How the check digits work
Each check digit is a single digit (0–9) computed by ICAO's weighting scheme: weights 7, 3, 1 cycling across the input string, sum mod 10. Letters map to 10–35 (A=10, B=11 … Z=35), and '<' maps to 0. The composite check digit at position 44 covers all the dated fields together — it's how a border system spots a single-character OCR error. If line 2's passport-number check digit comes out wrong, you've almost certainly OCR'd a 0 as 'O' or a 1 as 'I'.
def mrz_check_digit(value: str) -> int: weights = [7, 3, 1] total = 0 for i, char in enumerate(value): if char == "<": n = 0 elif char.isdigit(): n = int(char) else: n = ord(char.upper()) - 55 # A=10, B=11, ... total += n * weights[i % 3] return total % 10
Parsing in Python: mrz library
Once you have the two lines as strings, the simplest parser is the mrz library (Konstantin Klepikov's, on PyPI). It validates check digits and returns a typed object:
from mrz.checker.td3 import TD3CodeChecker lines = ( "P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<", "L898902C36UTO7408122F1204159ZE184226B<<<<<10", ) checker = TD3CodeChecker("\n".join(lines)) print(checker.fields()) # {'document_type': 'P', 'country': 'UTO', 'surname': 'ERIKSSON', # 'name': 'ANNA MARIA', 'document_number': 'L898902C3', # 'birth_date': '740812', 'sex': 'F', 'expiry_date': '120415', ...} print(bool(checker)) # True if all check digits validate
Roll your own only if you have to. The library handles the YYMMDD century roll-over for dates of birth (a passport-holder born in 1990 vs 2090) and the filler-padding edge cases that bite first-time parsers.
MRZ parser output
A useful MRZ parser should return both the raw lines and normalized fields: document_type, issuing_country, surname, given_names, passport_number, nationality, date_of_birth, sex, date_of_expiry, optional_data, and validation flags for each check digit. Keep the raw MRZ in your database so you can audit OCR mistakes later.
Common mistakes when parsing
- OCR'ing the MRZ with a general-purpose OCR. The MRZ uses OCR-B, a font designed for machine reading — but only if your OCR is configured to expect it. Tesseract default settings frequently misread '0' as 'O', '1' as 'I', and '<' as 'c'.
- Treating the YY in dates as 19YY without checking the expiry. ICAO 9303 doesn't pin a century. The reliable convention: birth-year ≤ current YY → 20YY, otherwise 19YY. Do the same for expiry but assume the future.
- Reading the visual fields and ignoring the MRZ check digits. The MRZ exists precisely so you can validate. If the printed surname says SMITH but the MRZ says SMTTH, the check digit will fail — that's your signal to flag the document for review.
- Hardcoding country names. Always store the alpha-3 code from positions 3–5 (line 1) and 11–13 (line 2). They should match. If they don't, the document is dual-national or there's an OCR error.
When OCR + parse is the wrong tool
If you only need the MRZ from one or two passports, OCR + the mrz library is fine. If you're building anything that runs in production — KYC onboarding, hotel check-in, visa applications — you want both the MRZ and the visual fields cross-validated. The visual fields catch OCR errors in the MRZ; the MRZ check digits catch OCR errors in the visual fields. Doing this manually means writing two pipelines and a reconciler.
Or you skip the OCR step entirely and use a multimodal model that reads the page the way a person does — both the printed fields and the MRZ in one pass, with a stable schema.
Further reading
- ICAO Doc 9303 part 4 — the official specification for passport MRZ. Public PDF, dense but authoritative.
- The mrz Python package documentation for TD1 (ID cards), TD2 (older ID cards, some visas), and MRVA/MRVB (visa formats).
- Border-control style guides — most public-facing border agencies publish a short note on which passport variants their systems handle.
Frequently asked questions
What is a passport MRZ?+
MRZ stands for Machine Readable Zone. In a TD3 passport it is the two-line section at the bottom of the photo page, each line exactly 44 characters. Line 1 contains document type, issuing state, and name. Line 2 contains passport number, nationality, birth date, sex, expiry, and check digits.
How do I read or parse a passport MRZ?+
Each field occupies fixed character positions defined by ICAO 9303. In Python, extract substrings by position: surname and given names are at positions 6–44 of line 1 (separated by '<<'), passport number at positions 1–9 of line 2, date of birth at 14–19, sex at 21, expiry at 22–27. Check digits use a weighted modulo-10 algorithm.
What do the check digits in an MRZ verify?+
Check digits use a modulo-10 checksum with character weights 7, 3, 1 repeating. They verify the passport number (digit at position 10 of line 2), date of birth (position 20), expiry date (position 28), optional data (position 43), and the composite of all five fields (position 44).
What is the difference between TD1, TD2, and TD3 MRZ formats?+
TD1 is the 3-line 30-character format used on ID cards. TD2 is the 2-line 36-character format used on some official travel documents. TD3 is the standard 2-line 44-character format used on all modern passports.
Can I extract MRZ data from a passport photo automatically?+
Yes. ExtractFox's passport extractor reads the MRZ from a photo or scan and returns all fields — surname, given names, passport number, nationality, date of birth, sex, expiry date — as structured JSON without manual parsing.