What the Public Suffix List is, and why your URL parser needs it
"foo.co.uk" is one domain. Naive URL parsing splits it wrong. The Public Suffix List explained — what it is, where to get it, and the libraries that use it correctly.
Most URL-parsing bugs come down to the same mistake: treating the last two dotted parts of a hostname as 'domain.tld' and everything before as the subdomain. That's wrong for any country-code TLD with a multi-part suffix — co.uk, com.au, com.br, ne.jp — which is roughly 25% of the public-internet hostnames you'll encounter. The fix is the Public Suffix List.
What it is
The Public Suffix List (PSL) is a community-maintained list of every domain suffix under which the public can register names. It's hosted at publicsuffix.org by Mozilla and updated continuously. The list has two sections: ICANN-recognized suffixes (.com, .uk, .co.uk, etc.) and a private section for vendor-controlled suffixes where users can register subdomains (github.io, vercel.app, herokuapp.com).
Without the PSL, a URL parser cannot reliably tell where 'a domain' ends. With it, the rule is simple: the registrable domain is the suffix on the list, plus exactly one label to the left of it.
What goes wrong without it
Naive parsing of "https://shop.bbc.co.uk/cart" treats "co.uk" as the domain and "shop.bbc" as the subdomain. Cookies set on the naive 'domain' would scope to every .co.uk site — a real security bug, which is why every browser ships with the PSL embedded. The same logic applies to:
- Cookie scoping — Set-Cookie should not be allowed to set on a public suffix.
- Same-origin checks for embedded content and OAuth redirect validation.
- Email domain matching for SSO — "@bbc.co.uk" and "@news.bbc.co.uk" share an organization; "@something-else.co.uk" does not.
- Spam and phishing detection — phishing domains often abuse subdomain confusion (paypal-login.co.uk vs login.paypal.co.uk).
- Analytics — counting unique sites by registrable domain, not by raw hostname.
How to use it: the three steps
- Fetch the latest PSL — either at build time (bundled), at runtime with periodic refresh, or via a library that caches it for you. The list is small (a few hundred KB).
- Match the hostname against the list, finding the longest suffix that matches.
- The registrable domain is that suffix plus one label. Anything to the left of that is the subdomain.
Library support, by language
Python: tldextract
import tldextract ext = tldextract.extract("https://shop.bbc.co.uk/cart") ext.subdomain # 'shop' ext.domain # 'bbc' ext.suffix # 'co.uk' ext.registered_domain # 'bbc.co.uk'
tldextract caches the PSL locally and refreshes on demand. Use the include_psl_private_domains=True flag if you want to treat github.io as a public suffix (and therefore each user's *.github.io as a separate registrable domain).
JavaScript / Node: tldts
import { parse } from "tldts"; const { domain, subdomain, publicSuffix } = parse("https://shop.bbc.co.uk"); // domain: 'bbc.co.uk', subdomain: 'shop', publicSuffix: 'co.uk'
tldts ships the PSL inline and is dependency-free. For the browser, it's the right pick because it doesn't need a runtime fetch.
Go: golang.org/x/net/publicsuffix
import "golang.org/x/net/publicsuffix" etld1, _ := publicsuffix.EffectiveTLDPlusOne("shop.bbc.co.uk") // etld1 == "bbc.co.uk"
The Go x/net package is what most production Go services use. It's a generated lookup table — fast and zero-allocation per call.
Rust, Ruby, others
Rust has publicsuffix on crates.io. Ruby has the public_suffix gem (used by Rails for cookie domain handling). Java has guava's InternetDomainName. Every reasonable language ecosystem has at least one good binding — there is no good reason to roll your own.
ICANN vs private suffixes — the part that catches people
The PSL has two sections. ICANN suffixes are real public TLDs and country-codes (.com, .co.uk, .com.au). Private suffixes are vendor-managed (github.io, vercel.app, herokuapp.com, blogspot.com). Treat them the same for cookie isolation; treat them differently for organizational ownership.
An example: alice.github.io and bob.github.io are different registrable domains for the purpose of cookies and same-origin policy (a security boundary). But they're both on github.io — the organizational owner is GitHub, not Alice or Bob. Most libraries default to including private suffixes; flip the flag if your use case is 'who owns this domain?' rather than 'what's the security boundary?'
Keeping the list fresh
The PSL changes weekly — new TLDs, new private suffixes (every Cloudflare-managed pages domain, for example, ends up on it). For long-running services, refresh periodically: tldextract has a built-in refresh, tldts ships a new version every few days. Pin the version in your build, but pin a recent one.
When you need more than parsing
Once you have the registrable domain, the next step is usually fetching it and pulling structured data — title, OG tags, JSON-LD, contact details, prices. URL parsing gets you the right key; the value still has to be extracted from the page.