Public Suffix List: get the registrable domain from a URL
How to get the registrable domain from a URL with the Public Suffix List, including vercel.app, github.io, co.uk, and copy-paste Python, JavaScript, and Go examples.
To get the registrable domain from a URL, parse the hostname with the Public Suffix List and return the matched public suffix plus one label to its left. For shop.bbc.co.uk, the public suffix is co.uk and the registrable domain is bbc.co.uk. For app.vercel.app, vercel.app is the suffix and app.vercel.app is the registrable domain.
Most URL-parsing bugs come down to the same mistake: treating the last two dotted parts of a hostname as 'domain.tld' and everything before as the subdomain. That's wrong for country-code TLDs with multi-part suffixes, and it's wrong for private suffixes like vercel.app and github.io. The fix is the Public Suffix List.
What it is
The Public Suffix List (PSL) is a community-maintained list of every domain suffix under which the public can register names. It's hosted at publicsuffix.org by Mozilla and updated continuously. The list has two sections: ICANN-recognized suffixes (.com, .uk, .co.uk, etc.) and a private section for vendor-controlled suffixes where users can register subdomains (github.io, vercel.app, herokuapp.com).
Without the PSL, a URL parser cannot reliably tell where 'a domain' ends. With it, the rule is simple: the registrable domain is the suffix on the list, plus exactly one label to the left of it.
Registrable domain examples
| URL or hostname | Public suffix | Registrable domain | Subdomain |
|---|---|---|---|
| shop.bbc.co.uk | co.uk | bbc.co.uk | shop |
| docs.extractfox.com | com | extractfox.com | docs |
| my-project.vercel.app | vercel.app | my-project.vercel.app | |
| alice.github.io | github.io | alice.github.io | |
| a.b.example.com.au | com.au | example.com.au | a.b |
The important detail: vercel.app and github.io are private public suffixes. If your library excludes private domains, it may return vercel.app as the registrable domain for every Vercel project, which is wrong for cookie boundaries and tenant isolation.
How to get registrable domain from URL
First parse the URL with a real URL parser, not string splitting. Then lowercase and IDNA-normalize the hostname, match the longest suffix from the Public Suffix List, and return that suffix plus the label immediately before it. If there is no label before the suffix, the hostname itself is a public suffix and is not registrable by an end user.
What goes wrong without it
Naive parsing of "https://shop.bbc.co.uk/cart" treats "co.uk" as the domain and "shop.bbc" as the subdomain. Cookies set on the naive 'domain' would scope to every .co.uk site — a real security bug, which is why every browser ships with the PSL embedded. The same logic applies to:
- Cookie scoping — Set-Cookie should not be allowed to set on a public suffix.
- Same-origin checks for embedded content and OAuth redirect validation.
- Email domain matching for SSO — "@bbc.co.uk" and "@news.bbc.co.uk" share an organization; "@something-else.co.uk" does not.
- Spam and phishing detection — phishing domains often abuse subdomain confusion (paypal-login.co.uk vs login.paypal.co.uk).
- Analytics — counting unique sites by registrable domain, not by raw hostname.
How to use it: the three steps
- Fetch the latest PSL — either at build time (bundled), at runtime with periodic refresh, or via a library that caches it for you. The list is small (a few hundred KB).
- Match the hostname against the list, finding the longest suffix that matches.
- The registrable domain is that suffix plus one label. Anything to the left of that is the subdomain.
Library support, by language
Python: tldextract
import tldextract ext = tldextract.extract("https://shop.bbc.co.uk/cart") ext.subdomain # 'shop' ext.domain # 'bbc' ext.suffix # 'co.uk' ext.registered_domain # 'bbc.co.uk'
tldextract caches the PSL locally and refreshes on demand. Use the include_psl_private_domains=True flag if you want to treat github.io as a public suffix (and therefore each user's *.github.io as a separate registrable domain).
JavaScript / Node: tldts
import { parse } from "tldts"; const parsed = parse("https://my-project.vercel.app/dashboard", { allowPrivateDomains: true, }); console.log(parsed.domain); // 'my-project.vercel.app' console.log(parsed.publicSuffix); // 'vercel.app' console.log(parsed.subdomain); // ''
tldts ships the PSL inline and is dependency-free. For browser code, it's the right pick because it doesn't need a runtime fetch. Use allowPrivateDomains when you want vercel.app, github.io, pages.dev, and similar hosting domains to behave as suffixes.
Go: golang.org/x/net/publicsuffix
import "golang.org/x/net/publicsuffix" etld1, _ := publicsuffix.EffectiveTLDPlusOne("shop.bbc.co.uk") // etld1 == "bbc.co.uk"
The Go x/net package is what most production Go services use. It's a generated lookup table — fast and zero-allocation per call.
Rust, Ruby, others
Rust has publicsuffix on crates.io. Ruby has the public_suffix gem (used by Rails for cookie domain handling). Java has guava's InternetDomainName. Every reasonable language ecosystem has at least one good binding — there is no good reason to roll your own.
ICANN vs private suffixes — the part that catches people
The PSL has two sections. ICANN suffixes are real public TLDs and country-codes (.com, .co.uk, .com.au). Private suffixes are vendor-managed (github.io, vercel.app, herokuapp.com, blogspot.com). Treat them the same for cookie isolation; treat them differently for organizational ownership.
An example: alice.github.io and bob.github.io are different registrable domains for the purpose of cookies and same-origin policy (a security boundary). But they're both on github.io — the organizational owner is GitHub, not Alice or Bob. Most libraries default to including private suffixes; flip the flag if your use case is 'who owns this domain?' rather than 'what's the security boundary?'
Keeping the list fresh
The PSL changes weekly — new TLDs, new private suffixes (every Cloudflare-managed pages domain, for example, ends up on it). For long-running services, refresh periodically: tldextract has a built-in refresh, tldts ships a new version every few days. Pin the version in your build, but pin a recent one.
When you need more than parsing
Once you have the registrable domain, the next step is usually fetching it and pulling structured data — title, OG tags, JSON-LD, contact details, prices. URL parsing gets you the right key; the value still has to be extracted from the page.