All posts
Engineering7 min read

Public Suffix List: get the registrable domain from a URL

How to get the registrable domain from a URL with the Public Suffix List, including vercel.app, github.io, co.uk, and copy-paste Python, JavaScript, and Go examples.

By · Updated

To get the registrable domain from a URL, parse the hostname with the Public Suffix List and return the matched public suffix plus one label to its left. For shop.bbc.co.uk, the public suffix is co.uk and the registrable domain is bbc.co.uk. For app.vercel.app, vercel.app is the suffix and app.vercel.app is the registrable domain.

Most URL-parsing bugs come down to the same mistake: treating the last two dotted parts of a hostname as 'domain.tld' and everything before as the subdomain. That's wrong for country-code TLDs with multi-part suffixes, and it's wrong for private suffixes like vercel.app and github.io. The fix is the Public Suffix List.

What it is

The Public Suffix List (PSL) is a community-maintained list of every domain suffix under which the public can register names. It's hosted at publicsuffix.org by Mozilla and updated continuously. The list has two sections: ICANN-recognized suffixes (.com, .uk, .co.uk, etc.) and a private section for vendor-controlled suffixes where users can register subdomains (github.io, vercel.app, herokuapp.com).

Without the PSL, a URL parser cannot reliably tell where 'a domain' ends. With it, the rule is simple: the registrable domain is the suffix on the list, plus exactly one label to the left of it.

Registrable domain examples

URL or hostnamePublic suffixRegistrable domainSubdomain
shop.bbc.co.ukco.ukbbc.co.ukshop
docs.extractfox.comcomextractfox.comdocs
my-project.vercel.appvercel.appmy-project.vercel.app
alice.github.iogithub.ioalice.github.io
a.b.example.com.aucom.auexample.com.aua.b

The important detail: vercel.app and github.io are private public suffixes. If your library excludes private domains, it may return vercel.app as the registrable domain for every Vercel project, which is wrong for cookie boundaries and tenant isolation.

How to get registrable domain from URL

First parse the URL with a real URL parser, not string splitting. Then lowercase and IDNA-normalize the hostname, match the longest suffix from the Public Suffix List, and return that suffix plus the label immediately before it. If there is no label before the suffix, the hostname itself is a public suffix and is not registrable by an end user.

What goes wrong without it

Naive parsing of "https://shop.bbc.co.uk/cart" treats "co.uk" as the domain and "shop.bbc" as the subdomain. Cookies set on the naive 'domain' would scope to every .co.uk site — a real security bug, which is why every browser ships with the PSL embedded. The same logic applies to:

  • Cookie scoping — Set-Cookie should not be allowed to set on a public suffix.
  • Same-origin checks for embedded content and OAuth redirect validation.
  • Email domain matching for SSO — "@bbc.co.uk" and "@news.bbc.co.uk" share an organization; "@something-else.co.uk" does not.
  • Spam and phishing detection — phishing domains often abuse subdomain confusion (paypal-login.co.uk vs login.paypal.co.uk).
  • Analytics — counting unique sites by registrable domain, not by raw hostname.

How to use it: the three steps

  1. Fetch the latest PSL — either at build time (bundled), at runtime with periodic refresh, or via a library that caches it for you. The list is small (a few hundred KB).
  2. Match the hostname against the list, finding the longest suffix that matches.
  3. The registrable domain is that suffix plus one label. Anything to the left of that is the subdomain.

Library support, by language

Python: tldextract

import tldextract ext = tldextract.extract("https://shop.bbc.co.uk/cart") ext.subdomain # 'shop' ext.domain # 'bbc' ext.suffix # 'co.uk' ext.registered_domain # 'bbc.co.uk'

tldextract caches the PSL locally and refreshes on demand. Use the include_psl_private_domains=True flag if you want to treat github.io as a public suffix (and therefore each user's *.github.io as a separate registrable domain).

JavaScript / Node: tldts

import { parse } from "tldts"; const parsed = parse("https://my-project.vercel.app/dashboard", { allowPrivateDomains: true, }); console.log(parsed.domain); // 'my-project.vercel.app' console.log(parsed.publicSuffix); // 'vercel.app' console.log(parsed.subdomain); // ''

tldts ships the PSL inline and is dependency-free. For browser code, it's the right pick because it doesn't need a runtime fetch. Use allowPrivateDomains when you want vercel.app, github.io, pages.dev, and similar hosting domains to behave as suffixes.

Go: golang.org/x/net/publicsuffix

import "golang.org/x/net/publicsuffix" etld1, _ := publicsuffix.EffectiveTLDPlusOne("shop.bbc.co.uk") // etld1 == "bbc.co.uk"

The Go x/net package is what most production Go services use. It's a generated lookup table — fast and zero-allocation per call.

Rust, Ruby, others

Rust has publicsuffix on crates.io. Ruby has the public_suffix gem (used by Rails for cookie domain handling). Java has guava's InternetDomainName. Every reasonable language ecosystem has at least one good binding — there is no good reason to roll your own.

ICANN vs private suffixes — the part that catches people

The PSL has two sections. ICANN suffixes are real public TLDs and country-codes (.com, .co.uk, .com.au). Private suffixes are vendor-managed (github.io, vercel.app, herokuapp.com, blogspot.com). Treat them the same for cookie isolation; treat them differently for organizational ownership.

An example: alice.github.io and bob.github.io are different registrable domains for the purpose of cookies and same-origin policy (a security boundary). But they're both on github.io — the organizational owner is GitHub, not Alice or Bob. Most libraries default to including private suffixes; flip the flag if your use case is 'who owns this domain?' rather than 'what's the security boundary?'

Keeping the list fresh

The PSL changes weekly — new TLDs, new private suffixes (every Cloudflare-managed pages domain, for example, ends up on it). For long-running services, refresh periodically: tldextract has a built-in refresh, tldts ships a new version every few days. Pin the version in your build, but pin a recent one.

When you need more than parsing

Once you have the registrable domain, the next step is usually fetching it and pulling structured data — title, OG tags, JSON-LD, contact details, prices. URL parsing gets you the right key; the value still has to be extracted from the page.

Tool
Past the URL — pull structured data from any website
Drop a URL into ExtractFox and get back the page's structured data — title, OG/Twitter tags, JSON-LD, prices, contacts — as Excel, CSV, or JSON. Built on top of the same URL-parsing primitives.

More on engineering

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →