All posts
EngineeringMarch 12, 20267 min read

How to extract LinkedIn profile data using Python

LinkedIn is the most legally fraught scraping target on the internet. Here's what's allowed, what's gray, and the safest patterns for engineers who need profile or company data programmatically.

By Dawid Sibinski

Before any code: LinkedIn is the most aggressively defended scraping target on the public web. Their legal team has filed and won multiple cases against scraper operators, including the long-running hiQ Labs saga that — despite an early hiQ win — eventually settled with hiQ shutting down. If you're doing this commercially, you need a lawyer, not a blog post. With that out of the way, here are the technical options.

1. Official LinkedIn APIs

LinkedIn offers a few API products, all gated:

  • Sign In with LinkedIn — basic profile fields for the authenticated user only.
  • Marketing Developer Platform — campaign and analytics data, not profile scraping.
  • Talent Solutions / Recruiter — full profile data for paying enterprise customers under specific contracts.
  • Sales Navigator — lead data, also paid, also restricted.

If you qualify for any of these, use them. They're the only fully ToS-compliant route.

2. The unofficial linkedin-api package

There's a popular Python package on PyPI called linkedin-api that wraps LinkedIn's internal Voyager API using a logged-in session cookie. Functionally it works. Practically:

  • It violates LinkedIn's Terms of Service.
  • Your account will eventually be flagged or banned.
  • Voyager endpoints change without notice, so the library breaks regularly.
  • Don't use it from your real account.

If you're going to test it, use a throwaway account and accept that it will not last.

3. Headless browser with Selenium or Playwright

Same ToS issues as above, plus you have to handle login, two-factor, and infinite scroll. LinkedIn's bot detection on its login flow is sophisticated. Don't expect this to be a weekend project.

4. Third-party data providers

Companies like Proxycurl, Phantombuster, and Bright Data sell LinkedIn data through their own infrastructure. They wear the legal and operational risk; you pay per record. For most engineering teams that need LinkedIn data, this is the right tradeoff.

5. Manual export + extraction

If you have a small set of profiles you've legitimately accessed (with permission), you can save the page as PDF or take a screenshot and run it through ExtractFox's LinkedIn extractor or image data extractor. This works well for recruiter shortlists or sales prospect research where the volume is small and human-driven.

Fields you can typically get

  • Name, headline, location
  • Current and past positions with dates
  • Education
  • Skills and endorsements
  • About / summary text
  • Connection count and mutual connections

Practical recommendation for engineers

Default to the official APIs if you qualify. If you don't, buy from a provider — the all-in cost is lower than building and maintaining a scraper, and you offload the legal exposure. Build your own scraping infrastructure only if LinkedIn data is the literal core of your product and you have legal counsel on retainer.

More on engineering

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →