All posts
TutorialMarch 26, 20266 min read

How to extract data from Amazon product pages

Title, price, ASIN, ratings, variations — Amazon makes it hard to scrape and easy to misuse APIs. Here are the legitimate options, the gray ones, and the screenshot-based fallback.

By Dawid Sibinski

Amazon is the most-scraped site on the internet, and Amazon knows it. Their anti-bot defenses are aggressive, their ToS prohibit automated collection, and they aggressively rate-limit even legitimate API consumers. Here's how to actually get product data without lighting your IP block on fire.

Option 1: Product Advertising API (PA-API 5)

The official path. PA-API 5 gives you product metadata in exchange for being an Amazon Associate (affiliate). Sign up, get keys, install the SDK for your language. The catch: you need a minimum number of qualified sales within a window to keep your access — and rate limits are real.

Fields available: ASIN, title, price, primary image, variations, basic ratings. Notably absent: full review text, sales rank history, the BSR badge contents.

Option 2: Third-party data providers

Keepa, RainforestAPI, ScraperAPI, and Bright Data resell Amazon data with proxy fleets and uptime guarantees. They've absorbed the operational pain of staying ahead of Amazon's defenses. Pricing scales with volume; for serious projects this is usually cheaper than building it yourself.

Option 3: Roll your own scraper

Possible, painful. You'll need residential proxies, header rotation, browser fingerprint randomization, CAPTCHA solving, and constant maintenance as Amazon shifts its DOM and blocking heuristics. Legally, you're operating in the same gray zone that hiQ vs LinkedIn opened up — public pages aren't clearly off-limits under the CFAA, but you're definitely violating Amazon's ToS.

Option 4: Screenshot extraction

If you only need a few hundred products, taking screenshots and running them through a multimodal extractor sidesteps the entire scraping problem. ExtractFox's image data extractor handles product page screenshots — drop them in, get title, price, ASIN, rating, review count, and shipping info as a row.

This is the right approach for one-off competitive research, not for an ongoing pipeline. For ongoing data, pay for a provider.

Fields you'll typically want

  • ASIN and parent ASIN (variation grouping)
  • Title, brand, and bullet points
  • Price, list price, and discount
  • Star rating and review count
  • Best Sellers Rank (BSR) and category
  • Buy Box winner and Prime eligibility
  • Variation matrix (size, color)
  • Image URLs

What not to do

Don't hammer Amazon from a single IP. Don't ignore robots.txt. Don't republish full review text or product copy verbatim — it's copyrighted. Don't sign up for PA-API to get "free" data without intending to drive affiliate sales; you'll lose access fast.

More on tutorial

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →