How to extract data from Amazon product pages
Title, price, ASIN, ratings, variations — Amazon makes it hard to scrape and easy to misuse APIs. Here are the legitimate options, the gray ones, and the screenshot-based fallback.
Amazon is the most-scraped site on the internet, and Amazon knows it. Their anti-bot defenses are aggressive, their ToS prohibit automated collection, and they aggressively rate-limit even legitimate API consumers. Here's how to actually get product data without lighting your IP block on fire.
Option 1: Product Advertising API (PA-API 5)
The official path. PA-API 5 gives you product metadata in exchange for being an Amazon Associate (affiliate). Sign up, get keys, install the SDK for your language. The catch: you need a minimum number of qualified sales within a window to keep your access — and rate limits are real.
Fields available: ASIN, title, price, primary image, variations, basic ratings. Notably absent: full review text, sales rank history, the BSR badge contents.
Option 2: Third-party data providers
Keepa, RainforestAPI, ScraperAPI, and Bright Data resell Amazon data with proxy fleets and uptime guarantees. They've absorbed the operational pain of staying ahead of Amazon's defenses. Pricing scales with volume; for serious projects this is usually cheaper than building it yourself.
Option 3: Roll your own scraper
Possible, painful. You'll need residential proxies, header rotation, browser fingerprint randomization, CAPTCHA solving, and constant maintenance as Amazon shifts its DOM and blocking heuristics. Legally, you're operating in the same gray zone that hiQ vs LinkedIn opened up — public pages aren't clearly off-limits under the CFAA, but you're definitely violating Amazon's ToS.
Option 4: Screenshot extraction
If you only need a few hundred products, taking screenshots and running them through a multimodal extractor sidesteps the entire scraping problem. ExtractFox's image data extractor handles product page screenshots — drop them in, get title, price, ASIN, rating, review count, and shipping info as a row.
This is the right approach for one-off competitive research, not for an ongoing pipeline. For ongoing data, pay for a provider.
Fields you'll typically want
- ASIN and parent ASIN (variation grouping)
- Title, brand, and bullet points
- Price, list price, and discount
- Star rating and review count
- Best Sellers Rank (BSR) and category
- Buy Box winner and Prime eligibility
- Variation matrix (size, color)
- Image URLs
What not to do
Don't hammer Amazon from a single IP. Don't ignore robots.txt. Don't republish full review text or product copy verbatim — it's copyrighted. Don't sign up for PA-API to get "free" data without intending to drive affiliate sales; you'll lose access fast.