Amazon is one of the most protected marketplaces in the world. Its anti-bot system blocks 90% of attempts to automatically collect data on prices, stock levels, and product rankings. For sellers and marketers, this is a critical problem: without up-to-date competitor data, it is impossible to adjust pricing strategies and remain profitable.
In this guide, we will explore the technical mechanisms of Amazon's protection, demonstrate proven methods to bypass the anti-bot measures, and set up a price monitoring system that works reliably for months without blocks.
Why Amazon Blocks Scraping: Protection Mechanisms
Amazon loses millions of dollars due to scraping: competitors copy product data, prices, reviews, and unscrupulous sellers use automation to inflate rankings. Therefore, the company invests huge amounts in anti-bot systems that operate on multiple levels simultaneously.
Main components of Amazon's protection:
- AWS WAF (Web Application Firewall) — analyzes incoming traffic and blocks suspicious IP addresses at the network level. Monitors request frequency, geography, and IP reputation.
- Cloudfront CDN — a distributed content delivery network with its own bot filtering algorithms. Checks request headers, cookies, and TLS fingerprints of the browser.
- Bot Management System — uses machine learning to analyze user behavior. Tracks mouse movements, scroll speed, and click patterns.
- CAPTCHA and challenge pages — displayed during suspicious activity. Require solving a puzzle or entering a CAPTCHA to continue.
- Rate limiting — strict limits on the number of requests from one IP: usually 10-20 requests per minute for non-logged-in users.
All these systems work together and exchange data. If even one of them suspects a bot, the IP gets blacklisted for 24-48 hours, and sometimes permanently.
Important: Amazon shows different prices for different regions and types of users. Blocking means not only loss of access but also receiving outdated data, which is critical for monitoring competitors.
How Amazon Detects Bots: 7 Key Signals
Amazon's anti-bot system analyzes dozens of parameters for each request. Here are the key signals it uses to recognize automation:
1. IP Reputation
Amazon maintains a database of IP addresses from data centers, VPN services, and public proxies. Requests from such addresses receive heightened scrutiny or are blocked outright. The system also tracks activity history: if an IP sends too many requests to product pages — it raises suspicion.
What is checked: affiliation with known data centers (AWS, Google Cloud, DigitalOcean), presence in public proxy databases, number of requests in the last hour, geography (requests from unexpected countries).
2. User-Agent and HTTP Headers
Many scrapers use standard User-Agent libraries: python-requests/2.28.0 or do not send this header at all. Amazon instantly recognizes such requests.
Suspicious signs: absence of Accept-Language, Accept-Encoding headers; mismatch between User-Agent and other headers (e.g., Chrome User-Agent but headers like Firefox); absence of Referer when navigating between pages; old browser versions.
3. TLS/SSL Fingerprinting
When establishing an HTTPS connection, the browser sends a set of encryption parameters (cipher suites, extensions, TLS version). This set is unique to each browser. Libraries like requests or curl have fingerprints that differ from real browsers — Amazon can see this.
4. JavaScript and Canvas Fingerprinting
Amazon loads JavaScript code that collects information about the browser: screen resolution, installed fonts, supported WebGL features, Canvas parameters. Simple HTTP clients do not execute JavaScript and reveal themselves immediately.
5. Cookies and Sessions
Amazon sets numerous cookies on the first visit: session-id, ubid-main, x-main, and others. Absence of these cookies or incorrect values is a sign of a bot. The system also tracks session lifetime: a real user does not make 100 requests in 30 seconds.
6. Behavior Patterns
A real person opens the homepage, searches for a product, navigates through categories, reads descriptions, and goes back. A bot immediately requests specific product URLs in a perfect sequence without delays.
Suspicious patterns: requests only to product pages without visiting the homepage; perfect URL sequence (product1, product2, product3...); absence of requests to static resources (images, CSS, JS); identical intervals between requests.
7. Request Frequency
Even with perfect browser emulation, too high a request frequency will reveal a bot. Amazon tracks the number of requests from an IP per minute, hour, and day. Exceeding limits (usually 10-20 requests/minute for guests) leads to blocking.
Choosing Proxies to Bypass Anti-Bot: Residential vs Data Centers
The correct choice of proxy type is 70% of the success in bypassing Amazon's protection. Let's analyze three main types and their applicability for scraping the marketplace.
| Proxy Type | Amazon Trust Level | Speed | Application |
|---|---|---|---|
| Residential | Very High (real IPs of home users) | Average (50-150 ms) | Main scraping, high volumes |
| Mobile | Maximum (IP of mobile operators) | Low (200-500 ms) | Bypassing strict blocks, accounts |
| Data Centers | Low (Amazon knows these IPs) | Very High (10-30 ms) | Testing, one-off tasks |
Residential Proxies — The Optimal Choice
For stable scraping of Amazon, residential proxies are recommended — they use the IP addresses of real home users, which Amazon cannot block en masse without risking blocking actual buyers.
Advantages of Residential Proxies for Amazon:
- IPs belong to Internet service providers (Comcast, AT&T, Verizon in the USA), not data centers
- Low blocking percentage: less than 2% with proper rotation settings
- Ability to choose geography: USA, UK, Germany, and other countries for local pricing
- Support for sticky sessions: one IP can be used for 10-30 minutes to simulate a real user
Important parameters when choosing residential proxies:
- IP pool size: at least 1 million addresses for effective rotation
- Geography: choose a country where Amazon operates (USA, UK, Germany, Japan, etc.)
- Rotation type: support for sticky sessions with a lifespan of 10-30 minutes
- Protocol: HTTP/HTTPS and SOCKS5 for compatibility with various tools
When to Use Mobile Proxies
Mobile proxies use IPs from mobile operators (4G/5G). Amazon almost never blocks such addresses, as thousands of real users can be behind one IP due to CGNAT technology.
When to choose mobile proxies:
- Working with Amazon seller accounts (Seller Central) — IP stability is critical for them
- Bypassing strict blocks after banning residential IPs
- Scraping with authentication (e.g., prices for Prime subscribers)
- Small data volumes (up to 1000 products a day) — mobile proxies are more expensive
The downside of mobile proxies is their high cost and lower speed due to the nature of mobile networks. They are inefficient for mass scraping of thousands of products.
Why Data Centers Are Not Suitable
Data center proxies use IPs from AWS servers, Google Cloud, DigitalOcean. Amazon instantly recognizes such addresses — they are in the ASN (autonomous system) databases of data centers.
Problems with using data centers: blocking after 5-10 requests; constant CAPTCHAs; showing outdated prices or empty pages; permanent IP ban after several attempts.
The only case where data centers can be used is testing the scraper on a small number of products (10-20) before launching on residential proxies.
IP Rotation Strategy: Frequency and Geography
Even with residential proxies, incorrect IP rotation will lead to blocks. Amazon tracks the behavior of each address and bans those that make too many requests or behave suspiciously.
Optimal Rotation Frequency
There are two approaches to rotation: after each request (rotating proxies) and with a fixed lifespan (sticky sessions). For Amazon, the second option is more effective.
Recommended sticky sessions strategy:
- IP lifespan: 10-15 minutes — the optimal balance between simulating a real user and the risk of blocking
- Number of requests per IP: no more than 15-20 requests during the session lifespan
- Delay between requests: 3-7 seconds (random, not fixed!)
- Behavior simulation: first request — homepage or category, then — product pages
Example scenario for one IP: open the main Amazon.com → wait 5 sec → open Electronics category → wait 4 sec → open product 1 → wait 6 sec → open product 2 → ... → after 15 requests, change IP.
Tip for high loads:
If you need to scrape thousands of products per hour, use a pool of 50-100 concurrent sessions with different IPs. Each session makes 10-15 requests with delays, then changes IP. This gives 500-1500 requests per hour without blocks.
Geographical Distribution
Amazon shows different prices, assortments, and delivery conditions depending on the user's location. For accurate monitoring, proxies from the same country as the target marketplace should be used.
Correspondence of marketplaces and proxy geolocation:
- Amazon.com (USA): use proxies from the USA, preferably from different states for diversity
- Amazon.co.uk (UK): proxies from the UK
- Amazon.de (Germany): proxies from Germany
- Amazon.co.jp (Japan): proxies from Japan
Important: do not use proxies from other countries for scraping a specific marketplace. For example, requests to Amazon.com from an IP in India or Russia look suspicious and often receive CAPTCHAs.
Avoid Reusing IPs
Even if an IP is not blocked, do not reuse it within 2-3 hours. Amazon remembers the activity history of each address. If the same IP appears every 15 minutes throughout the day — this is a clear sign of automation.
Rotation rule: the minimum pool for stable operation is 500-1000 unique IPs. This ensures enough diversity so that each address is used no more than 1-2 times a day.
Emulating a Real Browser: Headers and Fingerprints
Even with residential proxies and proper rotation, a scraper will be blocked if it does not emulate a real browser. Amazon checks dozens of parameters of HTTP requests and the JavaScript environment.
Correct HTTP Headers
Simple HTTP clients (requests, curl, wget) send a minimal set of headers, which instantly reveals a bot. You need to copy headers from a real browser.
Mandatory headers for Amazon:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8 Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br Connection: keep-alive Upgrade-Insecure-Requests: 1 Sec-Fetch-Dest: document Sec-Fetch-Mode: navigate Sec-Fetch-Site: none Sec-Fetch-User: ?1 Cache-Control: max-age=0
Critical points:
- User-Agent: use the latest version of Chrome or Firefox (check every 2-3 months). Old browser versions are suspicious.
- Accept-Language: must match the geography of the proxy (en-US for the USA, en-GB for the UK, de-DE for Germany)
- Sec-Fetch-* headers: appeared in modern browsers, their absence is a sign of an old client
- Referer: when navigating between pages, always send the Referer of the previous page
TLS Fingerprinting and Bypass
Amazon analyzes the parameters of the TLS connection: protocol version, cipher suite, extensions. Standard libraries (OpenSSL in Python requests) have fingerprints that differ from browsers.
Solution: use tools that emulate browser TLS:
- curl-impersonate: a version of curl that copies TLS fingerprints of Chrome and Firefox
- tls-client (Python): a library supporting browser fingerprinting
- Playwright/Puppeteer: real browsers in headless mode — ideal emulation, but slower
JavaScript and Cookies
Amazon executes JavaScript code when loading the page, which sets cookies and collects information about the browser. Without executing this code, you will not receive complete data and will quickly get blocked.
Mandatory actions:
- Use tools that support JavaScript: Selenium, Playwright, Puppeteer
- Save all cookies between requests within the same session
- Wait for the full page load (DOMContentLoaded event) before extracting data
- Simulate user actions: scrolling, random pauses
Amazon sets critical cookies: session-id, ubid-main, x-main. Without them, you will receive a CAPTCHA or an empty page.
Request Limits and Delays Between Them
Even perfect browser emulation will not save you from a ban if you make too many requests. Amazon strictly limits the frequency of requests from one IP.
Documented Amazon Limits
There is no official data on limits, but based on community testing, approximate values are known:
| User Type | Request Limit/Minute | Request Limit/Hour |
|---|---|---|
| Non-logged-in user | 10-15 | 200-300 |
| Logged-in buyer | 20-30 | 500-800 |
| Amazon API (official) | Unlimited | Depends on the plan |
Exceeding limits leads to CAPTCHAs, temporary blocking (1-24 hours), or permanent IP bans for systematic violations.
Optimal Delays Between Requests
Fixed intervals (e.g., exactly 5 seconds) reveal a bot. A real person takes breaks of varying lengths: reads product descriptions, compares prices, gets distracted.
Recommended delay strategy:
- Base delay: 3-7 seconds (random value from the range)
- First request in the session: 5-10 seconds (simulating loading the homepage)
- After an error or CAPTCHA: 30-60 seconds before retrying
- Between IP changes: 2-3 seconds for "reconnection"
Example of implementing a random delay: sleep(random.uniform(3, 7)) — each pause will be unique.
Load Distribution Over Time
Do not start scraping thousands of products simultaneously at 00:00. Amazon tracks spikes in activity. Spread the task over several hours or the entire day.
Example: if you need to scrape 5000 products. Break it into 10 batches of 500 products, launching each batch with an interval of 1-2 hours. This looks like organic activity from different users.
Ready-Made Tools for Amazon Scraping
Writing a scraper from scratch is difficult and time-consuming. There are ready-made solutions that already implement bypassing anti-bot measures, proxy rotation, and browser emulation.
1. Bright Data Web Scraper IDE
A cloud tool with ready-made templates for Amazon. No programming required — you configure data selectors through a visual interface. Built-in proxies and CAPTCHA bypass.
Pros: works out of the box, automatic IP rotation, JavaScript support. Cons: expensive ($500+ per month), dependence on an external service.
2. Octoparse
A desktop application for Windows with a visual parser builder. There is a cloud version for running tasks 24/7. Supports integration with proxies.
Proxy setup in Octoparse: Settings → Proxy Settings → add a list of proxies in the format IP:PORT:USER:PASS → enable rotation.
Pros: no coding required, user-friendly interface, free plan available. Cons: limitations on the number of pages in the free version, difficulties with CAPTCHAs.
3. ScrapingBee API
An API service for scraping with automatic protection bypass. You send a URL and receive HTML. Built-in proxy rotation and JavaScript execution.
Example usage:
curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_KEY&url=https://www.amazon.com/dp/B08N5WRWNW&render_js=true&premium_proxy=true&country_code=us"
Pros: easy integration, no need for your own proxies. Cons: paid (from $49/month), limits on the number of requests.
4. Playwright + Own Proxies (for Developers)
If you know how to program, the best option is to use Playwright (or Puppeteer) with residential proxies. Full control over the process and minimal cost.
Example of proxy setup in Playwright (Python):
from playwright.sync_api import sync_playwright
import random
import time
proxy_list = [
{"server": "http://proxy1.example.com:8080", "username": "user", "password": "pass"},
{"server": "http://proxy2.example.com:8080", "username": "user", "password": "pass"},
]
with sync_playwright() as p:
proxy = random.choice(proxy_list)
browser = p.chromium.launch(proxy=proxy, headless=True)
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
locale="en-US",
timezone_id="America/New_York"
)
page = context.new_page()
# First request - homepage
page.goto("https://www.amazon.com")
time.sleep(random.uniform(3, 5))
# Product request
page.goto("https://www.amazon.com/dp/B08N5WRWNW")
page.wait_for_load_state("networkidle")
# Extracting data
title = page.locator("#productTitle").inner_text()
price = page.locator(".a-price-whole").first.inner_text()
print(f"Title: {title}, Price: ${price}")
browser.close()
Pros: full control, cheaper than cloud services, scalable. Cons: requires programming skills, need to handle CAPTCHAs yourself.
Recommendations for Tool Selection
| Your Situation | Recommended Tool |
|---|---|
| I don't know how to program, need 100-500 products a day | Octoparse + residential proxies |
| Need to quickly test an idea, have a budget | ScrapingBee API |
| I know how to program, need thousands of products | Playwright/Puppeteer + residential proxies |
| Large budget, need maximum reliability | Bright Data Web Scraper |
What to Do When Blocked: Diagnosis and Solutions
Even when following all the rules, blocks can still occur. It is important to understand the cause and quickly fix the problem.
Types of Blocks and Their Signs
1. CAPTCHA (status code 503 or redirect to /errors/validateCaptcha):
- Cause: suspicious activity from the IP, but not a complete block
- Solution: change IP, increase delays between requests, add user action simulation
- Automation: use CAPTCHA solving services (2Captcha, Anti-Captcha) — but this slows down scraping
2. IP Block (code 403 or empty page):
- Cause: IP has been blacklisted due to exceeding limits or using data centers
- Solution: immediately change IP, check the type of proxy (possibly using data centers instead of residential)
- Duration: usually 24-48 hours, sometimes permanently
3. "To discuss automated access to Amazon data please contact api-services-support@amazon.com":
- Cause: Amazon has clearly identified automation and suggests using the official API
- Solution: improve browser emulation, check TLS fingerprint, reduce request frequency by half
Troubleshooting Checklist
If you are receiving blocks, check in order:
- Proxy type: ensure you are using residential, not data center proxies. You can check this on whoer.net
- Geography: IP should be from the same country as the marketplace (USA for .com, UK for .co.uk)
- User-Agent: current version of Chrome/Firefox (no older than 3-4 months)
- Cookies: are they preserved between requests within the session?
- JavaScript: is it executed (if using Playwright/Puppeteer — it should be executed)
- Request frequency: no more than 10-15 per minute from one IP
- Delays: random, not fixed
- IP rotation: each address is used no more than once every 2-3 hours
Emergency Measures for Mass Blocks
If most requests are being blocked (over 30%):
- Stop scraping for 2-3 hours — give Amazon a chance to "forget" about your activity
- Change your proxy provider — the IP pool may already be compromised
- Reduce the load by 3-5 times — instead of 100 requests per hour, make 20-30
- Switch to mobile proxies — they are practically not blocked, although more expensive
- Add more human simulation: random transitions between categories, searching for products through the search bar instead of direct URLs
Attention: If your IP is permanently banned (the block lasts more than 72 hours), do not attempt to use it again. Amazon rarely lifts permanent bans. Switch to a new proxy pool.
Conclusion
Bypassing Amazon's anti-bot measures is a complex task that requires a combination of the right proxies, accurate browser emulation, and reasonable request limits. Key points for successful scraping: use residential proxies from the same country as the marketplace; rotate IPs every 10-15 minutes with a limit of 15-20 requests per session; fully emulate a modern browser with correct headers and JavaScript execution; random delays of 3-7 seconds between requests.
By following these rules, the success rate of requests reaches 95-98%, and blocks become rare. The main thing is not to rush and to simulate the behavior of a real user, rather than trying to scrape thousands of products in minutes.
For stable operation with Amazon, we recommend using residential proxies...