If you are engaged in monitoring competitor prices, scraping product stock, or automatically posting ads on marketplaces, you have likely encountered blocks. The APIs of Wildberries, Ozon, Yandex.Market, and Avito are actively protected against automation: they limit the number of requests, ban IP addresses, and require CAPTCHA. In this guide, we will analyze why blocks occur and how to configure your scraper to work stably for months without bans.
Why Marketplaces Block Frequent API Requests
Marketplaces spend huge amounts of money on infrastructure support β servers, databases, CDNs. When you make thousands of requests per minute to scrape prices, you create additional load on their systems. However, the main reason for blocks is not technical but business-oriented.
Main Reasons for Blocks:
- Protection of Competitive Data. Wildberries and Ozon do not want competitors to easily access information about prices, stock, and popular products. This data is a trade secret.
- Reducing Server Load. One scraper can generate as many requests as 10,000 regular buyers. This increases hosting costs.
- Combating Manipulation and Spam. Automated systems are used to inflate views, reviews, and mass posting of ads on Avito.
- Monetization of API. Some marketplaces offer official paid APIs with limits. By blocking free scraping, they encourage the purchase of access.
For example, if you monitor prices for 5,000 products every hour β that's 120,000 requests per day. From a single IP address, this looks suspicious, and the marketplace's protection system will quickly block your access.
What Protection Methods Are Used by Wildberries, Ozon, and Avito
Modern marketplaces use multi-layered protection against scraping. Understanding these mechanisms will help you properly configure your bypass methods.
| Protection Method | How It Works | How to Bypass |
|---|---|---|
| Rate Limiting | Request limits from one IP: 100-500 per hour | Delays between requests + IP rotation |
| IP Blacklist | Blocking known data center proxies | Using residential proxies |
| User-Agent Check | Blocking requests without a browser User-Agent | Setting realistic headers |
| JavaScript Checks | Requiring JS code execution to obtain data | Using headless browsers |
| Captcha | Forced verification during suspicious activity | Reducing request frequency, CAPTCHA solving services |
| TLS Fingerprinting | Identifying automation based on TLS parameters | Using libraries with the correct fingerprint |
| Behavioral Analysis | Analyzing patterns: click speed, mouse movements | Randomizing delays, mimicking human behavior |
Wildberries employs aggressive protection: a limit of about 200-300 requests per hour from one IP, User-Agent checks, and JavaScript challenges. If you exceed the limit, you will receive HTTP 429 (Too Many Requests) or 403 (Forbidden).
Ozon is more lenient towards scraping through the API but actively bans data center IPs. They use services to determine the type of IP (DataCenter vs Residential), so regular proxies often do not work.
Avito protects its API from mass ad postings and contact scraping. Here, geographical relevance is crucial: if you post an ad in Kazan, the IP must be from Kazan; otherwise, moderation will block the publication.
Rate Limiting: How to Properly Configure Delays Between Requests
Rate limiting is an artificial restriction on the speed of requests to make your activity appear like that of a regular user. The main rule: it's better to be slow but steady than fast and banned.
Recommended Settings for Popular Marketplaces:
Wildberries:
- Delay between requests: 2-5 seconds (randomized)
- Maximum 150-200 requests per hour from one IP
- Pause 10-15 minutes after every 100 requests
- Rotate IP after 200 requests
Ozon:
- Delay between requests: 1-3 seconds
- Maximum 300-400 requests per hour from one IP
- Using residential proxies is mandatory
- Rotate IP after 300 requests
Avito:
- Delay between requests: 3-7 seconds
- Maximum 50-100 requests per hour (strict limits)
- IP must correspond to the city of the ad
- One IP = one account (do not mix)
How to Implement Randomized Delays: Do not use fixed intervals like "exactly 3 seconds" β this looks like a bot. Add randomness: from 2 to 5 seconds. Most scrapers support this through settings.
For example, in Python with the requests library, it looks like this:
import time
import random
import requests
def make_request(url, proxies):
response = requests.get(url, proxies=proxies)
# Random delay from 2 to 5 seconds
delay = random.uniform(2.0, 5.0)
time.sleep(delay)
return response
# Example usage
proxy = {
'http': 'http://username:password@proxy.example.com:8000',
'https': 'http://username:password@proxy.example.com:8000'
}
for product_id in product_list:
url = f'https://card.wb.ru/cards/detail?nm={product_id}'
response = make_request(url, proxy)
# Process data...
Important Note: After every 100-200 requests, take a long pause (10-20 minutes) or change the IP. This mimics the behavior of a person who browses products and then gets distracted by other tasks.
Proxy Rotation for Load Distribution
Even with the right delays, one IP cannot handle long-term load. The solution is proxy rotation: distributing requests among multiple IP addresses. This is the foundation of stable scraping on marketplaces.
Types of Proxies for Marketplace Scraping:
| Proxy Type | Advantages | Disadvantages | For Which Tasks |
|---|---|---|---|
| Data Center | Fast, cheap, stable | Easily identified, often on ban lists | Yandex.Market, small marketplaces |
| Residential | Real IPs of home users, low risk of bans | More expensive, slower than data centers | Wildberries, Ozon, Avito |
| Mobile | IPs of mobile operators, maximum anonymity | Most expensive, variable speed | Bypassing strict Avito blocks |
For scraping Wildberries and Ozon, we recommend using residential proxies β they have IPs of real home users, so marketplaces cannot distinguish them from regular buyers. Data center proxies perform poorly here: Ozon and Wildberries maintain blacklists of such IPs.
Proxy Rotation Strategies:
- Rotation after N requests. Change IP after every 100-300 requests. This is the optimal balance between efficiency and safety.
- Time-based rotation. Change IP every 30-60 minutes. Suitable for long scraping sessions.
- Sticky sessions. Use one IP for all requests to one product/category, then change. This reduces suspicion.
- Geographical relevance. Mandatory for Avito: scrape Moscow ads through Moscow IPs, Kazan ads through Kazan IPs.
Most residential proxy providers offer automatic rotation: you get one endpoint, and the IP changes automatically at a specified frequency or after each request. This simplifies scraper configuration.
Example of Setting Up a Proxy Pool in Python:
import requests
import random
# List of proxies (can be loaded from a file)
proxy_list = [
'http://user:pass@proxy1.example.com:8000',
'http://user:pass@proxy2.example.com:8000',
'http://user:pass@proxy3.example.com:8000',
# ... additional 50-100 proxies
]
def get_random_proxy():
proxy = random.choice(proxy_list)
return {
'http': proxy,
'https': proxy
}
# Usage
for product_id in product_list:
proxy = get_random_proxy() # Random proxy for each request
response = requests.get(url, proxies=proxy)
# Process...
Configuring Headers and Fingerprint to Mimic a Browser
Marketplaces analyze not only IPs and request frequency but also HTTP headers. If your scraper sends requests with the default headers of the library (for example, python-requests/2.28.0), it is instantly identified as a bot.
Mandatory Headers to Mimic a Browser:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Cache-Control': 'max-age=0',
'Referer': 'https://www.google.com/'
}
Important Points:
- User-Agent must match a real browser. Use up-to-date versions of Chrome, Firefox, Safari. Change User-Agent every 100-200 requests.
- Accept-Language must match the geography of the proxy. If using Russian IPs β set to ru-RU, for Ukrainian β uk-UA.
- Referer indicates where the user came from. For the first request, use Google/Yandex, for subsequent ones β internal pages of the marketplace.
- Sec-Fetch-* headers add realism. Modern browsers send them automatically.
TLS Fingerprinting: Advanced protection systems (Ozon, Wildberries) analyze the parameters of the TLS connection: the order of cipher suites, extensions, protocol version. Standard Python/Node.js libraries have a different fingerprint than browsers.
The solution is to use specialized libraries:
- curl-impersonate (Python) β mimics TLS fingerprint of Chrome/Firefox
- tls-client (Go, Python bindings) β customizable TLS fingerprint
- Playwright / Puppeteer β headless browsers with real TLS
For most marketplace scraping tasks, correct HTTP headers and residential proxies are sufficient. TLS fingerprinting is critical only when working with the most secure APIs.
API vs Web Scraping: Which is Safer for Scraping
Marketplaces have two ways to obtain data: official API and scraping HTML pages (web scraping). Which one to choose for stable operation?
| Parameter | Official API | Web Scraping |
|---|---|---|
| Legality | β Allowed, documentation available | β οΈ Gray area, may violate ToS |
| Stability | β Stable data structure | β Breaks with website redesign |
| Limits | β οΈ Strict official limits | β οΈ Unofficial, but there is protection |
| Data Access | β οΈ Not all data is available | β All public data |
| Speed | β Fast JSON responses | β Slower due to HTML |
| Cost | β οΈ Often paid | β Free (only proxy costs) |
Recommendations for Choosing:
- Use the official API if: You need small volumes of data (up to 10,000 products per day), you are willing to pay for access, and legality and stability are important.
- Use web scraping if: You need large volumes of data, the official API does not provide the required information (e.g., competitor prices), and the budget is limited.
Hybrid Approach: Many professional scrapers combine both methods. For example, they obtain a list of products through the API (quickly and legally) and scrape detailed information about prices and stock from HTML pages (more data).
Internal Marketplaces APIs: Besides the official API, marketplaces use internal APIs for website operations. For instance, Wildberries loads product data through https://card.wb.ru/cards/detail. These endpoints are undocumented but work faster than HTML scraping. The downside is they can change without warning.
Setting Up Popular Scrapers and Tools
Most sellers and marketers use ready-made tools for scraping marketplaces. Letβs look at how to properly configure proxies and limits in popular solutions.
Setting Up Scrapy (Python Framework)
Scrapy is a popular framework for web scraping. To work with marketplaces, add the following to settings.py:
# Delays between requests
DOWNLOAD_DELAY = 3 # 3 seconds
RANDOMIZE_DOWNLOAD_DELAY = True # Randomization from 0.5*DELAY to 1.5*DELAY
# Limits on concurrent requests
CONCURRENT_REQUESTS = 8
CONCURRENT_REQUESTS_PER_DOMAIN = 2
# Proxy settings (via rotating-proxies middleware)
ROTATING_PROXY_LIST = [
'http://user:pass@proxy1.example.com:8000',
'http://user:pass@proxy2.example.com:8000',
# ... list of proxies
]
# User-Agent rotation
USER_AGENT_LIST = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/537.36',
# ... list of User-Agents
]
# Retry attempts on errors
RETRY_TIMES = 3
RETRY_HTTP_CODES = [429, 500, 502, 503, 504]
Setting Up Octoparse (Visual No-Code Parser)
Octoparse is a popular tool for scraping without programming. Proxy and limit setup:
- Open Task Settings β Advanced Options
- In the "Network" section, enable "Use Proxy Server"
- Add the proxy list in the format
IP:PORT:USER:PASS - Enable "Rotate IP for each request" for automatic rotation
- In the "Speed" section, set to "Slow" or "Custom" with a delay of 3-5 seconds
- Enable "Random delay" to mimic human behavior
Setting Up Selenium (Browser Automation)
Selenium controls a real browser, so it bypasses many protections. Hereβs an example setup with a proxy:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
import random
# Setting up Chrome with a proxy
chrome_options = Options()
chrome_options.add_argument('--proxy-server=http://user:pass@proxy.example.com:8000')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=chrome_options)
# Hiding WebDriver
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
# Scraping with delays
urls = ['https://www.wildberries.ru/catalog/...', ...]
for url in urls:
driver.get(url)
# Random delay of 3-7 seconds
time.sleep(random.uniform(3, 7))
# Scrolling to mimic reading
driver.execute_script("window.scrollTo(0, document.body.scrollHeight/2);")
time.sleep(random.uniform(1, 3))
# Data scraping
# ...
Ready-Made Marketplace Scraping Services
If you do not want to set up a scraper yourself, use specialized services:
- Mpstats.io β analytics for Wildberries and Ozon, automatic price and sales monitoring
- SellerFox β competitor monitoring on marketplaces, stock tracking
- Moneyplace β scraping Avito, automatic ad posting
- Parsehub β visual scraper for any websites, including marketplaces
These services have already configured proxies, limits, and bypass protections β you only need to specify what to scrape. The downside is a monthly subscription starting from 2000β½.
Monitoring Blocks and Automatic Response
Even with the right settings, blocks are possible: marketplaces update protection, proxies get on ban lists, limits change. It is important to track issues and respond automatically.
Signs of Blocking to Monitor:
- HTTP 429 (Too Many Requests) β request limit exceeded, need a pause or IP change
- HTTP 403 (Forbidden) β IP blocked, immediate proxy rotation needed
- HTTP 503 (Service Unavailable) β temporary overload or DDoS protection
- Captcha in response β automation detected, need to reduce activity
- Empty responses or redirect to the homepage β soft block
- Sharp increase in response time β possible rate limiting on the server side
Automatic Response to Blocks (Example in Python):
import requests
import time
from datetime import datetime
class SmartParser:
def __init__(self, proxy_list):
self.proxy_list = proxy_list
self.current_proxy_index = 0
self.request_count = 0
self.blocked_proxies = set()
def get_next_proxy(self):
# Skip blocked proxies
while self.current_proxy_index in self.blocked_proxies:
self.current_proxy_index = (self.current_proxy_index + 1) % len(self.proxy_list)
proxy = self.proxy_list[self.current_proxy_index]
return {'http': proxy, 'https': proxy}
def rotate_proxy(self):
self.current_proxy_index = (self.current_proxy_index + 1) % len(self.proxy_list)
self.request_count = 0
def make_request(self, url):
max_retries = 3
for attempt in range(max_retries):
try:
proxy = self.get_next_proxy()
response = requests.get(url, proxies=proxy, timeout=10)
# Check for blocking
if response.status_code == 429:
print(f"[{datetime.now()}] Rate limit! Pausing for 60 seconds...")
time.sleep(60)
self.rotate_proxy()
continue
elif response.status_code == 403:
print(f"[{datetime.now()}] IP blocked! Rotating proxy...")
self.blocked_proxies.add(self.current_proxy_index)
self.rotate_proxy()
continue
elif response.status_code == 503:
print(f"[{datetime.now()}] Server overloaded. Pausing for 120 seconds...")
time.sleep(120)
continue
# Successful request
self.request_count += 1
# Rotate after 200 requests
if self.request_count >= 200:
self.rotate_proxy()
time.sleep(10) # Pause after rotation
return response
except requests.exceptions.Timeout:
print(f"[{datetime.now()}] Timeout. Attempt {attempt + 1}/{max_retries}")
time.sleep(5)
return None # All attempts exhausted
Logging and Alerts: Set up notifications for critical events. For example, send a message to Telegram when:
- More than 30% of proxies from the pool are blocked
- The percentage of successful requests drops below 80%
- The parser has not received data for more than 30 minutes
- Captcha detected in responses
Metrics to Monitor:
- Success rate β percentage of successful requests (should be >90%)
- Average response time β average response time (increase may indicate problems)
- Requests per hour β number of requests per hour per proxy
- Proxy health β percentage of working proxies in the pool
- Block rate β frequency of blocks (should be <5%)
Use dashboards for visualizing metrics: Grafana, Datadog, or simple Google Sheets with automatic updates via API.
Conclusion
Blocks when scraping marketplaces are not an obstacle but a task that can be solved with the right tool configuration. Key points for stable operation without bans:
- Use residential proxies for Wildberries, Ozon, and Avito β data center proxies do not work here
- Configure randomized delays of 2-5 seconds between requests
- Rotate IP after every 150-300 requests or every 30-60 minutes
- Use realistic HTTP headers with up-to-date User-Agent
- Monitor blocks and respond automatically
- For Avito, geographical relevance of IP to the city of the ad is mandatory
A properly configured scraper with quality proxies can operate for months without a single block, collecting tens of thousands of products daily. The key is not to chase speed but to mimic the behavior of an ordinary user.
If you plan to regularly scrape Wildberries, Ozon, or Avito, we recommend using residential proxies with automatic rotation β they provide maximum stability and minimal risk of blocks. For tasks requiring mobile IPs (e.g., bypassing strict Avito blocks), mobile proxies with IPs from Russian operators will suffice.