Success in marketplaces directly depends on the speed of response to trends. While you manually browse the catalogs of Wildberries and Ozon, competitors have already automated data collection through proxies and are receiving real-time information about bestsellers. However, marketplaces actively block parsing — without the correct proxy setup, you risk losing access to the platform or obtaining incomplete data.
In this guide, we will discuss how to set up an automatic data collection system for trending products, which type of proxy to choose for different marketplaces, and how to avoid common mistakes that lead to blocks.
Why Marketplaces Block Parsing and How Proxies Solve the Problem
Marketplaces spend millions on protection against automated data collection. The reason is simple: parsing creates a load on servers and allows competitors to obtain commercial information. Wildberries, Ozon, and other platforms use a multi-layered protection system that monitors suspicious activity.
The anti-parsing system analyzes several parameters simultaneously. If 100 requests come from one IP address in a minute — this is a clear sign of a bot. An ordinary buyer views 5-10 product cards in that time. The User-Agent of the browser, click frequency, mouse movement, and even time spent on the page are also monitored.
Proxies solve a key problem — they distribute requests among different IP addresses. Instead of sending 1000 requests from your real IP, the system makes 10-20 requests from each of 50-100 different addresses. For the marketplace, this looks like the activity of ordinary users from different cities.
Important: Using proxies does not guarantee complete protection from blocks. You also need to set up proper IP rotation, maintain intervals between requests, and mimic real user behavior. We will discuss this in detail in the setup section.
Which Type of Proxy to Choose for Data Collection
Three types of proxies are suitable for parsing marketplaces, each with its own advantages and limitations. The choice depends on the volume of data, budget, and speed requirements for information collection.
| Proxy Type | Speed | Trust Level with Platforms | Price | Recommendation |
|---|---|---|---|---|
| Data Center Proxies | High (100+ Mbps) | Low (easily detected) | From $1-3/IP | Mass parsing with high rotation |
| Residential Proxies | Medium (20-50 Mbps) | High (real user IPs) | From $5-15/GB of traffic | Parsing protected marketplaces (Wildberries, Ozon) |
| Mobile Proxies | Medium (10-30 Mbps) | Maximum (mobile operators) | From $50-100/IP | Parsing with maximum protection, mobile versions of websites |
Data Center Proxies: When Speed is More Important than Anonymity
If you need to quickly collect a large volume of data from less protected platforms (for example, AliExpress or Yandex.Market), data center proxies are the optimal choice. They operate on servers of hosting providers, ensuring high page loading speeds.
The main drawback is that marketplaces can easily identify data center IPs and may block them when suspicious activity is detected. The solution is to use a large pool of IPs (from 50-100 addresses) and set up fast rotation: change IPs after every 10-15 requests.
Residential Proxies: The Golden Mean for Most Tasks
Residential proxies use IP addresses from real internet service providers assigned to ordinary users. For Wildberries or Ozon, such traffic appears completely legitimate — as if a buyer from Moscow, St. Petersburg, or Kazan is browsing the products.
This type of proxy is suitable for regular trend monitoring, when you collect data daily or several times a day. The cost is calculated based on traffic — for parsing 10,000 product cards, you will need about 5-10 GB depending on the volume of images and descriptions.
Mobile Proxies: Maximum Protection for Critical Tasks
Mobile proxies use IP addresses from cellular operators (MTS, Beeline, MegaFon). Marketplaces rarely block such addresses because thousands of real users can be behind one IP — operators use CGNAT technology (shared IP for multiple subscribers).
Mobile proxies make sense to use for parsing particularly protected sections of marketplaces or when you have already faced blocks using other types of proxies. They are also essential for collecting data from the mobile applications of Wildberries and Ozon, where protection is even stricter.
Features of Parsing Different Marketplaces: Wildberries, Ozon, AliExpress
Each marketplace uses its own protection system against parsing. Understanding these features will help you set up proxies as effectively as possible and avoid blocks.
Wildberries: Strict Protection and Geographical Binding
Wildberries uses one of the most advanced protection systems among Russian marketplaces. The platform analyzes not only the frequency of requests but also behavioral factors: time on the page, scrolling, clicks on elements. For successful parsing, it is necessary to mimic the actions of a real user.
An important feature is the geographical binding of prices and product availability. Wildberries shows different assortments for Moscow, regions, and remote areas. If you are collecting trend data for sales across Russia, use proxies from different regions: Moscow, St. Petersburg, Yekaterinburg, Novosibirsk, Krasnodar.
Practical Advice: For parsing Wildberries, use residential proxies with rotation every 50-100 requests. Be sure to add random delays of 2-5 seconds between requests and change the browser's User-Agent. This will minimize the likelihood of blocking.
Ozon: API for Partners and Protection of the Public Catalog
Ozon provides an official API for sellers, but it does not give access to competitor data. For trend analysis, you still have to parse the public catalog. Ozon's protection is less aggressive than Wildberries, but the platform actively uses CAPTCHA during suspicious activity.
A feature of Ozon is dynamic content loading via JavaScript. Simple HTTP requests will not work; you need a parser that supports JavaScript (Selenium, Puppeteer) or a headless browser. This increases the load on proxies, so expect higher traffic consumption — up to 15-20 GB for 10,000 product cards.
AliExpress: Mass Parsing with Regional Limitations
AliExpress shows different prices and delivery conditions depending on the user's country. For Russian sellers, it is critically important to use proxies with Russian IPs — otherwise, you will receive data for another region, distorting trend analysis.
AliExpress's protection is relatively lenient towards parsing — the platform is interested in traffic. You can use data center proxies with moderate rotation (every 100-200 requests). The main thing is not to exceed a speed of 5-10 requests per second from one IP.
Tools for Automating Trend Data Collection
There are two approaches to parsing marketplaces: ready-made services and self-setup parsers. Ready-made solutions are more expensive but save time. A custom parser requires technical knowledge but provides full control over the process.
Ready-Made Services for Marketplace Parsing
For those who do not want to deal with technical details, there are ready-made platforms. They are already configured for specific marketplaces, have a built-in proxy system, and automatic IP rotation.
- Mpstats — specializes in Wildberries and Ozon, collects data on sales, stock, and positions in search results. Cost from 3000 rubles per month.
- SellerFox — analytics for Wildberries with trend and niche tracking. Suitable for finding products with growing demand.
- Moneyplace — competitor monitoring on Ozon and Wildberries, tracking price and rating changes.
- ParseHub — a universal parser for any websites, including marketplaces. Requires setup but works with any platforms.
The main downside of ready-made services is that you pay not only for data but also for their proxy infrastructure. For large volumes of parsing, this can cost tens of thousands of rubles monthly.
Self-Setup Parser: Tools and Libraries
If you have basic technical skills (or a developer on your team), you can set up your own parsing system. This is cheaper when scaling and gives you full control over the process.
Popular tools for parsing:
- Selenium (Python) — browser automation, JavaScript support, easy proxy integration. Suitable for Wildberries and Ozon.
- Puppeteer (Node.js) — headless browser based on Chrome, faster than Selenium, lower memory consumption.
- Scrapy (Python) — a framework for parsing, suitable for simple sites without JavaScript. Fast but does not work with dynamic content.
- Playwright (Python/Node.js) — a modern alternative to Selenium, supports all browsers, built-in proxy handling.
For parsing marketplaces, we recommend Selenium or Playwright — they correctly handle JavaScript and allow you to mimic real user actions (scrolling, clicks, delays).
Step-by-Step Proxy Setup for Product Parsers
Proper proxy setup is a key factor for success. Even the best residential proxies won't save you from blocking if the rotation is incorrectly configured or if you exceed request limits. We will break down the setup process using popular tools as an example.
Step 1: Obtain Proxy Data and Check Functionality
After purchasing proxies, you receive a list in the format: IP:PORT:LOGIN:PASSWORD. Before setting up the parser, be sure to check the functionality of each proxy.
The simplest way to check is to open a browser, configure the proxy in the network settings, and visit an IP check website (for example, 2ip.ru or whoer.net). Ensure that the proxy IP is displayed, not your real address. Also, check the loading speed — if pages take longer than 5 seconds to open, the proxy is of poor quality.
Step 2: Configure Proxy in the Parser (Using Selenium as an Example)
If you are using Selenium for parsing, the proxy setup looks as follows. You create a list of proxies in a separate file, then the parser randomly selects a proxy from the list for each session.
The basic logic is: the parser launches the browser with the configured proxy, makes 50-100 requests (views product cards), then closes the session and starts a new one with a different proxy. This mimics the behavior of different users and reduces the risk of blocking.
Step 3: Configure IP Rotation
Proxy rotation is the automatic change of the IP address at certain intervals. There are two approaches: time-based rotation (every 5-10 minutes) and request-based rotation (every 50-100 requests).
For parsing marketplaces, we recommend request-based rotation — it is more predictable. If you are parsing Wildberries, change the IP every 50 requests. For less protected platforms (AliExpress), you can increase it to 200-300 requests per IP.
Important: Some proxy providers offer automatic rotation on their side — you receive one endpoint (address:port), and the IP changes automatically with each request or on a timer. This simplifies setup but gives less control over the process.
Step 4: Configure Delays Between Requests
Even with proxy rotation, you cannot send requests in a continuous stream. A real user spends time viewing a product card, reading reviews, and comparing prices. Your parser should mimic this behavior.
Optimal delays for different marketplaces:
- Wildberries: 2-5 seconds between requests, random variation ±1 second
- Ozon: 3-7 seconds (due to CAPTCHA during fast requests)
- AliExpress: 1-3 seconds (more lenient protection)
Use random delays rather than fixed ones. If each request is made exactly every 3 seconds — this is also a sign of a bot. Add randomness: from 2 to 5 seconds with a uniform distribution.
IP Rotation and Request Limits: How to Avoid Bans
Even with proper proxy setup, you can still get blocked if you do not consider the peculiarities of anti-parsing systems. Marketplaces analyze not only the frequency of requests but also behavioral patterns.
Request Limits for Different Types of Proxies
Each type of proxy has its own safe usage limits. Exceeding these limits sharply increases the likelihood of blocking.
| Proxy Type | Requests per IP per Hour | Requests per IP per Day | Recommended Rotation |
|---|---|---|---|
| Data Centers | 50-100 | 300-500 | Every 10-20 requests |
| Residential | 100-200 | 1000-2000 | Every 50-100 requests |
| Mobile | 200-300 | 2000-3000 | Every 100-200 requests |
These figures are approximate. Actual limits depend on the specific marketplace and the time of day. During peak hours (evening, weekends), you can increase activity since there are more real users on the platform.
Rotation Strategies for Different Parsing Volumes
The rotation strategy depends on how much data you need to collect. For monitoring the top 100 products in a category, a simple scheme is sufficient. For parsing the entire catalog (tens of thousands of items), a more complex system is needed.
Small Volume (up to 1000 products per day): Use 5-10 residential proxies with rotation every 100 requests. This is enough for monitoring trends in 2-3 categories.
Medium Volume (1000-10000 products per day): A pool of 20-50 residential proxies, rotation every 50 requests. Add random pauses of 1-2 hours between parsing sessions.
Large Volume (10000+ products per day): A combination of residential (for critical requests) and data center proxies (for mass collection). Use 100+ proxies with aggressive rotation and load distribution over time.
What Data to Collect for Trend Analysis
Parsing for the sake of parsing makes no sense. It is important to collect the right metrics that will help identify trending products before the niche becomes saturated with competitors.
Key Metrics for Identifying Trends
For each product card, collect the following data:
- Product Name and Article Number — for identification and tracking dynamics
- Price (current and discounted) — trends often start with a sharp drop in prices
- Number of Reviews — an increase in reviews over a week indicates rising sales
- Average Rating — products with a rating of 4.5+ become trending faster
- Number of Orders (if available) — a direct indicator of demand
- Stock Levels — a sharp decrease in stock = increased demand
- Position in Search Results for Key Queries — products in the top 10 receive 80% of clicks
- Date of Product Appearance — new products with rapid sales growth = potential trend
Collect this data daily and save it in a database (PostgreSQL, MySQL) or Google Sheets for simple projects. Analyzing dynamics over 7-14 days will reveal products with growing demand.
How to Identify Trends at an Early Stage
Successful sellers profit from trends precisely because they enter the niche before competitors. When a trend is already being discussed in Telegram channels, it is too late to profit from it — margins drop due to competition.
Signs of an Emerging Trend:
- A 50-100% increase in the number of reviews over a week with a small base (10-50 reviews)
- The appearance of 5-10 new sellers in the niche over the past 2 weeks
- A sharp decrease in stock levels for category leaders (from 1000+ to 100-200 units)
- An increase in positions in search results: the product rose from 50th to 10th position in a week
- Mentions of the product on social media (TikTok, Instagram) — an indirect sign
Set up automatic notifications (Telegram bot, email) when such signals are detected. This will give you a 1-2 week head start over the main mass of competitors.
Common Mistakes in Parsing and How to Avoid Them
Most blocks during parsing occur due to the same mistakes. Let's discuss the most common problems and their solutions.
Mistake 1: Using One IP for All Requests
Beginners often buy 1-2 proxies and try to parse the entire catalog through them. The result is predictable — blocking within an hour. Marketplaces easily identify bots due to abnormal activity from one IP.
Solution: Use at least 10-20 proxies even for small projects. Distribute the load evenly — no more than 100-200 requests per IP per hour.
Mistake 2: Parsing at Night
Many launch parsers at night to get fresh data by morning. The problem is that at night (from 2 to 6 AM Moscow time), marketplaces have minimal traffic. Your activity becomes more noticeable against the backdrop of low overall load.
Solution: Run parsing during peak hours — from 6 PM to 11 PM, when there are maximum real users on the platform. Your requests will blend into the overall traffic flow.
Mistake 3: Ignoring User-Agent and Other Headers
Parsers by default send requests with User-Agent like "Python-requests/2.28" or "Selenium WebDriver". This is a direct indication of a bot. Marketplaces automatically block such requests.
Solution: Use realistic User-Agents of modern browsers. Change the User-Agent with each proxy rotation. Also, add headers like Accept-Language, Referer, and others typical for real browsers.
Mistake 4: Parsing Only the First Page of Results
Many limit themselves to collecting data on the top 50 products in a category. This is a mistake — trends often emerge on pages 3-5 of the results, where competition is lower and products are just starting to gain popularity.
Solution: Parse at least the first 5-10 pages of results (200-500 products in a category). Track products that quickly rise from page 5 to 1-2 — these are the emerging trends.
Mistake 5: Lack of CAPTCHA and Blocking Handling
Even with proper proxy setup, CAPTCHA or temporary blocks may still appear. If the parser cannot handle such situations, it will simply crash with an error, and you will lose data.
Solution: Add error handling to the parser. Upon receiving a CAPTCHA — switch to another proxy and repeat the request after 5-10 minutes. Save intermediate results to avoid losing data during a failure.
Conclusion
Collecting data on trending products through proxies is not just a technical process but a competitive advantage for marketplace sellers. While some manually monitor competitors, you receive structured data on tens of thousands of products daily and identify trends at an early stage.
Key points to remember: choose the type of proxy based on the protection level of the marketplace (residential for Wildberries and Ozon, data center proxies for less protected platforms), set up proper IP rotation considering request limits, add random delays between requests, and mimic real user behavior, collect data during peak hours when your activity is less noticeable against the overall traffic.
Start small — set up parsing for 1-2 product categories using 10-20 proxies. Refine the process, ensure there are no blocks, and gradually scale the system. Automating data collection pays off in the first month due to faster entry into trending niches.
If you plan to regularly collect data from Wildberries, Ozon, or other protected marketplaces, we recommend using residential proxies — they provide a high level of trust from platforms and minimal risk of blocking. For mass parsing of less protected sites, data center proxies with proper rotation setup will suffice.