Back to Blog

Proxies for Scraping AliExpress: How to Collect Product Data Without Getting Blocked

We discuss how to set up secure parsing of AliExpress catalogs through proxies: what types of IPs to use, how to avoid blocks, and how to automate data collection on products and prices.

📅January 23, 2026

AliExpress actively fights against automated data collection — parsers receive captchas, temporary IP bans, and authorization requests. If you are monitoring competitors' prices, looking for trending products for dropshipping, or collecting a database for a marketplace, without properly configured proxies, the work turns into a constant struggle against blocks.

In this guide, we will discuss how to choose a proxy for parsing AliExpress, set up IP address rotation, bypass anti-bot systems, and automate data collection on products, prices, and reviews without the risk of getting banned.

Why AliExpress Blocks Parsing and How It Works

AliExpress uses a multi-layered protection system against automated data collection. The platform loses money when competitors mass-copy catalogs, and servers become overloaded with bots. Therefore, the protection is constantly being improved and becoming more aggressive.

Main methods of detecting parsers:

  • Request frequency from one IP — if 50+ requests come from one address in a minute, the system automatically shows a captcha or temporarily blocks the IP for 30-60 minutes.
  • Behavior analysis — bots open pages too quickly (0.5-1 second), do not move the mouse, do not scroll, and do not click on interface elements.
  • Absence of cookies — normal users accumulate cookies when visiting the site, parsers often work with a clean session.
  • Suspicious User-Agent — old versions of browsers, server libraries (Python-requests, curl), absence of mobile devices in the statistics.
  • Browser fingerprint — AliExpress collects a fingerprint: screen resolution, time zone, installed fonts, WebGL, Canvas. Identical fingerprints from different IPs are a sign of a bot.

When the system detects suspicious activity, it applies a gradation of blocks: first, it shows a captcha, then a temporary IP ban for 1-2 hours, and with repeated violations — a ban for a day or permanently.

Important: AliExpress uses Cloudflare and its own anti-bot system. They analyze not only the IP but also the TLS fingerprint (protocol version, cipher order) — even with proxies, you can get banned if you use outdated HTTP clients.

What Types of Proxies Are Suitable for Parsing AliExpress

The choice of proxy type depends on the volume of parsing, budget, and data quality requirements. Let's discuss each type with real usage scenarios.

Proxy Type Speed Risk of Blocking When to Use
Datacenter Proxies High (50-150 ms) High Fast parsing of public data with frequent IP rotation
Residential Proxies Medium (200-500 ms) Low Long-term parsing, data collection with authorization
Mobile Proxies Medium (300-700 ms) Very Low Parsing from the mobile version, bypassing strict blocks

Datacenter Proxies for Fast Parsing

Suitable when you need to quickly collect a large volume of data: prices for 10,000+ products, category characteristics, seller lists. The response speed of 50-150 ms allows making 5-10 requests per second from one IP.

Usage Scenario: You have a dropshipping store on Shopify and need to update prices for 5,000 products from AliExpress daily. You purchase a pool of 50-100 datacenter IPs with rotation every 10-15 requests. In 2-3 hours, you collect all the data, and the cost of proxies is $50-100 per month.

Cons: AliExpress knows the ranges of datacenter IPs and treats them suspiciously. Aggressive rotation is needed (changing IP every 5-10 requests) and behavior emulation (random delays of 2-5 seconds between requests).

Residential Proxies for Stable Parsing

Residential proxies have IPs of real home users — providers assign them to individuals. AliExpress cannot distinguish a request through such a proxy from a request of a regular buyer. This reduces the risk of blocking by 5-10 times compared to datacenters.

Usage Scenario: You monitor competitors' prices for your store on Ozon. You need to check 200-300 products daily, comparing prices on AliExpress and with Russian suppliers. You use 10-20 residential IPs with rotation every 50-100 requests. Parsing takes 30-40 minutes, and there are no blocks for months.

Pros: You can work from one IP longer (100-200 requests instead of 10-20), fewer captchas, the ability to authorize and work with the seller's personal account.

Mobile Proxies for Bypassing Strict Blocks

Mobile IPs (3G/4G/5G operators) have maximum trust — AliExpress cannot block entire subnets of mobile operators, as this would block millions of real buyers. One mobile IP can be used by hundreds of devices (NAT), so even aggressive parsing looks like activity from different users.

Usage Scenario: You have already been banned using residential IPs in a certain region, and you urgently need to collect data for a report to a client. You take 2-3 mobile proxies and parse through the mobile version of the site (m.aliexpress.com). Even with aggressive parsing (1 request per second), there are no blocks.

Cons: 2-3 times more expensive than residential proxies, lower speed (300-700 ms delay), IP may change when reconnecting to the operator.

IP Rotation Settings: Change Frequency and Timeouts

Proper IP rotation is key to long-term parsing without blocks. Too frequent changes look suspicious and waste proxies, while too rare changes lead to bans.

Recommended Rotation Frequency by Proxy Type

Proxy Type Requests per 1 IP Delay between Requests Session Lifetime
Datacenters 5-15 requests 2-5 seconds 1-3 minutes
Residential 50-150 requests 3-8 seconds 10-30 minutes
Mobile 100-300 requests 1-3 seconds 30-60 minutes

Rotation Strategies for Different Tasks

1. Fast Catalog Parsing (10,000+ products in an hour)

  • Use a pool of 100-200 datacenter IPs
  • Rotate every 5-10 requests
  • Parallel threads: 10-20 simultaneous requests from different IPs
  • Delay between requests: 1-2 seconds (simulating a fast user)
  • If you receive a captcha on an IP — exclude it from the pool for 2-3 hours

2. Daily Price Monitoring (500-1000 products)

  • Use 10-20 residential IPs
  • Rotate every 50-100 requests
  • Sequential requests with a delay of 3-5 seconds
  • Save cookies between requests from one IP
  • Simulate behavior: occasionally open the homepage, categories

3. Parsing with Authorization (Seller's Personal Account)

  • One residential or mobile IP per account
  • No rotation during the session (30-60 minutes)
  • Delay of 5-10 seconds between requests
  • Full browser emulation: saving cookies, localStorage, fingerprint

Tip: Add randomness to delays. Instead of fixed 3 seconds, use a range of 2-5 seconds. This makes the request pattern less predictable for anti-bot systems.

Bypassing Anti-Bot Systems: User-Agent, Cookies, and Fingerprint

Changing IP only solves part of the problem. AliExpress analyzes dozens of request parameters and behavior to distinguish a bot from a human. Let's discuss what needs to be configured in addition to proxies.

User-Agent and HTTP Headers

The User-Agent informs the server which browser and operating system are making the request. Parsers often use default values from libraries (Python-requests/2.28.0), which are instantly detectable.

Correct User-Agent Configuration:

  • Use up-to-date versions of popular browsers: Chrome 120+, Firefox 121+, Safari 17+
  • Change User-Agent when rotating IP — one IP should not show different browsers
  • Add mobile User-Agents in a ratio of 40-50% (half of AliExpress traffic is from mobile devices)
  • Copy the full set of headers from a real browser: Accept, Accept-Language, Accept-Encoding, Connection, Upgrade-Insecure-Requests

Example of Correct Headers for Desktop:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1

Example for Mobile Device:

User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br

Working with Cookies and Sessions

AliExpress sets cookies on the first visit: session ID, language and currency settings, tracking tokens. Parsers without cookies look suspicious — a normal user accumulates them while navigating the site.

Correct Handling of Cookies:

  • Before parsing, open the homepage and save all cookies
  • Use these cookies for all subsequent requests from the same IP
  • When changing IP — start a new session with new cookies
  • Save cookies between parser runs — this simulates a returning user
  • Update cookies every 1-2 hours (open the homepage again)

Browser Fingerprint and TLS Fingerprint

Modern anti-bot systems collect a digital fingerprint of the browser — a combination of dozens of parameters that uniquely identifies the device. Even from different IPs, the same fingerprint reveals a bot.

What is included in the browser fingerprint:

  • Screen resolution and color depth
  • Time zone and system language
  • List of installed fonts
  • WebGL and Canvas fingerprint (unique way of rendering graphics)
  • Audio context (AudioContext fingerprint)
  • List of browser plugins
  • Support for WebRTC, Battery API, and other modern APIs

Simple HTTP libraries (requests, axios, curl) do not have these parameters — they operate at the protocol level without rendering. For serious parsing, tools with a full browser are needed.

Solutions for Browser Emulation:

  • Selenium + undetected-chromedriver — runs real Chrome with modifications to bypass detection
  • Puppeteer + puppeteer-extra-plugin-stealth — Node.js library with plugins for masking automation
  • Playwright — a modern alternative to Selenium with better performance
  • Anti-detect browsers — Dolphin Anty, AdsPower, Multilogin (for working through the interface)

Important: The TLS fingerprint (SSL connection fingerprint) is also analyzed. Old versions of Python and Node.js use outdated cipher suites that reveal a bot. Use up-to-date versions of libraries or curl_cffi to simulate modern browsers.

Ready-Made Tools for Parsing AliExpress

Writing a parser from scratch makes sense only for specific tasks. For standard data collection (products, prices, reviews), there are ready-made solutions that save weeks of development.

Commercial Services with API

1. ScraperAPI (scrape.do, scrapingbee.com)

Cloud services that handle all work with proxies and bypassing protection. You send them the URL of an AliExpress product, and they return HTML or JSON with the data.

  • Pros: no need for your own proxies, automatic captcha bypass, ready-made parsers for popular sites
  • Cons: expensive for large volumes (from $50 for 100K requests), dependency on a third-party service
  • When to Use: one-time tasks, prototyping, small volumes (up to 10K products per month)

2. Bright Data (luminati.io)

The largest proxy provider with its own parsing tools. They provide not only proxies but also ready-made datasets from AliExpress (updated product databases).

  • Pros: huge pool of IPs (72+ million residential), infrastructure for enterprise clients
  • Cons: very expensive (from $500 per month), complex pricing
  • When to Use: large businesses with a budget, constant parsing of large volumes

Open-Source Solutions

1. Scrapy + scrapy-rotating-proxies

A popular framework for parsing in Python. Supports asynchronous requests, automatic proxy rotation, and export to CSV/JSON/database.

Example of proxy configuration in Scrapy:

# settings.py
ROTATING_PROXY_LIST = [
    'http://user:pass@proxy1.example.com:8000',
    'http://user:pass@proxy2.example.com:8000',
    'http://user:pass@proxy3.example.com:8000',
]

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
    'scrapy_rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
    'scrapy_rotating_proxies.middlewares.BanDetectionMiddleware': 620,
}

# Settings for bypassing bans
ROTATING_PROXY_PAGE_RETRY_TIMES = 5
ROTATING_PROXY_BACKOFF_BASE = 300  # ban time for proxy in seconds

2. Puppeteer + puppeteer-extra-plugin-stealth

For sites with aggressive protection (like AliExpress), a full browser is needed. Puppeteer controls Chrome through the DevTools Protocol, and the stealth plugin masks signs of automation.

// parser.js
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

(async () => {
  const browser = await puppeteer.launch({
    args: [
      '--proxy-server=http://proxy.example.com:8000',
      '--no-sandbox',
      '--disable-setuid-sandbox'
    ]
  });

  const page = await browser.newPage();
  
  // Proxy authorization
  await page.authenticate({
    username: 'user',
    password: 'pass'
  });

  // Set realistic viewport
  await page.setViewport({
    width: 1920,
    height: 1080,
    deviceScaleFactor: 1
  });

  // Parsing product
  await page.goto('https://www.aliexpress.com/item/1234567890.html', {
    waitUntil: 'networkidle2'
  });

  const productData = await page.evaluate(() => {
    return {
      title: document.querySelector('.product-title-text')?.innerText,
      price: document.querySelector('.product-price-value')?.innerText,
      rating: document.querySelector('.overview-rating-average')?.innerText
    };
  });

  console.log(productData);
  await browser.close();
})();

Desktop Applications for Non-Technical Users

1. Octoparse

A visual parser without code — you click on page elements, and the program remembers the structure and collects data. Built-in proxy support and task scheduler.

  • Pros: no programming needed, works with dynamic content, cloud version for background work
  • Cons: limitations in the free version (10K rows per month), sometimes struggles with complex protection
  • Price: from $75 per month for the Standard plan

2. ParseHub

An alternative to Octoparse with a simpler interface. Works well with AliExpress thanks to built-in templates for popular sites.

  • Pros: free plan for 200 pages, easy proxy setup
  • Cons: slow performance in the free version, lacks advanced features (API, webhooks)

Geo-Targeting: How to Parse Prices for Different Countries

AliExpress shows different prices, assortments, and delivery conditions depending on the user's country. If you are working with international dropshipping or comparing prices for different markets, you need proxies from specific regions.

How AliExpress Determines the User's Country

The platform uses several data sources:

  • IP Address — the main method, determines the country by IP geolocation
  • Cookies — saves the selected country in aep_usuc_f (can be spoofed)
  • Accept-Language Header — browser language, but not a determining factor
  • Currency in URL — parameters ?currency=USD or subdomains (ru.aliexpress.com)

For reliable price parsing for a specific country, you need to use proxies from that region. Spoofing only cookies does not always work — AliExpress prioritizes IP geolocation.

Popular Regions for Parsing and Their Features

Country Price Features Why Parse
USA Prices in USD, often lower than in Europe Dropshipping in the USA, comparison with Amazon
Russia Prices in RUB, considering duties and VAT Comparison with Wildberries, Ozon
Germany Prices in EUR, fast delivery from EU warehouses Dropshipping in Europe, eBay.de
Brazil High prices due to duties, but high demand Local e-commerce (Mercado Livre)

Setting Up Geo-Targeting via Proxies

Most providers of residential and mobile proxies allow you to choose the country (and even city) through connection parameters or API.

Example of Choosing a Country via Proxy Username:

# Format: username-country-country_code
proxy_us = "http://username-country-us:password@gate.example.com:8000"
proxy_de = "http://username-country-de:password@gate.example.com:8000"
proxy_br = "http://username-country-br:password@gate.example.com:8000"

# Parsing price for the USA
response_us = requests.get(
    "https://www.aliexpress.com/item/1234567890.html",
    proxies={"http": proxy_us, "https": proxy_us}
)

# Parsing price for Germany
response_de = requests.get(
    "https://www.aliexpress.com/item/1234567890.html",
    proxies={"http": proxy_de, "https": proxy_de}
)

Additionally, adjust headers for the region:

  • Accept-Language: en-US for the USA, de-DE for Germany, pt-BR for Brazil
  • Use the appropriate subdomain: ru.aliexpress.com for Russia, de.aliexpress.com for Germany
  • Check the currency in the response — if you see the wrong currency, it means geo-targeting did not work

Common Mistakes When Parsing and How to Avoid Them

Even with the right proxies and settings, you can get blocked due to errors in the parsing logic. Let's discuss common problems and solutions.

Error 1: Too Aggressive Parsing

Problem: The parser makes 100 requests per minute from one IP, trying to collect data faster. AliExpress detects this as a DDoS attack and blocks the IP.

Solution: Add delays and limit the number of requests. For residential proxies, a safe speed is 10-20 requests per minute from one IP (1 request every 3-6 seconds). It's better to parse longer than to lose proxies.

Error 2: Ignoring Captchas and Errors

Problem: The parser receives a page with a captcha but continues to parse it as regular content. As a result — thousands of empty records in the database.

Solution: Check the server response before parsing. If the HTML contains the words "captcha", "Access Denied", or response code 403/429 — stop using this IP for 1-2 hours.

def is_blocked(html):
    blocked_keywords = ['captcha', 'access denied', 'too many requests']
    return any(keyword in html.lower() for keyword in blocked_keywords)

response = requests.get(url, proxies=proxy)
if is_blocked(response.text):
    print(f"Proxy {proxy} is blocked, switching...")
    # Exclude the proxy from the pool for 2 hours
    blocked_proxies[proxy] = time.time() + 7200
    continue

Error 3: Parsing Outdated Data

Problem: AliExpress caches pages through CDN (Cloudflare). The parser receives data that is 2-3 hours old instead of current prices.

Solution: Add a random parameter to the URL to bypass the cache, or use the header Cache-Control: no-cache.

import random
import time

# Add timestamp to URL to bypass cache
url = f"https://www.aliexpress.com/item/1234567890.html?_t={int(time.time())}"

# Or use the header
headers = {
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache'
}

Error 4: Incorrect Handling of Dynamic Content

Problem: Prices and product characteristics on AliExpress are loaded via JavaScript after the page loads. A simple HTTP request receives an empty HTML template without data.

Solution: Use a headless browser (Selenium, Puppeteer, Playwright) that executes JavaScript and waits for the full content to load. Or find an API endpoint that returns data in JSON — it is often accessible through DevTools in Network.

Error 5: Lack of Logging and Monitoring

Problem: The parser runs for a week, collects data, but no one checks the quality. It turns out that 30% of records are empty due to changes in the site's structure.

Solution: Log all important events — successful requests, errors, proxy blocks, changes in data structure. Set up alerts when the number of errors rises above 10%.

Checklist Before Launching the Parser:
✅ Delays between requests are set (3-8 seconds for residential proxies)
✅ IP rotation works (no more than 50-100 requests per IP)
✅ User-Agent is current and changes with IP
✅ Cookies are saved and reused
✅ There is a check for captchas and blocks
✅ Logging and monitoring are set up
✅ Test run on 100 products was successful

Conclusion

Parsing AliExpress requires a comprehensive approach: the right proxies are only part of the solution. Proper IP rotation, emulation of a real browser, working with cookies and fingerprints, as well as constant monitoring of data quality are necessary. Too aggressive parsing will lead to blocks even with expensive proxies, while proper configuration will allow data collection for months without issues.

For most tasks (monitoring competitors' prices, collecting catalogs for dropshipping, analyzing trends), the optimal choice is residential proxies with rotation every 50-100 requests. They provide a balance between speed and trust level from AliExpress. If the budget is limited and high speed is needed — start with datacenter proxies, but be prepared for more frequent blocks and the need for aggressive rotation.

Remember: the quality of proxies is more important than their quantity. 10 quality residential IPs with the right configuration will yield better results than 100 cheap datacenter proxies with a high block rate. Invest time in setting up browser emulation, logging, and monitoring — it will pay off with stable parser operation without constant issues with captchas and bans.