You have set up a scraper, launched data collection β and within minutes you receive a page with a CAPTCHA or an empty response. Most likely, the site is protected by DataDome. This is one of the most aggressive anti-bot systems on the market, and regular data center proxies won't help here. In this article, we will examine how exactly DataDome detects bots and which types of proxies yield results.
What is DataDome and where is it used
DataDome is a commercial SaaS bot protection platform used by large online stores, news portals, marketplaces, and booking services worldwide. The company was founded in 2015 and currently protects thousands of sites with a total audience of billions of requests per day.
Among DataDome's clients are platforms such as Reddit, Foot Locker, Rakuten, AngelList, and many other large resources. If you are engaged in competitor price monitoring, scraping product cards, collecting data from foreign marketplaces, or aggregating news β there is a high probability that you have already encountered this system.
Characteristic signs that a site is protected by DataDome:
- A CAPTCHA page appears after several consecutive requests
- The server response contains the header
x-datadome-cid - Redirect to the domain
geo.captcha-delivery.com - HTTP response 403 or 429 for frequent requests from a single IP
- JavaScript challenge on the first visit (the "browser check" page)
DataDome operates in real-time: each incoming request is analyzed in milliseconds. The system decides whether to allow the user, show a CAPTCHA, or block them β even before the server delivers the main content of the page. This is why bypassing it is more difficult than simple IP blocks.
How DataDome identifies bots: protection mechanisms
To understand which proxies work, it is necessary to figure out what exactly DataDome analyzes. The system uses a multi-layered approach β no single factor is the sole criterion for blocking. The decision is made based on a combination of signals.
1. IP address reputation
The first thing DataDome checks is the reputation of the IP address against external and internal databases. The system instantly determines whether the IP belongs to a data center (AWS, Google Cloud, Hetzner, DigitalOcean), a VPN provider, or is a real residential/mobile address. IPs from data centers automatically receive a high "suspicion score" even before behavior analysis.
2. Behavioral analysis
DataDome tracks behavior patterns: request speed, sequence of page visits, time between clicks, mouse movement (if JavaScript is present). A real user takes breaks, follows logical paths, and sometimes goes back. A bot typically makes requests at constant intervals, to strictly defined URLs, with no "random" deviations.
3. JavaScript fingerprint
If the request is made through a browser (or a headless browser like Puppeteer/Playwright), DataDome runs a JavaScript script that collects the "fingerprint" of the environment: browser version, installed fonts, screen resolution, WebGL support, canvas fingerprint, presence of plugins. Headless browsers without additional masking are easily identified by their characteristic parameters.
4. HTTP headers
The request headers are analyzed: User-Agent, Accept-Language, Accept-Encoding, Referer, sec-ch-ua, and others. A mismatch between the declared User-Agent and the actual request parameters is a strong bot signal.
5. Real-time machine learning
All collected signals are processed by an ML model trained on a vast dataset of real users and bots. The model is constantly updated β what worked a month ago may not work today. This is why static solutions quickly become obsolete.
Why data center proxies fail against DataDome
This is the most common question from those who are just starting to work with protected sites. Data center proxies are cheap, fast, and have high uptime. It seems like the perfect choice for scraping. But against DataDome, they are practically useless.
The reason is simple: DataDome maintains and uses ASN (autonomous system) databases of all major hosting providers. When a request comes from an IP address belonging to, for example, an Amazon Web Services or OVH subnet, the system immediately assigns it a "suspicious" status. Even if your scraper perfectly mimics human behavior β an IP from a data center already puts you at risk.
β οΈ Important to understand
Data center proxies are great for tasks where protection is weak or absent: scraping open data, working with APIs without anti-bot systems, speed testing. But for sites with DataDome, they result in a block in 90%+ of cases already in the first dozens of requests.
Another problem is "burned" IPs. If thousands of users before you have used the same IP address for bot activity (and this is normal in pools of cheap data centers), DataDome already has a negative history for that address. Even the first request from such an IP may get blocked.
Residential proxies: the main tool for bypassing DataDome
Residential proxies are IP addresses that belong to real home internet users. They are issued by internet service providers (Rostelecom, Comcast, Deutsche Telekom, etc.) and from DataDome's perspective, they look like ordinary people sitting at home behind a computer.
This is why residential proxies are the primary working tool for scraping sites with DataDome. They pass the initial reputation check, giving you a "credit of trust" for further work.
What to consider when choosing residential proxies for DataDome
| Parameter | What is important | Why this is critical |
|---|---|---|
| Rotation type | Rotation on each request or session 5-30 minutes | DataDome tracks IP history β too frequent changes are also suspicious |
| Geolocation | IP from the country of the target site | Requests from another country are an additional signal of suspicion |
| Pool size | Millions of IPs, not thousands | A small pool burns out quickly β DataDome remembers active addresses |
| Sticky sessions | Ability to hold one IP for 10-30 minutes | For multi-page scraping, one session must look like one user |
| Speed | At least 5-10 Mbps per connection | Slow proxies increase request time, affecting timing |
An important point: residential proxies do not guarantee 100% bypass of DataDome by themselves. They solve the IP reputation problem, but if your scraper makes 100 requests per minute from one address or sends incorrect headers β DataDome will still block you. The IP is just one layer of protection.
Mobile proxies: when maximum trust is needed
Mobile proxies are IP addresses from mobile operators (4G/5G networks). They have a unique property: one mobile operator's IP address can be used by thousands of real users simultaneously through NAT. DataDome knows this β and therefore treats mobile IPs with maximum trust.
Blocking a mobile IP means blocking potentially thousands of real customers of the operator β no normal site would do that. This is why mobile proxies provide the highest percentage of successful requests to sites with DataDome.
When to choose mobile proxies over residential ones:
- The site is very aggressively protected β residential proxies result in blocks even at low request frequencies
- You are scraping the mobile version of the site β mobile IP + mobile User-Agent look organic
- Need to work with applications β if scraping a mobile API, mobile IP logically corresponds to the request
- Long-term sessions β mobile proxies maintain sessions well without changing IP
The downside of mobile proxies is that they are more expensive than residential ones and usually have a smaller pool of IPs. For large-scale scraping with thousands of requests per hour, this can become a limitation. In such cases, the optimal strategy is to use mobile proxies for "reconnaissance" and complex pages, and residential ones for mass data collection.
Rotation and delay strategy: how not to get caught even with good proxies
Even with residential or mobile proxies, you can get blocked if you do not build your request strategy correctly. DataDome analyzes behavior at the session level β and anomalous patterns raise suspicion regardless of the quality of the IP.
Safe scraping rules through DataDome
β Safe scraping checklist
- Delays between requests: from 3 to 15 seconds (random, not fixed)
- No more than 20-30 requests from one IP per session
- Sticky session: keep one IP for one "user path"
- Start with the homepage, then move to target URLs
- Imitate real navigation: homepage β category β product
- Use proxy geolocation that matches the site language
- Change IP after each session or after a block
- Do not launch parallel requests from one IP
Rotation: when to change IP
There is no universal answer here β it all depends on the specific site. But the general logic is this: DataDome remembers the activity of an IP in a sliding window (usually 10-60 minutes). If a suspiciously high number of requests come from one address during that time β the IP receives a temporary ban.
The optimal strategy is to rotate IPs not by timer, but by the number of requests. For example: 15-25 requests β change IP β pause 30-60 seconds β new session. This approach imitates the behavior of different users, each of whom visited several pages and left.
Headers and fingerprint: what else DataDome checks besides IP
Good proxies are a necessary but not sufficient condition for bypassing DataDome. The system analyzes the entire request as a whole. If the IP is residential, but the headers indicate a bot β blocking will still occur.
Critically important headers
Hereβs what DataDome checks in HTTP headers and what to pay attention to:
| Header | What is checked | Typical mistake |
|---|---|---|
User-Agent |
Current browser version | Outdated UA or UA from Python libraries |
Accept-Language |
Language matches the proxy geo | Proxy from the USA, but language is ru-RU |
sec-ch-ua |
Matches User-Agent | Missing header when Chrome is declared |
Referer |
Logical chain of transitions | Direct request to a deep page without Referer |
Accept-Encoding |
Standard browser set | Absence or non-standard set |
Cookie |
Saving DataDome session cookies | Ignoring Set-Cookie from DataDome |
Special attention should be paid to DataDome cookies. Upon the first request, the system sets its cookie (usually called datadome). If your scraper does not save and send this cookie in subsequent requests β DataDome perceives each request as the first visit of a new user, which is itself suspicious at a high frequency.
TLS fingerprint
DataDome's advanced protection also analyzes the TLS fingerprint β characteristics of the SSL/TLS handshake. Different HTTP libraries (requests, curl, axios) have characteristic sets of cipher suites and TLS extensions that differ from those of browsers. If you use the standard Python requests library β its TLS fingerprint is easily identifiable. The solution is to use libraries that simulate browser TLS (for example, curl-impersonate or specialized solutions).
Tools for working with DataDome-protected sites
Choosing the right tool for scraping is just as important as choosing proxies. Different tasks require different approaches. Let's consider the main options in terms of compatibility with DataDome.
Browser automation (Puppeteer, Playwright)
Headless browsers should theoretically work well with DataDome, as they execute JavaScript and form a "real" fingerprint. In practice, standard Puppeteer or Playwright are easily identified by their characteristic parameters: navigator.webdriver = true, absence of plugins, non-standard WebGL values. Additional masking is needed to bypass them using plugins like puppeteer-extra-plugin-stealth.
Anti-detect browsers
For tasks that require full interaction with the site (not just scraping but also interaction), anti-detect browsers are the optimal choice. Dolphin Anty, AdsPower, GoLogin, and Multilogin create complete browser profiles with realistic fingerprints. In conjunction with residential or mobile proxies, they provide the highest level of bypassing DataDome.
The connection scheme in an anti-detect browser is standard: create a profile β specify the proxy type (HTTP/SOCKS5), host, port, username, and password from the proxy service in the settings β launch the profile. Each profile operates in an isolated environment with a unique fingerprint.
Specialized scraping services
There are ready-made services (ScrapingBee, Apify, Bright Data Scraping Browser) that handle all the work of bypassing protections β you simply provide the URL and receive HTML. They use their own pools of residential proxies and automatically solve CAPTCHAs. The downside is the high cost for large volumes and less control over the process.
Comparison of approaches
| Tool | Effectiveness against DataDome | Setup complexity | Scalability |
|---|---|---|---|
| HTTP parser + residential proxies | Average | Low | High |
| Puppeteer/Playwright + stealth + proxies | High | Medium | Medium |
| Anti-detect browser + mobile proxies | Very high | Low | Low |
| Ready-made scraping services | High | Very low | High (expensive) |
| Data center proxies (any tool) | Very low | β | β |
Practical scenario: price monitoring on a protected site
Suppose you are monitoring competitor prices on a foreign marketplace protected by DataDome. You need to collect data on 5000 products every 6 hours. Hereβs the optimal scheme:
- Tool: Playwright with the stealth plugin (automatically solves JS challenges)
- Proxies: Residential with rotation, geolocation β country of the target site
- Session: Sticky for 15 minutes, 20 requests per IP
- Headers: Current Chrome User-Agent, correct Accept-Language
- Cookies: Saving and transmitting DataDome cookies between requests of one session
- Delays: Random from 4 to 12 seconds between requests
- Session start: Always start from the homepage, then move to products
With this setup, the success rate of requests is 85-95%, which is quite sufficient for regular monitoring. The remaining 5-15% β repeat requests through another IP.
Conclusion and recommendations
DataDome is a serious protection system, but not insurmountable. The key to successful work with sites under its protection is a comprehensive approach: the right type of proxy, correct headers, realistic behavior, and a well-thought-out rotation strategy.
The main conclusions of the article:
- Data center proxies do not work against DataDome β they are blocked at the IP reputation level
- Residential proxies are the basic tool for most scraping tasks
- Mobile proxies provide maximum trust and are suitable for aggressively protected sites
- Good proxies are only part of the solution: headers, cookies, and behavior are equally important
- Anti-detect browsers in conjunction with quality proxies yield the best results
- The rotation and delay strategy is critically important β even with residential proxies, you can get banned with aggressive scraping
If you are engaged in price monitoring, scraping product cards, or collecting data from sites protected by DataDome, we recommend starting with residential proxies β they provide the optimal balance between the quality of bypassing protection and cost. For tasks requiring the highest level of trust from anti-bot systems, consider mobile proxies β especially if you are working with mobile versions of sites or mobile application APIs.