Proxies for Price Monitoring on Aviasales and Booking: A Guide 2024

```html

Travel aggregators like Aviasales, Booking, and Skyscanner actively protect against automated data collection — blocking IPs after 10-20 requests, showing captchas, distorting prices for bots. If you are monitoring prices for flights or hotels for your service, affiliate program, or market analysis, without properly configured proxies, you will get banned within minutes of your parser's operation.

In this guide, we will discuss what proxies are needed for stable scraping of travel sites, how to set up IP rotation, bypass anti-bot systems like Cloudflare and Akamai, and what mistakes lead to blocks even when using proxies.

Why Travel Aggregators Block Scraping and How They Do It

Travel aggregators incur real losses from scraping: every request to their API costs money (they pay airlines and hotels for data access), and competitors use the collected prices to lure customers away. Therefore, Aviasales, Booking, Skyscanner, and Kayak invest millions in anti-bot protection.

Main Methods of Detecting Scraping

1. Analyzing Request Frequency from One IP. A regular user makes 3-5 search queries per session, while a scraper makes hundreds per minute. If more than 15-20 requests come from your IP per minute, the system flags it as suspicious. After 50-100 requests, a block is imposed for 24 hours or permanently.

2. Browser Fingerprinting. Travel sites collect dozens of parameters: screen resolution, time zone, installed fonts, WebGL fingerprint, canvas fingerprint, audio context. If these parameters do not match the declared geolocation of the IP (for example, an IP from Moscow but a time zone of UTC+8) — this is a sign of a proxy or VPN.

3. IP Reputation Check. Sites use databases of known proxy providers, data centers, and VPN servers. If your IP is listed in such databases (for example, MaxMind GeoIP2, IPQualityScore, SEON), requests are blocked or a captcha is shown. Booking and Skyscanner are particularly strict with IPs from Amazon AWS, Google Cloud, and DigitalOcean ranges.

4. Behavioral Analysis. Anti-bot systems track mouse movements, scrolling speed, pauses between clicks. Selenium and Puppeteer leave traces without additional patches: the navigator.webdriver property, absence of plugins, atypical window sizes. Even with proxies, such traffic is easily recognizable.

5. TLS Fingerprinting. Modern anti-bot systems (Cloudflare, Akamai) analyze TLS handshake parameters: cipher suite order, extensions, protocol version. The TLS fingerprint of Python requests and standard libraries differs from that of browsers — this instantly reveals a bot.

Real Case: One of our clients scraped prices on Booking through 100 datacenter proxies (DigitalOcean). After 2 hours of operation, all IPs were permanently blocked — Booking detected the datacenter range and added it to the blacklist. Switching to residential proxies solved the problem: after a month of operation — zero blocks.

Which Types of Proxies Are Suitable for Price Monitoring: Comparison

Three types of proxies are used for scraping travel aggregators: residential, mobile, and datacenter proxies. Each type has its pros, cons, and use cases. The choice depends on the volume of scraping, budget, and anonymity requirements.

Proxy Type	Website Trust Level	Speed	Cost (conditionally)	Best For
Residential Proxies	Very High (Home User IPs)	Average (300-800 ms)	$$$ (based on traffic)	Booking, Expedia, Airbnb — sites with strict protection
Mobile Proxies	Maximum (Mobile Operator IPs)	Low (500-1500 ms)	$$$$ (most expensive)	Scraping mobile versions, API requests, bypassing Cloudflare
Datacenter Proxies	Low (easily detected)	Very High (50-150 ms)	$ (cheapest)	Aviasales API, less protected aggregators, testing

Features of Choosing for Specific Travel Sites

Aviasales and Skyscanner — are relatively lenient towards scraping via API (if you have affiliate access). For web scraping, residential proxies with rotation every 5-10 requests are sufficient. Datacenter proxies work but require a large pool of IPs (at least 500 addresses) and slow rotation (no more than 1 request every 30 seconds from one IP).

Booking.com and Expedia — use Cloudflare Enterprise with strict rules. Datacenter proxies are blocked 90% of the time even with slow scraping. Only residential or mobile proxies are needed, plus emulation of a real browser (Selenium Stealth, Puppeteer Extra with plugins). IP rotation — after every 3-5 requests.

Airbnb — is one of the most protected sites. Requires residential proxies with geolocation matching the search query (if searching for hotels in Paris — the IP must be French). Cookies, referer, and browser headers are mandatory. Mobile proxies yield the best results for scraping through the mobile API.

Kayak and Momondo — have a medium level of protection. Residential proxies are the optimal choice. Datacenter proxies can be used, but with mandatory rotation and delays between requests (at least 10-15 seconds).

Residential vs Datacenter Proxies: What to Choose for Travel Sites

The main difference between residential proxies and datacenter proxies is the source of the IP address. Residential proxies use IPs from real home internet providers (Rostelecom, MTS, Comcast, Verizon), while datacenter proxies use IPs from hosting companies' servers (AWS, Google Cloud, OVH). Travel sites trust residential IPs because they are used by regular users.

When Residential Proxies Are Mandatory

1. Scraping Sites with Cloudflare/Akamai. Booking, Expedia, Airbnb use these systems — they automatically block 95% of datacenter IPs. Residential proxies pass the check because their IPs are not listed in proxy provider databases.

2. Collecting Prices Tied to Geolocation. Travel sites show different prices to users from different countries and cities (due to taxes, currency exchange rates, local promotions). If you need prices for a specific region (for example, prices for residents of Germany), residential proxies with German IPs are the only reliable option.

3. Long-Term Scraping Without Blocks. If you are monitoring prices 24/7 for months, residential proxies pay off — you do not waste time replacing blocked IPs and setting up new proxies.

When Datacenter Proxies Can Be Used

1. Scraping via Official APIs. If you have affiliate access to the Aviasales API, Skyscanner API — the type of proxy is not critical, APIs are less sensitive to the source of the IP. Datacenter proxies will provide high speed and low cost.

2. Testing and Developing the Scraper. During the coding and debugging phase, use datacenter proxies — they are cheaper, faster, and it’s not a big deal if a few IPs get banned.

3. Scraping Less Protected Aggregators. Some regional travel sites or bus ticket aggregators do not use advanced anti-bot protection. For them, datacenter proxies with a large pool of IPs and slow rotation are quite suitable.

Tip: Combine proxy types. Use residential proxies for critical requests (first search, obtaining tokens, bypassing captchas), and datacenter proxies for bulk API requests or less protected endpoints. This will reduce costs by 40-60% while maintaining stability.

IP Rotation Strategy: How Often to Change Proxies When Scraping

Proper IP rotation is key to long-term scraping without blocks. If you change IP too often, you will quickly exhaust the pool of addresses and incur high traffic costs. If too rarely — you will accumulate suspicious activity on one IP and get banned.

Types of Proxy Rotation

1. Rotation by Requests (Rotating Proxies). The IP changes automatically after each request or after a specified number of requests. Most residential proxy providers offer this mode: you connect to one endpoint (for example, gate.proxycove.com:8000), and the IP changes on the provider's side.

Pros: Easy to set up, no need to manually manage the IP pool, minimal risk of blocking one IP.
Cons: Cannot control sessions (if you need to save cookies or tokens), each request = new IP = new traffic costs.

2. Sticky Sessions (Session Proxies). The IP is tied to your session for a certain period (usually 10-30 minutes). You make several requests from one IP, then it automatically changes. Configured through proxy parameters (for example, adding session-id123 in the login).

Pros: Can save cookies and tokens within the session, less traffic consumption (one IP = several requests).
Cons: If the IP gets banned during the session, all subsequent requests in that session will be blocked.

3. Manual Rotation from the Pool. You receive a list of IP addresses (for example, 1000) and manage the rotation in the scraper's code: select a random IP from the list, make N requests, switch to the next one. Typical for datacenter proxies.

Pros: Full control over rotation, can exclude blocked IPs from the pool.
Cons: Need to write rotation logic in code, manage the state of IPs (which have been used, which are blocked).

Recommended Rotation Frequency for Travel Sites

Site	Proxy Type	Rotation Frequency	Max Requests from 1 IP
Booking.com	Residential	After 3-5 requests	5-7
Expedia	Residential	After 5-8 requests	8-10
Airbnb	Residential/Mobile	After 2-4 requests	3-5
Aviasales	Residential/Datacenter	After 10-15 requests	15-20
Skyscanner	Residential/Datacenter	After 8-12 requests	12-15
Kayak	Residential	After 5-10 requests	10-12

Important: These are average values. Actual limits depend on the time of day (anti-bot systems are stricter at night), type of requests (searching for flights = more load on the API than viewing a hotel), quality of browser emulation. Start with conservative values (fewer requests per IP), then gradually increase while monitoring the block percentage.

Geo-Targeting Proxies: Why the Country and City of the IP Address Matter

Travel sites show different prices depending on the user's geolocation. This is not a bug, but a business model: airlines and hotels set different rates for different markets. For example, a ticket from Moscow to New York may cost $600 for a user from Russia and $750 for a user from the USA (due to taxes, competition, purchasing power).

How Sites Determine Geolocation

1. By IP Address. The primary method. Sites use GeoIP databases (MaxMind, IP2Location) that map IPs to cities, regions, and countries. The accuracy of city identification is 70-90%, country — 95-99%.

2. By Browser Language and Time Zone. If the IP shows Germany, but the browser language is Russian, and the time zone is UTC+3 (Moscow) — this is a sign of a proxy. The site may show a captcha or block the request.

3. By Currency and Account Settings. If you are logged into a Booking account, the site remembers your country at registration. Changing the IP to another country will raise suspicions — Booking may ask you to verify your identity or block your account.

How to Properly Choose the Geolocation of Proxies

For collecting prices for a specific market: Use IPs from the country whose prices you are interested in. If you are monitoring prices for the Russian market — take Russian residential proxies. For the European market — proxies from EU countries (Germany, France, Poland). For the USA — American proxies.

For bypassing geo-blocks: Some travel sites or special offers are only available from certain countries. For example, domestic flights in the USA are often cheaper when booked from an American IP. Use proxies from the required country + set the browser language and time zone for that country.

For scraping global data: If you need prices for all markets (for example, for analytics), use a pool of proxies from different countries. Rotate geolocation along with the IP: request from a German IP → German prices, request from a French IP → French prices.

Mistake: Using IP from one country but searching for hotels/tickets in another country with a mismatched currency. For example, IP from Russia, searching for hotels in Thailand, currency — euros. This looks suspicious. Either use the destination country's IP or your real country's IP with its currency.

Setting Up Proxies for Popular Parsers and Scripts

Let's consider setting up proxies for the most popular tools for scraping travel sites. The examples are provided for residential proxies with rotation, but they are also suitable for other types.

Python + requests / httpx

The simplest option for scraping APIs or simple pages without JavaScript. Suitable for Aviasales API, Skyscanner API, simple endpoints without Cloudflare.

import requests

# Proxy data (replace with your own)
proxy_host = "gate.proxycove.com"
proxy_port = "8000"
proxy_user = "your_username"
proxy_pass = "your_password"

proxies = {
    "http": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}",
    "https": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
}

# Browser headers (mandatory!)
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Referer": "https://www.google.com/"
}

# Request through proxy
response = requests.get(
    "https://www.aviasales.com/search",
    proxies=proxies,
    headers=headers,
    timeout=30
)

print(response.status_code)
print(response.text[:500])  # First 500 characters of the response

Important: For residential proxies with rotation, each new request will automatically receive a new IP. If you need a sticky session (one IP for several requests), add the session ID to the username: your_username-session-12345.

Selenium (for JavaScript sites)

Booking, Expedia, Airbnb actively use JavaScript for rendering content and anti-bot checks. Selenium emulates a real browser but requires additional settings to bypass detection.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

# Chrome settings
chrome_options = Options()

# Proxy
proxy_host = "gate.proxycove.com"
proxy_port = "8000"
proxy_user = "your_username"
proxy_pass = "your_password"

# Proxy format for Chrome
chrome_options.add_argument(f'--proxy-server=http://{proxy_host}:{proxy_port}')

# Hide automation
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)

# User-Agent
chrome_options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36')

driver = webdriver.Chrome(options=chrome_options)

# Remove webdriver property
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")

# Proxy authorization (if required)
# For Chrome, you need to create an extension with authorization, see selenium-wire or simply use Puppeteer

driver.get("https://www.booking.com/")
print(driver.title)
driver.quit()

Problem: Chrome does not support proxy authorization through login:password directly. Solutions: use the selenium-wire library (adds proxy with authorization), create a Chrome extension for authorization, or use Puppeteer (Node.js).

Puppeteer (Node.js) — The Best Choice for Complex Sites

Puppeteer emulates a browser better than Selenium and is easily configured with proxy authorization. Recommended for Booking, Airbnb, Expedia.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--proxy-server=http://gate.proxycove.com:8000',
      '--disable-blink-features=AutomationControlled',
      '--no-sandbox'
    ]
  });

  const page = await browser.newPage();

  // Proxy authorization
  await page.authenticate({
    username: 'your_username',
    password: 'your_password'
  });

  // Hide webdriver
  await page.evaluateOnNewDocument(() => {
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    });
  });

  // User-Agent
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');

  await page.goto('https://www.booking.com/', { waitUntil: 'networkidle2' });
  
  const title = await page.title();
  console.log('Title:', title);

  await browser.close();
})();

For even better protection against detection, use the puppeteer-extra-plugin-stealth plugin — it automatically hides all signs of automation.

Ready Solutions: Scrapy, Crawlee

Scrapy (Python) — a framework for large-scale scraping. Supports proxies through middleware. Example setup in settings.py:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 1,
}

# In spider
class TravelSpider(scrapy.Spider):
    def start_requests(self):
        proxy = "http://your_username:[email protected]:8000"
        yield scrapy.Request(
            url="https://www.aviasales.com/",
            meta={'proxy': proxy},
            callback=self.parse
        )

Crawlee (Node.js) — a modern framework with built-in proxy rotation, anti-bot system bypass, and automatic retries. Excellent for travel sites.

Bypassing Anti-Bot Systems Cloudflare, PerimeterX, Akamai

Even with quality residential proxies, you may encounter blocks if you do not bypass anti-bot systems correctly. Booking uses Cloudflare, Airbnb uses PerimeterX, and some sites use Akamai Bot Manager. These systems analyze not only the IP but also behavior, browser fingerprint, and TLS handshake.

Cloudflare: Main Bypass Methods

1. Use Browser Automation. Cloudflare checks JavaScript challenges that are executed in the browser. Simple HTTP requests (requests, curl) will not pass the check. You need Puppeteer, Playwright, or Selenium with the right settings.

2. Hide Signs of Automation. Install puppeteer-extra-plugin-stealth (Node.js) or undetected-chromedriver (Python). These libraries patch the browser, removing properties like navigator.webdriver, window.chrome, and changing permissions API.

3. Correct TLS Fingerprint. Cloudflare analyzes the TLS handshake. Use libraries that emulate browser TLS: curl-impersonate (emulates Chrome/Firefox TLS), tls-client (Go), hrequests (Python).

4. Solve Captchas Automatically. If Cloudflare shows a captcha (Turnstile), use captcha-solving services: 2Captcha, Anti-Captcha, CapSolver. They integrate via API and cost $1-3 per 1000 solutions.

PerimeterX (Airbnb, Some Travel Sites)

PerimeterX is one of the most complex anti-bot systems. It analyzes user behavior (mouse movements, clicks, scrolling), creates a device fingerprint, checks cookies and localStorage.

Bypass Methods:

1. Emulate User Behavior. Add random pauses between actions (2-5 seconds), move the mouse, scroll the page. In Puppeteer, use the ghost-cursor library for realistic mouse movements.

2. Save Cookies and LocalStorage. PerimeterX generates tokens stored in cookies (_px3, _pxhd) and localStorage. If you change IP but keep cookies — this is suspicious. Either change IP + clear cookies, or use sticky sessions (one IP = one session with cookies).

3. Use Mobile Proxies. PerimeterX is stricter with datacenter IPs. Mobile proxies yield better results for bypassing PerimeterX.

Akamai Bot Manager

Akamai analyzes sensor data (accelerometer, gyroscope on mobile), WebGL fingerprint, audio context, device performance. Bypassing requires advanced browser emulation.

Recommendations: Use real browsers (not headless), mobile proxies, add random delays, emulate touch events. For complex cases — use browser farms (BrowserStack, LambdaTest) or anti-detect browsers (AdsPower, Multilogin).

Common Mistakes When Scraping Travel Sites via Proxies

Even experienced developers make mistakes that lead to blocks. Here are the most common problems and their solutions.

Mistake 1: Using the Same User-Agent for All Requests

If all your requests come with the same User-Agent (for example, the standard Python requests: python-requests/2.28.0), this instantly reveals a bot. Even if you change IP, the site sees the same UA and links the requests.

Solution: Use a list of real browser User-Agents (Chrome, Firefox, Safari) and rotate them. The fake-useragent library (Python) automatically generates random UAs.

Mistake 2: Too High Request Speed

The scraper makes 100 requests per second — this is physically impossible for a human. Even with different IPs, anti-bot systems detect abnormal activity by patterns (all requests come exactly every 0.01 seconds).

Solution: Add random delays between requests: time.sleep(random.uniform(2, 5)). For travel sites, the optimal is: 2-5 seconds between requests from one IP, 0.5-2 seconds between requests from different IPs.

Mistake 3: Ignoring Cookies and Sessions

Travel sites use cookies to track sessions, store anti-bot tokens, and personalize prices. If you make each request without cookies (like a new user), this is suspicious.

Solution: Use requests.Session() (Python) or save cookies between requests in Puppeteer. For sticky sessions (one IP = several requests), it is essential to save cookies.

Mistake 4: Mismatch Between IP Geolocation and Browser Parameters

IP from Germany, but browser language is Russian, time zone is UTC+3, currency is rubles. Anti-bot systems see this mismatch and block the request.

Solution: Synchronize browser parameters with the proxy geolocation. If you are using a German IP — set the German language (Accept-Language: de-DE), time zone Europe/Berlin, currency EUR.

Mistake 5: Using Free or Low-Quality Proxies

Free proxies and cheap public proxies are already blocked on all major travel sites. Their IPs are listed in blacklists and have a bad reputation (used for spam, DDoS).

Solution: Use quality residential or mobile proxies from trusted providers. Check the reputation of IPs through IPQualityScore, Scamalytics before use.

Checklist Before Launching the Scraper:
✅ Proxies — residential or mobile, with the required geolocation
✅ User-Agent — real browser, rotated
✅ Cookies — saved within the session
✅ Delays — 2-5 seconds between requests
✅ IP Rotation — after 3-10 requests (depends on the site)