Setting timeout and retry for proxy: protection against data loss

```html

When working with proxies for scraping marketplaces, automating social media, or collecting data, the most common issue is hanging requests and lost data. A proxy server may not respond in time, the connection may drop, and your script may hang for several minutes. As a result, you lose time, data, and money.

In this guide, I will show you how to properly configure timeout and retry logic for working with proxies. You will learn what timeout values to use for different tasks, how to automatically reconnect on errors, and how to avoid losing any requests. This article is suitable for both those who write code in Python and those who use ready-made scraping tools.

Why timeout is critically important when working with proxies

Imagine a situation: you launched a price scraper for Wildberries for 10,000 products. The script works through a proxy to avoid getting banned. Everything is going well, but on the 523rd request, the proxy server stops responding — it may be overloaded or temporarily unavailable. Without a configured timeout, your script will wait indefinitely for a response (or until the system timeout of 2-5 minutes expires). As a result, the scraping stops, you lose time, and by the time you notice the problem, several hours may have passed.

A timeout is the maximum time to wait for a response from the server. If the server does not respond within this time, the request is aborted, and you can either retry with another proxy or log the error. This is especially important when working with proxies because:

Proxy servers can be unstable — especially public or cheap ones. Even quality residential proxies sometimes lose connection because the real user has disconnected from the internet.
The target site may block the IP — if the proxy gets banned, it will not respond at all or will respond very slowly (returning a captcha or redirect).
Network delays are unpredictable — especially when using proxies from other countries. The request may go through several intermediate nodes.
Mass operations require stability — if you are scraping 100,000 pages or managing 50 Instagram accounts, even 1% of hanging requests = 1,000 lost operations.

Without properly configured timeouts, your script will waste time waiting for unavailable proxies instead of switching to working ones. This directly affects the speed of operation and the stability of the result.

Types of timeouts: connect, read, and total timeout

There are three main types of timeouts that need to be understood and configured separately. Many novice developers and users of scrapers only configure one general timeout, which leads to problems.

1. Connect timeout

This is the time allocated for establishing a connection with the proxy server. If the connection is not established within this time — the request is aborted. Connect timeout is responsible for the initial handshake (TCP handshake) between your client and the proxy.

When it triggers: The proxy server is unavailable, overloaded, or the IP is blocked by a firewall.

Recommended values:

For fast data center proxies: 3-5 seconds
For residential proxies: 5-10 seconds
For mobile proxies: 10-15 seconds (mobile internet is slower)

2. Read timeout

This is the time to wait for a response from the target server after the connection with the proxy has already been established. If the server does not start sending data within this time — the request is aborted. Read timeout protects against situations where the server accepted the request but "hung" and does not return a response.

When it triggers: The target site processes the request slowly, is overloaded, or intentionally throttles suspicious requests.

Recommended values:

For scraping simple pages (HTML): 10-15 seconds
For scraping with JavaScript rendering: 30-60 seconds
For API requests: 5-10 seconds
For downloading large files: 120+ seconds

3. Total timeout

This is the maximum time for the entire request to be completed from start to finish, including connecting, sending the request, receiving, and reading the response. Total timeout is a "circuit breaker" that ensures that no request runs longer than the specified time.

When to use: When it is important for each request to fit within strict time limits (for example, when scraping in real-time for arbitrage).

Formula: Total timeout = Connect timeout + Read timeout + 20-30% buffer

Important: Not all libraries and tools support separate configuration of connect and read timeouts. For example, the requests library in Python allows you to specify both values as a tuple: timeout=(5, 15), where 5 is connect, and 15 is read.

Optimal timeout values for different tasks

The correct timeout values depend on your task, the type of proxy, and the target site. Too short timeouts will lead to a large number of false errors (the proxy is working, but you are discarding it). Too long will waste time waiting for dead proxies.

Task	Connect timeout	Read timeout	Comment
Scraping Wildberries, Ozon	5-7 sec	15-20 sec	Marketplaces may slowly serve pages with a large number of products
Scraping Avito, Yandex.Market	5-7 sec	10-15 sec	Usually fast sites, but may block suspicious IPs
Instagram, TikTok Automation	7-10 sec	20-30 sec	Use mobile proxies — they are slower but more stable
Working with Facebook Ads API	5 sec	10-15 sec	APIs are usually fast but may slow down during rate limiting
Scraping via Selenium/Puppeteer	10 sec	60-120 sec	JavaScript rendering takes time, especially on slow proxies
Mass proxy checking	3-5 sec	5-7 sec	Quick availability check, slow proxies are discarded

Tip: Start with conservative (longer) timeouts and gradually reduce them while analyzing error logs. If you see many timeout errors on working proxies — increase the values. If the script is lagging due to slow proxies — decrease them.

Retry logic: how to properly configure retries

Timeout addresses the issue of hanging requests, but it does not solve the problem of data loss. If the proxy does not respond — you simply get an error and lose that request. This is why retry logic is critically important.

Retry logic is the automatic repetition of a request upon an error. The main principles of proper configuration are:

1. Determine which errors require a retry

Not all errors need to be retried. For example:

Retry for: Timeout, Connection refused, Proxy error, 502/503/504 (temporary server errors), Rate limiting (429)
Do NOT retry: 404 (page not found), 403 (access forbidden permanently), 401 (invalid authorization), data validation errors

2. Configure the number of attempts

The optimal number of retries depends on the criticality of the data:

For non-critical tasks (scraping for analytics): 2-3 attempts
For important tasks (monitoring competitors' prices): 3-5 attempts
For critical tasks (working with advertising accounts): 5-10 attempts

3. Use exponential backoff

Do not retry the request immediately — this may exacerbate the problem (for example, if the server is overloaded). Use increasing delays between attempts:

1st attempt: immediately
2nd attempt: after 1-2 seconds
3rd attempt: after 4-5 seconds
4th attempt: after 10-15 seconds

Formula: delay = base_delay * (2 ^ attempt_number). For example: 1 sec, 2 sec, 4 sec, 8 sec, 16 sec.

4. Rotate proxies on retries

The most important rule: when retrying, use a DIFFERENT proxy from your pool. If one proxy failed to execute the request, the likelihood that it will succeed on retry is low. However, another proxy is likely to succeed.

This is especially important when working with residential proxies, where you have a pool of hundreds or thousands of IP addresses. For each retry, take a new random IP from the pool.

Examples of configuring timeout and retry in Python

Let's consider practical examples of implementing timeout and retry logic in Python using popular libraries.

Example 1: Basic setup with requests

The requests library is the most popular for HTTP requests in Python. Here’s how to set up timeout and simple retry:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Setting up retry logic
retry_strategy = Retry(
    total=5,  # Maximum 5 attempts
    backoff_factor=1,  # Delay: 1, 2, 4, 8, 16 seconds
    status_forcelist=[429, 500, 502, 503, 504],  # Error codes for retry
    allowed_methods=["HEAD", "GET", "POST", "PUT", "DELETE"]
)

adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)

# Setting up proxy
proxies = {
    'http': 'http://username:[email protected]:8080',
    'https': 'http://username:[email protected]:8080'
}

# Executing request with timeout
try:
    response = session.get(
        'https://www.wildberries.ru/catalog/electronics',
        proxies=proxies,
        timeout=(5, 15)  # connect timeout 5 sec, read timeout 15 sec
    )
    print(f"Success! Status: {response.status_code}")
    print(f"Response size: {len(response.content)} bytes")
except requests.exceptions.Timeout:
    print("Error: timeout exceeded")
except requests.exceptions.ProxyError:
    print("Error: proxy issue")
except requests.exceptions.RequestException as e:
    print(f"Request error: {e}")

In this example, we set up automatic retry at the session level. On errors 429, 500, 502, 503, 504, the library will automatically retry the request up to 5 times with exponential backoff.

Example 2: Proxy rotation on retry

A more advanced example with proxy rotation from the pool on each attempt:

import requests
import random
import time

# Proxy pool (replace with your actual proxies)
PROXY_POOL = [
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
]

def make_request_with_retry(url, max_retries=5, base_delay=1):
    """
    Executes a request with retry and proxy rotation
    """
    for attempt in range(max_retries):
        # Choose a random proxy from the pool
        proxy = random.choice(PROXY_POOL)
        proxies = {'http': proxy, 'https': proxy}
        
        try:
            response = requests.get(
                url,
                proxies=proxies,
                timeout=(5, 15),
                headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
            )
            
            # Check status code
            if response.status_code == 200:
                return response
            elif response.status_code in [429, 500, 502, 503, 504]:
                # Temporary error - retry
                print(f"Attempt {attempt + 1}: code {response.status_code}, retrying...")
            else:
                # Permanent error - stop
                print(f"Error {response.status_code}, stopping attempts")
                return None
                
        except (requests.exceptions.Timeout, 
                requests.exceptions.ProxyError,
                requests.exceptions.ConnectionError) as e:
            print(f"Attempt {attempt + 1}: error {type(e).__name__}, retrying...")
        
        # If this is not the last attempt - wait with exponential backoff
        if attempt < max_retries - 1:
            delay = base_delay * (2 ** attempt)
            print(f"Waiting {delay} seconds before the next attempt...")
            time.sleep(delay)
    
    print("All attempts exhausted")
    return None

# Usage
result = make_request_with_retry('https://www.ozon.ru/category/smartfony-15502/')
if result:
    print(f"Success! Received {len(result.content)} bytes of data")
else:
    print("Failed to execute request")

This code selects a new random proxy from the pool on each attempt, significantly increasing the likelihood of successfully executing the request.

Example 3: Using the tenacity library

For more flexible management of retry logic, you can use the specialized tenacity library:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import requests

@retry(
    stop=stop_after_attempt(5),  # Maximum 5 attempts
    wait=wait_exponential(multiplier=1, min=1, max=30),  # Exponential delay 1-30 sec
    retry=retry_if_exception_type((requests.exceptions.Timeout, 
                                   requests.exceptions.ProxyError,
                                   requests.exceptions.ConnectionError))
)
def fetch_with_proxy(url, proxy):
    """
    Function with automatic retry via decorator
    """
    proxies = {'http': proxy, 'https': proxy}
    response = requests.get(url, proxies=proxies, timeout=(5, 15))
    response.raise_for_status()  # Will raise an exception on HTTP error
    return response

# Usage
try:
    result = fetch_with_proxy(
        'https://www.avito.ru/rossiya/telefony',
        'http://user:[email protected]:8080'
    )
    print(f"Success! Status: {result.status_code}")
except Exception as e:
    print(f"Failed to execute request after all attempts: {e}")

The tenacity library provides very flexible retry configuration options through decorators. Installation: pip install tenacity

Ready-made solutions for no-code scraping

If you are not a programmer or want to save time on development, there are ready-made scraping tools with built-in support for timeout and retry logic. You don’t need to write code — just configure the parameters in the graphical interface.

Octoparse

A popular visual scraper for Windows and Mac. Timeout and retry configuration:

Open task settings → Advanced Options
Page Load Timeout: set to 20-30 seconds
Ajax Timeout: 10-15 seconds for dynamic content
Retry Times: 3-5 attempts on error
In the proxy settings, you can upload a list and enable automatic rotation

ParseHub

A cloud scraper with a free plan. Configuration:

Settings → Advanced → Page Load Delay: 5-10 seconds
Request Timeout: 30 seconds
Retry Failed Requests: enable, 3 attempts
Supports proxies through project settings

Apify

A platform for automating web tasks with ready-made actors (scripts) for scraping popular sites. Many actors for scraping marketplaces (Wildberries, Ozon) already have built-in optimal timeout and retry settings. You only need to:

Select a ready-made actor for the desired site
Specify the proxy (supports integration with proxy providers)
Run the task — everything else is configured automatically

Anti-detect browsers for automation

If you work with social media or advertising platforms through Dolphin Anty, AdsPower, or Multilogin, timeout is configured in the browser profile:

Dolphin Anty: Profile settings → Proxy → Timeout: 10-15 seconds
AdsPower: Proxy Settings → Connection Timeout: 10 seconds, Read Timeout: 20 seconds
Multilogin: Browser Profile → Network → Proxy Timeout: 15 seconds

When automating through these browsers (for example, with Selenium scripts), the proxy timeout is inherited from the profile settings, but you can also set additional timeouts at the script level.

Common mistakes when configuring timeouts

Even experienced developers and scraping specialists make typical mistakes when working with timeouts and retries. Here are the most common:

Mistake 1: No timeout at all

Many libraries do not set a timeout by default or set a very large value (several minutes). If you haven't explicitly specified a timeout — your script may hang for a long time.

Solution: Always explicitly specify a timeout in each request. It’s better to get an error after 15 seconds than to wait 5 minutes.

Mistake 2: Using the same proxy for all retries

If the proxy did not respond the first time, the likelihood of success on retrying with the same proxy is very low. Many forget to rotate proxies between attempts.

Solution: Use a new proxy from the pool for each retry. This is critical for a high success rate.

Mistake 3: Too short timeouts for slow proxies

Mobile and some residential proxies may be slower than data center ones. If you set a timeout of 5 seconds for a mobile proxy — you will get many false errors on perfectly working IPs.

Solution: Consider the type of proxy. For mobile proxies, use a timeout of at least 10-15 seconds.

Mistake 4: Infinite retries without limits

Some implement retries in a while True loop without limiting the number of attempts. If the problem is on the target site (for example, it is completely down) — the script will keep trying indefinitely.

Solution: Always limit the number of retries (3-10 attempts maximum) and log failed requests for subsequent analysis.

Mistake 5: Ignoring the type of error

Not all errors should be retried. For example, if you receive a 404 (page not found) — retrying is pointless, the page simply does not exist. However, a 503 (service temporarily unavailable) — makes sense to retry after a few seconds.

Solution: Analyze the type of error and only retry temporary issues (timeout, connection error, 429, 500, 502, 503, 504).

Mistake 6: Lack of logging

Without logs, you won’t understand why requests are failing: is the problem with the proxy, the timeouts, or the target site?

Solution: Log every error specifying: which proxy was used, what the timeout was, how many attempts were made, and what specific error occurred. This will help optimize settings.

Proxy selection tip: If you frequently encounter timeout errors even with correct settings, the problem may lie in the quality of the proxies. Cheap public or shared proxies are often overloaded and respond slowly. For stable operation, we recommend using quality residential proxies with guaranteed uptime.

Conclusion

Properly configuring timeout and retry logic is not just a technical detail but a critically important factor for stability and efficiency when working with proxies. Without timeouts, your scripts will hang on dead proxies, wasting time. Without retry logic, you will lose data on temporary errors. And without proxy rotation on retries — you will achieve a low success rate even with a quality pool of IPs.

Key takeaways from this guide:

Always explicitly set timeouts: connect timeout 5-10 seconds, read timeout 10-30 seconds depending on the task
Use retry logic with a limit of 3-5 attempts and exponential backoff
Rotate proxies on each retry — this is key to a high success rate
Retry only temporary errors (timeout, 429, 500, 502, 503, 504), do not waste attempts on permanent ones (404, 403)
Log all errors for analysis and optimization of settings
Consider the type of proxy: mobile proxies are slower than data centers, increase timeouts accordingly

If you are working with scraping marketplaces (Wildberries, Ozon, Avito), automating social media, or advertising platforms, the stability of proxies directly affects your results. Use quality proxies with high uptime and properly configure timeouts and retries — this will save you hours of time and thousands of lost requests.

For tasks requiring maximum stability and minimal timeout errors, we recommend trying data center proxies — they provide high response speed and stable connection, which is especially important for mass scraping and automation.