Proxies in Python requests and aiohttp: setup in 5 minutes

When developing parsers, automating data collection, or testing web services, it is often necessary to use proxy servers from Python. The requests and aiohttp libraries provide flexible mechanisms for working with proxies, but their configuration has important nuances. In this guide, we will explore synchronous and asynchronous approaches, provide examples for HTTP and SOCKS5 proxies, and discuss IP rotation and error handling.

Basic proxy setup in requests

The requests library is the standard for HTTP requests in Python. Proxy setup is done through the proxies parameter, which accepts a dictionary with protocols and proxy server addresses.

A simple example with an HTTP proxy:

import requests

# Setting up the proxy
proxies = {
    'http': 'http://123.45.67.89:8080',
    'https': 'http://123.45.67.89:8080'
}

# Making a request through the proxy
response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())  # {'origin': '123.45.67.89'}

Note: For HTTPS requests, the http:// protocol is also specified in the proxy value (not https://). This is because the connection to the proxy server is established over HTTP, and then a tunnel for HTTPS traffic is created using the CONNECT method.

Using environment variables:

The requests library automatically reads proxies from the environment variables HTTP_PROXY and HTTPS_PROXY:

import os
import requests

# Setting through environment variables
os.environ['HTTP_PROXY'] = 'http://123.45.67.89:8080'
os.environ['HTTPS_PROXY'] = 'http://123.45.67.89:8080'

# Proxy will be applied automatically
response = requests.get('https://httpbin.org/ip')
print(response.json())

This approach is convenient for containerization (Docker) or when proxies are configured at the system level. However, for flexibility, it is recommended to explicitly pass the proxies parameter.

Authentication and SOCKS5 in requests

Most commercial proxy services require authentication with a username and password. In requests, this is implemented through a URL format with credentials.

HTTP proxy with authentication:

import requests

# Format: http://username:password@host:port
proxies = {
    'http': 'http://user123:[email protected]:8080',
    'https': 'http://user123:[email protected]:8080'
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())

Setting up SOCKS5 proxy:

To work with SOCKS5, an additional library requests[socks] or PySocks is required. Installation:

pip install requests[socks]

Example of using SOCKS5:

import requests

# SOCKS5 without authentication
proxies = {
    'http': 'socks5://123.45.67.89:1080',
    'https': 'socks5://123.45.67.89:1080'
}

# SOCKS5 with authentication
proxies_auth = {
    'http': 'socks5://user:[email protected]:1080',
    'https': 'socks5://user:[email protected]:1080'
}

response = requests.get('https://httpbin.org/ip', proxies=proxies_auth)
print(response.json())

SOCKS5 proxies are particularly useful when working with residential proxies, as this protocol provides more reliable traffic tunneling and supports UDP (necessary for some applications).

Proxy rotation in requests

When parsing large volumes of data, using a single IP address can lead to blocks. Proxy rotation is the cyclic change of IPs to distribute load and bypass rate limits.

Simple rotation through a list:

import requests
import itertools

# List of proxy servers
proxy_list = [
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
]

# Creating an infinite iterator
proxy_pool = itertools.cycle(proxy_list)

# Making requests with rotation
for i in range(10):
    proxy = next(proxy_pool)
    proxies = {'http': proxy, 'https': proxy}
    
    try:
        response = requests.get('https://httpbin.org/ip', proxies=proxies, timeout=5)
        print(f"Request {i+1}: IP = {response.json()['origin']}")
    except Exception as e:
        print(f"Error with proxy {proxy}: {e}")

Rotation with sessions to maintain cookies:

import requests
from itertools import cycle

class ProxyRotator:
    def __init__(self, proxy_list):
        self.proxy_pool = cycle(proxy_list)
        self.session = requests.Session()
    
    def get(self, url, **kwargs):
        proxy = next(self.proxy_pool)
        self.session.proxies = {'http': proxy, 'https': proxy}
        return self.session.get(url, **kwargs)

# Usage
proxy_list = [
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
]

rotator = ProxyRotator(proxy_list)

for i in range(5):
    response = rotator.get('https://httpbin.org/ip', timeout=5)
    print(f"Request {i+1}: {response.json()['origin']}")

Random rotation for unpredictability:

import requests
import random

proxy_list = [
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
    'http://user:[email protected]:8080',
]

def get_random_proxy():
    proxy = random.choice(proxy_list)
    return {'http': proxy, 'https': proxy}

# Each request with a random proxy
for i in range(5):
    response = requests.get('https://httpbin.org/ip', proxies=get_random_proxy(), timeout=5)
    print(f"Request {i+1}: {response.json()['origin']}")

Random rotation is more effective when working with sites that track request patterns. Sequentially changing IPs may look suspicious, while random selection mimics the behavior of different users.

Proxy setup in aiohttp

The aiohttp library is designed for asynchronous HTTP requests and is critical for high-load parsers. Proxy setup differs from requests — the proxy parameter (singular) is used.

Basic example with an HTTP proxy:

import aiohttp
import asyncio

async def fetch_with_proxy():
    proxy = 'http://123.45.67.89:8080'
    
    async with aiohttp.ClientSession() as session:
        async with session.get('https://httpbin.org/ip', proxy=proxy) as response:
            data = await response.json()
            print(data)

# Running
asyncio.run(fetch_with_proxy())

Proxy with authentication:

In aiohttp, authentication is passed through the aiohttp.BasicAuth object or directly in the URL:

import aiohttp
import asyncio

async def fetch_with_auth_proxy():
    # Option 1: Credentials in URL
    proxy = 'http://user123:[email protected]:8080'
    
    async with aiohttp.ClientSession() as session:
        async with session.get('https://httpbin.org/ip', proxy=proxy) as response:
            print(await response.json())

# Option 2: Through BasicAuth (for some proxies)
async def fetch_with_basic_auth():
    proxy = 'http://proxy.example.com:8080'
    proxy_auth = aiohttp.BasicAuth('user123', 'pass456')
    
    async with aiohttp.ClientSession() as session:
        async with session.get('https://httpbin.org/ip', 
                                proxy=proxy, 
                                proxy_auth=proxy_auth) as response:
            print(await response.json())

asyncio.run(fetch_with_auth_proxy())

SOCKS5 in aiohttp:

For SOCKS5, the aiohttp-socks library is required:

pip install aiohttp-socks

import asyncio
from aiohttp_socks import ProxyConnector
import aiohttp

async def fetch_with_socks5():
    connector = ProxyConnector.from_url('socks5://user:[email protected]:1080')
    
    async with aiohttp.ClientSession(connector=connector) as session:
        async with session.get('https://httpbin.org/ip') as response:
            print(await response.json())

asyncio.run(fetch_with_socks5())

When working with mobile proxies for scraping social networks or marketplaces, it is recommended to use aiohttp — the asynchronous nature allows processing hundreds of requests in parallel without blocking the execution thread.

Asynchronous rotation and proxy pool

For high-load parsers, efficient proxy rotation with failure handling and automatic replacement of non-working IPs is critical. Let's look at advanced patterns for aiohttp.

Class for managing a proxy pool:

import aiohttp
import asyncio
from itertools import cycle
from typing import List, Optional

class ProxyPool:
    def __init__(self, proxy_list: List[str]):
        self.proxy_list = proxy_list
        self.proxy_cycle = cycle(proxy_list)
        self.failed_proxies = set()
    
    def get_next_proxy(self) -> Optional[str]:
        """Get the next working proxy"""
        for _ in range(len(self.proxy_list)):
            proxy = next(self.proxy_cycle)
            if proxy not in self.failed_proxies:
                return proxy
        return None  # All proxies are unavailable
    
    def mark_failed(self, proxy: str):
        """Mark proxy as non-working"""
        self.failed_proxies.add(proxy)
        print(f"Proxy {proxy} marked as unavailable")
    
    async def fetch(self, session: aiohttp.ClientSession, url: str, **kwargs):
        """Perform a request with automatic proxy change on error"""
        max_retries = 3
        
        for attempt in range(max_retries):
            proxy = self.get_next_proxy()
            if not proxy:
                raise Exception("All proxies are unavailable")
            
            try:
                async with session.get(url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=10), **kwargs) as response:
                    return await response.json()
            except (aiohttp.ClientError, asyncio.TimeoutError) as e:
                print(f"Error with proxy {proxy}: {e}")
                self.mark_failed(proxy)
                continue
        
        raise Exception(f"Failed to perform the request after {max_retries} attempts")

# Usage
async def main():
    proxy_list = [
        'http://user:[email protected]:8080',
        'http://user:[email protected]:8080',
        'http://user:[email protected]:8080',
    ]
    
    pool = ProxyPool(proxy_list)
    
    async with aiohttp.ClientSession() as session:
        # Performing 10 requests with automatic rotation
        tasks = [pool.fetch(session, 'https://httpbin.org/ip') for _ in range(10)]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        for i, result in enumerate(results):
            if isinstance(result, Exception):
                print(f"Request {i+1} failed with error: {result}")
            else:
                print(f"Request {i+1}: IP = {result.get('origin')}")

asyncio.run(main())

Parallel processing with concurrency limit:

import aiohttp
import asyncio
from itertools import cycle

async def fetch_url(session, url, proxy, semaphore):
    async with semaphore:  # Limit concurrent requests
        try:
            async with session.get(url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=10)) as response:
                data = await response.json()
                return {'url': url, 'ip': data.get('origin'), 'status': response.status}
        except Exception as e:
            return {'url': url, 'error': str(e)}

async def main():
    urls = [f'https://httpbin.org/ip' for _ in range(50)]  # 50 requests
    proxy_list = [
        'http://user:[email protected]:8080',
        'http://user:[email protected]:8080',
    ]
    proxy_cycle = cycle(proxy_list)
    
    # Limit: no more than 10 concurrent requests
    semaphore = asyncio.Semaphore(10)
    
    async with aiohttp.ClientSession() as session:
        tasks = [
            fetch_url(session, url, next(proxy_cycle), semaphore)
            for url in urls
        ]
        results = await asyncio.gather(*tasks)
        
        # Analyzing results
        successful = [r for r in results if 'ip' in r]
        failed = [r for r in results if 'error' in r]
        
        print(f"Successful requests: {len(successful)}")
        print(f"Failed requests: {len(failed)}")

asyncio.run(main())

Using asyncio.Semaphore is critical when working with proxies — too many concurrent connections through one IP can lead to blocking by the target site or proxy provider.

Error and timeout handling

Working with proxies is associated with a higher number of errors: timeouts, connection drops, proxy server failures. Proper error handling is key to the stability of the parser.

Typical errors when working with proxies:

Error	Cause	Solution
`ProxyError`	Proxy server is unavailable	Switch to another proxy
`ConnectTimeout`	Proxy does not respond in time	Increase timeout or change proxy
`ProxyAuthenticationRequired`	Invalid username/password	Check credentials
`SSLError`	Issues with SSL certificate	Disable SSL verification (not recommended)
`TooManyRedirects`	Proxy creates a redirect loop	Change proxy or limit redirects

Error handling in requests:

import requests
from requests.exceptions import ProxyError, ConnectTimeout, RequestException

def fetch_with_retry(url, proxies, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(
                url, 
                proxies=proxies, 
                timeout=(5, 10),  # (connect timeout, read timeout)
                allow_redirects=True,
                verify=True  # SSL certificate verification
            )
            response.raise_for_status()  # Raises an exception for 4xx/5xx
            return response.json()
            
        except ProxyError as e:
            print(f"Attempt {attempt + 1}: Proxy unavailable - {e}")
        except ConnectTimeout as e:
            print(f"Attempt {attempt + 1}: Connection timeout - {e}")
        except requests.exceptions.HTTPError as e:
            print(f"Attempt {attempt + 1}: HTTP error {e.response.status_code}")
            if e.response.status_code == 407:  # Proxy Authentication Required
                print("Proxy authentication error!")
                break  # Do not retry on authorization error
        except RequestException as e:
            print(f"Attempt {attempt + 1}: General error - {e}")
        
        if attempt < max_retries - 1:
            print(f"Retrying in 2 seconds...")
            import time
            time.sleep(2)
    
    raise Exception(f"Failed to perform the request after {max_retries} attempts")

# Usage
proxies = {'http': 'http://user:[email protected]:8080', 'https': 'http://user:[email protected]:8080'}
try:
    data = fetch_with_retry('https://httpbin.org/ip', proxies)
    print(data)
except Exception as e:
    print(f"Critical error: {e}")

Error handling in aiohttp:

import aiohttp
import asyncio
from aiohttp import ClientError, ClientProxyConnectionError

async def fetch_with_retry(session, url, proxy, max_retries=3):
    for attempt in range(max_retries):
        try:
            timeout = aiohttp.ClientTimeout(total=10, connect=5)
            async with session.get(url, proxy=proxy, timeout=timeout) as response:
                response.raise_for_status()
                return await response.json()
                
        except ClientProxyConnectionError as e:
            print(f"Attempt {attempt + 1}: Proxy connection error - {e}")
        except asyncio.TimeoutError:
            print(f"Attempt {attempt + 1}: Timeout")
        except aiohttp.ClientHttpProxyError as e:
            print(f"Attempt {attempt + 1}: Proxy HTTP error - {e}")
            if e.status == 407:
                print("Proxy authentication error!")
                break
        except ClientError as e:
            print(f"Attempt {attempt + 1}: General client error - {e}")
        
        if attempt < max_retries - 1:
            await asyncio.sleep(2)
    
    raise Exception(f"Failed to perform the request after {max_retries} attempts")

async def main():
    proxy = 'http://user:[email protected]:8080'
    async with aiohttp.ClientSession() as session:
        try:
            data = await fetch_with_retry(session, 'https://httpbin.org/ip', proxy)
            print(data)
        except Exception as e:
            print(f"Critical error: {e}")

asyncio.run(main())

Setting timeouts:

Proper timeout settings are critical for stability. Recommended values:

Connect timeout: 5-10 seconds (time to establish a connection with the proxy)
Read timeout: 10-30 seconds (time to receive a response from the target site)
Total timeout: 30-60 seconds (total request time)

For slow residential proxies, it is recommended to increase timeouts to 20-30 seconds per connection, as routing through real providers may take longer.

Best practices and optimization

Effective work with proxies requires adherence to a set of rules to minimize blocks and maximize performance.

1. Use Session to reuse connections:

# requests: Session reuses TCP connections
session = requests.Session()
session.proxies = {'http': proxy, 'https': proxy}

for url in urls:
    response = session.get(url)  # Faster than requests.get()

# aiohttp: Session is required for asynchronicity
async with aiohttp.ClientSession() as session:
    tasks = [session.get(url, proxy=proxy) for url in urls]
    await asyncio.gather(*tasks)

2. Set realistic User-Agent and headers:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
}

proxies = {'http': proxy, 'https': proxy}
response = requests.get('https://example.com', headers=headers, proxies=proxies)

3. Limit rate limit (requests per second):

import time
import requests

class RateLimiter:
    def __init__(self, max_requests_per_second):
        self.max_requests = max_requests_per_second
        self.interval = 1.0 / max_requests_per_second
        self.last_request_time = 0
    
    def wait(self):
        elapsed = time.time() - self.last_request_time
        if elapsed < self.interval:
            time.sleep(self.interval - elapsed)
        self.last_request_time = time.time()

# Usage: no more than 2 requests per second
limiter = RateLimiter(2)
proxies = {'http': proxy, 'https': proxy}

for url in urls:
    limiter.wait()
    response = requests.get(url, proxies=proxies)

4. Logging and monitoring proxies:

import logging
from collections import defaultdict

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ProxyMonitor:
    def __init__(self):
        self.stats = defaultdict(lambda: {'success': 0, 'failed': 0, 'total_time': 0})
    
    def log_request(self, proxy, success, response_time):
        stats = self.stats[proxy]
        if success:
            stats['success'] += 1
        else:
            stats['failed'] += 1
        stats['total_time'] += response_time
        
        # Logging every 10 requests
        total = stats['success'] + stats['failed']
        if total % 10 == 0:
            avg_time = stats['total_time'] / total
            success_rate = stats['success'] / total * 100
            logger.info(f"Proxy {proxy}: {total} requests, success {success_rate:.1f}%, avg {avg_time:.2f}s")

monitor = ProxyMonitor()

# In the request code
import time
start = time.time()
try:
    response = requests.get(url, proxies=proxies, timeout=10)
    monitor.log_request(proxy, True, time.time() - start)
except Exception as e:
    monitor.log_request(proxy, False, time.time() - start)
    logger.error(f"Error with proxy {proxy}: {e}")

5. DNS caching for speed:

# aiohttp with DNS caching
import aiohttp
from aiohttp.resolver import AsyncResolver

resolver = AsyncResolver(nameservers=['8.8.8.8', '8.8.4.4'])
connector = aiohttp.TCPConnector(resolver=resolver, ttl_dns_cache=300)

async with aiohttp.ClientSession(connector=connector) as session:
    # Requests will use DNS cache for 5 minutes
    async with session.get(url, proxy=proxy) as response:
        data = await response.json()

6. Handling captchas and blocks:

Tip: When receiving status 403, 429, or captcha, it is recommended to:

Switch to a proxy with an IP from another subnet
Increase the delay between requests (up to 5-10 seconds)
Change User-Agent and other headers
Use cookies from previous successful sessions

Comparison of requests and aiohttp for proxies

The choice between requests and aiohttp depends on the task and the volume of data. Let's look at the key differences.

Criterion	requests	aiohttp
Synchrony	Synchronous (blocking)	Asynchronous (non-blocking)
Performance	~10-50 requests/sec	~100-1000 requests/sec
Code simplicity	Easier for beginners	Requires knowledge of async/await
Proxy setup	Dictionary `proxies`	Parameter `proxy`
SOCKS5 support	Through `requests[socks]`	Through `aiohttp-socks`
Memory usage	Less (one thread)	More (multiple tasks)
Better for	Simple scripts, <100 requests	Parsers, >1000 requests

When to use requests:

Simple scripts for one-off tasks
Prototyping and testing
Small volume of requests (up to 100 per minute)
When code simplicity and readability are important
Integration with synchronous libraries

When to use aiohttp:

Scraping large volumes of data (thousands of pages)
Monitoring many sources in real-time
API services under heavy load
When processing speed is critical
Working with WebSocket through proxies

Practical performance comparison:

# Test: 100 requests through proxy

# requests (synchronous) - ~50 seconds
import requests
import time

start = time.time()
proxies = {'http': proxy, 'https': proxy}
for i in range(100):
    response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(f"requests: {time.time() - start:.2f} seconds")

# aiohttp (asynchronous) - ~5 seconds
import aiohttp
import asyncio

async def fetch_all():
    async with aiohttp.ClientSession() as session:
        tasks = [
            session.get('https://httpbin.org/ip', proxy=proxy)
            for _ in range(100)
        ]
        await asyncio.gather(*tasks)

start = time.time()
asyncio.run(fetch_all())
print(f"aiohttp: {time.time() - start:.2f} seconds")

When using data center proxies for high-speed scraping, aiohttp shows a 10-20 times advantage over requests due to parallel processing of requests.

Conclusion

Setting up proxies in Python through the requests and aiohttp libraries is a fundamental skill for developing parsers, automating data collection, and bypassing geographical restrictions. The requests library is suitable for simple scripts and prototyping due to its clear synchronous API, while aiohttp provides high performance when processing thousands of requests through asynchronous architecture.

Key points for effective work with proxies in Python: proper error and timeout handling, implementing IP address rotation to distribute load, using Session to reuse connections, setting realistic headers and User-Agent, and monitoring the performance of proxy servers. For SOCKS5 proxies, additional libraries are required — requests[socks] or aiohttp-socks.

When choosing the type of proxy for scraping, consider the specifics of the task: for high-load parsers with thousands of requests, fast data center proxies are suitable, for bypassing strict anti-bot systems and working with social networks, residential proxies with real IPs of home users are recommended, and for tasks requiring maximum anonymity and simulation of mobile traffic, mobile proxies with IPs from cellular operators are optimal.

If you plan to develop high-performance parsers or automate data collection from many sources, we recommend trying residential proxies — they provide a high level of anonymity, minimal risk of blocks, and stable operation with most secure web services. For technical tasks requiring high processing speeds, data center proxies with low latency and high throughput are also suitable.