AWS Lambda is a serverless platform that allows you to run code without managing servers. However, when working with website scraping, marketplace APIs, or task automation, a common issue arises: Lambda functions use AWS IP addresses, which are easily detected and blocked. In this guide, we will explore how to integrate proxies into Lambda, set up IP rotation, and avoid common mistakes.
This article is aimed at developers who automate tasks through AWS Lambda: scraping data from protected websites, monitoring competitor prices, working with social media or marketplace APIs. You will receive ready-to-use code examples in Python and Node.js that you can implement right after reading.
Why Use Proxies in AWS Lambda
By default, AWS Lambda uses IP addresses from the Amazon Web Services pool. These addresses are listed publicly and can be easily identified by bot protection systems. Here are the main scenarios when proxies become necessary:
Real Case: A developer set up Lambda to monitor prices on Wildberries every 15 minutes. After 2 days, the marketplace began returning a 403 Forbidden error β AWS IPs were blacklisted. After connecting residential proxies, scraping has been running smoothly for 6 months.
Main Reasons to Use Proxies in Lambda:
- Scraping Protected Websites: Many websites block requests from AWS data center IPs. Proxies allow Lambda to masquerade as regular users.
- Geolocation Restrictions: If you need to obtain data from a website that is only accessible from a specific country (e.g., regional prices on Ozon), proxies with the required geolocation solve the problem.
- Bypassing Rate Limiting: Many service APIs limit the number of requests from a single IP. Proxy rotation allows you to distribute the load.
- A/B Testing for Ads: Checking the display of advertisements from different regions for competitor analysis.
- Monitoring Marketplaces: Tracking product positions and competitor prices on Wildberries, Ozon, Avito without blocks.
Lambda functions are often triggered on a schedule (via CloudWatch Events) or by events, making them an ideal tool for automation. However, without proxies, such tasks quickly encounter blocks from target resources.
Which Type of Proxy to Choose for Lambda
The choice of proxy type depends on the task your Lambda function is solving. Let's examine three main types and their applications in serverless architecture:
| Proxy Type | Speed | Anonymity | Best Use Cases for Lambda |
|---|---|---|---|
| Data Center Proxies | Very High (50-200 ms) | Medium | API scraping without strict protection, mass availability checks, SEO monitoring |
| Residential Proxies | Medium (300-800 ms) | Very High | Scraping protected websites (marketplaces, social networks), bypassing Cloudflare, working with Instagram/Facebook API |
| Mobile Proxies | Medium (400-1000 ms) | Maximum | Working with mobile APIs (TikTok, Instagram), testing mobile ads, bypassing the strictest protections |
Recommendations for Selection:
- For scraping Wildberries, Ozon, Avito: Use residential proxies with Russian geolocation. These platforms actively block data center IPs.
- For monitoring APIs without strict protection: Data center proxies are sufficient; they are cheaper and faster.
- For working with Instagram, Facebook, TikTok APIs: Only mobile or residential proxies β these platforms detect and ban data centers.
- For bypassing Cloudflare, PerimeterX: Residential proxies with rotation, preferably with sticky sessions (keeping IP for 5-30 minutes).
Important: Lambda functions have a time limit (maximum 15 minutes). When using slow proxies (residential/mobile), consider the delays β if a request through a proxy takes 2 seconds, then in 15 minutes you can make a maximum of ~450 requests.
Setting Up Proxies in Lambda with Python (requests, urllib3)
Python is the most popular language for Lambda functions, especially for scraping and automation tasks. Let's consider setting up proxies with the requests library, which is used in 90% of cases.
Basic Setup for HTTP Proxy
The simplest way to connect a proxy is to pass the proxies parameter to the requests.get() method:
import requests
import os
def lambda_handler(event, context):
# Get proxy credentials from environment variables
proxy_host = os.environ['PROXY_HOST'] # For example: proxy.example.com
proxy_port = os.environ['PROXY_PORT'] # For example: 8080
proxy_user = os.environ['PROXY_USER']
proxy_pass = os.environ['PROXY_PASS']
# Form the proxy URL with authentication
proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
proxies = {
'http': proxy_url,
'https': proxy_url
}
try:
# Make a request through the proxy
response = requests.get(
'https://api.example.com/data',
proxies=proxies,
timeout=10 # Important! Set a timeout
)
return {
'statusCode': 200,
'body': response.text
}
except requests.exceptions.ProxyError as e:
print(f"Proxy error: {e}")
return {
'statusCode': 500,
'body': 'Proxy connection failed'
}
except requests.exceptions.Timeout as e:
print(f"Timeout error: {e}")
return {
'statusCode': 504,
'body': 'Request timeout'
}
Key Points of This Code:
- Environment Variables: Never store proxy credentials directly in the code! Use Environment Variables in the Lambda settings.
- Timeout: Always set a timeout (10-30 seconds). Without it, Lambda may hang until the maximum execution time is reached.
- Error Handling: Proxies may be unavailable or slow β always handle exceptions
ProxyErrorandTimeout. - HTTP and HTTPS: Specify both protocols in the
proxiesdictionary, even if you are using only HTTPS.
Setting Up SOCKS5 Proxy
SOCKS5 proxies provide a higher level of anonymity and operate at the TCP level, making them undetectable by some protection systems. To work with SOCKS5 in requests, you need the requests[socks] library:
import requests
import os
def lambda_handler(event, context):
proxy_host = os.environ['PROXY_HOST']
proxy_port = os.environ['PROXY_PORT']
proxy_user = os.environ['PROXY_USER']
proxy_pass = os.environ['PROXY_PASS']
# SOCKS5 proxy with authentication
proxy_url = f"socks5://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
proxies = {
'http': proxy_url,
'https': proxy_url
}
try:
response = requests.get(
'https://www.wildberries.ru/catalog/12345/detail.aspx',
proxies=proxies,
timeout=15
)
# Parse data
return {
'statusCode': 200,
'body': response.text
}
except Exception as e:
print(f"Error: {e}")
return {
'statusCode': 500,
'body': str(e)
}
Important for Deployment in Lambda: When using SOCKS5, add to requirements.txt:
requests[socks]
PySocks
Checking IP Through Proxy
Before running the main logic, it's useful to check that the proxy is working and returning the correct IP:
def check_proxy_ip(proxies):
"""Checks the IP seen by the outside world through the proxy"""
try:
response = requests.get(
'https://api.ipify.org?format=json',
proxies=proxies,
timeout=10
)
ip_data = response.json()
print(f"Current IP through proxy: {ip_data['ip']}")
return ip_data['ip']
except Exception as e:
print(f"Proxy check failed: {e}")
return None
def lambda_handler(event, context):
# ... proxy setup ...
# Check IP before main work
current_ip = check_proxy_ip(proxies)
if not current_ip:
return {
'statusCode': 500,
'body': 'Proxy verification failed'
}
# Main scraping logic
# ...
Setting Up Proxies in Lambda with Node.js (axios, got)
Node.js is the second most popular language for Lambda functions, especially when high performance is needed for API work. Let's consider setting up proxies with the axios and got libraries.
Setting Up with axios
Axios is the most popular HTTP library for Node.js. To work with proxies, you'll need an additional package https-proxy-agent:
const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');
exports.handler = async (event) => {
// Get credentials from environment variables
const proxyHost = process.env.PROXY_HOST;
const proxyPort = process.env.PROXY_PORT;
const proxyUser = process.env.PROXY_USER;
const proxyPass = process.env.PROXY_PASS;
// Form the proxy URL
const proxyUrl = `http://${proxyUser}:${proxyPass}@${proxyHost}:${proxyPort}`;
// Create an agent for the proxy
const agent = new HttpsProxyAgent(proxyUrl);
try {
const response = await axios.get('https://api.example.com/data', {
httpsAgent: agent,
timeout: 10000 // 10 seconds
});
return {
statusCode: 200,
body: JSON.stringify(response.data)
};
} catch (error) {
console.error('Request failed:', error.message);
return {
statusCode: 500,
body: JSON.stringify({
error: error.message
})
};
}
};
Installing Dependencies: Add to package.json:
{
"dependencies": {
"axios": "^1.6.0",
"https-proxy-agent": "^7.0.0"
}
}
Setting Up SOCKS5 with axios
For SOCKS5 proxies, use the package socks-proxy-agent:
const axios = require('axios');
const { SocksProxyAgent } = require('socks-proxy-agent');
exports.handler = async (event) => {
const proxyUrl = `socks5://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
const agent = new SocksProxyAgent(proxyUrl);
try {
const response = await axios.get('https://www.ozon.ru/api/products', {
httpAgent: agent,
httpsAgent: agent,
timeout: 15000
});
return {
statusCode: 200,
body: JSON.stringify(response.data)
};
} catch (error) {
console.error('Error:', error.message);
return {
statusCode: 500,
body: JSON.stringify({ error: error.message })
};
}
};
Alternative: the got Library
Got is a modern HTTP library with native proxy support (no separate agents required):
const got = require('got');
exports.handler = async (event) => {
const proxyUrl = `http://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
try {
const response = await got('https://api.example.com/data', {
agent: {
http: new (require('http-proxy-agent'))(proxyUrl),
https: new (require('https-proxy-agent'))(proxyUrl)
},
timeout: {
request: 10000
},
responseType: 'json'
});
return {
statusCode: 200,
body: JSON.stringify(response.body)
};
} catch (error) {
console.error('Error:', error.message);
return {
statusCode: 500,
body: JSON.stringify({ error: error.message })
};
}
};
Proxy Rotation in Lambda: How to Change IP Automatically
Proxy rotation is critically important for tasks that require making many requests without blocks. There are two main approaches: using proxy services with automatic rotation or manually managing a pool of proxies.
Automatic Rotation via Provider
Most residential proxy providers (including ProxyCove) offer an endpoint with automatic rotation β each request or every N minutes, the IP changes automatically:
import requests
import os
def lambda_handler(event, context):
# Proxy with automatic rotation
# Format: rotating.proxy.com:port
# Each request = new IP
proxy_url = f"http://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@rotating.proxycove.com:8080"
proxies = {
'http': proxy_url,
'https': proxy_url
}
results = []
# Make 10 requests β each with a new IP
for i in range(10):
try:
response = requests.get(
f'https://api.wildberries.ru/products/{i}',
proxies=proxies,
timeout=10
)
results.append({
'product_id': i,
'status': response.status_code,
'data': response.json()
})
except Exception as e:
results.append({
'product_id': i,
'error': str(e)
})
return {
'statusCode': 200,
'body': json.dumps(results)
}
Manual Rotation from Proxy Pool
If you have a list of proxies, you can implement rotation manually. This is useful when you need control over which proxy is used for each request:
import requests
import random
import json
def lambda_handler(event, context):
# List of proxies (can be stored in DynamoDB or S3)
proxy_pool = [
{
'host': 'proxy1.example.com',
'port': '8080',
'user': 'user1',
'pass': 'pass1'
},
{
'host': 'proxy2.example.com',
'port': '8080',
'user': 'user2',
'pass': 'pass2'
},
{
'host': 'proxy3.example.com',
'port': '8080',
'user': 'user3',
'pass': 'pass3'
}
]
results = []
for i in range(10):
# Select a random proxy from the pool
proxy = random.choice(proxy_pool)
proxy_url = f"http://{proxy['user']}:{proxy['pass']}@{proxy['host']}:{proxy['port']}"
proxies = {
'http': proxy_url,
'https': proxy_url
}
try:
response = requests.get(
f'https://api.example.com/item/{i}',
proxies=proxies,
timeout=10
)
results.append({
'item': i,
'proxy_used': proxy['host'],
'status': response.status_code
})
except Exception as e:
results.append({
'item': i,
'proxy_used': proxy['host'],
'error': str(e)
})
return {
'statusCode': 200,
'body': json.dumps(results)
}
Sticky Sessions for IP Retention
Some tasks require retaining a single IP throughout the session (e.g., logging into a website). Proxy providers offer sticky sessions through a parameter in the URL:
import requests
import uuid
def lambda_handler(event, context):
# Generate a unique session_id
session_id = str(uuid.uuid4())
# Proxy with sticky session (IP retained for 10 minutes)
proxy_url = f"http://{os.environ['PROXY_USER']}-session-{session_id}:{os.environ['PROXY_PASS']}@sticky.proxycove.com:8080"
proxies = {
'http': proxy_url,
'https': proxy_url
}
# All requests in this Lambda will be executed with the same IP
# 1. Login
login_response = requests.post(
'https://example.com/login',
data={'user': 'test', 'pass': 'test'},
proxies=proxies
)
# 2. Get data (the same IP is used)
data_response = requests.get(
'https://example.com/dashboard',
proxies=proxies,
cookies=login_response.cookies
)
return {
'statusCode': 200,
'body': data_response.text
}
Storing Proxy Credentials via Environment Variables
Never store proxy credentials (username, password, host) directly in the Lambda function code. AWS provides several secure ways to store sensitive data:
1. Environment Variables (Basic Method)
In the AWS Lambda console β Configuration β Environment variables, add:
PROXY_HOST= proxy.example.comPROXY_PORT= 8080PROXY_USER= your_usernamePROXY_PASS= your_password
AWS automatically encrypts Environment Variables at rest. Accessing them in code:
# Python
import os
proxy_host = os.environ['PROXY_HOST']
// Node.js
const proxyHost = process.env.PROXY_HOST;
2. AWS Secrets Manager (Recommended for Production)
For maximum security, use AWS Secrets Manager β it provides automatic secret rotation and detailed access control:
import boto3
import json
from botocore.exceptions import ClientError
def get_proxy_credentials():
secret_name = "proxy-credentials"
region_name = "us-east-1"
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name
)
try:
get_secret_value_response = client.get_secret_value(
SecretId=secret_name
)
secret = json.loads(get_secret_value_response['SecretString'])
return secret
except ClientError as e:
print(f"Error retrieving secret: {e}")
raise e
def lambda_handler(event, context):
# Get credentials from Secrets Manager
creds = get_proxy_credentials()
proxy_url = f"http://{creds['user']}:{creds['password']}@{creds['host']}:{creds['port']}"
# Use the proxy
# ...
Important: Donβt forget to add IAM permissions to the Lambda function for access to Secrets Manager:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:us-east-1:123456789:secret:proxy-credentials-*"
}
]
}
Common Errors and Their Solutions
When working with proxies in Lambda, developers often encounter the same issues. Let's discuss the most common ones and how to resolve them:
Error: ProxyError / Connection Timeout
Symptom: requests.exceptions.ProxyError: HTTPConnectionPool(host='proxy.example.com', port=8080): Max retries exceeded
Causes:
- Incorrect proxy credentials (username/password)
- Proxy server is unavailable or overloaded
- Firewall blocks outgoing connections from Lambda
- Timeout is too short
Solution:
# 1. Check credentials
print(f"Using proxy: {proxy_host}:{proxy_port}")
print(f"User: {proxy_user}")
# 2. Increase timeout
response = requests.get(url, proxies=proxies, timeout=30)
# 3. Add retry logic
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(
total=3,
backoff_factor=1,
status_forcelist=[500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
response = session.get(url, proxies=proxies, timeout=30)
Error: SSL Certificate Verification Failed
Symptom: SSLError: [SSL: CERTIFICATE_VERIFY_FAILED]
Cause: Some proxies (especially cheap ones) use self-signed SSL certificates.
Solution (use with caution!):
# Disable SSL verification (only for testing!)
response = requests.get(
url,
proxies=proxies,
verify=False # DO NOT use in production!
)
# Better: specify the path to the CA certificate
response = requests.get(
url,
proxies=proxies,
verify='/path/to/ca-bundle.crt'
)
Important: Disabling SSL verification (verify=False) makes the connection vulnerable to man-in-the-middle attacks. Use only for debugging in a dev environment!
Error: Lambda Timeout (Task Timed Out After X Seconds)
Symptom: The Lambda function terminates with a timeout error, not waiting for a response from the proxy.
Cause: Slow proxies (especially residential/mobile) + a large number of requests.
Solution:
- Increase the Lambda function timeout: Configuration β General configuration β Timeout (maximum 15 minutes)
- Reduce the number of requests per execution
- Use asynchronous requests (asyncio in Python, Promise.all in Node.js)
- Switch to faster proxies for non-critical tasks
# Python: asynchronous requests for speed
import asyncio
import aiohttp
async def fetch_url(session, url, proxy):
async with session.get(url, proxy=proxy, timeout=10) as response:
return await response.text()
async def lambda_handler_async(event, context):
proxy_url = f"http://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@{os.environ['PROXY_HOST']}:{os.environ['PROXY_PORT']}"
urls = [f'https://api.example.com/item/{i}' for i in range(50)]
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url, proxy_url) for url in urls]
results = await asyncio.gather(*tasks)
return {
'statusCode': 200,
'body': json.dumps({'count': len(results)})
}
def lambda_handler(event, context):
return asyncio.run(lambda_handler_async(event, context))
Error: 407 Proxy Authentication Required
Symptom: HTTP 407 error when attempting to use the proxy.
Cause: Incorrect format for passing credentials, or the proxy requires IP authentication instead of username/password.
Solution:
# Check the proxy URL format
# Correct:
proxy_url = f"http://{user}:{password}@{host}:{port}"
# Incorrect (protocol missing):
proxy_url = f"{user}:{password}@{host}:{port}" # β
# If the proxy requires IP authentication:
# 1. Find out the external IP of your Lambda (it may change!)
# 2. Add this IP to the proxy provider's whitelist
# 3. Use the proxy without user:pass
# Getting the external IP of Lambda:
response = requests.get('https://api.ipify.org?format=json')
lambda_ip = response.json()['ip']
print(f"Lambda external IP: {lambda_ip}")
Optimizing Lambda Performance with Proxies
Using proxies adds latency to each request. Here are proven ways to minimize the impact on performance:
1. Connection Pooling
Reuse TCP connections instead of creating a new one for each request:
# Python: use Session instead of requests.get()
import requests
# Create session once (can be moved outside the handler)
session = requests.Session()
session.proxies = {
'http': proxy_url,
'https': proxy_url
}
def lambda_handler(event, context):
# All requests reuse connections
for i in range(100):
response = session.get(f'https://api.example.com/item/{i}')
# process response...
2. Parallel Requests
If you need to make many independent requests, execute them in parallel:
// Node.js: parallel requests with Promise.all
const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');
const agent = new HttpsProxyAgent(proxyUrl);
exports.handler = async (event) => {
const urls = Array.from({length: 50}, (_, i) =>
`https://api.example.com/item/${i}`
);
// All requests are executed in parallel
const promises = urls.map(url =>
axios.get(url, {
httpsAgent: agent,
timeout: 10000
})
);
try {
const results = await Promise.all(promises);
return {
statusCode: 200,
body: JSON.stringify({
count: results.length,
data: results.map(r => r.data)
})
};
} catch (error) {
console.error('Error:', error.message);
return {
statusCode: 500,
body: JSON.stringify({ error: error.message })
};
}
};
3. Caching Results
If the data changes infrequently, cache the results in DynamoDB or S3:
import boto3
import json
import time
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('proxy-cache')
def get_cached_or_fetch(url, proxies, cache_ttl=3600):
"""Returns data from cache or makes a request through the proxy"""
# Check the cache
try:
response = table.get_item(Key={'url': url})
if 'Item' in response:
item = response['Item']
if time.time() - item['timestamp'] < cache_ttl:
print(f"Cache hit for {url}")
return item['data']
except Exception as e:
print(f"Cache error: {e}")
# Cache is empty or expired β make a request
print(f"Cache miss for {url}, fetching...")
response = requests.get(url, proxies=proxies, timeout=10)
data = response.text
# Save to cache
try:
table.put_item(Item={
'url': url,
'data': data,
'timestamp': int(time.time())
})
except Exception as e:
print(f"Cache save error: {e}")
return data
4. Choosing the Right Type of Proxy
Comparing the speed of different types of proxies in real conditions:
| Proxy Type | Average Latency | Requests/Minute (Lambda 1GB RAM) | Recommendation |
|---|---|---|---|
| Data Centers | 50-200 ms | 300-600 | Mass API scraping |
| Residential | 300-800 ms | 100-200 | Protected websites |
| Mobile | 500-1500 ms | 50-100 | Mobile APIs |
Conclusion: Using proxies in AWS Lambda is essential for tasks that require anonymity and reliability. By selecting the right type of proxy and implementing best practices, you can ensure successful automation and data scraping without running into blocks or timeouts.