← Back to Blog

How to Set Up a Proxy in AWS Lambda for Web Scraping and API: Complete Guide with Examples

A step-by-step guide to setting up a proxy in AWS Lambda for web scraping, API requests, and automation. Code examples in Python and Node.js, solutions to common issues.

πŸ“…February 18, 2026
```html

AWS Lambda is a serverless platform that allows you to run code without managing servers. However, when working with website scraping, marketplace APIs, or task automation, a common issue arises: Lambda functions use AWS IP addresses, which are easily detected and blocked. In this guide, we will explore how to integrate proxies into Lambda, set up IP rotation, and avoid common mistakes.

This article is aimed at developers who automate tasks through AWS Lambda: scraping data from protected websites, monitoring competitor prices, working with social media or marketplace APIs. You will receive ready-to-use code examples in Python and Node.js that you can implement right after reading.

Why Use Proxies in AWS Lambda

By default, AWS Lambda uses IP addresses from the Amazon Web Services pool. These addresses are listed publicly and can be easily identified by bot protection systems. Here are the main scenarios when proxies become necessary:

Real Case: A developer set up Lambda to monitor prices on Wildberries every 15 minutes. After 2 days, the marketplace began returning a 403 Forbidden error β€” AWS IPs were blacklisted. After connecting residential proxies, scraping has been running smoothly for 6 months.

Main Reasons to Use Proxies in Lambda:

  • Scraping Protected Websites: Many websites block requests from AWS data center IPs. Proxies allow Lambda to masquerade as regular users.
  • Geolocation Restrictions: If you need to obtain data from a website that is only accessible from a specific country (e.g., regional prices on Ozon), proxies with the required geolocation solve the problem.
  • Bypassing Rate Limiting: Many service APIs limit the number of requests from a single IP. Proxy rotation allows you to distribute the load.
  • A/B Testing for Ads: Checking the display of advertisements from different regions for competitor analysis.
  • Monitoring Marketplaces: Tracking product positions and competitor prices on Wildberries, Ozon, Avito without blocks.

Lambda functions are often triggered on a schedule (via CloudWatch Events) or by events, making them an ideal tool for automation. However, without proxies, such tasks quickly encounter blocks from target resources.

Which Type of Proxy to Choose for Lambda

The choice of proxy type depends on the task your Lambda function is solving. Let's examine three main types and their applications in serverless architecture:

Proxy Type Speed Anonymity Best Use Cases for Lambda
Data Center Proxies Very High (50-200 ms) Medium API scraping without strict protection, mass availability checks, SEO monitoring
Residential Proxies Medium (300-800 ms) Very High Scraping protected websites (marketplaces, social networks), bypassing Cloudflare, working with Instagram/Facebook API
Mobile Proxies Medium (400-1000 ms) Maximum Working with mobile APIs (TikTok, Instagram), testing mobile ads, bypassing the strictest protections

Recommendations for Selection:

  • For scraping Wildberries, Ozon, Avito: Use residential proxies with Russian geolocation. These platforms actively block data center IPs.
  • For monitoring APIs without strict protection: Data center proxies are sufficient; they are cheaper and faster.
  • For working with Instagram, Facebook, TikTok APIs: Only mobile or residential proxies β€” these platforms detect and ban data centers.
  • For bypassing Cloudflare, PerimeterX: Residential proxies with rotation, preferably with sticky sessions (keeping IP for 5-30 minutes).

Important: Lambda functions have a time limit (maximum 15 minutes). When using slow proxies (residential/mobile), consider the delays β€” if a request through a proxy takes 2 seconds, then in 15 minutes you can make a maximum of ~450 requests.

Setting Up Proxies in Lambda with Python (requests, urllib3)

Python is the most popular language for Lambda functions, especially for scraping and automation tasks. Let's consider setting up proxies with the requests library, which is used in 90% of cases.

Basic Setup for HTTP Proxy

The simplest way to connect a proxy is to pass the proxies parameter to the requests.get() method:

import requests
import os

def lambda_handler(event, context):
    # Get proxy credentials from environment variables
    proxy_host = os.environ['PROXY_HOST']  # For example: proxy.example.com
    proxy_port = os.environ['PROXY_PORT']  # For example: 8080
    proxy_user = os.environ['PROXY_USER']
    proxy_pass = os.environ['PROXY_PASS']
    
    # Form the proxy URL with authentication
    proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
    
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    try:
        # Make a request through the proxy
        response = requests.get(
            'https://api.example.com/data',
            proxies=proxies,
            timeout=10  # Important! Set a timeout
        )
        
        return {
            'statusCode': 200,
            'body': response.text
        }
    
    except requests.exceptions.ProxyError as e:
        print(f"Proxy error: {e}")
        return {
            'statusCode': 500,
            'body': 'Proxy connection failed'
        }
    
    except requests.exceptions.Timeout as e:
        print(f"Timeout error: {e}")
        return {
            'statusCode': 504,
            'body': 'Request timeout'
        }

Key Points of This Code:

  • Environment Variables: Never store proxy credentials directly in the code! Use Environment Variables in the Lambda settings.
  • Timeout: Always set a timeout (10-30 seconds). Without it, Lambda may hang until the maximum execution time is reached.
  • Error Handling: Proxies may be unavailable or slow β€” always handle exceptions ProxyError and Timeout.
  • HTTP and HTTPS: Specify both protocols in the proxies dictionary, even if you are using only HTTPS.

Setting Up SOCKS5 Proxy

SOCKS5 proxies provide a higher level of anonymity and operate at the TCP level, making them undetectable by some protection systems. To work with SOCKS5 in requests, you need the requests[socks] library:

import requests
import os

def lambda_handler(event, context):
    proxy_host = os.environ['PROXY_HOST']
    proxy_port = os.environ['PROXY_PORT']
    proxy_user = os.environ['PROXY_USER']
    proxy_pass = os.environ['PROXY_PASS']
    
    # SOCKS5 proxy with authentication
    proxy_url = f"socks5://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
    
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    try:
        response = requests.get(
            'https://www.wildberries.ru/catalog/12345/detail.aspx',
            proxies=proxies,
            timeout=15
        )
        
        # Parse data
        return {
            'statusCode': 200,
            'body': response.text
        }
    
    except Exception as e:
        print(f"Error: {e}")
        return {
            'statusCode': 500,
            'body': str(e)
        }

Important for Deployment in Lambda: When using SOCKS5, add to requirements.txt:

requests[socks]
PySocks

Checking IP Through Proxy

Before running the main logic, it's useful to check that the proxy is working and returning the correct IP:

def check_proxy_ip(proxies):
    """Checks the IP seen by the outside world through the proxy"""
    try:
        response = requests.get(
            'https://api.ipify.org?format=json',
            proxies=proxies,
            timeout=10
        )
        ip_data = response.json()
        print(f"Current IP through proxy: {ip_data['ip']}")
        return ip_data['ip']
    except Exception as e:
        print(f"Proxy check failed: {e}")
        return None

def lambda_handler(event, context):
    # ... proxy setup ...
    
    # Check IP before main work
    current_ip = check_proxy_ip(proxies)
    if not current_ip:
        return {
            'statusCode': 500,
            'body': 'Proxy verification failed'
        }
    
    # Main scraping logic
    # ...

Setting Up Proxies in Lambda with Node.js (axios, got)

Node.js is the second most popular language for Lambda functions, especially when high performance is needed for API work. Let's consider setting up proxies with the axios and got libraries.

Setting Up with axios

Axios is the most popular HTTP library for Node.js. To work with proxies, you'll need an additional package https-proxy-agent:

const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');

exports.handler = async (event) => {
    // Get credentials from environment variables
    const proxyHost = process.env.PROXY_HOST;
    const proxyPort = process.env.PROXY_PORT;
    const proxyUser = process.env.PROXY_USER;
    const proxyPass = process.env.PROXY_PASS;
    
    // Form the proxy URL
    const proxyUrl = `http://${proxyUser}:${proxyPass}@${proxyHost}:${proxyPort}`;
    
    // Create an agent for the proxy
    const agent = new HttpsProxyAgent(proxyUrl);
    
    try {
        const response = await axios.get('https://api.example.com/data', {
            httpsAgent: agent,
            timeout: 10000  // 10 seconds
        });
        
        return {
            statusCode: 200,
            body: JSON.stringify(response.data)
        };
    } catch (error) {
        console.error('Request failed:', error.message);
        
        return {
            statusCode: 500,
            body: JSON.stringify({
                error: error.message
            })
        };
    }
};

Installing Dependencies: Add to package.json:

{
  "dependencies": {
    "axios": "^1.6.0",
    "https-proxy-agent": "^7.0.0"
  }
}

Setting Up SOCKS5 with axios

For SOCKS5 proxies, use the package socks-proxy-agent:

const axios = require('axios');
const { SocksProxyAgent } = require('socks-proxy-agent');

exports.handler = async (event) => {
    const proxyUrl = `socks5://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
    
    const agent = new SocksProxyAgent(proxyUrl);
    
    try {
        const response = await axios.get('https://www.ozon.ru/api/products', {
            httpAgent: agent,
            httpsAgent: agent,
            timeout: 15000
        });
        
        return {
            statusCode: 200,
            body: JSON.stringify(response.data)
        };
    } catch (error) {
        console.error('Error:', error.message);
        return {
            statusCode: 500,
            body: JSON.stringify({ error: error.message })
        };
    }
};

Alternative: the got Library

Got is a modern HTTP library with native proxy support (no separate agents required):

const got = require('got');

exports.handler = async (event) => {
    const proxyUrl = `http://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
    
    try {
        const response = await got('https://api.example.com/data', {
            agent: {
                http: new (require('http-proxy-agent'))(proxyUrl),
                https: new (require('https-proxy-agent'))(proxyUrl)
            },
            timeout: {
                request: 10000
            },
            responseType: 'json'
        });
        
        return {
            statusCode: 200,
            body: JSON.stringify(response.body)
        };
    } catch (error) {
        console.error('Error:', error.message);
        return {
            statusCode: 500,
            body: JSON.stringify({ error: error.message })
        };
    }
};

Proxy Rotation in Lambda: How to Change IP Automatically

Proxy rotation is critically important for tasks that require making many requests without blocks. There are two main approaches: using proxy services with automatic rotation or manually managing a pool of proxies.

Automatic Rotation via Provider

Most residential proxy providers (including ProxyCove) offer an endpoint with automatic rotation β€” each request or every N minutes, the IP changes automatically:

import requests
import os

def lambda_handler(event, context):
    # Proxy with automatic rotation
    # Format: rotating.proxy.com:port
    # Each request = new IP
    proxy_url = f"http://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@rotating.proxycove.com:8080"
    
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    results = []
    
    # Make 10 requests β€” each with a new IP
    for i in range(10):
        try:
            response = requests.get(
                f'https://api.wildberries.ru/products/{i}',
                proxies=proxies,
                timeout=10
            )
            results.append({
                'product_id': i,
                'status': response.status_code,
                'data': response.json()
            })
        except Exception as e:
            results.append({
                'product_id': i,
                'error': str(e)
            })
    
    return {
        'statusCode': 200,
        'body': json.dumps(results)
    }

Manual Rotation from Proxy Pool

If you have a list of proxies, you can implement rotation manually. This is useful when you need control over which proxy is used for each request:

import requests
import random
import json

def lambda_handler(event, context):
    # List of proxies (can be stored in DynamoDB or S3)
    proxy_pool = [
        {
            'host': 'proxy1.example.com',
            'port': '8080',
            'user': 'user1',
            'pass': 'pass1'
        },
        {
            'host': 'proxy2.example.com',
            'port': '8080',
            'user': 'user2',
            'pass': 'pass2'
        },
        {
            'host': 'proxy3.example.com',
            'port': '8080',
            'user': 'user3',
            'pass': 'pass3'
        }
    ]
    
    results = []
    
    for i in range(10):
        # Select a random proxy from the pool
        proxy = random.choice(proxy_pool)
        proxy_url = f"http://{proxy['user']}:{proxy['pass']}@{proxy['host']}:{proxy['port']}"
        
        proxies = {
            'http': proxy_url,
            'https': proxy_url
        }
        
        try:
            response = requests.get(
                f'https://api.example.com/item/{i}',
                proxies=proxies,
                timeout=10
            )
            results.append({
                'item': i,
                'proxy_used': proxy['host'],
                'status': response.status_code
            })
        except Exception as e:
            results.append({
                'item': i,
                'proxy_used': proxy['host'],
                'error': str(e)
            })
    
    return {
        'statusCode': 200,
        'body': json.dumps(results)
    }

Sticky Sessions for IP Retention

Some tasks require retaining a single IP throughout the session (e.g., logging into a website). Proxy providers offer sticky sessions through a parameter in the URL:

import requests
import uuid

def lambda_handler(event, context):
    # Generate a unique session_id
    session_id = str(uuid.uuid4())
    
    # Proxy with sticky session (IP retained for 10 minutes)
    proxy_url = f"http://{os.environ['PROXY_USER']}-session-{session_id}:{os.environ['PROXY_PASS']}@sticky.proxycove.com:8080"
    
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    # All requests in this Lambda will be executed with the same IP
    # 1. Login
    login_response = requests.post(
        'https://example.com/login',
        data={'user': 'test', 'pass': 'test'},
        proxies=proxies
    )
    
    # 2. Get data (the same IP is used)
    data_response = requests.get(
        'https://example.com/dashboard',
        proxies=proxies,
        cookies=login_response.cookies
    )
    
    return {
        'statusCode': 200,
        'body': data_response.text
    }

Storing Proxy Credentials via Environment Variables

Never store proxy credentials (username, password, host) directly in the Lambda function code. AWS provides several secure ways to store sensitive data:

1. Environment Variables (Basic Method)

In the AWS Lambda console β†’ Configuration β†’ Environment variables, add:

  • PROXY_HOST = proxy.example.com
  • PROXY_PORT = 8080
  • PROXY_USER = your_username
  • PROXY_PASS = your_password

AWS automatically encrypts Environment Variables at rest. Accessing them in code:

# Python
import os
proxy_host = os.environ['PROXY_HOST']

// Node.js
const proxyHost = process.env.PROXY_HOST;

2. AWS Secrets Manager (Recommended for Production)

For maximum security, use AWS Secrets Manager β€” it provides automatic secret rotation and detailed access control:

import boto3
import json
from botocore.exceptions import ClientError

def get_proxy_credentials():
    secret_name = "proxy-credentials"
    region_name = "us-east-1"
    
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )
    
    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
        secret = json.loads(get_secret_value_response['SecretString'])
        return secret
    except ClientError as e:
        print(f"Error retrieving secret: {e}")
        raise e

def lambda_handler(event, context):
    # Get credentials from Secrets Manager
    creds = get_proxy_credentials()
    
    proxy_url = f"http://{creds['user']}:{creds['password']}@{creds['host']}:{creds['port']}"
    
    # Use the proxy
    # ...

Important: Don’t forget to add IAM permissions to the Lambda function for access to Secrets Manager:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789:secret:proxy-credentials-*"
    }
  ]
}

Common Errors and Their Solutions

When working with proxies in Lambda, developers often encounter the same issues. Let's discuss the most common ones and how to resolve them:

Error: ProxyError / Connection Timeout

Symptom: requests.exceptions.ProxyError: HTTPConnectionPool(host='proxy.example.com', port=8080): Max retries exceeded

Causes:

  • Incorrect proxy credentials (username/password)
  • Proxy server is unavailable or overloaded
  • Firewall blocks outgoing connections from Lambda
  • Timeout is too short

Solution:

# 1. Check credentials
print(f"Using proxy: {proxy_host}:{proxy_port}")
print(f"User: {proxy_user}")

# 2. Increase timeout
response = requests.get(url, proxies=proxies, timeout=30)

# 3. Add retry logic
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

response = session.get(url, proxies=proxies, timeout=30)

Error: SSL Certificate Verification Failed

Symptom: SSLError: [SSL: CERTIFICATE_VERIFY_FAILED]

Cause: Some proxies (especially cheap ones) use self-signed SSL certificates.

Solution (use with caution!):

# Disable SSL verification (only for testing!)
response = requests.get(
    url,
    proxies=proxies,
    verify=False  # DO NOT use in production!
)

# Better: specify the path to the CA certificate
response = requests.get(
    url,
    proxies=proxies,
    verify='/path/to/ca-bundle.crt'
)

Important: Disabling SSL verification (verify=False) makes the connection vulnerable to man-in-the-middle attacks. Use only for debugging in a dev environment!

Error: Lambda Timeout (Task Timed Out After X Seconds)

Symptom: The Lambda function terminates with a timeout error, not waiting for a response from the proxy.

Cause: Slow proxies (especially residential/mobile) + a large number of requests.

Solution:

  • Increase the Lambda function timeout: Configuration β†’ General configuration β†’ Timeout (maximum 15 minutes)
  • Reduce the number of requests per execution
  • Use asynchronous requests (asyncio in Python, Promise.all in Node.js)
  • Switch to faster proxies for non-critical tasks
# Python: asynchronous requests for speed
import asyncio
import aiohttp

async def fetch_url(session, url, proxy):
    async with session.get(url, proxy=proxy, timeout=10) as response:
        return await response.text()

async def lambda_handler_async(event, context):
    proxy_url = f"http://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@{os.environ['PROXY_HOST']}:{os.environ['PROXY_PORT']}"
    
    urls = [f'https://api.example.com/item/{i}' for i in range(50)]
    
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url, proxy_url) for url in urls]
        results = await asyncio.gather(*tasks)
    
    return {
        'statusCode': 200,
        'body': json.dumps({'count': len(results)})
    }

def lambda_handler(event, context):
    return asyncio.run(lambda_handler_async(event, context))

Error: 407 Proxy Authentication Required

Symptom: HTTP 407 error when attempting to use the proxy.

Cause: Incorrect format for passing credentials, or the proxy requires IP authentication instead of username/password.

Solution:

# Check the proxy URL format
# Correct:
proxy_url = f"http://{user}:{password}@{host}:{port}"

# Incorrect (protocol missing):
proxy_url = f"{user}:{password}@{host}:{port}"  # ❌

# If the proxy requires IP authentication:
# 1. Find out the external IP of your Lambda (it may change!)
# 2. Add this IP to the proxy provider's whitelist
# 3. Use the proxy without user:pass

# Getting the external IP of Lambda:
response = requests.get('https://api.ipify.org?format=json')
lambda_ip = response.json()['ip']
print(f"Lambda external IP: {lambda_ip}")

Optimizing Lambda Performance with Proxies

Using proxies adds latency to each request. Here are proven ways to minimize the impact on performance:

1. Connection Pooling

Reuse TCP connections instead of creating a new one for each request:

# Python: use Session instead of requests.get()
import requests

# Create session once (can be moved outside the handler)
session = requests.Session()
session.proxies = {
    'http': proxy_url,
    'https': proxy_url
}

def lambda_handler(event, context):
    # All requests reuse connections
    for i in range(100):
        response = session.get(f'https://api.example.com/item/{i}')
        # process response...

2. Parallel Requests

If you need to make many independent requests, execute them in parallel:

// Node.js: parallel requests with Promise.all
const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');

const agent = new HttpsProxyAgent(proxyUrl);

exports.handler = async (event) => {
    const urls = Array.from({length: 50}, (_, i) => 
        `https://api.example.com/item/${i}`
    );
    
    // All requests are executed in parallel
    const promises = urls.map(url => 
        axios.get(url, { 
            httpsAgent: agent,
            timeout: 10000
        })
    );
    
    try {
        const results = await Promise.all(promises);
        return {
            statusCode: 200,
            body: JSON.stringify({
                count: results.length,
                data: results.map(r => r.data)
            })
        };
    } catch (error) {
        console.error('Error:', error.message);
        return {
            statusCode: 500,
            body: JSON.stringify({ error: error.message })
        };
    }
};

3. Caching Results

If the data changes infrequently, cache the results in DynamoDB or S3:

import boto3
import json
import time

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('proxy-cache')

def get_cached_or_fetch(url, proxies, cache_ttl=3600):
    """Returns data from cache or makes a request through the proxy"""
    
    # Check the cache
    try:
        response = table.get_item(Key={'url': url})
        if 'Item' in response:
            item = response['Item']
            if time.time() - item['timestamp'] < cache_ttl:
                print(f"Cache hit for {url}")
                return item['data']
    except Exception as e:
        print(f"Cache error: {e}")
    
    # Cache is empty or expired β€” make a request
    print(f"Cache miss for {url}, fetching...")
    response = requests.get(url, proxies=proxies, timeout=10)
    data = response.text
    
    # Save to cache
    try:
        table.put_item(Item={
            'url': url,
            'data': data,
            'timestamp': int(time.time())
        })
    except Exception as e:
        print(f"Cache save error: {e}")
    
    return data

4. Choosing the Right Type of Proxy

Comparing the speed of different types of proxies in real conditions:

Proxy Type Average Latency Requests/Minute (Lambda 1GB RAM) Recommendation
Data Centers 50-200 ms 300-600 Mass API scraping
Residential 300-800 ms 100-200 Protected websites
Mobile 500-1500 ms 50-100 Mobile APIs

Conclusion: Using proxies in AWS Lambda is essential for tasks that require anonymity and reliability. By selecting the right type of proxy and implementing best practices, you can ensure successful automation and data scraping without running into blocks or timeouts.

```