Integration of Proxy with Google Cloud Functions: A Guide 2024

Google Cloud Functions is a serverless platform for running code without managing servers. When working with scraping, API request automation, or data collection, routing traffic through a proxy is often required to bypass blocks, rotate IPs, and achieve geographic targeting. In this guide, we will explore how to set up a proxy in Cloud Functions using Python and Node.js with practical examples.

Why Use Proxies in Cloud Functions

Google Cloud Functions operate in an isolated environment with shared IP addresses from Google data centers. When making frequent requests to external APIs or websites, several issues arise:

IP Blocking — many services (Google, Facebook, marketplaces) recognize traffic from data centers and apply rate limiting or complete blocking.
Geographic Restrictions — to access content available only in certain countries (e.g., scraping regional prices on Wildberries or Ozon).
Rate Limiting — a single IP address can make a limited number of requests per minute. Proxies allow for load distribution.
Privacy — hiding the actual source of requests when dealing with sensitive data or competitive intelligence.

Typical use cases for proxies in Cloud Functions include:

Scraping marketplaces (Wildberries, Ozon, Amazon) for monitoring competitor prices
Data collection from social media (Instagram, TikTok) via APIs or web scraping
Automating ad checks in different regions
Massive requests to search engines (Google, Yandex) for SEO analysis
Testing geolocation features of applications

What Types of Proxies are Suitable for Cloud Functions

The choice of proxy type depends on the task, budget, and anonymity requirements. Here’s a comparison of the main options:

Proxy Type	Speed	Anonymity	Best For
Datacenter Proxies	High (50-200 ms)	Medium	Scraping simple websites, API requests, SEO monitoring
Residential Proxies	Medium (200-800 ms)	High	Scraping social networks, marketplaces, bypassing anti-bot systems
Mobile Proxies	Medium (300-1000 ms)	Very High	Instagram, TikTok, mobile applications, Facebook API

Selection Recommendations:

For scraping marketplaces (Wildberries, Ozon, Amazon) — residential proxies with request-based rotation, so each request comes from a new IP.
For API requests (Google Maps API, OpenWeatherMap) — datacenter proxies with high speed, if there are no strict IP restrictions.
For social networks (Instagram, TikTok) — mobile proxies, as they have IPs from mobile operators and are rarely blocked.
For SEO scraping (Google, Yandex) — residential proxies with geographic targeting to the desired region.

Setting Up Proxies in Python (requests, aiohttp)

Python is the most popular language for Cloud Functions when working with scraping and automation. Let's look at integrating proxies with the requests library (synchronous requests) and aiohttp (asynchronous requests).

Example with requests library (HTTP proxy)

import requests
import os

def parse_with_proxy(request):
    # Get proxy data from environment variables
    proxy_host = os.environ.get('PROXY_HOST', 'proxy.example.com')
    proxy_port = os.environ.get('PROXY_PORT', '8080')
    proxy_user = os.environ.get('PROXY_USER', 'username')
    proxy_pass = os.environ.get('PROXY_PASS', 'password')
    
    # Form the proxy URL with authentication
    proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
    
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    try:
        # Make a request through the proxy with a timeout
        response = requests.get(
            'https://api.example.com/data',
            proxies=proxies,
            timeout=10,
            headers={'User-Agent': 'Mozilla/5.0'}
        )
        
        # Check the response status
        response.raise_for_status()
        
        return {
            'statusCode': 200,
            'body': response.json(),
            'ip_used': response.headers.get('X-Forwarded-For', 'unknown')
        }
        
    except requests.exceptions.ProxyError as e:
        return {'statusCode': 502, 'error': f'Proxy error: {str(e)}'}
    except requests.exceptions.Timeout:
        return {'statusCode': 504, 'error': 'Request timeout'}
    except requests.exceptions.RequestException as e:
        return {'statusCode': 500, 'error': f'Request failed: {str(e)}'}

Important Points:

Environment Variables — store proxy data (host, port, username, password) in Secret Manager or Cloud Functions environment variables, not in code.
Timeouts — always set a timeout to prevent the function from hanging due to proxy issues.
User-Agent — add a User-Agent header to make requests appear as if they come from a real browser.
Error Handling — handle ProxyError (proxy issues) and Timeout (slow proxy) separately.

Example with aiohttp (asynchronous requests)

For high-load tasks (e.g., scraping 1000+ pages), use asynchronous requests with aiohttp:

import aiohttp
import asyncio
import os

async def fetch_with_proxy(url, proxy_url):
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(
                url,
                proxy=proxy_url,
                timeout=aiohttp.ClientTimeout(total=10),
                headers={'User-Agent': 'Mozilla/5.0'}
            ) as response:
                return await response.text()
        except aiohttp.ClientProxyConnectionError:
            return {'error': 'Proxy connection failed'}
        except asyncio.TimeoutError:
            return {'error': 'Request timeout'}

def parse_multiple_urls(request):
    proxy_url = f"http://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@{os.environ['PROXY_HOST']}:{os.environ['PROXY_PORT']}"
    
    urls = [
        'https://example.com/page1',
        'https://example.com/page2',
        'https://example.com/page3'
    ]
    
    # Run asynchronous requests in parallel
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    
    tasks = [fetch_with_proxy(url, proxy_url) for url in urls]
    results = loop.run_until_complete(asyncio.gather(*tasks))
    
    return {'statusCode': 200, 'results': results}

The asynchronous approach allows for making 10-100 parallel requests through the proxy, which is critical for scraping large volumes of data within the limited execution time of Cloud Functions (up to 9 minutes).

Working with SOCKS5 Proxies

Some proxy providers offer SOCKS5 proxies for more reliable handling of UDP traffic or bypassing blocks. To work with SOCKS5 in Python, use the requests[socks] library:

# Add to requirements.txt:
# requests[socks]

import requests

def use_socks5_proxy(request):
    proxy_url = f"socks5://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@{os.environ['PROXY_HOST']}:{os.environ['PROXY_PORT']}"
    
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    response = requests.get(
        'https://api.ipify.org?format=json',
        proxies=proxies,
        timeout=10
    )
    
    return {'statusCode': 200, 'ip': response.json()}

Setting Up Proxies in Node.js (axios, node-fetch)

Node.js is the second most popular language for Cloud Functions. Let's explore integrating proxies with the axios and node-fetch libraries.

Example with axios

const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');

exports.parseWithProxy = async (req, res) => {
  const proxyUrl = `http://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
  
  const agent = new HttpsProxyAgent(proxyUrl);
  
  try {
    const response = await axios.get('https://api.example.com/data', {
      httpsAgent: agent,
      timeout: 10000,
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
      }
    });
    
    res.status(200).json({
      success: true,
      data: response.data,
      proxyUsed: proxyUrl.split('@')[1] // Return host:port without password
    });
    
  } catch (error) {
    if (error.code === 'ECONNREFUSED') {
      res.status(502).json({ error: 'Proxy connection refused' });
    } else if (error.code === 'ETIMEDOUT') {
      res.status(504).json({ error: 'Proxy timeout' });
    } else {
      res.status(500).json({ error: error.message });
    }
  }
};

Dependencies for package.json:

{
  "dependencies": {
    "axios": "^1.6.0",
    "https-proxy-agent": "^7.0.2"
  }
}

Example with node-fetch and SOCKS5

const fetch = require('node-fetch');
const { SocksProxyAgent } = require('socks-proxy-agent');

exports.fetchWithSocks5 = async (req, res) => {
  const proxyUrl = `socks5://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
  
  const agent = new SocksProxyAgent(proxyUrl);
  
  try {
    const response = await fetch('https://api.ipify.org?format=json', {
      agent,
      timeout: 10000
    });
    
    const data = await response.json();
    
    res.status(200).json({
      success: true,
      yourIP: data.ip
    });
    
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
};

Dependencies for SOCKS5:

{
  "dependencies": {
    "node-fetch": "^2.7.0",
    "socks-proxy-agent": "^8.0.2"
  }
}

Proxy Authentication: Username/Password and IP Whitelist

There are two main methods of authentication when working with proxies:

1. Username and Password Authentication

The most common method is passing credentials in the proxy URL:

http://username:password@proxy.example.com:8080

Advantages: Easy to set up, does not require a fixed source IP.

Disadvantages: Credentials are sent with every request, slight overhead.

2. IP Whitelist Authentication

Some providers allow adding Cloud Functions IP addresses to a whitelist. The problem is that Cloud Functions use dynamic IPs from the Google Cloud pool.

Solution: Use Cloud NAT to route outgoing traffic through a static external IP:

Create a VPC network and subnet in Google Cloud
Set up Cloud NAT with a reserved static IP
Connect Cloud Functions to the VPC Connector
Add the static IP to the proxy provider's whitelist

After setup, the proxy does not require a username and password:

proxies = {
    'http': 'http://proxy.example.com:8080',
    'https': 'http://proxy.example.com:8080'
}

Recommendation: For most cases, use username/password authentication — it is simpler and does not incur additional costs for Cloud NAT (from $0.044/hour + traffic).

IP Rotation and Proxy Pool Management

When scraping large volumes of data, it is critical to use IP rotation to avoid blocks. There are several approaches:

1. Provider-Side Rotation (Rotating Proxies)

Many providers offer rotating proxies — a single endpoint that automatically changes the IP with each request or on a timer:

# One endpoint, IP changes automatically
proxy_url = "http://username:password@rotating.proxy.com:8080"

# Each request comes from a new IP
for i in range(100):
    response = requests.get('https://api.ipify.org', proxies={'http': proxy_url})
    print(f"Request {i}: IP = {response.text}")

Advantages: No need to manage a proxy pool manually, simple integration.

Disadvantages: No control over specific IPs, may be more expensive.

2. Manual Proxy Pool Management

If you have a list of static proxies, implement rotation at the code level:

import random
import requests

# Proxy pool (can be loaded from Secret Manager)
PROXY_POOL = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

def get_random_proxy():
    return random.choice(PROXY_POOL)

def parse_with_rotation(urls):
    results = []
    
    for url in urls:
        proxy = get_random_proxy()
        
        try:
            response = requests.get(
                url,
                proxies={'http': proxy, 'https': proxy},
                timeout=10
            )
            results.append({
                'url': url,
                'status': response.status_code,
                'proxy': proxy.split('@')[1]
            })
        except Exception as e:
            # If the proxy doesn't work, try another
            proxy = get_random_proxy()
            response = requests.get(url, proxies={'http': proxy, 'https': proxy})
            results.append({'url': url, 'status': response.status_code})
    
    return results

3. Session-Based Proxies (Sticky Sessions)

For tasks where you need to maintain one IP within a session (e.g., logging into a site), use a session ID in the proxy URL:

# Add session ID to login
import uuid

session_id = str(uuid.uuid4())
proxy_url = f"http://username-session-{session_id}:password@proxy.example.com:8080"

# All requests with this session_id will go through one IP
session = requests.Session()
session.proxies = {'http': proxy_url, 'https': proxy_url}

# Login
session.post('https://example.com/login', data={'user': 'test', 'pass': '123'})

# Subsequent requests in the same session
session.get('https://example.com/dashboard')

Error Handling and Timeouts

When working with proxies in Cloud Functions, it is critical to handle errors properly to avoid data loss and exceeding execution time limits.

Types of Errors and Handling Methods

Error	Cause	Solution
ProxyError	Proxy is unavailable or incorrect credentials	Switch to another proxy from the pool
Timeout	Slow proxy or overloaded server	Set a timeout of 5-10 seconds, retry with another IP
407 Proxy Authentication Required	Incorrect username/password	Check credentials in environment variables
429 Too Many Requests	Rate limiting on the target site	Add a delay between requests, use more IPs
403 Forbidden	Proxy IP is blocked by the site	Change IP, use residential instead of datacenter

Example of Comprehensive Error Handling

import requests
import time
from requests.exceptions import ProxyError, Timeout, RequestException

def fetch_with_retry(url, proxy_pool, max_retries=3):
    """
    Request with automatic retry and proxy switching on errors
    """
    for attempt in range(max_retries):
        proxy = random.choice(proxy_pool)
        
        try:
            response = requests.get(
                url,
                proxies={'http': proxy, 'https': proxy},
                timeout=10,
                headers={'User-Agent': 'Mozilla/5.0'}
            )
            
            # Check the status code
            if response.status_code == 200:
                return {'success': True, 'data': response.text, 'proxy': proxy}
            elif response.status_code == 429:
                # Rate limiting — wait and try again
                time.sleep(2 ** attempt)  # Exponential backoff
                continue
            elif response.status_code == 403:
                # IP blocked — change proxy
                continue
            else:
                return {'success': False, 'status': response.status_code}
                
        except ProxyError:
            # Proxy not working — try the next one
            print(f"Proxy {proxy} failed, trying another...")
            continue
        except Timeout:
            # Timeout — try with another proxy
            print(f"Timeout with {proxy}, retrying...")
            continue
        except RequestException as e:
            # Other errors
            print(f"Request failed: {e}")
            if attempt == max_retries - 1:
                return {'success': False, 'error': str(e)}
            continue
    
    return {'success': False, 'error': 'Max retries exceeded'}

Setting Timeouts in Cloud Functions

Cloud Functions have an execution time limit (default 60 seconds, maximum 540 seconds). Consider this when setting proxy timeouts:

Connection timeout — time to establish a connection with the proxy (recommended 5 seconds)
Read timeout — time to receive a response from the target server through the proxy (recommended 10-15 seconds)
Total timeout — total time for the entire request (should be less than the function timeout)

# Python: separate timeouts
response = requests.get(
    url,
    proxies=proxies,
    timeout=(5, 15)  # (connect timeout, read timeout)
)

# Node.js with axios
const response = await axios.get(url, {
  httpsAgent: agent,
  timeout: 10000  // total timeout in milliseconds
});

Best Practices and Performance Optimization

Recommendations for effective proxy use in Cloud Functions:

1. Use Environment Variables for Credentials

Never store proxy usernames and passwords in code. Use Secret Manager or environment variables:

# Create a secret in Google Cloud
gcloud secrets create proxy-credentials \
    --data-file=proxy-config.json

# Grant access to Cloud Functions
gcloud secrets add-iam-policy-binding proxy-credentials \
    --member=serviceAccount:PROJECT_ID@appspot.gserviceaccount.com \
    --role=roles/secretmanager.secretAccessor

# Reading the secret in code
from google.cloud import secretmanager
import json

def get_proxy_config():
    client = secretmanager.SecretManagerServiceClient()
    name = f"projects/{PROJECT_ID}/secrets/proxy-credentials/versions/latest"
    response = client.access_secret_version(request={"name": name})
    return json.loads(response.payload.data.decode('UTF-8'))

2. Cache Scraping Results

Use Cloud Storage or Firestore to cache data to avoid making repeated requests through the proxy:

import hashlib
from google.cloud import storage

def fetch_with_cache(url, proxy):
    # Generate cache key based on URL
    cache_key = hashlib.md5(url.encode()).hexdigest()
    
    # Check cache in Cloud Storage
    bucket = storage.Client().bucket('my-cache-bucket')
    blob = bucket.blob(f"cache/{cache_key}.json")
    
    if blob.exists():
        # Return cached data
        return json.loads(blob.download_as_text())
    
    # Make a request through the proxy
    response = requests.get(url, proxies={'http': proxy})
    data = response.json()
    
    # Save to cache
    blob.upload_from_string(json.dumps(data))
    
    return data

3. Monitoring and Logging

Monitor proxy performance and error rates through Cloud Logging:

import logging
import time

def fetch_with_logging(url, proxy):
    start_time = time.time()
    
    try:
        response = requests.get(url, proxies={'http': proxy}, timeout=10)
        duration = time.time() - start_time
        
        logging.info({
            'url': url,
            'proxy': proxy.split('@')[1],
            'status': response.status_code,
            'duration': duration,
            'success': True
        })
        
        return response
        
    except Exception as e:
        duration = time.time() - start_time
        
        logging.error({
            'url': url,
            'proxy': proxy.split('@')[1],
            'error': str(e),
            'duration': duration,
            'success': False
        })
        
        raise

4. Optimize Cold Start

Cloud Functions have a cold start delay. Minimize dependencies and use minimal library versions:

# requirements.txt — only necessary libraries
requests==2.31.0
# Avoid heavy libraries like pandas unless critical

Use global variables to reuse connections:

# Create a session once at cold start
session = requests.Session()
session.proxies = {'http': PROXY_URL, 'https': PROXY_URL}

def parse_data(request):
    # Reuse session between calls
    response = session.get('https://api.example.com/data')
    return response.json()

5. Geographic Targeting of Proxies

For tasks with geographic targeting (e.g., scraping regional prices), use proxies tied to a specific country or city:

# Example with residential proxies, where you can specify the country in the login
proxy_url = f"http://username-country-ru:password@proxy.example.com:8080"

# Or use different endpoints for different countries
PROXIES_BY_COUNTRY = {
    'RU': 'http://user:pass@ru.proxy.example.com:8080',
    'US': 'http://user:pass@us.proxy.example.com:8080',
    'DE': 'http://user:pass@de.proxy.example.com:8080'
}

def parse_by_country(country_code):
    proxy = PROXIES_BY_COUNTRY.get(country_code)
    response = requests.get('https://example.com', proxies={'http': proxy})
    return response.text

Conclusion

Integrating proxies with Google Cloud Functions opens up wide opportunities for scraping, automation, and working with APIs without IP restrictions. Key points to consider include proper error handling with retry logic, using timeouts to prevent hanging, IP rotation to avoid blocks, and securely storing credentials in Secret Manager.

For most scraping and automation tasks, the optimal choice will be residential proxies — they provide high anonymity and a low block rate due to the use of real user IPs. For working with social networks and mobile applications, we recommend mobile proxies, which have IPs from mobile operators and are virtually never blocked by platforms like Instagram and TikTok.

With the right setup of Cloud Functions with proxies, you get a scalable and cost-effective solution for processing large volumes of data without the need to manage infrastructure.

Proxy Integration with Google Cloud Functions: Setup for Scraping and Automation