Google Cloud Functions is a serverless platform for running code without managing servers. When working with scraping, API request automation, or data collection, routing traffic through a proxy is often required to bypass blocks, rotate IPs, and achieve geographic targeting. In this guide, we will explore how to set up a proxy in Cloud Functions using Python and Node.js with practical examples.
Why Use Proxies in Cloud Functions
Google Cloud Functions operate in an isolated environment with shared IP addresses from Google data centers. When making frequent requests to external APIs or websites, several issues arise:
- IP Blocking β many services (Google, Facebook, marketplaces) recognize traffic from data centers and apply rate limiting or complete blocking.
- Geographic Restrictions β to access content available only in certain countries (e.g., scraping regional prices on Wildberries or Ozon).
- Rate Limiting β a single IP address can make a limited number of requests per minute. Proxies allow for load distribution.
- Privacy β hiding the actual source of requests when dealing with sensitive data or competitive intelligence.
Typical use cases for proxies in Cloud Functions include:
- Scraping marketplaces (Wildberries, Ozon, Amazon) for monitoring competitor prices
- Data collection from social media (Instagram, TikTok) via APIs or web scraping
- Automating ad checks in different regions
- Massive requests to search engines (Google, Yandex) for SEO analysis
- Testing geolocation features of applications
What Types of Proxies are Suitable for Cloud Functions
The choice of proxy type depends on the task, budget, and anonymity requirements. Hereβs a comparison of the main options:
| Proxy Type | Speed | Anonymity | Best For |
|---|---|---|---|
| Datacenter Proxies | High (50-200 ms) | Medium | Scraping simple websites, API requests, SEO monitoring |
| Residential Proxies | Medium (200-800 ms) | High | Scraping social networks, marketplaces, bypassing anti-bot systems |
| Mobile Proxies | Medium (300-1000 ms) | Very High | Instagram, TikTok, mobile applications, Facebook API |
Selection Recommendations:
- For scraping marketplaces (Wildberries, Ozon, Amazon) β residential proxies with request-based rotation, so each request comes from a new IP.
- For API requests (Google Maps API, OpenWeatherMap) β datacenter proxies with high speed, if there are no strict IP restrictions.
- For social networks (Instagram, TikTok) β mobile proxies, as they have IPs from mobile operators and are rarely blocked.
- For SEO scraping (Google, Yandex) β residential proxies with geographic targeting to the desired region.
Setting Up Proxies in Python (requests, aiohttp)
Python is the most popular language for Cloud Functions when working with scraping and automation. Let's look at integrating proxies with the requests library (synchronous requests) and aiohttp (asynchronous requests).
Example with requests library (HTTP proxy)
import requests
import os
def parse_with_proxy(request):
# Get proxy data from environment variables
proxy_host = os.environ.get('PROXY_HOST', 'proxy.example.com')
proxy_port = os.environ.get('PROXY_PORT', '8080')
proxy_user = os.environ.get('PROXY_USER', 'username')
proxy_pass = os.environ.get('PROXY_PASS', 'password')
# Form the proxy URL with authentication
proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
proxies = {
'http': proxy_url,
'https': proxy_url
}
try:
# Make a request through the proxy with a timeout
response = requests.get(
'https://api.example.com/data',
proxies=proxies,
timeout=10,
headers={'User-Agent': 'Mozilla/5.0'}
)
# Check the response status
response.raise_for_status()
return {
'statusCode': 200,
'body': response.json(),
'ip_used': response.headers.get('X-Forwarded-For', 'unknown')
}
except requests.exceptions.ProxyError as e:
return {'statusCode': 502, 'error': f'Proxy error: {str(e)}'}
except requests.exceptions.Timeout:
return {'statusCode': 504, 'error': 'Request timeout'}
except requests.exceptions.RequestException as e:
return {'statusCode': 500, 'error': f'Request failed: {str(e)}'}
Important Points:
- Environment Variables β store proxy data (host, port, username, password) in Secret Manager or Cloud Functions environment variables, not in code.
- Timeouts β always set a
timeoutto prevent the function from hanging due to proxy issues. - User-Agent β add a User-Agent header to make requests appear as if they come from a real browser.
- Error Handling β handle ProxyError (proxy issues) and Timeout (slow proxy) separately.
Example with aiohttp (asynchronous requests)
For high-load tasks (e.g., scraping 1000+ pages), use asynchronous requests with aiohttp:
import aiohttp
import asyncio
import os
async def fetch_with_proxy(url, proxy_url):
async with aiohttp.ClientSession() as session:
try:
async with session.get(
url,
proxy=proxy_url,
timeout=aiohttp.ClientTimeout(total=10),
headers={'User-Agent': 'Mozilla/5.0'}
) as response:
return await response.text()
except aiohttp.ClientProxyConnectionError:
return {'error': 'Proxy connection failed'}
except asyncio.TimeoutError:
return {'error': 'Request timeout'}
def parse_multiple_urls(request):
proxy_url = f"http://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@{os.environ['PROXY_HOST']}:{os.environ['PROXY_PORT']}"
urls = [
'https://example.com/page1',
'https://example.com/page2',
'https://example.com/page3'
]
# Run asynchronous requests in parallel
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
tasks = [fetch_with_proxy(url, proxy_url) for url in urls]
results = loop.run_until_complete(asyncio.gather(*tasks))
return {'statusCode': 200, 'results': results}
The asynchronous approach allows for making 10-100 parallel requests through the proxy, which is critical for scraping large volumes of data within the limited execution time of Cloud Functions (up to 9 minutes).
Working with SOCKS5 Proxies
Some proxy providers offer SOCKS5 proxies for more reliable handling of UDP traffic or bypassing blocks. To work with SOCKS5 in Python, use the requests[socks] library:
# Add to requirements.txt:
# requests[socks]
import requests
def use_socks5_proxy(request):
proxy_url = f"socks5://{os.environ['PROXY_USER']}:{os.environ['PROXY_PASS']}@{os.environ['PROXY_HOST']}:{os.environ['PROXY_PORT']}"
proxies = {
'http': proxy_url,
'https': proxy_url
}
response = requests.get(
'https://api.ipify.org?format=json',
proxies=proxies,
timeout=10
)
return {'statusCode': 200, 'ip': response.json()}
Setting Up Proxies in Node.js (axios, node-fetch)
Node.js is the second most popular language for Cloud Functions. Let's explore integrating proxies with the axios and node-fetch libraries.
Example with axios
const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');
exports.parseWithProxy = async (req, res) => {
const proxyUrl = `http://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
const agent = new HttpsProxyAgent(proxyUrl);
try {
const response = await axios.get('https://api.example.com/data', {
httpsAgent: agent,
timeout: 10000,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
});
res.status(200).json({
success: true,
data: response.data,
proxyUsed: proxyUrl.split('@')[1] // Return host:port without password
});
} catch (error) {
if (error.code === 'ECONNREFUSED') {
res.status(502).json({ error: 'Proxy connection refused' });
} else if (error.code === 'ETIMEDOUT') {
res.status(504).json({ error: 'Proxy timeout' });
} else {
res.status(500).json({ error: error.message });
}
}
};
Dependencies for package.json:
{
"dependencies": {
"axios": "^1.6.0",
"https-proxy-agent": "^7.0.2"
}
}
Example with node-fetch and SOCKS5
const fetch = require('node-fetch');
const { SocksProxyAgent } = require('socks-proxy-agent');
exports.fetchWithSocks5 = async (req, res) => {
const proxyUrl = `socks5://${process.env.PROXY_USER}:${process.env.PROXY_PASS}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}`;
const agent = new SocksProxyAgent(proxyUrl);
try {
const response = await fetch('https://api.ipify.org?format=json', {
agent,
timeout: 10000
});
const data = await response.json();
res.status(200).json({
success: true,
yourIP: data.ip
});
} catch (error) {
res.status(500).json({ error: error.message });
}
};
Dependencies for SOCKS5:
{
"dependencies": {
"node-fetch": "^2.7.0",
"socks-proxy-agent": "^8.0.2"
}
}
Proxy Authentication: Username/Password and IP Whitelist
There are two main methods of authentication when working with proxies:
1. Username and Password Authentication
The most common method is passing credentials in the proxy URL:
http://username:password@proxy.example.com:8080
Advantages: Easy to set up, does not require a fixed source IP.
Disadvantages: Credentials are sent with every request, slight overhead.
2. IP Whitelist Authentication
Some providers allow adding Cloud Functions IP addresses to a whitelist. The problem is that Cloud Functions use dynamic IPs from the Google Cloud pool.
Solution: Use Cloud NAT to route outgoing traffic through a static external IP:
- Create a VPC network and subnet in Google Cloud
- Set up Cloud NAT with a reserved static IP
- Connect Cloud Functions to the VPC Connector
- Add the static IP to the proxy provider's whitelist
After setup, the proxy does not require a username and password:
proxies = {
'http': 'http://proxy.example.com:8080',
'https': 'http://proxy.example.com:8080'
}
Recommendation: For most cases, use username/password authentication β it is simpler and does not incur additional costs for Cloud NAT (from $0.044/hour + traffic).
IP Rotation and Proxy Pool Management
When scraping large volumes of data, it is critical to use IP rotation to avoid blocks. There are several approaches:
1. Provider-Side Rotation (Rotating Proxies)
Many providers offer rotating proxies β a single endpoint that automatically changes the IP with each request or on a timer:
# One endpoint, IP changes automatically
proxy_url = "http://username:password@rotating.proxy.com:8080"
# Each request comes from a new IP
for i in range(100):
response = requests.get('https://api.ipify.org', proxies={'http': proxy_url})
print(f"Request {i}: IP = {response.text}")
Advantages: No need to manage a proxy pool manually, simple integration.
Disadvantages: No control over specific IPs, may be more expensive.
2. Manual Proxy Pool Management
If you have a list of static proxies, implement rotation at the code level:
import random
import requests
# Proxy pool (can be loaded from Secret Manager)
PROXY_POOL = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
"http://user:pass@proxy3.example.com:8080",
]
def get_random_proxy():
return random.choice(PROXY_POOL)
def parse_with_rotation(urls):
results = []
for url in urls:
proxy = get_random_proxy()
try:
response = requests.get(
url,
proxies={'http': proxy, 'https': proxy},
timeout=10
)
results.append({
'url': url,
'status': response.status_code,
'proxy': proxy.split('@')[1]
})
except Exception as e:
# If the proxy doesn't work, try another
proxy = get_random_proxy()
response = requests.get(url, proxies={'http': proxy, 'https': proxy})
results.append({'url': url, 'status': response.status_code})
return results
3. Session-Based Proxies (Sticky Sessions)
For tasks where you need to maintain one IP within a session (e.g., logging into a site), use a session ID in the proxy URL:
# Add session ID to login
import uuid
session_id = str(uuid.uuid4())
proxy_url = f"http://username-session-{session_id}:password@proxy.example.com:8080"
# All requests with this session_id will go through one IP
session = requests.Session()
session.proxies = {'http': proxy_url, 'https': proxy_url}
# Login
session.post('https://example.com/login', data={'user': 'test', 'pass': '123'})
# Subsequent requests in the same session
session.get('https://example.com/dashboard')
Error Handling and Timeouts
When working with proxies in Cloud Functions, it is critical to handle errors properly to avoid data loss and exceeding execution time limits.
Types of Errors and Handling Methods
| Error | Cause | Solution |
|---|---|---|
| ProxyError | Proxy is unavailable or incorrect credentials | Switch to another proxy from the pool |
| Timeout | Slow proxy or overloaded server | Set a timeout of 5-10 seconds, retry with another IP |
| 407 Proxy Authentication Required | Incorrect username/password | Check credentials in environment variables |
| 429 Too Many Requests | Rate limiting on the target site | Add a delay between requests, use more IPs |
| 403 Forbidden | Proxy IP is blocked by the site | Change IP, use residential instead of datacenter |
Example of Comprehensive Error Handling
import requests
import time
from requests.exceptions import ProxyError, Timeout, RequestException
def fetch_with_retry(url, proxy_pool, max_retries=3):
"""
Request with automatic retry and proxy switching on errors
"""
for attempt in range(max_retries):
proxy = random.choice(proxy_pool)
try:
response = requests.get(
url,
proxies={'http': proxy, 'https': proxy},
timeout=10,
headers={'User-Agent': 'Mozilla/5.0'}
)
# Check the status code
if response.status_code == 200:
return {'success': True, 'data': response.text, 'proxy': proxy}
elif response.status_code == 429:
# Rate limiting β wait and try again
time.sleep(2 ** attempt) # Exponential backoff
continue
elif response.status_code == 403:
# IP blocked β change proxy
continue
else:
return {'success': False, 'status': response.status_code}
except ProxyError:
# Proxy not working β try the next one
print(f"Proxy {proxy} failed, trying another...")
continue
except Timeout:
# Timeout β try with another proxy
print(f"Timeout with {proxy}, retrying...")
continue
except RequestException as e:
# Other errors
print(f"Request failed: {e}")
if attempt == max_retries - 1:
return {'success': False, 'error': str(e)}
continue
return {'success': False, 'error': 'Max retries exceeded'}
Setting Timeouts in Cloud Functions
Cloud Functions have an execution time limit (default 60 seconds, maximum 540 seconds). Consider this when setting proxy timeouts:
- Connection timeout β time to establish a connection with the proxy (recommended 5 seconds)
- Read timeout β time to receive a response from the target server through the proxy (recommended 10-15 seconds)
- Total timeout β total time for the entire request (should be less than the function timeout)
# Python: separate timeouts
response = requests.get(
url,
proxies=proxies,
timeout=(5, 15) # (connect timeout, read timeout)
)
# Node.js with axios
const response = await axios.get(url, {
httpsAgent: agent,
timeout: 10000 // total timeout in milliseconds
});
Best Practices and Performance Optimization
Recommendations for effective proxy use in Cloud Functions:
1. Use Environment Variables for Credentials
Never store proxy usernames and passwords in code. Use Secret Manager or environment variables:
# Create a secret in Google Cloud
gcloud secrets create proxy-credentials \
--data-file=proxy-config.json
# Grant access to Cloud Functions
gcloud secrets add-iam-policy-binding proxy-credentials \
--member=serviceAccount:PROJECT_ID@appspot.gserviceaccount.com \
--role=roles/secretmanager.secretAccessor
# Reading the secret in code
from google.cloud import secretmanager
import json
def get_proxy_config():
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{PROJECT_ID}/secrets/proxy-credentials/versions/latest"
response = client.access_secret_version(request={"name": name})
return json.loads(response.payload.data.decode('UTF-8'))
2. Cache Scraping Results
Use Cloud Storage or Firestore to cache data to avoid making repeated requests through the proxy:
import hashlib
from google.cloud import storage
def fetch_with_cache(url, proxy):
# Generate cache key based on URL
cache_key = hashlib.md5(url.encode()).hexdigest()
# Check cache in Cloud Storage
bucket = storage.Client().bucket('my-cache-bucket')
blob = bucket.blob(f"cache/{cache_key}.json")
if blob.exists():
# Return cached data
return json.loads(blob.download_as_text())
# Make a request through the proxy
response = requests.get(url, proxies={'http': proxy})
data = response.json()
# Save to cache
blob.upload_from_string(json.dumps(data))
return data
3. Monitoring and Logging
Monitor proxy performance and error rates through Cloud Logging:
import logging
import time
def fetch_with_logging(url, proxy):
start_time = time.time()
try:
response = requests.get(url, proxies={'http': proxy}, timeout=10)
duration = time.time() - start_time
logging.info({
'url': url,
'proxy': proxy.split('@')[1],
'status': response.status_code,
'duration': duration,
'success': True
})
return response
except Exception as e:
duration = time.time() - start_time
logging.error({
'url': url,
'proxy': proxy.split('@')[1],
'error': str(e),
'duration': duration,
'success': False
})
raise
4. Optimize Cold Start
Cloud Functions have a cold start delay. Minimize dependencies and use minimal library versions:
# requirements.txt β only necessary libraries
requests==2.31.0
# Avoid heavy libraries like pandas unless critical
Use global variables to reuse connections:
# Create a session once at cold start
session = requests.Session()
session.proxies = {'http': PROXY_URL, 'https': PROXY_URL}
def parse_data(request):
# Reuse session between calls
response = session.get('https://api.example.com/data')
return response.json()
5. Geographic Targeting of Proxies
For tasks with geographic targeting (e.g., scraping regional prices), use proxies tied to a specific country or city:
# Example with residential proxies, where you can specify the country in the login
proxy_url = f"http://username-country-ru:password@proxy.example.com:8080"
# Or use different endpoints for different countries
PROXIES_BY_COUNTRY = {
'RU': 'http://user:pass@ru.proxy.example.com:8080',
'US': 'http://user:pass@us.proxy.example.com:8080',
'DE': 'http://user:pass@de.proxy.example.com:8080'
}
def parse_by_country(country_code):
proxy = PROXIES_BY_COUNTRY.get(country_code)
response = requests.get('https://example.com', proxies={'http': proxy})
return response.text
Conclusion
Integrating proxies with Google Cloud Functions opens up wide opportunities for scraping, automation, and working with APIs without IP restrictions. Key points to consider include proper error handling with retry logic, using timeouts to prevent hanging, IP rotation to avoid blocks, and securely storing credentials in Secret Manager.
For most scraping and automation tasks, the optimal choice will be residential proxies β they provide high anonymity and a low block rate due to the use of real user IPs. For working with social networks and mobile applications, we recommend mobile proxies, which have IPs from mobile operators and are virtually never blocked by platforms like Instagram and TikTok.
With the right setup of Cloud Functions with proxies, you get a scalable and cost-effective solution for processing large volumes of data without the need to manage infrastructure.