Setting up a proxy in curl and wget: a complete guide

```html

When scraping websites, automating API requests, or monitoring competitor prices on marketplaces, you will inevitably encounter IP blocks. The curl and wget utilities are standard tools for working with HTTP requests in the command line, and properly configuring proxies in them is critically important for bypassing restrictions. In this article, we will explore all the ways to use proxies in curl and wget: from basic commands to advanced scenarios with IP rotation and error handling.

Basic Proxy Syntax in curl and wget

Let's start with the simplest commands for connecting through a proxy. Both tools support a parameter for specifying the proxy server, but the syntax is slightly different.

Using Proxy in curl

In curl, the proxy is specified using the -x or --proxy parameter. The basic format of the command is:

curl -x http://proxy-server:port http://example.com

A specific example with a real proxy server:

curl -x http://45.130.123.45:8080 http://api.ipify.org

This command will send a request to api.ipify.org (a service that returns your IP address) through the specified proxy server. You will see the proxy's IP instead of your real address.

Using Proxy in wget

In wget, the proxy is configured using the -e use_proxy=yes parameter and environment variables, or directly through options:

wget -e use_proxy=yes -e http_proxy=http://45.130.123.45:8080 http://example.com

Or a shorter version using environment variables (more on this in the section below):

export http_proxy="http://45.130.123.45:8080"
wget http://example.com

Proxy Server Authentication

Most commercial proxy services require authentication with a username and password. This protects the proxy from unauthorized use and allows tracking of each client's traffic. Let's look at how to pass credentials in curl and wget.

Authentication in curl

In curl, the username and password can be specified directly in the proxy server's URL or through a separate -U parameter:

# Method 1: username and password in URL
curl -x http://username:password@proxy-server:port http://example.com

# Method 2: using the -U parameter
curl -x http://proxy-server:port -U username:password http://example.com

A specific example with credentials:

curl -x http://user123:[email protected]:8080 http://api.ipify.org

An important point: if the password contains special characters (@, :, /, ?), they need to be URL-encoded. For example, the @ symbol is replaced with %40:

# If the password contains @: pass@456
curl -x http://user123:pass%[email protected]:8080 http://api.ipify.org

Authentication in wget

In wget, authentication is configured using the --proxy-user and --proxy-password parameters:

wget --proxy-user=username --proxy-password=password \
     -e use_proxy=yes -e http_proxy=http://45.130.123.45:8080 \
     http://example.com

Or through environment variables with credentials in the URL:

export http_proxy="http://username:[email protected]:8080"
wget http://example.com

Working with Different Types of Proxies: HTTP, HTTPS, SOCKS5

Proxy servers operate over different protocols, and the choice of type depends on the task. HTTP proxies are suitable for simple requests, HTTPS provides encryption, and SOCKS5 operates at a lower level and supports any traffic. When scraping marketplaces like Wildberries or Ozon, residential proxies are often used, which can work over any of these protocols.

HTTP and HTTPS Proxies

HTTP proxies are the most common type. They operate at the HTTP protocol level and are suitable for most web scraping tasks:

# HTTP proxy in curl
curl -x http://proxy-server:8080 http://example.com

# HTTPS proxy in curl (for secure connections)
curl -x https://proxy-server:8080 https://example.com

Important: even if the target site uses HTTPS, the proxy can be HTTP. Curl will automatically establish a tunnel using the CONNECT method:

# HTTP proxy for HTTPS site (works correctly)
curl -x http://proxy-server:8080 https://secure-site.com

SOCKS5 Proxies

SOCKS5 is a more versatile protocol that operates at the TCP level and supports any type of traffic (HTTP, HTTPS, FTP, even UDP). This makes SOCKS5 an ideal choice for complex automation tasks:

# SOCKS5 in curl
curl -x socks5://proxy-server:1080 http://example.com

# SOCKS5 with authentication
curl -x socks5://username:password@proxy-server:1080 http://example.com

# SOCKS5h (DNS resolution through proxy)
curl -x socks5h://proxy-server:1080 http://example.com

The difference between socks5 and socks5h: in the first case, DNS queries go from your computer, in the second — through the proxy server. Use socks5h if you want to completely hide your activity, including DNS queries.

In wget, SOCKS5 support is limited, so for such tasks, it's better to use curl or additional utilities like proxychains.

Tip: For scraping marketplaces (Wildberries, Ozon, Yandex.Market), it is recommended to use residential or mobile proxies with HTTP/HTTPS protocol — they are less likely to get blocked, as they have IPs of real users.

Configuring Proxy via Environment Variables

If you regularly work through a proxy, it's more convenient to set environment variables once than to specify parameters in every command. Curl and wget automatically read these variables.

Configuration for the Current Session

Export the variables in the terminal (they will last until the session is closed):

# For HTTP traffic
export http_proxy="http://username:password@proxy-server:8080"

# For HTTPS traffic
export https_proxy="http://username:password@proxy-server:8080"

# For FTP traffic
export ftp_proxy="http://username:password@proxy-server:8080"

# For SOCKS5
export all_proxy="socks5://username:password@proxy-server:1080"

After this, curl and wget will automatically use the proxy:

# Proxy will be applied automatically
curl http://api.ipify.org
wget http://example.com

Permanent Configuration in .bashrc or .zshrc

To ensure proxies are applied every time the terminal is launched, add the variables to your shell's configuration file:

# Open the file in an editor
nano ~/.bashrc  # for bash
# or
nano ~/.zshrc   # for zsh

# Add to the end of the file:
export http_proxy="http://username:password@proxy-server:8080"
export https_proxy="http://username:password@proxy-server:8080"

# Save and apply the changes:
source ~/.bashrc

Exclusions: no_proxy

Sometimes you need to exclude certain addresses from proxying (for example, localhost or internal services):

export no_proxy="localhost,127.0.0.1,192.168.0.0/16,.local"

Now requests to these addresses will go directly, bypassing the proxy.

Proxy Rotation in Bash Scripts

When mass scraping (for example, collecting prices from thousands of product cards on Wildberries), using a single proxy will lead to blocking. The solution is IP rotation. Let's look at how to implement this in bash scripts.

Simple Rotation from a List of Proxies

Create a file proxies.txt with a list of proxy servers (one per line):

http://user1:[email protected]:8080
http://user2:[email protected]:8080
http://user3:[email protected]:8080

A script for sequentially rotating proxies:

#!/bin/bash

# File with the list of URLs to scrape
urls_file="urls.txt"
# File with the list of proxies
proxies_file="proxies.txt"

# Read proxies into an array
mapfile -t proxies < "$proxies_file"
proxy_count=${#proxies[@]}
current_proxy=0

# Process each URL
while IFS= read -r url; do
    # Select proxy in a round-robin manner
    proxy="${proxies[$current_proxy]}"
    
    echo "Requesting $url through $proxy"
    curl -x "$proxy" -s "$url" -o "output_$(basename $url).html"
    
    # Switch to the next proxy
    current_proxy=$(( (current_proxy + 1) % proxy_count ))
    
    # Pause between requests (1-3 seconds)
    sleep $((RANDOM % 3 + 1))
done < "$urls_file"

This script sequentially uses proxies from the list, returning to the first after the last. A random pause between requests makes the activity appear more natural.

Random Proxy Selection

For greater unpredictability, you can select proxies randomly:

#!/bin/bash

proxies_file="proxies.txt"
mapfile -t proxies < "$proxies_file"
proxy_count=${#proxies[@]}

while IFS= read -r url; do
    # Randomly select a proxy
    random_index=$((RANDOM % proxy_count))
    proxy="${proxies[$random_index]}"
    
    echo "Requesting $url through proxy #$random_index"
    curl -x "$proxy" -s "$url" -o "output_$(date +%s).html"
    
    sleep $((RANDOM % 3 + 1))
done < "urls.txt"

Automatic Rotation via Proxy Service API

Many proxy providers (including services that offer residential proxies) offer automatic rotation through a single entry point. You use one proxy address, and the IP changes with each request or on a timer:

# Proxy with automatic rotation
# IP changes with each request
curl -x http://username:[email protected]:8080 http://api.ipify.org
curl -x http://username:[email protected]:8080 http://api.ipify.org

# The two requests above will receive different IP addresses

This is the most convenient way for large-scale scraping — there is no need to manage a proxy list manually.

Passing Headers and User-Agent through Proxy

Modern websites analyze not only the IP address but also the HTTP request headers. The absence of a User-Agent or suspicious headers can lead to blocking even when using high-quality proxies. Let's look at how to properly set headers in curl and wget.

User-Agent in curl

The User-Agent is a header that identifies the browser and operating system. Curl by default sends its own User-Agent (curl/7.x.x), which immediately reveals automation. Replace it with a real browser:

# Chrome on Windows
curl -x http://proxy:8080 \
     -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
     http://example.com

# Firefox on macOS
curl -x http://proxy:8080 \
     -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0" \
     http://example.com

Additional Headers

To make the request more realistic, add typical browser headers:

curl -x http://proxy:8080 \
     -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0" \
     -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
     -H "Accept-Language: ru-RU,ru;q=0.9,en;q=0.8" \
     -H "Accept-Encoding: gzip, deflate, br" \
     -H "Connection: keep-alive" \
     -H "Upgrade-Insecure-Requests: 1" \
     http://example.com

User-Agent in wget

In wget, the User-Agent is set using the --user-agent parameter:

wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0" \
     -e use_proxy=yes -e http_proxy=http://proxy:8080 \
     http://example.com

Randomizing User-Agent in Scripts

For large-scale scraping, it's useful to alternate User-Agents so that requests appear to come from different users:

#!/bin/bash

# User-Agent array
user_agents=(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0 Safari/537.36"
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/605.1.15"
    "Mozilla/5.0 (X11; Linux x86_64) Firefox/121.0"
    "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) Safari/604.1"
)

while IFS= read -r url; do
    # Random User-Agent
    random_ua=${user_agents[$RANDOM % ${#user_agents[@]}]}
    
    curl -x http://proxy:8080 -A "$random_ua" -s "$url"
    sleep 2
done < "urls.txt"

Diagnosing Problems and Error Handling

When working with proxies, errors often occur: timeouts, connection refusals, incorrect authentication. Let's look at how to diagnose and handle these situations.

Checking Proxy Functionality

The simplest way to check a proxy is to request a service that returns your IP:

# Check HTTP proxy
curl -x http://proxy:8080 http://api.ipify.org

# Check SOCKS5 proxy
curl -x socks5://proxy:1080 http://api.ipify.org

# With detailed output
curl -x http://proxy:8080 -v http://api.ipify.org

The -v (verbose) parameter will show connection details, including headers and errors.

Handling Timeouts

Slow proxies or overloaded servers can cause timeouts. Set reasonable time limits:

# Connection timeout 10 seconds, total timeout 30 seconds
curl -x http://proxy:8080 --connect-timeout 10 --max-time 30 http://example.com

# In wget
wget --timeout=30 --tries=3 -e http_proxy=http://proxy:8080 http://example.com

Automatic Error Handling in Scripts

A script for scraping with automatic switching to the next proxy upon error:

#!/bin/bash

proxies_file="proxies.txt"
mapfile -t proxies < "$proxies_file"

fetch_with_retry() {
    local url=$1
    local max_attempts=3
    
    for proxy in "${proxies[@]}"; do
        echo "Attempting through proxy: $proxy"
        
        if curl -x "$proxy" \
                --connect-timeout 10 \
                --max-time 30 \
                -s -f "$url" -o output.html; then
            echo "Success with proxy: $proxy"
            return 0
        else
            echo "Error with proxy: $proxy, trying next"
        fi
    done
    
    echo "All proxies unavailable for $url"
    return 1
}

# Usage
fetch_with_retry "http://example.com/page1"

The -f parameter makes curl return an error for HTTP statuses 4xx and 5xx, allowing you to handle not only network errors but also application-level blocks.

Logging for Debugging

Keep detailed logs of requests for problem analysis:

# Save response headers
curl -x http://proxy:8080 -D headers.txt http://example.com

# Full log of interaction
curl -x http://proxy:8080 -v http://example.com 2>&1 | tee curl.log

# Only HTTP status
curl -x http://proxy:8080 -o /dev/null -s -w "%{http_code}\n" http://example.com

Practical Use Cases

Let's consider real tasks where curl and wget with proxies solve specific business problems.

Scraping Competitor Prices on Marketplaces

Task: collect prices for 500 competitor products from Wildberries for price strategy analysis. Wildberries actively blocks mass requests from a single IP.

Solution: use residential proxies with rotation and User-Agent randomization:

#!/bin/bash

# Proxy with automatic rotation
PROXY="http://user:[email protected]:8080"

# User-Agent array
user_agents=(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0"
    "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0) Safari/604.1"
)

# Read product IDs from file
while IFS= read -r product_id; do
    ua=${user_agents[$RANDOM % ${#user_agents[@]}]}
    
    curl -x "$PROXY" \
         -A "$ua" \
         -H "Accept-Language: ru-RU,ru;q=0.9" \
         -s "https://www.wildberries.ru/catalog/${product_id}/detail.aspx" \
         -o "products/${product_id}.html"
    
    echo "Downloaded product $product_id"
    sleep $((RANDOM % 5 + 3))  # Pause 3-8 seconds
done < product_ids.txt

Monitoring API Availability from Different Regions

Task: check how your service's API works for users from different countries (geo-blocking, response speed).

Solution: proxies with IPs from the required countries:

#!/bin/bash

# Proxies from different countries
declare -A proxies=(
    ["US"]="http://user:[email protected]:8080"
    ["DE"]="http://user:[email protected]:8080"
    ["JP"]="http://user:[email protected]:8080"
)

API_URL="https://api.yourservice.com/v1/status"

for country in "${!proxies[@]}"; do
    echo "Checking from $country..."
    
    response_time=$(curl -x "${proxies[$country]}" \
                         -s -o /dev/null \
                         -w "%{time_total}" \
                         "$API_URL")
    
    http_code=$(curl -x "${proxies[$country]}" \
                     -s -o /dev/null \
                     -w "%{http_code}" \
                     "$API_URL")
    
    echo "$country: HTTP $http_code, response time ${response_time}s"
done

Downloading Files via wget with Proxy Rotation

Task: download an archive of files (product images, documents) from a site that limits speed for a single IP.

#!/bin/bash

proxies_file="proxies.txt"
mapfile -t proxies < "$proxies_file"
proxy_count=${#proxies[@]}
current=0

while IFS= read -r file_url; do
    proxy="${proxies[$current]}"
    filename=$(basename "$file_url")
    
    echo "Downloading $filename through proxy #$current"
    
    wget --proxy-user=username --proxy-password=password \
         -e use_proxy=yes -e http_proxy="$proxy" \
         -O "downloads/$filename" \
         "$file_url"
    
    current=$(( (current + 1) % proxy_count ))
    sleep 2
done < file_urls.txt

Testing Ad Creatives in Different GEOs

Task: check how Facebook Ads look for users from the USA, Canada, and the UK (different currencies, languages, offer availability).

#!/bin/bash

# Mobile proxies from different countries for realism
declare -A mobile_proxies=(
    ["US"]="http://user:[email protected]:8080"
    ["CA"]="http://user:[email protected]:8080"
    ["GB"]="http://user:[email protected]:8080"
)

AD_URL="https://www.facebook.com/ads/library/?id=YOUR_AD_ID"

for country in "${!mobile_proxies[@]}"; do
    curl -x "${mobile_proxies[$country]}" \
         -A "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0) Safari/604.1" \
         -H "Accept-Language: en-US,en;q=0.9" \
         -s "$AD_URL" \
         -o "ads_preview_${country}.html"
    
    echo "Saved preview for $country"
done

For such tasks, mobile proxies are especially effective, as they mimic real smartphone users and are less likely to raise suspicions with Facebook's anti-fraud systems.

Important for arbitrageurs: When checking ad creatives through proxies, use mobile IPs and corresponding User-Agents of mobile devices. Facebook analyzes data consistency (the device type by User-Agent must match the IP type).

Automating Website Availability Checks

Task: monitor the availability of your website every 5 minutes, simulating requests from real users (not from server IP).

#!/bin/bash

PROXY="http://user:[email protected]:8080"
SITE_URL="https://yoursite.com"
LOG_FILE="uptime.log"

while true; do
    timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    
    http_code=$(curl -x "$PROXY" \
                     -s -o /dev/null \
                     -w "%{http_code}" \
                     --max-time 10 \
                     "$SITE_URL")
    
    if [ "$http_code" -eq 200 ]; then
        echo "[$timestamp] OK - HTTP $http_code" >> "$LOG_FILE"
    else
        echo "[$timestamp] ERROR - HTTP $http_code" >> "$LOG_FILE"
        # Send alert (for example, via Telegram API)
        curl -s "https://api.telegram.org/botTOKEN/sendMessage" \
             -d "chat_id=CHAT_ID&text=Website is down: HTTP $http_code"
    fi
    
    sleep 300  # 5 minutes
done

Conclusion

Curl and wget are powerful tools for automating HTTP requests, and proper proxy configuration makes them indispensable for scraping, monitoring, and testing. We have covered all key aspects: from basic syntax to advanced scenarios with IP rotation, error handling, and header randomization.

Key takeaways from the article:

Use the -x parameter in curl and environment variables for proxy configuration
Choose the type of proxy based on the task: HTTP for simple requests, SOCKS5 for versatility
Always replace the default User-Agent with a realistic browser one
Implement proxy rotation for large-scale scraping — this is critical for bypassing blocks
Add error and timeout handling in production scripts
Use random pauses between requests to simulate human behavior

For tasks requiring a high level of anonymity and minimal risk of blocking (scraping marketplaces, checking ads, monitoring competitors), we recommend using residential proxies. They have IPs of real home users, making your requests indistinguishable from regular traffic and significantly reducing the likelihood of being blacklisted.

Now you have a complete set of tools and knowledge for effectively working with proxies in curl and wget. Apply these techniques in your projects, adapt the examples to specific tasks, and scale automation without fear of blocking.

```

Proxy Configuration in curl and wget: Complete Guide with Examples for Web Scraping