Why some websites don't open through proxy: complete guide
Proxy is an indispensable tool for web scraping, testing, SMM automation and bypassing geographic restrictions. But sometimes instead of content you see a 403 error, timeout or blank page. Let's figure out why this happens and how to fix it.
1. Proxy detection and blocking
This is the most common reason. Modern web applications use special services to detect proxy traffic. The site analyzes:
- ASN (Autonomous System Number) — many proxy providers use known ASN ranges that are easy to block
- User behavior — impossibly fast IP switching between requests, absence of cookies, strange click patterns
- TLS fingerprints — browsers send unique data about SSL version, extensions, cipher order
- WebGL and Canvas fingerprints — even JavaScript can reveal proxy usage
Example: The site sees that 100 product pages were loaded from your IP in 10 seconds. This is clearly not a human — blocking is inevitable.
2. Geographic restrictions
The site checks IP geolocation and denies access if it doesn't match expectations:
- Banks and financial services block access from certain countries
- Streaming services (Netflix, YouTube) restrict content by region
- Government websites may be unavailable from outside
- E-commerce platforms change language and currency based on IP
If you use a datacenter proxy from the US, but the site requires access only from Europe — you'll get a 403 error or redirect.
3. IP address reputation
Each IP has a history. If the address was previously used for spam, scraping or DDoS attacks, sites will block it:
- Blacklists — IP gets into databases like Project Honey Pot, Spamhaus, AbuseIPDB
- Low score in services like IPQualityScore — sites use such services for filtering
- Previous violations — if an IP was already blocked on a site, it may remain on the blacklist for a long time
You can check IP reputation on abuseipdb.com or ipqualityscore.com.
4. Incorrect headers and configuration
Often sites block requests due to missing or incorrect HTTP headers:
| Header | Problem |
|---|---|
User-Agent |
Missing or strange (like Python-requests/2.25.1) |
Referer |
Doesn't match the logic of site navigation |
Accept-Language |
Missing or doesn't match IP geolocation |
X-Forwarded-For |
Reveals proxy or VPN usage |
Solution: Use real browser headers. Here's an example in Python:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
proxy = 'http://your-proxy:port'
response = requests.get('https://example.com',
headers=headers,
proxies={'http': proxy, 'https': proxy})
5. Protocol and port issues
Some proxies only support HTTP, but you're trying to access an HTTPS site. Or the port is blocked at the network level:
- HTTP vs HTTPS — make sure your proxy supports both protocols
- Ports — standard 80 (HTTP) and 443 (HTTPS), but some sites use non-standard ports
- SOCKS vs HTTP — different proxy types have different limitations
Tip: If a site doesn't open through an HTTP proxy, try SOCKS5. It works at a lower level and better bypasses some restrictions.
6. Rate limiting and DDoS protection
If you make many requests in a row, even through different IPs, the site may block you:
- 429 Too Many Requests — you exceeded the request limit
- Temporary blocking — usually for 1-24 hours
- Permanent blocking — if you continue attacking the server
- Cloudflare, WAF — specialized protection systems that distinguish bots from humans
7. Practical solutions
✓ Use residential proxies instead of datacenters
Residential proxies are real IP addresses of home users. They are much harder to detect because they look like normal traffic. Datacenters are often blocked because their ASN is known.
✓ Add delays between requests
import time
import random
for url in urls:
response = requests.get(url, headers=headers, proxies=proxies)
# Random delay from 1 to 5 seconds
time.sleep(random.uniform(1, 5))
✓ Rotate proxies
Don't use one IP for all requests. Switch between different addresses:
proxies_list = [
'http://proxy1:port',
'http://proxy2:port',
'http://proxy3:port',
]
for i, url in enumerate(urls):
proxy = proxies_list[i % len(proxies_list)]
response = requests.get(url, proxies={'http': proxy, 'https': proxy})
✓ Check IP before using
Make sure the IP is not blacklisted:
import requests
def check_ip_reputation(ip):
response = requests.get(f'https://ipqualityscore.com/api/json/ip/{ip}')
data = response.json()
return data.get('fraud_score', 0)
# Use only IPs with low score
if check_ip_reputation(proxy_ip) < 75:
# IP is safe
pass
✓ Use browser automation for complex sites
If a site uses JavaScript and complex protection, regular HTTP requests won't help. Use Selenium or Puppeteer:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=http://proxy:port')
driver = webdriver.Chrome(options=options)
driver.get('https://example.com')
✓ Use mobile proxies for mobile sites
Mobile proxies work through real mobile networks (4G/5G). They are more reliable for mobile applications and are often not blocked by sites that restrict access for PCs.
✓ Handle errors correctly
try:
response = requests.get(url, proxies=proxies, timeout=10)
response.raise_for_status()
except requests.exceptions.ProxyError:
print("Proxy error — IP may be blocked")
except requests.exceptions.Timeout:
print("Timeout — server is not responding")
except requests.exceptions.HTTPError as e:
if response.status_code == 403:
print("Access denied — try a different proxy")
Summary
Sites block proxies for various reasons: from detecting traffic patterns to simple IP reputation. There's no universal solution, but a combination of good proxies, correct headers, delays and IP rotation will solve most problems.
For scraping and automation it's recommended to use quality residential proxies that look like real traffic from home users. They are more expensive than datacenters, but work more reliably and are rarely blocked.