JavaScript rendering through proxy: how to avoid blocks and errors
When you parse websites that load content through JavaScript, simple HTTP requests become ineffective. Add bot protection and geolocation restrictions to this ā and the task becomes many times more complex. Proxies solve part of the problems, but require proper configuration.
Why JavaScript rendering requires a proxy
Modern websites often use frameworks like React, Vue, or Angular that load content on the client side. When you send a regular GET request, you get empty HTML with a <div id="root"></div> tag, not the ready content.
Problems that proxies solve:
- Geolocation blocks. Websites restrict access by country. A proxy with an IP from the required region bypasses these restrictions.
- Bot protection. Cloudflare, hCaptcha and similar systems block automated requests. Residential proxies look like regular users and pass checks better.
- Rate limiting. A server can block a single IP after multiple requests. Proxies distribute traffic and avoid blocks.
- Anonymity. Hide your real IP when parsing.
Headless browsers and proxies: basics
Headless browsers are used for JavaScript rendering ā Chromium without a graphical interface. Popular options:
- Puppeteer ā Node.js library for controlling Chrome/Chromium.
- Playwright ā cross-browser alternative, supports Chrome, Firefox, Safari.
- Selenium ā classic choice, works with different browsers.
All of them support proxies through browser launch parameters or connection options.
Practical setup
Puppeteer with proxy
Basic example of connecting a proxy to Puppeteer:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: [
'--proxy-server=http://proxy.example.com:8080'
]
});
const page = await browser.newPage();
await page.goto('https://example.com');
const content = await page.content();
console.log(content);
await browser.close();
})();
If the proxy requires authentication:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: [
'--proxy-server=http://proxy.example.com:8080'
]
});
const page = await browser.newPage();
// Setting credentials for proxy
await page.authenticate({
username: 'user',
password: 'pass'
});
await page.goto('https://example.com');
const content = await page.content();
await browser.close();
})();
Playwright with proxy
In Playwright, the proxy is configured through the browser context:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
proxy: {
server: 'http://proxy.example.com:8080',
username: 'user',
password: 'pass'
}
});
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://example.com');
const content = await page.content();
await browser.close();
})();
Proxy rotation for multiple requests
For large-scale parsing, you need proxy rotation. Here's a simple example with an array:
const puppeteer = require('puppeteer');
const proxies = [
'http://proxy1.com:8080',
'http://proxy2.com:8080',
'http://proxy3.com:8080'
];
let proxyIndex = 0;
async function getPageWithProxy(url) {
const currentProxy = proxies[proxyIndex % proxies.length];
proxyIndex++;
const browser = await puppeteer.launch({
args: [`--proxy-server=${currentProxy}`]
});
const page = await browser.newPage();
await page.goto(url);
const content = await page.content();
await browser.close();
return content;
}
// Usage
(async () => {
const urls = ['https://example.com/1', 'https://example.com/2'];
for (const url of urls) {
const content = await getPageWithProxy(url);
console.log('Parsed:', url);
}
})();
Common errors and solutions
| Error | Reason | Solution |
|---|---|---|
| ERR_TUNNEL_CONNECTION_FAILED | Proxy is unavailable or incorrect credentials | Check proxy IP:port, username/password. Test via curl |
| Timeout during loading | Slow proxy or website blocks request | Increase timeout, add User-Agent, use residential proxies |
| 403 Forbidden | Website recognized a bot | Add realistic headers, use residential proxies, add delay between requests |
| CAPTCHA on every request | Website sees the same User-Agent | Rotate User-Agent, use different proxies for each browser |
Adding realistic headers
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
);
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9',
'Accept': 'text/html,application/xhtml+xml',
'Referer': 'https://google.com'
});
await page.goto('https://example.com');
Performance optimization
Disabling unnecessary resources
Loading images and styles slows down parsing. If you only need text:
const page = await browser.newPage();
// Block loading of images and styles
await page.on('request', (request) => {
const resourceType = request.resourceType();
if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://example.com');
Using a browser pool
For parallel processing of multiple pages, create a browser pool instead of launching a new browser for each request:
const puppeteer = require('puppeteer');
let browser;
const maxPages = 5;
let activePage = 0;
async function initBrowser() {
browser = await puppeteer.launch({
args: ['--proxy-server=http://proxy.example.com:8080']
});
}
async function parsePage(url) {
const page = await browser.newPage();
try {
await page.goto(url);
const content = await page.content();
return content;
} finally {
await page.close();
}
}
// Usage
(async () => {
await initBrowser();
const urls = ['url1', 'url2', 'url3'];
for (const url of urls) {
await parsePage(url);
}
await browser.close();
})();
Tools and libraries
- Puppeteer Extra ā Puppeteer extension with plugin support for bypassing bot protection.
- Cheerio ā lightweight library for parsing HTML after browser rendering.
- Axios + Proxy Agent ā for simple requests through a proxy without a browser.
- Scrapy ā Python framework with built-in proxy support and distributed parsing.
Important: When working with proxies, make sure you comply with the target website's terms of use and do not violate its robots.txt. Parsing should be ethical and not overload servers.
Conclusion
JavaScript rendering through a proxy is a powerful tool for automating parsing, but requires attention to detail. Proper browser configuration, proxy rotation, realistic headers, and performance optimization are the keys to reliable operation.
For large-scale parsing with high anonymity requirements, residential proxies are suitable, which look like regular users and bypass most bot protection.