Reverse proxy for protecting a website from bots: setup

```html

Every day, your website is attacked by hundreds of bots: some steal prices and content, others overload the server with requests, and some look for vulnerabilities. If you own an online store on Wildberries or Ozon, run landing pages for advertising campaigns on Facebook Ads, or manage client websites — this issue directly concerns you. A reverse proxy is the first line of defense that can be set up without deep technical knowledge.

What is a Reverse Proxy and How is it Different from a Regular One

Most people know proxies as a tool for changing IP addresses: you connect to a proxy server, and websites see its address instead of yours. This is called a forward proxy — it protects the user.

A reverse proxy works from the other side. It stands in front of your server and takes all incoming requests instead of it. The visitor thinks they are communicating directly with your website — but in reality, they first hit the proxy server, which checks the request, filters out suspicious activity, and only then forwards legitimate traffic to your actual server.

A Simple Analogy:

Imagine your website is a warehouse. A forward proxy is your courier who picks up goods from suppliers anonymously. A reverse proxy is the guard at the warehouse gate: he checks everyone who wants to enter, lets in customers, and turns away robbers before they reach the goods.

A key difference from a regular proxy is that your real server IP address remains hidden. Malicious actors and bots do not know where your website is physically located — they only see the reverse proxy address. This alone cuts off a significant portion of attacks.

For online store owners, marketers, and arbitrage specialists who manage dozens of landing pages, this is critically important: competitors will not be able to find your actual hosting and attack it directly, bypassing protection.

Why Bots and Scanners are Dangerous for Your Business

Many website owners underestimate the threat of bots, considering it a "technical problem." In reality, it leads to direct business losses. Let's break down specific scenarios that our readers face.

Price Scraping by Competitors

If you sell on marketplaces or run your own online store, competitors can launch a scraper that pulls your prices every 15-30 minutes and automatically lists theirs 1-2% lower. You lose sales without understanding why. Scraper bots also overload the server: hundreds of requests per minute slow down the site for real buyers, which directly affects conversion.

Click Fraud

Arbitrage specialists and marketers working with Facebook Ads and Google Ads regularly face click inflation. Clicker bots click on your ads, burning through your budget without any conversions. Research shows that up to 20-30% of ad traffic in some niches is generated by bots. A reverse proxy with the right rules helps identify and block such sources.

Vulnerability Scanners

Automated scanners like Shodan, Masscan, and hundreds of lesser-known tools constantly scour the internet for unprotected websites. They check for outdated CMS versions, open ports, and default passwords. If your site is on WordPress or any other popular platform — it is definitely already in their databases. Without protection, it's only a matter of time before a vulnerability is found.

DDoS Attacks and Server Overload

Competitors or simply malicious individuals can organize an attack that takes your site down at the most inconvenient moment — for example, during a sale or an active advertising campaign. Even a small DDoS attack with a few thousand requests per second can "bring down" cheap hosting.

Real Statistics:

According to Imperva, over 47% of all internet traffic is generated by bots. Of these, about 30% are malicious bots. If your site receives 1000 visitors a day, about 300 of them are automated scripts that overload the server and distort analytics.

How a Reverse Proxy Protects Your Website: Mechanism of Action

A reverse proxy protects your website on several levels. Understanding these levels will help you set up the system correctly and not miss important steps.

Level 1 — Hiding the Real IP

Once you route traffic through a reverse proxy, your actual server IP address stops being public. Bots and attackers cannot reach the server directly — they hit the proxy instead. This is especially important for DDoS protection: even if a malicious actor knows your website's domain, they cannot attack the server bypassing the protection.

Level 2 — Analyzing and Filtering Requests

Each incoming request is checked against several parameters: User-Agent (which browser/bot is making the request), request frequency from a single IP, geolocation, behavioral patterns. Bots typically make requests too quickly, use non-standard User-Agent strings, or come from known data center IP addresses. A reverse proxy can track and block all of this.

Level 3 — Rate Limiting

A real person cannot browse 500 pages of a website in a minute. A reverse proxy allows you to set limits: for example, no more than 60 requests per minute from a single IP. If someone exceeds the limit — they receive a temporary block or CAPTCHA verification. This effectively stops most scrapers and scanners without harming regular visitors.

Level 4 — Caching and Load Reduction

A reverse proxy caches static content (images, CSS, JavaScript) and serves it directly, without contacting your server. As a result, even if bots break through the filters, they receive cached pages — the load on the actual server is minimal. Plus, this speeds up the site for real visitors, positively impacting SEO and conversion.

Level 5 — SSL/TLS Termination

A reverse proxy handles the encryption of traffic. Your server runs faster as it doesn't spend resources encrypting each request. And visitors see a secure HTTPS connection — this is important for both trust and ranking in Google.

Overview of Solutions: Cloudflare, Nginx, Caddy — Which to Choose

There are several popular solutions for setting up a reverse proxy. The choice depends on your technical level, budget, and the scale of the task.

Solution	Difficulty	Cost	For Whom	Bot Protection
Cloudflare	Low	Free / from $20/month	Everyone, especially beginners	⭐⭐⭐⭐⭐
Nginx	Medium	Free (server needed)	Technical users	⭐⭐⭐⭐
Caddy	Low–Medium	Free (server needed)	Developers, startups	⭐⭐⭐
AWS CloudFront	High	Pay as you go	Enterprise segment	⭐⭐⭐⭐⭐
HAProxy	High	Free (server needed)	High-load projects	⭐⭐⭐⭐

For most website owners, marketers, and arbitrage specialists who do not want to deal with server settings, Cloudflare is the optimal choice. The free plan covers 90% of tasks for protection against bots and scanners. This is where we will start our step-by-step guide.

Setting Up Cloudflare as a Reverse Proxy: Step by Step

Cloudflare is a cloud service that becomes a reverse proxy for your website after a simple DNS server change. No programming, no server access — just settings in your browser.

Step 1 — Register and Add Your Site

Go to cloudflare.com and create an account. After logging in, click the "Add a Site" button and enter your website domain (e.g., myshop.ru). Choose the free plan Free — it's more than enough to start.

Cloudflare will automatically scan your DNS records and show you the list. Check that all records are in place (usually this is an A record with the server IP and MX records for email). Click "Continue."

Step 2 — Change DNS Servers at Your Domain Registrar

Cloudflare will provide you with two nameserver addresses — something like alex.ns.cloudflare.com and diana.ns.cloudflare.com. Go to your domain registrar's control panel (RU-CENTER, Reg.ru, Namecheap, etc.) and replace the current nameservers with these two addresses.

DNS updates take from 15 minutes to 48 hours. Once everything updates, all traffic to your site will start passing through Cloudflare servers — the reverse proxy will work automatically.

Step 3 — Enable "Under Attack" Mode if Necessary

In the Cloudflare panel, go to the Security → Overview section. Here you can see the security level (Security Level). For most websites, the "Medium" level is suitable. If your site is under active attack — temporarily switch to "Under Attack Mode": all visitors will undergo a JS check that bots usually do not pass.

Step 4 — Setting Up Bot Fight Mode

Go to Security → Bots. Turn on the "Bot Fight Mode" switch — this is basic protection against automated bots available for free. Cloudflare automatically identifies and blocks traffic from known malicious bots using a database of billions of requests.

On paid plans (Pro and above), Super Bot Fight Mode is available with more detailed settings: you can separately configure behavior for search bots (Googlebot, Yandexbot should be allowed!), for verified bots (uptime monitoring), and for suspicious automated requests.

Step 5 — Creating Firewall Rules

Go to Security → WAF → Firewall Rules and create rules for your tasks. Here are some examples of rules to add right away:

Example Rule 1 — Blocking Empty User-Agents:

Condition: http.user_agent eq "" → Action: Block. Real browsers always send a User-Agent. An empty User-Agent is almost always a bot.

Example Rule 2 — Blocking Known Scanners:

Condition: http.user_agent contains "sqlmap" or http.user_agent contains "nikto" or http.user_agent contains "nmap" → Action: Block. These are vulnerability scanning tools.

Example Rule 3 — Geo-blocking (if needed):

Condition: ip.geoip.country in {"CN" "KP" "IR"} → Action: Challenge (CAPTCHA). If your business operates only in Russia — you can block countries from which most attacks come.

Step 6 — Setting Up Rate Limiting

In the Security → WAF → Rate limiting rules section, create a rate limiting rule. For example: no more than 100 requests per minute from a single IP address. If the limit is exceeded — show CAPTCHA or temporarily block. This stops most scrapers trying to quickly gather all pages of your site.

Setting Up Nginx Reverse Proxy: Basic Configuration

If you have your own VPS server (for example, on Timeweb, Selectel, or DigitalOcean), you can set up Nginx as a reverse proxy. This gives you more control and flexibility, although it requires basic command line skills.

A typical setup: you have a main server with an application (for example, an online store on port 8080), and a separate server or the same server with Nginx on port 80/443, which takes all traffic and forwards it to the application.

The basic configuration of Nginx as a reverse proxy with bot protection looks like this:

server {
    listen 80;
    server_name mysite.ru www.mysite.ru;

    # Redirect to HTTPS
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name mysite.ru www.mysite.ru;

    ssl_certificate /etc/letsencrypt/live/mysite.ru/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/mysite.ru/privkey.pem;

    # Block empty User-Agents (bots without UA)
    if ($http_user_agent = "") {
        return 403;
    }

    # Block known scanners by User-Agent
    if ($http_user_agent ~* (sqlmap|nikto|nmap|masscan|zgrab|python-requests)) {
        return 403;
    }

    # Rate limiting — apply the limit zone
    limit_req zone=one burst=20 nodelay;

    # Forward requests to the real server
    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Hide information about the real server
        proxy_hide_header X-Powered-By;
        proxy_hide_header Server;
    }
}

# Define the Rate Limiting zone (add to http block)
# limit_req_zone $binary_remote_addr zone=one:10m rate=60r/m;

Pay attention to the proxy_hide_header directive — it hides headers that reveal information about your real server and CMS. This complicates the work of vulnerability scanners: they do not know what exactly they are looking for.

The limit_req_zone limits each IP address to 60 requests per minute. The burst=20 parameter allows for short bursts (a real user may quickly open several pages), but prolonged high load is blocked.

Important for Arbitrage Specialists and Marketers:

If you use residential proxies to check how your landing pages look from different regions — make sure your proxies do not get blocked. Whitelist the IP addresses of your proxy servers in the settings of Cloudflare or Nginx.

Bot and Scanner Blocking Rules: Checklist

After the basic setup of the reverse proxy, use this checklist to ensure that protection is set up comprehensively. Each item is a separate attack vector that needs to be closed.

✅ Basic Protection (mandatory for all)

Bot Fight Mode is enabled in Cloudflare (or an equivalent in another solution)
Rate Limiting is set: no more than 60-100 requests per minute from a single IP
Requests with empty User-Agent are blocked
Known scanners are blocked: sqlmap, nikto, nmap, masscan
Server and X-Powered-By headers are hidden (do not show server and CMS version)
HTTPS is configured, HTTP automatically redirects to HTTPS
The real server IP is not exposed in public databases (Shodan, Censys)

✅ Advanced Protection (for actively attacked sites)

A Honeypot is set up — a hidden page that only bots see (real users do not visit it). IPs that visit the honeypot are automatically blocked
Browser Integrity Check is enabled (Browser Integrity Check in Cloudflare)
CAPTCHA is set up for suspicious traffic (not for all — only for suspicious)
Geo-blocking of countries from which legitimate traffic is not expected
Log monitoring: alerts are set up for a sharp increase in 403/429 errors
Blocking IP ranges of major cloud providers (AWS, Azure, GCP) — most bots operate from the cloud
robots.txt is configured to disallow aggressive bots

✅ Protection of Specific Pages

Login page (/admin, /wp-admin, /login) — strict Rate Limiting (5-10 attempts per minute) and two-factor authentication
API endpoints — mandatory authorization via API key or token
Price pages — additional checks for frequent requests (protection against scraping)
Registration/application form — CAPTCHA or an invisible honeypot trap for bots

Do Not Block Useful Bots!

Googlebot and Yandexbot must be allowed — they index your site. In Cloudflare, they automatically fall into the "Verified Bots" category and are not blocked with the right settings. Check this in the Security → Bots → Bot Analytics section.

Proxy for Monitoring: How to Check Your Website's Protection

After setting up a reverse proxy, it is important to ensure that the protection works correctly and does not block real users. This is where regular proxies come in — from the other side: you become the "external user" and check the website's behavior.

Why Marketers and Arbitrage Specialists Should Check Their Sites Through Proxies

Imagine a situation: you have set up bot protection, enabled geo-blocking, and suddenly notice that conversion has dropped. You may have accidentally blocked real users from certain regions or devices. You can check this by visiting the site through proxies from different countries and types of connections.

For such tasks, mobile proxies work great — they simulate a connection from a real mobile device through a carrier. If your blocking rules correctly allow mobile traffic, it means regular users on smartphones are not affected by the protection.

Practical Testing Scenarios

Scenario 1 — Checking Geo-availability. You launched an ad in Facebook Ads targeting audiences from several regions of Russia. Through proxies with IPs from Novosibirsk, Yekaterinburg, and Krasnodar, check that the landing page opens correctly, does not show CAPTCHA, and loads quickly.

Scenario 2 — Testing Rate Limiting. Open several tabs through one proxy and quickly navigate between pages. If you get blocked during normal behavior — the threshold is too low and needs to be raised.

Scenario 3 — Checking from Different Types of IPs. Try accessing through a data center proxy (simulating a bot/server) — if your protection works, such a request should receive CAPTCHA or blocking. Then access through a residential proxy (simulating an ordinary home user) — access should be free.

Scenario 4 — Checking by Competitors. If you want to see how difficult it is to scrape your site, try doing it yourself with a simple tool. If the scraper gets blocked after 10-20 requests — the protection is working. If it collects data unhindered — you need to tighten the rules.

Monitoring Protection Effectiveness

In Cloudflare, go to the Security → Overview section — here you can see graphs of blocked requests, types of threats, and the geography of attacks. Pay attention to:

A sharp increase in blocked requests — a sign of an active attack or scraping
A high percentage of traffic from one country — it may be worth adding geo-blocking
An increase in 403/429 errors in the logs — check if real users are being blocked
Regular requests to /admin, /wp-login.php — hacking attempts, strengthen the protection of these paths

Conclusion: A Reverse Proxy is Not an Option, but a Necessity

A reverse proxy has long ceased to be a tool only for large corporations. Today, even a small online store, a landing page for an arbitrage campaign, or an SMM agency's website needs basic protection against bots, scanners, and competitor scrapers.

Key takeaways from this guide: Cloudflare offers the fastest start without technical knowledge, while Nginx provides maximum control if you have your own server. In both cases, basic protection can be set up in 30-60 minutes and covers 80-90% of typical threats. Don't forget to check that useful bots (Googlebot, Yandexbot) remain whitelisted and that real users are not suffering from overly aggressive rules.

An additional point for those who use proxies in their work: if you are checking advertising campaigns, monitoring competitors, or testing the availability of your sites from different regions — for these tasks, residential proxies are the best fit. They have IP addresses of real home users, so they pass through most protection systems just like regular visitors, providing an objective picture of what your audience sees.

```

How to Set Up a Reverse Proxy to Protect Your Website from Bots, Scanners, and Parsers: Complete Guide