← Back to Blog

Walmart Scraping: How to Choose Proxies and Set Up Data Collection Without Getting Blocked

Walmart uses powerful bot protection from PerimeterX. We analyze which proxies work for scraping, how to set up rotation, and how to avoid blocks when collecting prices and stock levels.

šŸ“…January 24, 2026
```html

Walmart is the second largest online store in the US after Amazon, and its data is critically important for e-commerce businesses: monitoring competitor prices, tracking stock levels, and analyzing product assortments. The problem is that Walmart uses an advanced bot protection system called PerimeterX, which blocks 90% of requests from parsers right on the first page.

In this guide, we will discuss which types of proxies actually work for parsing Walmart, how to set up IP address rotation, bypass browser fingerprinting, and build a stable data collection system that won't fail after an hour of operation.

Why Walmart blocks parsers: PerimeterX protection mechanisms

Walmart uses the PerimeterX protection system (now called HUMAN Security) — one of the most aggressive anti-bot systems on the market. It analyzes each request based on dozens of parameters and blocks suspicious traffic even before your parser receives the HTML code of the page.

Main protection mechanisms of Walmart:

1. IP reputation analysis

PerimeterX checks each IP address against a database of known proxy servers, datacenters, and VPNs. If your IP is in this database — you will receive a block or CAPTCHA. Walmart particularly harshly filters IPs from popular cloud providers (AWS, Google Cloud, DigitalOcean).

2. Behavioral analysis

The system tracks how a user interacts with the page: mouse movements, scrolling speed, clicks. Parsers using Selenium or Puppeteer often get caught here — they open pages too quickly, without natural pauses, and do not move the mouse.

3. TLS and HTTP fingerprinting

PerimeterX analyzes the TLS fingerprint of your connection (cipher order, extensions) and HTTP request headers. Standard Python libraries (requests, urllib) have unique fingerprints that are easily recognized. Even if you change the User-Agent, the system sees a mismatch between the headers and the actual browser.

4. JavaScript challenges

Upon a suspicious request, PerimeterX sends JavaScript code that performs checks in the browser: availability of the Canvas API, WebGL, screen parameters, installed fonts. Simple HTTP parsers (without a browser engine) cannot pass these checks and receive a block.

What happens when blocked:

  • HTTP 403 Forbidden — the most common response, means that your IP or fingerprint is blacklisted
  • Redirect to a CAPTCHA page — the system is unsure, giving a chance to prove you are human
  • Empty page or JSON with an error — the server does not return content at all
  • Temporary IP ban for 15-60 minutes — during aggressive parsing from one address

The key takeaway: for successful parsing of Walmart, a comprehensive strategy is needed, where proxies are only one element. You will also need the right browser engine, human behavior emulation, and proper IP address rotation.

Which proxies work for parsing Walmart: comparison of types

Not all proxies are equally effective for bypassing Walmart's protection. Let's analyze four main types and their applicability to the parsing task.

Proxy Type Effectiveness for Walmart Speed Cost Recommendation
Residential Proxies ⭐⭐⭐⭐⭐
Excellent — IPs of real users, minimal blocks
Average
(200-800 ms)
High
(from $7-15/GB)
Optimal for production
Mobile Proxies ⭐⭐⭐⭐⭐
Excellent — high trust score, rare blocks
Low
(500-1500 ms)
Very high
(from $50-100/month per IP)
For complex cases
Datacenter Proxies ⭐⭐
Poor — high likelihood of blocking (70-90%)
High
(50-150 ms)
Low
(from $1-3/IP)
Not recommended
ISP Proxies ⭐⭐⭐⭐
Good — static residential IPs
High
(80-200 ms)
Average
(from $30-80/month per IP)
For long-term tasks

More about each type:

Residential Proxies — the gold standard for Walmart

These are IP addresses from real home internet providers (Comcast, AT&T, Verizon in the US). Walmart sees them as regular customers, so the blocking rate is minimal — about 5-10% with proper configuration. The main advantage is the huge pools of addresses (millions of IPs), which allows for effective rotation.

When to use: monitoring prices on thousands of products, daily data collection, long-term projects. For parsing Walmart, residential proxies are the optimal choice in terms of efficiency and cost.

Mobile Proxies — maximum reliability

IPs from mobile operators (T-Mobile, Verizon Wireless) have the highest trust score with anti-bot systems. The reason is that one IP is used by thousands of real users (through the operator's NAT), so blocking it means blocking thousands of customers. Walmart practically does not block mobile IPs.

When to use: if residential proxies are not sufficient, if you need to parse particularly protected sections (for example, prices for specific regions), if the budget allows. Mobile proxies provide nearly 100% successful requests but are more expensive.

Datacenter Proxies — not for Walmart

IP addresses from datacenter servers (AWS, OVH, Hetzner) are instantly recognized by PerimeterX. Even if you buy "clean" IPs that have not been used for parsing before, the system still sees that it is a datacenter and not a home provider. The blocking rate is 70-90%.

The only scenario for use: testing the parser on a small volume of data (10-50 pages). They are categorically unsuitable for production.

ISP Proxies (static residential) are a hybrid: IPs from home providers, but hosted in datacenters and allocated to you for a long term (a month or more). They are faster than regular residential ones but are more expensive and have a limited pool of addresses. Suitable if you need stable IPs for long-term parsing of the same product categories.

Residential vs Datacenter Proxies: what to choose for your task

Despite the fact that we have already established that residential proxies are more effective, let's analyze in detail the situations when each type may be justified and calculate the real cost of ownership.

Scenario 1: Monitoring 10,000 products daily

With residential proxies:

  • Average size of a Walmart product page: ~500 KB
  • 10,000 products Ɨ 500 KB = 5 GB of traffic per day
  • Monthly traffic: 150 GB
  • Cost at $10/GB: $1,500/month
  • Percentage of successful requests: 90-95%
  • Real cost considering retries: ~$1,650/month

With datacenter proxies (theoretically):

  • Cost for 100 IPs: ~$200/month
  • Percentage of successful requests: 10-30% (the rest are blocks)
  • Need to make 3-10 attempts for each product
  • Real traffic: 15-50 GB (due to retries)
  • Conclusion: the task is unfeasible — IPs quickly get banned, CAPTCHA at every step

Scenario 2: One-time data collection for 500 products

If you need to collect data for market analysis or research just once, you can try a combined approach:

  • Use datacenter proxies for initial collection of product URLs (category pages)
  • Switch to residential proxies for obtaining detailed product information
  • Cost: ~$50-100 for a one-time task
  • Execution time: 2-4 hours instead of 10-20 hours with datacenters

Key selection factors:

Criterion Residential Datacenters
Data volume Any — from 100 to millions of pages Only small volumes (up to 1000 pages)
Frequency Daily/weekly parsing Only one-time tasks
Execution speed Stable — no delays for retries Unpredictable — many retries
Reliability High — 90-95% success Low — 10-30% success
Cost of failure Low — pay only for successful traffic High — wasting time and money on bans

Conclusion: For any serious Walmart parsing tasks, use residential or mobile proxies. Datacenter proxies can only be considered for testing the parser logic on 10-50 pages, but not for production. Saving on proxies will lead to a loss of time, nerves, and ultimately cost more.

IP rotation strategies: frequency of change and address pools

Even with residential proxies, you can get blocked if you do not set up IP address rotation correctly. PerimeterX tracks behavior patterns: if one IP requests 100 product pages in a minute — it is clearly a bot. The correct rotation strategy is key to stable parsing without blocks.

Three main rotation strategies:

1. Rotation on each request (Rotating Proxies)

Each HTTP request goes through a new IP address. This is the standard mode of operation for most residential proxy providers.

Pros:

  • Minimal risk of blocking — each IP makes 1-2 requests
  • Easy setup — the provider manages the pool
  • You can parse aggressively — hundreds of requests per minute

Cons:

  • Session issues — if the site uses cookies, each request = new session
  • Slower — establishing a new connection takes 200-500 ms

When to use: For parsing Walmart product pages where authorization and sessions are not needed. This is the optimal strategy for most price monitoring tasks.

2. Sticky Sessions

One IP address is used for a series of requests over a certain period (usually 5-30 minutes), then it switches to a new IP.

Pros:

  • Session and cookie preservation — can work with the cart, authorization
  • Faster — reuses the TCP connection
  • More "natural" behavior for anti-bot systems

Cons:

  • Higher risk of blocking — one IP makes 10-50 requests
  • Need to control limits — no more than 30-50 requests from one IP

When to use: If you need to parse data that requires authorization (for example, prices for registered users), or if you are emulating the behavior of a real buyer (viewing category → product → adding to cart).

3. Pool of static IPs with manual rotation

You take 50-100 static residential IPs (ISP proxies) and manage the distribution of requests among them yourself.

Pros:

  • Complete control — you know how many requests each IP has made
  • Maximum speed — static IPs are faster than rotating ones
  • You can "warm up" IPs — make legitimate requests to improve reputation

Cons:

  • Complex setup — need to write logic for request distribution
  • More expensive — ISP proxies cost $30-80 per IP per month
  • Risk of losing IPs — if one gets banned, you will need to replace it

When to use: For high-load systems with 100,000+ requests per day, where speed and stability are critical. Requires experience in parser development.

Recommended settings for Walmart:

For price monitoring (simple parsing of product pages):

  • Type: Rotating proxies with rotation on each request
  • Delay between requests: 2-5 seconds
  • Parallelism: 10-20 threads
  • Geolocation: USA (preferably a state where there are physical Walmart stores)

For complex parsing (with authorization, cart):

  • Type: Sticky sessions lasting 10-15 minutes
  • Request limit per IP: maximum 30-40
  • Delay between requests: 3-7 seconds (human emulation)
  • Parallelism: 5-10 threads (less aggression)

Important: Many residential proxy providers allow you to set session duration through connection parameters. For example, by adding session-15min to the username, you will get a sticky session for 15 minutes. Check this possibility with your provider.

Bypassing fingerprinting: User-Agent, headers, and TLS fingerprints

Proxies solve only half the problem — they give you a clean IP. But PerimeterX analyzes not only the IP but also the "fingerprint" of your browser or parser. Even with a residential IP, you will get blocked if your HTTP client looks like a bot.

What PerimeterX checks:

1. User-Agent and HTTP headers

Standard libraries (Python requests, Node.js axios) send headers that instantly give away a bot. For example, User-Agent: python-requests/2.28.1 — this is a 100% block.

What needs to be changed:

  • User-Agent — use fresh versions of Chrome/Firefox
  • Accept — must match the content type
  • Accept-Language — en-US for parsing Walmart US
  • Accept-Encoding — gzip, deflate, br
  • Referer — previous page (category or homepage)
  • Sec-Fetch-* — Chrome headers for CSRF protection

2. TLS Fingerprint (JA3)

Each HTTP client has a unique TLS fingerprint — the order of ciphers, TLS extensions, protocol versions. PerimeterX compares this fingerprint with the User-Agent: if you write "Chrome 120" and the TLS fingerprint is from Python — you are blocked.

Solution:

  • Use libraries that support custom TLS: curl-impersonate (Python), tls-client (Go)
  • Or use a real browser via Selenium/Puppeteer — they have a real TLS fingerprint

3. JavaScript challenges and Canvas fingerprinting

PerimeterX may send JavaScript code that checks: whether the Canvas API is available, WebGL, what fonts are installed, screen size, timezone. Simple HTTP parsers cannot execute this code.

Solution:

  • Use headless browsers: Puppeteer, Playwright, Selenium
  • Be sure to enable detection bypass mode: puppeteer-extra-plugin-stealth
  • Randomize parameters: window size, timezone, browser language

Example of correct headers for parsing Walmart:

GET /ip/Product-Name/12345678 HTTP/1.1
Host: www.walmart.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Referer: https://www.walmart.com/browse/electronics/tv-video/3944_1060825
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
Connection: keep-alive

Important details:

  • The order of headers matters — real browsers send them in a specific sequence. Use libraries that adhere to this order.
  • Cookies — if PerimeterX set the cookie _px3 or _pxvid, be sure to send it in subsequent requests. This is your session token.
  • HTTP/2 — Walmart uses HTTP/2, and lack of support for this protocol may signal a bot. Ensure your client supports HTTP/2.
  • Do not use the same headers for all requests — vary the User-Agent, use a pool of 10-20 different browser versions.

Rate limiting and delays: how not to exceed request limits

Even with perfect proxies and headers, you will get blocked if you parse too aggressively. Walmart tracks request frequency and behavior patterns. A real user cannot open 100 product pages in a minute — the anti-bot system understands this.

Recommended limits for Walmart:

Request Type Delay between requests Maximum requests from one IP Parallelism
Product Pages 2-5 seconds 30-50 pages (with rotating) 10-20 threads
Category Pages 3-7 seconds 20-30 pages 5-10 threads
Search 5-10 seconds 10-15 requests 3-5 threads
API Endpoints 1-3 seconds 50-100 requests 20-30 threads

Why randomization of delays is important:

If you make requests exactly every 3 seconds (3.000, 6.000, 9.000...), the anti-bot system recognizes the pattern. A real person cannot be that precise — they will have variations: 2.8 sec, 3.4 sec, 2.9 sec.

Correct implementation of delay (Python):

import random
import time

# Incorrect — fixed delay
time.sleep(3)

# Correct — randomized delay
delay = random.uniform(2.0, 5.0)  # from 2 to 5 seconds
time.sleep(delay)

Load management strategies:

1. Adaptive rate limiting

Track the percentage of successful requests. If you start receiving 403 or CAPTCHA — automatically increase delays and decrease parallelism.

success_rate = successful_requests / total_requests

if success_rate < 0.8:  # less than 80% successful
    delay_multiplier *= 1.5  # increase delays
    parallel_workers -= 2    # decrease threads
elif success_rate > 0.95:  # more than 95% successful
    delay_multiplier *= 0.9  # can speed up
    parallel_workers += 1

2. Distribution by time of day

Parse during peak hours of real users (evening in the US, 6:00 PM - 10:00 PM EST). During this time, your traffic mixes with legitimate traffic, and the anti-bot system is less aggressive. At night (2:00 AM - 6:00 AM EST), the protection may be stricter, as there are fewer real users.

3. Warm-up IP addresses

Before starting mass parsing, "warm up" the IP addresses with legitimate requests: open the homepage, a couple of categories, perform a search. This creates an activity history and increases the trust score of the IP.

# Warm-up sequence for a new IP
1. GET https://www.walmart.com/  # homepage
2. Delay 3-5 sec
3. GET https://www.walmart.com/browse/electronics  # category
4. Delay 4-7 sec
5. GET https://www.walmart.com/search?q=laptop  # search
6. Delay 3-6 sec
# Now you can parse target products

Critical error: Do not use the same Referer for all requests. If you are parsing 1000 products and all have the same Referer in the header — this is an obvious bot pattern. Vary the Referer for each request to avoid detection.

```