If you are involved in recruiting or HR analytics, you have probably faced the situation: you need to quickly gather 500+ job vacancies from competitors, track salary trends, or extract employer contacts β and the platforms block you after just 20-30 requests. LinkedIn and HeadHunter actively protect their data, and without the right approach, scraping turns into an endless battle with CAPTCHAs and bans.
In this guide, we will explore which tools actually work in 2024, how to set up proxies for stable data collection, and what mistakes to avoid to prevent losing your account.
Why LinkedIn and HeadHunter Block Scraping
Both platforms monetize their data. LinkedIn sells access to its database through paid plans like Recruiter and Recruiter Lite, while HeadHunter does so through its API and paid job postings. When someone starts to collect this data en masse for free, the platforms respond harshly. Here are the specific protection mechanisms you will encounter:
Rate Limiting β Limiting Request Frequency
LinkedIn tracks the number of profile views and job page visits over a period of time. A free account can view about 300 profiles per month β after that, you receive a warning or a temporary block. With automated scraping without delays between requests, this limit is exhausted in just a few minutes. HeadHunter limits the number of search queries from a single IP β exceeding this limit will show a CAPTCHA or temporarily block access.
Behavior Analysis and User-Agent
The platforms analyze behavior patterns: a real user scrolls the page, lingers on content, and clicks inconsistently. A bot makes requests at the same intervals, does not scroll, and sends atypical headers. LinkedIn additionally checks for an authorized session β without logging into an account, you see limited data and quickly get blocked by IP.
IP Address Blocking
This is the most common form of protection. If too many requests come from a single IP, that IP gets blacklisted. Data center IPs (AWS, Google Cloud, Hetzner) are blocked particularly quickly: the platforms know these address ranges and treat them with heightened suspicion. Home and mobile IPs are blocked less frequently because they may belong to real users.
β οΈ Important to Know
In 2023, LinkedIn significantly tightened its protection: now even when manually viewing profiles with a VPN or data center proxy, your account can be blocked. For working with LinkedIn, residential or mobile proxies are critically important.
What Recruiters and HR Analysts Scrape
Before setting up tools, determine your task β it will affect your choice of approach and type of proxy. Here are the main scenarios that HR specialists and recruiting agencies work with:
| Task | Platform | Data Volume |
|---|---|---|
| Market Salary Monitoring | HeadHunter, LinkedIn | 500β5000 vacancies/day |
| Collecting Employer Contacts | 100β1000 profiles/day | |
| Analyzing Candidate Requirements | HeadHunter, LinkedIn | 1000β10,000 vacancies |
| Tracking New Competitor Vacancies | HeadHunter | Daily Monitoring |
| Searching for Passive Candidates | 50β500 profiles/day |
The key point: tasks with a large volume of data (thousands of vacancies per day) require a pool of rotating proxies. Tasks with a small volume (monitoring 50-100 positions daily) can be solved with 1-2 static proxies while maintaining delays between requests.
Ready-made Tools for Scraping Job Vacancies
The good news: you don't need to write code from scratch. There are ready-made solutions for different tasks and levels of technical expertise. Let's break down the main categories.
No-code Tools
Apify β a cloud platform with ready-made "actors" for LinkedIn and HeadHunter. There is a ready-made LinkedIn Jobs Scraper and HH.ru Scraper. You simply specify the search parameters, and the platform does the rest. It supports connecting your own proxies. Pricing starts at $49/month, with a free limit available.
Phantombuster β specializes in LinkedIn. It can collect job vacancies, profiles, and company contacts. Works through an authorized LinkedIn account. Supports proxies. Important: one LinkedIn account = one proxy profile; otherwise, you will get banned for changing IP.
Octoparse β a visual scraper builder. Allows you to set up data collection from any site without code by pointing to the required elements with your mouse. Supports proxy rotation. Suitable for HeadHunter β the interface is simple and user-friendly.
Tools for Technical Users
ParseHub β a desktop application with a visual interface, but more flexible than Octoparse. It can work with dynamic content (JavaScript pages). This is critical for LinkedIn β most data is loaded dynamically.
Bright Data (Web Scraper IDE) β a professional platform with built-in proxies. There are ready-made templates for LinkedIn. Expensive, but reliable for industrial volume.
HH.ru API β the official API of HeadHunter. Free for non-commercial use, paid for businesses. If your task is to monitor job vacancies rather than mass collection of contacts, the official API is the most stable option. Limits: 50 requests per second for authorized applications.
π‘ Tip
For HeadHunter, start with the official API β it is legal, stable, and free up to certain limits. For LinkedIn, you cannot do without third-party tools and proxies, as there is no official public API for job vacancies.
Why Proxies Are Needed and Which Type to Choose
Proxies are intermediary servers through which your requests are routed. The platform sees the proxy's IP, not your real address. With proxy rotation (automatic IP change), each request appears as if it is coming from a new user β this allows you to bypass limits and blocks.
However, not all proxies are equally effective for LinkedIn and HeadHunter. The choice of proxy type critically affects the outcome:
| Proxy Type | HeadHunter | Speed | Price | |
|---|---|---|---|---|
| Residential | β Excellent | β Excellent | Average | $$ |
| Mobile | β Excellent | β Good | Average | $$$ |
| Data Center | β Often Blocked | β οΈ Moderately | High | $ |
Residential Proxies β The Optimal Choice for LinkedIn
Residential proxies use real IP addresses from home users. From LinkedIn's perspective, this is an ordinary person sitting at home. Such IPs rarely end up on blacklists, and the platform cannot distinguish them from real users. For scraping LinkedIn, this is the industry standard.
Key parameters when choosing residential proxies for scraping job vacancies:
- Geolocation: choose IPs from the country whose vacancies you are scraping (for HeadHunter β Russia, for LinkedIn β the relevant country)
- Rotation: automatic IP change after each request or on a timer
- IP Pool: the larger, the better, reduces the risk of reusing a blocked IP
- Support for HTTP/HTTPS and SOCKS5 β most scraping tools require these protocols
Mobile Proxies β For Working with LinkedIn Accounts
If you are scraping LinkedIn through an authorized account (as Phantombuster operates), mobile proxies provide an additional advantage: LinkedIn sees the mobile operator as the source and trusts such IPs even more. One mobile IP can serve thousands of real users (behind the operator's NAT), so even high activity from it does not raise suspicion.
Data Center Proxies β Only for HeadHunter
Data center proxies are fast and cheap, but LinkedIn blocks them aggressively. They work better for HeadHunter: the platform is less paranoid about data center IPs, especially if delays between requests are maintained. Suitable for budget monitoring of vacancies on HH at small volumes.
LinkedIn Scraping: Step-by-step Setup
LinkedIn is the most challenging platform for scraping. It is important to act carefully to avoid losing your account. Let's break down a working scheme using Phantombuster β one of the most popular tools among recruiters.
Step 1: Prepare Your LinkedIn Account
Never use your main work account for scraping. Create a separate account or use a secondary one. If it gets blocked β you won't lose valuable connections and history. The account should be "warmed up": a filled profile, several connections, and at least a week of activity before starting scraping.
Step 2: Link Proxies to the Account
Critical rule: one LinkedIn account = one IP address. If you log in today with IP 1 and tomorrow with IP 2 β this is a red flag for LinkedIn's security system. Use a static residential proxy (sticky session) for each account.
In Phantombuster, the proxy setup looks like this:
- Go to Settings β Proxies in your Phantombuster account
- Click Add Proxy
- Enter the proxy details: host, port, username, password
- Select the type: HTTP or SOCKS5 (depends on your proxy provider)
- Click Test Proxy β make sure the proxy works
- Assign this proxy to a specific "phantom" (task) that works with your account
Step 3: Set Up LinkedIn Jobs Export
In Phantombuster, find the phantom "LinkedIn Jobs Search Export". Settings:
- Search URL: insert the LinkedIn job search URL with the required filters (position, city, employment type)
- Number of jobs per launch: start with 25-50. Do not set 500 from the first day
- Launch frequency: once every 2-3 hours. Do not run continuously
- Session cookie: copy the li_at cookie from your browser (instructions are available in Phantombuster)
Step 4: Set Safe Limits
LinkedIn blocks for aggressiveness, not for the fact of scraping itself. Safe limits for one account:
- No more than 80-100 job views per day
- Delay between requests: at least 3-5 seconds
- Take breaks during nighttime (simulate human behavior)
- Do not run scraping on weekends β it looks suspicious for a B2B platform
β οΈ If You Need a Large Volume of Data from LinkedIn
If you need to scrape thousands of job vacancies per day β use multiple accounts, each with its own residential proxy. One account + one IP = a maximum of 100 vacancies per day without the risk of blocking. 10 accounts Γ 100 = 1000 vacancies per day.
HeadHunter Scraping: Features and Setup
HeadHunter is easier than LinkedIn in terms of scraping for two reasons: there is an official API, and the protection is less aggressive. However, with mass data collection without proper setup, you will still get blocked.
Option 1: Official HeadHunter API (Recommended)
If your task is to monitor job vacancies and analyze the market (without collecting contacts), use the official hh.ru API. It is completely legal and provides stable access to data.
- Register an application at dev.hh.ru
- Obtain client_id and client_secret
- Use the GET /vacancies endpoint to search for vacancies
- Filtering parameters: text, area (region), salary, experience, schedule
- Limit: 50 requests per second for authorized applications
The result comes in JSON format β it is easy to load into Excel or Google Sheets through tools like Zapier or Make (formerly Integromat) without writing code.
Option 2: Scraping via Apify (No Code)
If you need data that is not available in the official API (for example, employer contacts or data in non-standard formats), use Apify with a ready-made actor for HH.ru:
- Go to apify.com and find the actor "HH.ru Scraper"
- Click Try for free
- In the settings, specify the search query (position, city)
- In the Proxy configuration section, select "Custom proxies" and insert your proxy details
- For HeadHunter, residential proxies with Russian IPs will work β the platform is regional
- Click Start and wait for results
- Export data to CSV, JSON, or Excel
Option 3: Octoparse for Advanced Tasks
Octoparse allows you to set up scraping of any elements on the HH.ru page β including those not available in the API. For example, you can collect complete job descriptions, contact details (if visible), and links to companies.
- Download and install Octoparse
- Create a new task, insert the job search URL on hh.ru
- Use the Auto-detect mode β Octoparse will automatically determine the structure of the list
- Check that all necessary fields are highlighted (title, company, salary, city)
- In the task settings, enable IP Rotation and add your proxies
- Set a delay between requests: 2-4 seconds
- Run in the cloud (Cloud Extraction) for continuous collection
π‘ Proxy Geolocation for HeadHunter
HeadHunter determines the user's region by IP and shows regional vacancies. If you want to scrape vacancies from a specific city (for example, only Moscow or St. Petersburg), use proxies with IPs from that region. For nationwide monitoring, any Russian IP is sufficient.
Common Mistakes and How to Avoid Them
Most problems when scraping LinkedIn and HeadHunter arise from the same mistakes. Here is a checklist of what not to do:
β Mistake 1: Using One IP for Everything
The most common mistake of beginners is running scraping from their home IP or from a single proxy. As soon as the platform detects abnormal activity β the IP is blocked permanently. Solution: use rotating proxies with automatic IP change or a pool of several static proxies.
β Mistake 2: Too High Request Speed
Scraping 1000 pages in 10 minutes is a sure way to get banned. A real user physically cannot browse pages at such speed. Set delays: at least 2-3 seconds between requests for HeadHunter, 5-10 seconds for LinkedIn. Add random variation to the delay (not exactly 3 seconds, but between 2 and 5 β this simulates human behavior).
β Mistake 3: Changing IP for LinkedIn Account
If you use rotating proxies to work with an authorized LinkedIn account β each request comes from a new IP. LinkedIn sees this as account hacking (someone is logging in from different locations) and blocks it. For authorized sessions, use only sticky proxies (fixed IP for a long time) or static residential proxies.
β Mistake 4: Ignoring User-Agent
The User-Agent is a string that the browser sends to the server, identifying itself. Many scraping tools by default send a User-Agent like "python-requests/2.28.0" β this instantly reveals a bot. Set a realistic User-Agent of a modern browser. In Apify and Phantombuster, this is done automatically; in Octoparse β in the task settings.
β Mistake 5: Scraping Without Checking robots.txt
LinkedIn prohibits scraping in its robots.txt and actively sues companies that do this on an industrial scale. This does not mean that you cannot collect data for personal analysis β but it is important to understand the legal risks of commercial use. HeadHunter is more lenient, especially if you use the official API.
β Mistake 6: Cheap Public Proxies
Free or very cheap proxies from public lists are a trap. They are already blocked by most platforms, work unstably, and often intercept data. For serious work, you need paid proxies from trusted providers with real residential or mobile IPs.
Checklist Before Starting Scraping
- β A separate account is used (not the main work account)
- β Residential or mobile proxies are connected
- β For LinkedIn: one account = one fixed IP
- β Delays between requests are set (at least 3 sec)
- β User-Agent is set as a real browser
- β Daily request limit is capped at reasonable values
- β Proxies are tested before launching
- β Proxy geolocation matches the target region
Conclusion
Scraping job vacancies from LinkedIn and HeadHunter is a working tool for recruiters, HR analysts, and labor market researchers. The main thing is to choose the right approach: for HeadHunter, start with the official API; for LinkedIn, use specialized tools like Phantombuster or Apify with properly configured proxies.
Key takeaways from the guide: LinkedIn requires residential or mobile proxies with a fixed IP per account, while HeadHunter is less strict but also needs proxies for large volumes. Adhere to request limits, simulate human behavior, and never use your main account for automation.
If you plan to regularly monitor job vacancies or conduct large-scale data collection from LinkedIn, we recommend using residential proxies β they provide maximum compatibility with both platforms and minimal risk of blocks even during prolonged use.