Website anti-pickpocketing

laical · Posted on 7/12/2019 5:22:08 PM

1. HTTP request header

Every time an HTTP request is sent to the server, a set of attributes and configuration information is passed, which is the HTTP request header. Since the request header sent by the browser is different from the request header sent by the crawler code, it is likely to be discovered by the anti-crawler, resulting in the IP blocking.

2. Cookie settings

Websites track your visit through cookies and interrupt your visit immediately if crawler behavior is detected, such as filling out a form particularly quickly or browsing a large number of pages in a short period of time. It is recommended to check the cookies generated by these websites in the process of collecting websites, and then think about which one the crawler needs to deal with.

3. Access path

The general crawler access path is always the same, and it is easy to be recognized by anti-crawlers, try to simulate user access, and randomly access the page.

4. Frequency of visits

Most of the reasons for blocking IPs are because the access frequency is too fast, after all, they want to complete the crawler task quickly, but the speed is not reached, and the efficiency decreases after the IP is blocked.

The basic anti-crawler strategy is these, of course, some stricter anti-crawlers, not only these, which requires anti-crawler engineers to slowly study the anti-crawler strategy of the target website, with the continuous upgrading of the anti-crawler strategy, the crawler strategy also needs to be continuously upgraded, coupled with efficient and high-quality proxy IP, the crawler work can be carried out efficiently.

Little scum · Posted on 7/12/2019 7:01:50 PM

Crawlers simulate HTTP request data, and all anti-crawlers are the same, just to see whose algorithm is smarter and more efficient. It is also necessary to formulate a reasonable strategy based on your own business situation.

For example, on a normal consulting website, users cannot have 1,000 requests in 1 minute, or tens of thousands of requests in 1 hour, if a single IP exceeds the set threshold, you can directly reject it or jump to a verification code page, slide or enter the verification code, you can access normally again, otherwise the IP will be blocked.

[Communication] Website anti-pickpocketing

Sections viewed