This article is a mirror article of machine translation, please click here to jump to the original article.

View: 10314|Reply: 1

[Communication] Website anti-pickpocketing

[Copy link]
Posted on 7/12/2019 5:22:08 PM | | |
1. HTTP request header

Every time an HTTP request is sent to the server, a set of attributes and configuration information is passed, which is the HTTP request header. Since the request header sent by the browser is different from the request header sent by the crawler code, it is likely to be discovered by the anti-crawler, resulting in the IP blocking.

2. Cookie settings

Websites track your visit through cookies and interrupt your visit immediately if crawler behavior is detected, such as filling out a form particularly quickly or browsing a large number of pages in a short period of time. It is recommended to check the cookies generated by these websites in the process of collecting websites, and then think about which one the crawler needs to deal with.

3. Access path

The general crawler access path is always the same, and it is easy to be recognized by anti-crawlers, try to simulate user access, and randomly access the page.

4. Frequency of visits

Most of the reasons for blocking IPs are because the access frequency is too fast, after all, they want to complete the crawler task quickly, but the speed is not reached, and the efficiency decreases after the IP is blocked.

The basic anti-crawler strategy is these, of course, some stricter anti-crawlers, not only these, which requires anti-crawler engineers to slowly study the anti-crawler strategy of the target website, with the continuous upgrading of the anti-crawler strategy, the crawler strategy also needs to be continuously upgraded, coupled with efficient and high-quality proxy IP, the crawler work can be carried out efficiently.




Previous:SpringBootMainApplication or also for Application
Next:Python introductory tutorial full version (you can learn it if you know Chinese)
Posted on 7/12/2019 7:01:50 PM |
Crawlers simulate HTTP request data, and all anti-crawlers are the same, just to see whose algorithm is smarter and more efficient. It is also necessary to formulate a reasonable strategy based on your own business situation.

For example, on a normal consulting website, users cannot have 1,000 requests in 1 minute, or tens of thousands of requests in 1 hour, if a single IP exceeds the set threshold, you can directly reject it or jump to a verification code page, slide or enter the verification code, you can access normally again, otherwise the IP will be blocked.
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com