This article is a mirror article of machine translation, please click here to jump to the original article.

View: 11657|Reply: 0

[Website Building Knowledge] How to block search engines from crawling website pages?

[Copy link]
Posted on 1/13/2016 10:16:38 AM | | |

When doing website operation, especially website ranking optimization, we always think about how to guide search engine spiders to crawl web pages and include them. However, many times some websites do not want to be patronized by search engines because of the different user groups and target regions, how do we solve this problem? Let's study it with the author Xiao Dan today!   

When we see that we want to block crawling, most SEOs think of robots.txt files. Because in our cognition, robot files can effectively crawl certain pages by Dujie search engine. But you know, although this method is very good, Xiaodan thinks that it is more suitable for the website that is not completed, in order to avoid the dead link or investigation period in the future.   

If we just want to block a search engine from crawling, we don't have to burden the space, just use a little bit of code. For example, we want to block Baidu spiders from crawling

That's it. Of course, this is just a way to block Baidu crawling, if you want any search engine, just replace Baiduspider with a spider that changes the search engine.   

Common search engine spider names are as follows:   

1. BaiduSpider Baidu's comprehensive index spider   

2. Googlebot Google Spider   

3. Googlebot-Image is a spider specially used to grab images      

4. Mediapartners-Google Ad Affiliate Code Spider   

5. Yahoo Slurp Yahoo Spider   

6、Yahoo! Slup China Yahoo China Spider   

7. Yahoo!-AdCrawler Yahoo Ad Spider   

8. YodaoBot NetEase spider   

9. Sosospider Tencent SOSO integrated spider   

10. Sogou Spider Sogou comprehensive spider   

11. MSNBot Live integrated spider   

However, if you want to block all search engines, then use a robot file:

By this point, many friends should understand that the command that prohibits the creation of web snapshots in the code is noarchive. Therefore, if we have restrictions on search engines, we can add the code directly to the web page according to our prohibited snapshot objects; On the contrary, without adding a single piece of code, you can ensure that major search engines can access the website normally and create snapshots.     

(Please indicate the reprint from: www.wangzhan.net.cn/news/n1913.htm, thank you!) To cherish the fruits of other people's labor is to respect yourself! )




Previous:Linux commonly uses chmod commands
Next:Apple ISO APP to crawl HTTPS packets
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com