架构师_程序员_码农网

Retrieve password
Register

QQ登录

Just one step to get started

Search
架构师_程序员_码农网 " 架构师 ' 其他技术&Other Technologies ' 网站 建设 ' 巧用 robots避免蜘蛛黑洞
View:8779|Reply: 0
打印 上一主题 下一主题

[Website Knowledge]Using robots to avoid spider black holes

[copy link]
jimmy choo choo
跳转到指定楼层
owner of the building
发表于 2014-10-23 22:44:58| 看该作者回帖奖励|ReverseBrowse|Read Mode

For the Baidu search engine, the spider black hole refers to the website through the very low cost to create a large number of parameters too much, and the content of the same but the specific parameters of the different dynamic URL, like an infinite loop of the "black hole" will spider trapped, Baiduspider wasted a lot of resources to crawl is invalid web page.
For example, many websites have a filtering function, through the filtering function of the web page will often be a large number of search engine crawl, and a large part of the search value is not high, such as "500-1000 prices between the rental", first of all, the website (including the reality) on the basic no relevant resources, and secondly, the website (including the reality) on the basic no relevant resources. ) is basically no relevant resources, and secondly, the users of the station and search engine users do not have this search habit. This kind of web page is a large number of search engine crawling, can only take up the site's valuable crawling quota. So how to avoid this situation?
We take a group-buying site in Beijing as an example, to see how the site is the use of robots to skillfully avoid this spider black hole:

For ordinary screening results page, the site chose to use static links, such as: http://bj.XXXXX.com/category/zizhucan/weigongcun
The same condition screening results page, when the user selects a different sorting conditions, it will generate a dynamic link with different parameters. dynamic links, and even the same sorting conditions (such as: are in descending order by sales), the parameters generated are different. For example: http://bj.XXXXX.com/category/zizhucan/weigongcun/hot?mtt=1.index%2Fpoi.0.0.i1afqhek
http://bj.XXXXX.com/category/zizhucan/ weigongcun/hot?mtt=1.index%2Fpoi.0.0.i1afqi5c

For the group-buying network, only let the search engine crawl the screening results page can be, and all kinds of with parameters of the results of the sorting page through the robots rules refuse to provide to the search engine.
robots.txt file usage has such a rule: Disallow: /*? *, that is, to prohibit search engines from accessing all dynamic pages in the site. In this way, the site is exactly Baiduspider priority to show high-quality pages, blocked the low-quality pages, for Baiduspider to provide a more friendly site structure, to avoid the formation of black holes.






Previous article: VMware virtual machine to install MAC OSX Mountain Lion
Next: Win system to install Mac OS X10.9 Black Apple tutorial
The first thing you need to do is to get your hands on some of the most advanced technology that you can find in the market.
You need to log in before you can post back Log in | Register

This version of the integral rules

DISCLAIMER:
All software, programming materials or articles published by Code Farmer are limited to be used for learning and research purposes only; the above contents shall not be used for commercial or illegal purposes, otherwise, all the consequences shall be borne by the users themselves. This site information from the network, copyright dispute has nothing to do with this site. You must completely remove the above content from your computer within 24 hours of downloading. If you like the program, please support the genuine software, buy registration and get better genuine service. If there is any infringement, please contact us by email to deal with it.

Mail To:help@itsvse.com

QQ| ( ICP备14021824号-2 )|Sitemap

GMT+8, 2024-9-17 14:06

Quick ReplyBack to topBack to list