... 4.2.7 倒排索引(Inverted Index) 4.3 网页抓取(Crawling the Web) 4.4 网站索引(Indexing the Web) ...
基于1个网页-相关网页
If crawling the Web empirically and without commercial bias is the moral duty of a search engine, Baidu is, at least in the eyes of its competitors, not a moral company.
如果说不带任何商业偏见地抓取网站是一个搜索引擎的道德义务的话,那么,至少在竞争者眼中,百度不是一个有道德的公司。
Most search engines work by crawling the Web, indexing and filtering the content they find into massive databases, and searching these databases to find results matching a particular search query.
大多数搜索引擎的工作方式是:在Web上爬行,索引并过滤在海量数据库中找到的内容,然后搜索这些数据库以找到匹配特定搜索查询的结果。
But all the major and legitimate Web crawling engines obey the requests in robots.txt.
不过,所有主要的合法Web爬虫引擎都会遵从robots . txt内的要求。
应用推荐