有三种方式: 基于爬虫的搜索引擎(crawler-based engine)(比如Google)是通过动化的软件 代理对网站进行访问、读取和索引。爬虫将所收集到的所有信息都返回到中心服务器。
基于20个网页-相关网页
如今,许多搜索引擎仍然推荐实现某一级别爬行效率的方法,那就是使爬虫程序可以爬行基于 HTML 的站点地图页面。
Today, the approach that many search engines still recommend for some level of crawling efficiency is to make an HTML-based site map page available for a crawler to pick up.
首先,让我们看看基于爬虫(crawler - based)的搜索引擎是如何工作的(Google和Yahoo都是这种类型)。
First, let's look at how crawler-based search engines work (both Google and Yahoo fall in this category).
然后,在搜索引擎关键技术的基础上,基于一个轻量级的架构设计了搜索引擎的三个主要模块:网页爬虫、索引器与搜索器。
Then, on basic of search engine's core technologies, based on a lightweight architecture, its three main modules were designed: crawler, indexer and searcher.
应用推荐