Note that the web site crawler cannot crawl secured pages.
请注意,该网站的抓取工具无法抓取安全网页。
A website crawler is a software program designed to follow hyperlinks throughout a web site, retrieving and indexing pages to document the site for searching purposes.
一个网站爬虫是设计在整个一个网络站点跟随超连接的一个软件程序,检索并且索引页记录为寻找目的地点。
This paper provides a Maintaining WAP Site Crawler system. This system can automatically traverse the WAP site, parse every page in the site and check syntax and semantic faults.
本文提出了一种维护WAP网站的网络爬虫系统,该系统可以自动遍历WAP网站,并对网页进行分析,检查语法和语义的错误。
Today, the approach that many search engines still recommend for some level of crawling efficiency is to make an HTML-based site map page available for a crawler to pick up.
如今,许多搜索引擎仍然推荐实现某一级别爬行效率的方法,那就是使爬虫程序可以爬行基于HTML 的站点地图页面。
An effective site map is built as a high level overview of the different pages in the site and presents a large list of links to the crawler in as few pages as possible.
高效站点地图将会构建为站点中不同页面的简要概述,并且会在尽可能少的页面中为爬网程序呈现大量的链接的列表。
The behavior policies define which pages the crawler will bring down to the indexer, how often to go back to a Web site to check it again, and something called a politeness policy.
这种行为策略定义了爬虫会将哪些页面带入索引程序、以什么样的频率回到Web站点上再次对它进行检查,以及一种礼貌原则。
Thus the efficiency of site crawls provides relief to both sides: the hosting Web server and the crawler as well, by keeping the number of GET requests for pages at a minimum.
这样,站点爬行的效率为两边都减轻了负担:宿主Web服务器和爬虫程序,方法是把GET页面请求数量保持在最小值。
The high-level task for this article is a Web crawler: given a base URL for a Web site, you'll collect elements from the site that you can use for some purpose.
本文的高级任务是创建一个Webcrawler:给定一个网站的基URL,从该网站收集可以用作某种用途的元素。
Instead of converting your entire site to static URLs, pick the pages you want to index by a crawler.
代替将整个站点转换为静态url,您可以挑选一些希望爬行器建立索引的页面。
Use a site map to lead the crawler around your site.
使用站点地图引导爬行器遍历您的站点。
Once you've verified ownership, you can go to Site configuration > Crawler access > Remove URL > Removed URLs (or > Made by others) and click "Cancel" next to any requests you wish to cancel.
一旦你确认了所有者身份,你可以依次访问”网站配置“ > ”抓取工具的权限“ > ”删除网址“ > ”已经删除的网址(或者其他人的请求)“,然后在人员你想取消的请求对应处点击取消。
Helps to optimize the entire structure of the site for a crawler by providing an alternate set of Web pages so that crawlers can quickly access and index a large number of embedded pages.
有助于为爬网程序优化站点的整个结构,通过提供备用的We b页面组,从而使爬网程序能够快速地访问大量的嵌入式页面并针对这些页面建立索引。
What you can do is to check what a crawler sees from your site.
你可以做的是检查什么是履带从您的网站看到。
Can I control the crawler request rate from Ask spider to my site?
我能控制爬虫请求率从问蜘蛛到我的地点吗?
Can I control the crawler request rate from Ask spider to my site?
我能控制爬虫请求率从问蜘蛛到我的地点吗?
应用推荐