使用站点地图引导爬行器遍历您的站点。
所谓的“会话标识符”也会吓走爬行器。
下一步是为爬行器提供一个样例通知文档。
The next step is to make a sample announcement document available to the crawler.
避免任何会阻碍爬行器在站点中漫游的东西。
Avoid anything that makes it hard for the spider to crawl your site.
还必须注意爬行器对每个页面的内容大小限制。
You must also pay attention to the size limits spiders impose on each page's content.
清理了爬行器路径之后,必须确保爬行器是受欢迎的。
Once your spider paths are cleared, you must ensure the spider is welcome.
会阻碍爬行器的一个问题是对页面使用长的动态url。
One sure way to push away the spider is to use long, dynamic URLs for your pages.
增加了默认的robots . txt来控制爬行器。
爬行器是自动的,所以不会像人类访问者那样填写注册表单。
Spiders are automated, so there's no human visitor to fill out a registration form.
为人类访问而设计的Web页面并不适合搜索引擎的爬行器。
Web pages designed for human visitors are not friendly for crawlers. There are a number of site design techniques you can use to make the search engine's time at your site both easy and meaningful.
重定向这种技术告诉浏览器和爬行器请求的URL已经改变了。
A redirect is a technique that tells a browser and a spider that the requested URL has changed.
一个样例场景就是包含定期更新的通知页面的网站的调度爬行器。
A sample scenario is based on a scheduled crawler for a website with regularly updated announcement pages.
即使您的站点欢迎爬行器,也不能保证它以后不会遗弃这个站点。
Even if your site lets in the spider, it's no guarantee that it won't abandon your site later.
即使您避免使用这些惹麻烦的技术,仍然可能会给爬行器造成阻碍。
Even if you avoid troublesome technologies, you might still cause trouble for the spider.
最明显的建议是,当爬行器到达时,确保站点正在运行,能够做出响应。
The most obvious advice is to make sure your site is up and responding when the spider arrives.
它是爬行器最早遇到的单词,也是页面在SERP中列出时显示的标题。
It's the first word or words the spider encounters and it's the title of your page's listing in the SERPs.
爬行器很讨厌这种技术,因为它导致成百上千的不同url显示同样的内容。
Spiders hate this technique because it results in the same content being displayed for hundreds or thousands of different URLs.
另外,如果用户需要这些技术才能使用链接,那么爬行器就无法沿着链接前进。
Moreover, if users need any of these technologies to follow the links, the spider won't be able to do so.
代替将整个站点转换为静态url,您可以挑选一些希望爬行器建立索引的页面。
Instead of converting your entire site to static URLs, pick the pages you want to index by a crawler.
您的站点已经包含路径,而且可能已经有了最重要的爬行器路径类型:站点地图。
Your site already contains paths, and probably already has the most important kind of spider path: your site map.
这些标记告诉用户突出显示的单词对页面很重要,而且爬行器会以相同方式看到它们。
These tags tell the user the highlighted terms are important to the page, and the spider sees them the same way.
创建合理的关键词策略是最重要的SEO任务,这可以满足爬行器和潜在受众的需要。
Creating a well-reasoned keyword strategy is the most important SEO task you can perform to meet both the needs of spiders and your potential audience.
如果您的Web页面没有这些技术就根本无法显示,那么页面就不会被爬行器编入索引。
If your Web page can't display at all without these technologies, your pages won't be indexed by the spider.
Google爬行器会由于元刷新重定向而阻塞,而302重定向会导致重复内容处罚。
The Google spider will choke on a meta refresh redirect and a 302 redirect can cause duplicate content penalties.
爬行器只查看HTML代码,就像有视力障碍的用户所用的屏幕阅读器一样。
Spiders see only the HTML coding, just as screen readers work for visually impaired people.
但是,正如前面讨论的,显示页面应该不需要cookie,否则爬行器无法将它编入索引。
But never require a cookie to display the page, as we discussed earlier, or the spider won't be able to index it.
导航对于爬行器和通过搜索(而不是通过老式的站点内导航方式)进入站点的用户仍然非常有用。
Navigations are still very useful for spiders and people getting to your site through searches rather than by navigating through the site (the old-fashioned way).
爬行器很反感这些动态站点,因为参数的组合几乎是无穷的 ——爬行器不希望在站点中迷路。
Spiders are skittish about these dynamic sites because the combinations of parameters can be almost limitless -- the spider doesn't want to get lost within your site.
在浏览器中看起来很好的页面却可能阻碍爬行器,这会使爬行器看不到或误解整个页面或部分页面。
Pages that seem fine in a browser can trip up a spider, which loses or misinterprets part or all of your page.
只有爬行器到了您的站点上,站点地图才会发挥作用,但是还有更加主动的使页面被编入索引的方法。
Site maps rely on spiders coming to call on your site, but more aggressive methods can land your pages in the search index, too.
应用推荐