在技术实现中,通过与网页采集程序结合,对HTML文档进行预处理,提取出主题内容,提高了检索效率和查准率。
Second, because the method is based on HTML specification and extracted the contents and structure of web pages with the same source, it is of high scalability.
这就需要采用有效的采集方法以减少采集网页的杂乱、重复等情况的发生。
This requires effective collection method used to reduce the collected page clutter and duplication.
论文陈述了搜索引擎的基本工作流程,讨论了网页信息采集技术、网页存储器与分析索引器技术。
This paper presents the search engine workflow, and discusses the webpage information gathering technique, webpage memorizer and analysis index technique.
研究了一种能够实现对数据型网页中信息实施实时采集的信息技术。
This paper researches into an information technology, which could real-timely extract the interested information from data-type Web pages.
在页面采集中分析了多线程、重复网页、采集器陷阱和网页的存储。
Multi-threading, detection of duplicate content and spider traps, text repository are discussed in page retrieval.
网页增量采集系统是信息检索领域中一个重要的研究方向。
The aim of incremental crawler is to gather changed pages, new pages and died pages, in which the most important part is the new pages.
信息采集系统需要采集的页面数量非常浩大,如何有效地采集到高质量的网页关乎系统性能。
For the mass number of web pages needed to be crawled, how the information crawling system gets more web pages with good quality effectively, is related to the system performance.
在我们最好的免费实时采集模板,我们选择15个最优质的免费的CSS模板,免费的Flash模板,免费网页模板和免费Joomla模板。
In our collection of best free real templates we selected 15 most quality free CSS templates, free flash templates, free HTML templates and free Joomla templates.
在我们最好的免费实时采集模板,我们选择15个最优质的免费的CSS模板,免费的Flash模板,免费网页模板和免费Joomla模板。
In our collection of best free real templates we selected 15 most quality free CSS templates, free flash templates, free HTML templates and free Joomla templates.
应用推荐