对面向主题的信息采集技术进行了探索性研究。
An explorative study on subject oriented web information acquisition techniques is provided.
为保证主题搜索引擎返回信息的准确性,开展对承担主题相关信息采集任务的主题爬虫系统研究具有重要意义。
It is of great significance for researchers to study the information collection related topical crawler system to ensure the topical search engine returns the information accurately.
同时,勘探数据具有显著的多源、多类、多维、多量、多时态和多主题特征,给数据的采集、存储、管理和应用带来了巨大困难。
Simultaneously, exploration data have the characteristic of multiple sources, multiple types, multiple dimensions, multiple numbers, multiple tense and multiple subjects characteristic.
建设主题网关,是综合风险防范研究的重要组成部分,其中对信息采集技术的研究尤为重要。
Construction of the subject gateway is the important constitute of integrated risk prevention, and the research for capturing data is more important.
在技术实现中,通过与网页采集程序结合,对HTML文档进行预处理,提取出主题内容,提高了检索效率和查准率。
Second, because the method is based on HTML specification and extracted the contents and structure of web pages with the same source, it is of high scalability.
在技术实现中,通过与网页采集程序结合,对HTML文档进行预处理,提取出主题内容,提高了检索效率和查准率。
Second, because the method is based on HTML specification and extracted the contents and structure of web pages with the same source, it is of high scalability.
应用推荐