传统的主题抽取方法单纯依靠分析网页内容的来自动获取网页主题,其分析结果并不十分精确。
Traditional topic extraction methods acquire web pages' topics only by analyzing the pages' content, and the analyzing results are not very accurate.
实际应用结果表明,基于正则式匹配方法的全自动包装器可以精确抽取CNK I网页的全部检索内容。
The practical application shows that the automatic wrapper based on regular expressions can precisely extract whole required contents from CNKI web pages.
该算法能比较准确地抽取到相关网页的内容,同时,本文亦对其可能存在的不足进行了说明。
This algorithm can extract information from related webpage more exactly. In the end, some drawbacks and the further works have also been present in the end.
应用推荐