Traditional topic extraction methods acquire web pages' topics only by analyzing the pages' content, and the analyzing results are not very accurate.
传统的主题抽取方法单纯依靠分析网页内容的来自动获取网页主题,其分析结果并不十分精确。
This paper presents a new method for content extraction from Web pages based on statistic and content-features.
该文提出了一种基于统计与正文特征的网页正文抽取方法。
WEB page content structure is very helpful for applications such as information retrieval, classification, information extraction etc.
页面内容结构分析在WEB信息检索、分类和抽取等方面有重要作用。
应用推荐