In this paper, considering the National High-tech R&D Program and the requirements of pervasive computing environment, we make further researches on content extraction of Chinese web pages and get some good performances as below.1.
本文结合国家863计划课题和普适计算环境下的需求,对中文网页的正文抽取技术进行了比较深入的研究,取得了以下主要研究成果:1.系统分析和比较了现有的正文抽取方法。
参考来源 - 面向普适计算的正文抽取技术的研究与设计·2,447,543篇论文数据,部分数据来源于NoteExpress
Adds the character set needed to display Simplified Chinese Web pages.
添加显示简体中文网页所需的字符集。
The experimental results are similar using plain English text and Chinese Web pages to evaluate feature selection methods.
使用普通英文文本和中文网页评测特征选取方法的结果是一致的。
In this paper, we propose a method to find and delete duplicated Chinese web pages which is based on "semantic fingerprint".
本论文提出了一种基于语义指纹的大规模网页去重的算法。
应用推荐