·2,447,543篇论文数据,部分数据来源于NoteExpress
本论文提出了一种基于语义指纹的大规模网页去重的算法。
In this paper, we propose a method to find and delete duplicated Chinese web pages which is based on "semantic fingerprint".
提出了一种通过新闻主题要素学习新闻内容的新闻网页去重算法。
This article proposes one kind of duplicated news web pages removal algorithm though study news content on elements of news subject.
实验结果表明,该方法能够完成针对新闻内容的新闻网页的去重,并得到较高的查全率和查准率。
The experimental result indicated that, this method can complete in view of the news content duplicated web pages, and obtains the high recall and the accuracy ratio.
应用推荐