针对当前相似重复记录检测方法中存在的问题,提出一种改进方法。
Problems in current existing methods of detecting approximately duplicate records are discussed, and an improved method is proposed.
在不同领域的数据集上的实验结果表明,该方法能够提高重复记录检测的精度,且具有良好的噪声数据抑制能力。
Experimental results on a range of datasets show that our approach improves duplicate accuracy significantly over traditional techniques and has a good ability of noise data constraint.
本文设计并实现了数据获取系统,主要研究数据获取中的两个关键技术:数据源增量数据获取技术和相似重复记录检测技术。
This paper intends to illustrate the data extracting system design, with focus on two key technologies in data extracting, namely, incremental data extracting and duplicate record detecting.
应用推荐