针对当前相似重复记录检测方法中存在的问题,提出一种改进方法。
Problems in current existing methods of detecting approximately duplicate records are discussed, and an improved method is proposed.
在不同领域的数据集上的实验结果表明,该方法能够提高重复记录检测的精度,且具有良好的噪声数据抑制能力。
Experimental results on a range of datasets show that our approach improves duplicate accuracy significantly over traditional techniques and has a good ability of noise data constraint.
本文设计并实现了数据获取系统,主要研究数据获取中的两个关键技术:数据源增量数据获取技术和相似重复记录检测技术。
This paper intends to illustrate the data extracting system design, with focus on two key technologies in data extracting, namely, incremental data extracting and duplicate record detecting.
实验表明,提出的方法能有效的检测汉语相似重复记录。
The experimental results prove: the approach can detect efficiently the approximately duplicate Chinese database records.
该方法根据关系表的决定属性值划分记录集,并在每个决定属性值类中检测相似重复记录。
The proposed method partitions record set according to decided attribute values, and then detects approximately duplicate records in each class of decided attribute value.
提出了利用有效权值和长度过滤的优化算法进行记录匹配,减少重复记录的检测时间,提高算法的效率;
In record match, we came up with the optimized method using valid weight value and length filtering to reduce the runtime of original algorithm and improve its efficiency.
提出了利用有效权值和长度过滤的优化算法进行记录匹配,减少重复记录的检测时间,提高算法的效率;
In record match, we came up with the optimized method using valid weight value and length filtering to reduce the runtime of original algorithm and improve its efficiency.
应用推荐