...字: 数据挖掘;数据清洗;重复记录;SNM算法[gap=942]Key words: data mining; data cleaning; approximately duplicate records; SNM algorithm...
基于6个网页-相关网页
Problems in current existing methods of detecting approximately duplicate records are discussed, and an improved method is proposed.
针对当前相似重复记录检测方法中存在的问题,提出一种改进方法。
The proposed method partitions record set according to decided attribute values, and then detects approximately duplicate records in each class of decided attribute value.
该方法根据关系表的决定属性值划分记录集,并在每个决定属性值类中检测相似重复记录。
The common approach of marking the approximately duplicate records is that a pair of records are compared in a window with fixed length after these records are indexed by a certain keyword.
要把数据表中的相似重复记录标识出来,常用的方法是先将所有记录按照某个关键字进行索引,然后在一个固定长度的窗口范围内进行记录的两两比对。
应用推荐