...常见情况是一个现实实体可能由多个不完全相同的记录来表示,这样的记录被称作 相似重复记录 ( approximately duplicated records )。检测和消除 相似重复记录 是数据清洗和提高数据质量要解决的主要问题之一。
基于34个网页-相关网页
汉语相似重复记录 approximately duplicate chinese records
·2,447,543篇论文数据,部分数据来源于NoteExpress
实验表明,提出的方法能有效的检测汉语相似重复记录。
The experimental results prove: the approach can detect efficiently the approximately duplicate Chinese database records.
针对当前相似重复记录检测方法中存在的问题,提出一种改进方法。
Problems in current existing methods of detecting approximately duplicate records are discussed, and an improved method is proposed.
该方法根据关系表的决定属性值划分记录集,并在每个决定属性值类中检测相似重复记录。
The proposed method partitions record set according to decided attribute values, and then detects approximately duplicate records in each class of decided attribute value.
应用推荐