The common approach of marking the approximately duplicate records is that a pair of records are compared in a window with fixed length after these records are indexed by a certain keyword.
要把数据表中的相似重复记录标识出来,常用的方法是先将所有记录按照某个关键字进行索引,然后在一个固定长度的窗口范围内进行记录的两两比对。
The common approach of marking the approximately duplicate records is that a pair of records are compared in a window with fixed length after these records are indexed by a certain keyword.
要把数据表中的相似重复记录标识出来,常用的方法是先将所有记录按照某个关键字进行索引,然后在一个固定长度的窗口范围内进行记录的两两比对。
应用推荐