Aiming at the processing of Chinese address information, the meta-database of segment rules is established, and an approximately duplicated detection model is proposed.
针对中文地址类信息的处理,建立了包含分词规则的元数据库,提出一种相似重复检测模型。
So if you are cleansing millions of addresses within given time bounds, rules can be applied in such a way that lets you segment the data.
所以,如果您要在给定时间范围内清理数百万个地址,可通过支持您对数据进行分段的方式来应用规则。
A parallel algorithm for discovering association rules is presented, after an algorithm based on calculating multi-segment support has been studied.
在研究多段支持度数据挖掘算法的基础上提出并行挖掘相联规则的算法。
应用推荐