在进行并行关联规则挖掘时,数据偏斜和工作量平衡这两个数据分布特征影响着剪枝的有效性。
When excavating with parallel association rules, the two data distribution characters, data skewness and workload balance, will affect the validity of pruning.
分析和实验表明,该算法适合于海量数据查询并能有效地解决机群并行环境下数据偏斜所造成的查询性能低下的问题。
The analysis and experiment results show that this algorithm effectively resolves the data skew problem in Computer Cluster. It can be fit for searching in the massive data.
该方法首先通过在加权最小二乘 支持向量机的基础上加入对数据偏斜的处理,解决了元 信息 分类时关键词特征稀疏和样本高度不均衡问题;
Since the feature of the meta-information classification keywords is sparse and the distributing of sample is unbalanced, this thesis considered the factor of data skew based on LS-VSM.
应用推荐