提出了一种基于聚类和粗糙集的数据挖掘模型。
We propose a data mining model based on clustering and rough set.
然后对数据先进行聚类,再在聚类结果中发掘频繁项目集;
The second, clustered the data, and then discovered frequent items sets in the result of clustering.
在公开数据集和人工数据集上的实验结果表明,DP算法能快速高效地找到接近于真实聚类中心的数据点作为初始聚类中心。
Experiments on both public and real datasets show that DP is helpful to find cluster centers near to real centers quickly and effectively.
基于网格的多密度聚类算法不仅能够对数据集进行正确的聚类,同时还能有效的进行孤立点检测,有效的解决了传统多密度聚类算法中不能有效识别孤立点和噪声的缺陷。
GDD algorithm can not only clusters correctly but find outliers in the dataset, and it effectively solves the problem that traditional grid algorithms can cluster only or find outliers only.
BIRCH算法是针对大规模数据集的聚类算法。
BIRCH algorithm is a clustering algorithm for very large datasets.
分类和聚类都是常用的数据挖掘方法,分类的优点是准确率较高,但需要带有类别标注的训练集;
Classification and clustering are both commonly used data mining methods. The advantage of classification is that the accuracy is higher, but the labeled training set is needed.
数据集的聚类结果是否合理的问题属于聚类有效性问题。
The reasonableness of clustering result is belongs to cluster validity problem.
为了提高模糊支持向量机在数据集上的训练效率,提出一种改进的基于密度聚类(DBSCAN)的模糊支持向量机算法。
In order to improve the training efficiency, an advanced Fuzzy Support Vector Machine (FSVM) algorithm based on the density clustering (DBSCAN) is proposed.
现有的半监督聚类方法较少利用数据集空间结构信息,限制了聚类算法的性能。
Most of the existing semi-supervised clustering methods neglect the structural information of the data, while the few constraints available may degrade the performance of the algorithms.
实验结果显示,该算法在不同结构和维数的数据集上都取得了更稳定的聚类精度。
Simulation results show that the algorithm can achieve more stable clustering accuracy on the benchmark data sets.
提供了用来剖析复杂数据集的聚类、机器学习和分类的很多内置方法。
Many built-in methods for clustering, machine learning and classification are provided for dissecting complex datasets.
将该种模型运用于公开的白血病基因表达数据集进行实验,实验表明该方法能自动获取基因表达数据的聚类数,并得到较高的分类准确率。
We applied the model to analyze the expression data set of leukaemia. The experimental result proved that this model can get cluster Numbers automatically and a high accuracy of classification.
引入了一种新的基于网格的数据压缩方法,并应用该方法对处理大型空间数据集的聚类算法SGR IDS进行研究。
By introducing a new grid-based data compression framework, conducted the study on the clustering algorithm SGRIDS which dealed with a large spatial databases.
CD -HIT是用来聚类和比较大的生物学序列数据集的一个广泛使用的程序。
CD-HIT is a widely used program for clustering and comparing large biological sequence datasets.
高维数据的稀疏性和“维灾”问题使得多数传统聚类算法失去作用,因此研究高维数据集的聚类算法己成为当前的一个热点。
The sparsity and the problem of the curse of dimensionality of high-dimensional data, make the most of traditional clustering algorithms lose their action in high-dimensional space.
提出了一种基于粗集和模糊聚类相结合的协同过滤推荐算法,通过粗集理论自动填补空缺评分降低数据稀疏性;
This paper puts forward a collaborative filtering algorithm based on rough set and fuzzy clustering which automatically fills vacant ratings through rough set theory.
利用聚类概念,对激光告警器测量数据进行集类,并对各个类对应的目标状态进行空间-时间融合。
The clustering concept is applied to the measurements of the laser warner, and space-time fusion for the measurements in the same cluster is made.
通过系统聚类和粗糙集两种方法进行数据约简,使数据得到横向和纵向两个方向上的约简。
The data are reduced in both horizontal and vertical directions by using hierarchical clustering and rough set methods.
主要工作和成果如下:①对谱聚类基本原理和典型算法做了较为全面的分析和研究,利用谱聚类的特性实现了在复杂数据集上的聚类。
We focus on finding abnormity in datasets with clustering and classified structure and studying the implement and optimization of key technology for outlier detection in this paper.
在KDDCUP 1999数据集上进行实验,结果表明,与聚类支持向量机方法相比,该方法能简化训练样本,提高SVM的训练和检测速度。
Experimental results on KDDCUP1999 data-set show that the method is more effective than cluster SVM in reducing training samples and improving the training and detecting speed of SVM.
实验结果表明,该算法对聚类边界不清晰的数据集可获得较精确的聚类划分,同时具有很强的噪声抑制能力。
The experimental results show that, the method is effective in clustering while dealing with undefined boundary problems, and is powerful in avoiding noise.
该基于超图的高维聚类算法具有以下特点:1)能处理大数据集;
The algorithm could solve the problems of 1)large volume of data set; 2)data set of high dimension;
该基于超图的高维聚类算法具有以下特点:1)能处理大数据集;
The algorithm could solve the problems of 1)large volume of data set; 2)data set of high dimension;
应用推荐