Abstract
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.
- Al-Razgan, M. and Domeniconi, C. 2006. Weighted clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 258--269.Google Scholar
- Asuncion, A. and Newman, D. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLR/epository.html.Google Scholar
- Ayad, H. and Kamel, M. 2003. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In Proceedings of the International Workshop on Multiple Classifier Systems. 166--175. Google ScholarDigital Library
- Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269--274. Google ScholarDigital Library
- Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D. 2007. Locally adaptive metrics for clustering high-dimensional data. Data Min. Knowl. Discov. J. 14, 1, 63--97. Google ScholarDigital Library
- Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining. 517--520.Google Scholar
- Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 9, 1090--1099.Google ScholarCross Ref
- Fern, X. and Brodley, C. 2003. Random projection for high-dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning. 63--74.Google Scholar
- Fern, X. and Brodley, C. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning. 281--288. Google ScholarDigital Library
- Fred, A. and Jain, A. 2002. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition. 276--280. Google ScholarDigital Library
- Fred, A. and Jain, A. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Patt. Analy. Mach. Intell. 27, 6, 835--850. Google ScholarDigital Library
- Gondek, D. and Hofmann, T. 2005. Non-redundant clustering with conditional ensembles. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 70--77. Google ScholarDigital Library
- Greene, D., Tsymbal, A., Bolshakova, N., and Cunningham, P. 2004. Ensemble clustering in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. 576--581. Google ScholarDigital Library
- Hadjitodorov, S., Kuncheva, L., and Todorova, L. 2006. Moderate diversity for better cluster ensembles. Inform. Fusion 7, 3, 264--275. Google ScholarDigital Library
- Hu, X. 2004. Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. 251--258. Google ScholarDigital Library
- Kang, N., Domeniconi, C., and Barbara, D. 2005. Categorization and keyword identification of unlabeled documents. In Proceedings of the 5th IEEE International Conference on Data Mining. 677--680. Google ScholarDigital Library
- Karypis, G. and Kumar, V. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scient. Comput. 20, 1, 359--392. Google ScholarDigital Library
- Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Statist. 22, 1, 79--86.Google ScholarCross Ref
- Kuncheva, L. and Hadjitodorov, S. 2004. Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. 1214--1219.Google Scholar
- Kuncheva, L. I., Hadjitodorov, S. T., and Todorova, L. P. 2006. Experimental comparison of cluster ensemble methods. In Proceedings of the International Conference on Information Fusion. 1--7.Google Scholar
- Mangasarian, O. L. and Wolberg, W. H. 1990. Cancer diagnosis via linear programming. SIAM News 23, 5, 1--18.Google Scholar
- Minaei-Bidgoli, B., Topchy, A., and Punch, W. 2004. A comparison of resampling methods for clustering ensembles. In Proceedings of the International Conference on Machine Learning: Models, Technologies and Applications. 939--945.Google Scholar
- Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems. Vol. 14. 849--856.Google Scholar
- Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high-dimensional data: a review. ACM SIGKDD Explor. Newslet. 6, 1, 90--105. Google ScholarDigital Library
- Pekalska, E. 2005. The dissimilariy representations in pattern recognition. concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft.Google Scholar
- Punera, K. and Ghosh, J. 2007. Soft cluster ensembles. In Advances in Fuzzy Clustering and its Applications, J. V. de Oliveira and W. Pedrycz, Eds. John Wiley & Sons, Ltd., 69--90.Google Scholar
- Strehl, A. and Ghosh, J. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Resea. 3, 3, 583--617. Google ScholarDigital Library
- Topchy, A., Jain, A., and Punch, W. 2003. Combining multiple weak clusterings. In Proceedings of the IEEE International Conference on Data Mining. 331--338. Google ScholarDigital Library
- Topchy, A., Jain, A., and Punch, W. 2004. A mixture model for clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 379--390.Google Scholar
- Topchy, A., Jain, A., and Punch, W. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Patt. Anal. Mach. Intell. 27, 12, 1866--1881. Google ScholarDigital Library
Index Terms
- Weighted cluster ensembles: Methods and analysis
Recommendations
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application ...
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization
Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. ...
Comments