skip to main content
research-article

Weighted cluster ensembles: Methods and analysis

Published:16 January 2009Publication History
Skip Abstract Section

Abstract

Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.

References

  1. Al-Razgan, M. and Domeniconi, C. 2006. Weighted clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 258--269.Google ScholarGoogle Scholar
  2. Asuncion, A. and Newman, D. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLR/epository.html.Google ScholarGoogle Scholar
  3. Ayad, H. and Kamel, M. 2003. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In Proceedings of the International Workshop on Multiple Classifier Systems. 166--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D. 2007. Locally adaptive metrics for clustering high-dimensional data. Data Min. Knowl. Discov. J. 14, 1, 63--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining. 517--520.Google ScholarGoogle Scholar
  7. Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 9, 1090--1099.Google ScholarGoogle ScholarCross RefCross Ref
  8. Fern, X. and Brodley, C. 2003. Random projection for high-dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning. 63--74.Google ScholarGoogle Scholar
  9. Fern, X. and Brodley, C. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning. 281--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fred, A. and Jain, A. 2002. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition. 276--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fred, A. and Jain, A. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Patt. Analy. Mach. Intell. 27, 6, 835--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gondek, D. and Hofmann, T. 2005. Non-redundant clustering with conditional ensembles. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 70--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Greene, D., Tsymbal, A., Bolshakova, N., and Cunningham, P. 2004. Ensemble clustering in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. 576--581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hadjitodorov, S., Kuncheva, L., and Todorova, L. 2006. Moderate diversity for better cluster ensembles. Inform. Fusion 7, 3, 264--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hu, X. 2004. Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. 251--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kang, N., Domeniconi, C., and Barbara, D. 2005. Categorization and keyword identification of unlabeled documents. In Proceedings of the 5th IEEE International Conference on Data Mining. 677--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Karypis, G. and Kumar, V. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scient. Comput. 20, 1, 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Statist. 22, 1, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kuncheva, L. and Hadjitodorov, S. 2004. Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. 1214--1219.Google ScholarGoogle Scholar
  20. Kuncheva, L. I., Hadjitodorov, S. T., and Todorova, L. P. 2006. Experimental comparison of cluster ensemble methods. In Proceedings of the International Conference on Information Fusion. 1--7.Google ScholarGoogle Scholar
  21. Mangasarian, O. L. and Wolberg, W. H. 1990. Cancer diagnosis via linear programming. SIAM News 23, 5, 1--18.Google ScholarGoogle Scholar
  22. Minaei-Bidgoli, B., Topchy, A., and Punch, W. 2004. A comparison of resampling methods for clustering ensembles. In Proceedings of the International Conference on Machine Learning: Models, Technologies and Applications. 939--945.Google ScholarGoogle Scholar
  23. Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems. Vol. 14. 849--856.Google ScholarGoogle Scholar
  24. Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high-dimensional data: a review. ACM SIGKDD Explor. Newslet. 6, 1, 90--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pekalska, E. 2005. The dissimilariy representations in pattern recognition. concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft.Google ScholarGoogle Scholar
  26. Punera, K. and Ghosh, J. 2007. Soft cluster ensembles. In Advances in Fuzzy Clustering and its Applications, J. V. de Oliveira and W. Pedrycz, Eds. John Wiley & Sons, Ltd., 69--90.Google ScholarGoogle Scholar
  27. Strehl, A. and Ghosh, J. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Resea. 3, 3, 583--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Topchy, A., Jain, A., and Punch, W. 2003. Combining multiple weak clusterings. In Proceedings of the IEEE International Conference on Data Mining. 331--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Topchy, A., Jain, A., and Punch, W. 2004. A mixture model for clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 379--390.Google ScholarGoogle Scholar
  30. Topchy, A., Jain, A., and Punch, W. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Patt. Anal. Mach. Intell. 27, 12, 1866--1881. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Weighted cluster ensembles: Methods and analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 2, Issue 4
          January 2009
          154 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/1460797
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 January 2009
          • Accepted: 1 August 2008
          • Revised: 1 June 2008
          • Received: 1 August 2007
          Published in tkdd Volume 2, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader