research-article

Weighted cluster ensembles: Methods and analysis

Authors:
Carlotta Domeniconi

George Mason University, Fairfax, VA

George Mason University, Fairfax, VA
View Profile

,
Muna Al-Razgan

George Mason University, Fairfax, VA

George Mason University, Fairfax, VA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 2 Issue 4Article No.: 17pp 1–40https://doi.org/10.1145/1460797.1460800

Published:16 January 2009Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.

References

Al-Razgan, M. and Domeniconi, C. 2006. Weighted clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 258--269.Google Scholar
Asuncion, A. and Newman, D. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLR/epository.html.Google Scholar
Ayad, H. and Kamel, M. 2003. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In Proceedings of the International Workshop on Multiple Classifier Systems. 166--175. Google ScholarDigital Library
Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269--274. Google ScholarDigital Library
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D. 2007. Locally adaptive metrics for clustering high-dimensional data. Data Min. Knowl. Discov. J. 14, 1, 63--97. Google ScholarDigital Library
Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining. 517--520.Google Scholar
Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 9, 1090--1099.Google ScholarCross Ref
Fern, X. and Brodley, C. 2003. Random projection for high-dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning. 63--74.Google Scholar
Fern, X. and Brodley, C. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning. 281--288. Google ScholarDigital Library
Fred, A. and Jain, A. 2002. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition. 276--280. Google ScholarDigital Library
Fred, A. and Jain, A. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Patt. Analy. Mach. Intell. 27, 6, 835--850. Google ScholarDigital Library
Gondek, D. and Hofmann, T. 2005. Non-redundant clustering with conditional ensembles. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 70--77. Google ScholarDigital Library
Greene, D., Tsymbal, A., Bolshakova, N., and Cunningham, P. 2004. Ensemble clustering in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. 576--581. Google ScholarDigital Library
Hadjitodorov, S., Kuncheva, L., and Todorova, L. 2006. Moderate diversity for better cluster ensembles. Inform. Fusion 7, 3, 264--275. Google ScholarDigital Library
Hu, X. 2004. Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. 251--258. Google ScholarDigital Library
Kang, N., Domeniconi, C., and Barbara, D. 2005. Categorization and keyword identification of unlabeled documents. In Proceedings of the 5th IEEE International Conference on Data Mining. 677--680. Google ScholarDigital Library
Karypis, G. and Kumar, V. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scient. Comput. 20, 1, 359--392. Google ScholarDigital Library
Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Statist. 22, 1, 79--86.Google ScholarCross Ref
Kuncheva, L. and Hadjitodorov, S. 2004. Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. 1214--1219.Google Scholar
Kuncheva, L. I., Hadjitodorov, S. T., and Todorova, L. P. 2006. Experimental comparison of cluster ensemble methods. In Proceedings of the International Conference on Information Fusion. 1--7.Google Scholar
Mangasarian, O. L. and Wolberg, W. H. 1990. Cancer diagnosis via linear programming. SIAM News 23, 5, 1--18.Google Scholar
Minaei-Bidgoli, B., Topchy, A., and Punch, W. 2004. A comparison of resampling methods for clustering ensembles. In Proceedings of the International Conference on Machine Learning: Models, Technologies and Applications. 939--945.Google Scholar
Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems. Vol. 14. 849--856.Google Scholar
Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high-dimensional data: a review. ACM SIGKDD Explor. Newslet. 6, 1, 90--105. Google ScholarDigital Library
Pekalska, E. 2005. The dissimilariy representations in pattern recognition. concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft.Google Scholar
Punera, K. and Ghosh, J. 2007. Soft cluster ensembles. In Advances in Fuzzy Clustering and its Applications, J. V. de Oliveira and W. Pedrycz, Eds. John Wiley & Sons, Ltd., 69--90.Google Scholar
Strehl, A. and Ghosh, J. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Resea. 3, 3, 583--617. Google ScholarDigital Library
Topchy, A., Jain, A., and Punch, W. 2003. Combining multiple weak clusterings. In Proceedings of the IEEE International Conference on Data Mining. 331--338. Google ScholarDigital Library
Topchy, A., Jain, A., and Punch, W. 2004. A mixture model for clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 379--390.Google Scholar
Topchy, A., Jain, A., and Punch, W. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Patt. Anal. Mach. Intell. 27, 12, 1866--1881. Google ScholarDigital Library

Index Terms

Weighted cluster ensembles: Methods and analysis
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application ...
Read More
TWCC

A co-clustering method TWCC was proposed, in which two types of weights are automatically computed.Its the first two-way subspace weighting partitional co-clustering method.It can simultaneously weight data from two ways for co-clustering.Experimental ...
Read More
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 2, Issue 4
January 2009
154 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1460797
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 January 2009
- Accepted: 1 August 2008
- Revised: 1 June 2008
- Received: 1 August 2007
Published in tkdd Volume 2, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cluster ensembles
accuracy and diversity measures
consensus functions
data mining
subspace clustering
text data
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 141
  Total Citations
  View Citations
- 1,704
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

TWCC

Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

TWCC

Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media