Skip to main content
Top
Published in: Journal of Classification 3/2019

09-08-2019

MCC: a Multiple Consensus Clustering Framework

Authors: Tao Li, Yi Zhang, Dingding Wang, Jian Xu

Published in: Journal of Classification | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Consensus clustering has emerged as an important extension of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings. There is a significant drawback in generating a single consensus clustering since different input clusterings could differ significantly. In this paper, we develop a new framework, called Multiple Consensus Clustering (MCC), to explore multiple clustering views of a given dataset from a set of input clusterings. Instead of generating a single consensus, we propose two sets of approaches to obtain multiple consensus. One employs the meta clustering method, and the other uses a hierarchical tree structure and further applies a dynamic programming algorithm to generate a flat partition from the hierarchical tree using the modularity measure. Multiple consensuses are finally obtained by applying consensus clustering algorithms to each cluster of the partition. Extensive experimental results on 11 real-world datasets and a case study on a Protein-Protein Interaction (PPI) dataset demonstrate the effectiveness of the MCC framework.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
go back to reference Asa, B.-H., Elisseeff, A., Guyon, I. (2002). A stability based method for discovering structure in clustered data, Pacific Symposium on Biocomputing. Asa, B.-H., Elisseeff, A., Guyon, I. (2002). A stability based method for discovering structure in clustered data, Pacific Symposium on Biocomputing.
go back to reference Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Michael, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harri, M., Hill, D., Traver, L., Kassarskis, A., Levis, S., Matese, J., Richardson, E., Ringwald, M., Rubin, G., Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 24–29.CrossRef Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Michael, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harri, M., Hill, D., Traver, L., Kassarskis, A., Levis, S., Matese, J., Richardson, E., Ringwald, M., Rubin, G., Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 24–29.CrossRef
go back to reference Asur, S., Ucar, D., Parthasarathy, S. (2007). An ensemble framework for clustering protein-protein interaction networks. Bioinformatics, 23(13), i29–i40.CrossRef Asur, S., Ucar, D., Parthasarathy, S. (2007). An ensemble framework for clustering protein-protein interaction networks. Bioinformatics, 23(13), i29–i40.CrossRef
go back to reference Azimi, J., & Fern, X. (2009). Adaptive cluster ensemble selection. In Proceedings of International Joint Conference on Artificial Intellegence (pp. 993–997). Azimi, J., & Fern, X. (2009). Adaptive cluster ensemble selection. In Proceedings of International Joint Conference on Artificial Intellegence (pp. 993–997).
go back to reference Blake, C.L., & Merz, C.J. (1998). UCI repository of machine learning databases. Blake, C.L., & Merz, C.J. (1998). UCI repository of machine learning databases.
go back to reference Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D. (2008). On modularity clustering. IEEE Transactions on in Knowledge and Data Engineering, 20(2), 172–188.CrossRef Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D. (2008). On modularity clustering. IEEE Transactions on in Knowledge and Data Engineering, 20(2), 172–188.CrossRef
go back to reference Bronstein, M.M., Bronstein, A.M., Kimmel, R., Yavneh, I. (2006). Multigrid multidimensional scaling. In Numerical Linear Algebra with Applications (NLAA), 13:149C171, March–April (pp. 149–171).MathSciNetCrossRef Bronstein, M.M., Bronstein, A.M., Kimmel, R., Yavneh, I. (2006). Multigrid multidimensional scaling. In Numerical Linear Algebra with Applications (NLAA), 13:149C171, March–April (pp. 149–171).MathSciNetCrossRef
go back to reference Caruana, R., Elhawary, M., Nguyen, N. (2006). Meta clustering. In Proceedings IEEE International Conference on Data Mining. Caruana, R., Elhawary, M., Nguyen, N. (2006). Meta clustering. In Proceedings IEEE International Conference on Data Mining.
go back to reference Cui, Y., Fern, X.Z., Dy, J. (2007). Non-redundant multi-view clustering via orthogonalization. In ICDM (pp. 133–142). Cui, Y., Fern, X.Z., Dy, J. (2007). Non-redundant multi-view clustering via orthogonalization. In ICDM (pp. 133–142).
go back to reference Ding, C., & He, X. (2002). Cluster merging and splitting in hierarchical clustering algorithms. In ICDM (pp. 139–146). Ding, C., & He, X. (2002). Cluster merging and splitting in hierarchical clustering algorithms. In ICDM (pp. 139–146).
go back to reference Dongen, S.V., & Dongen, S.V. (2000). Performance criteria for graph clustering and Markov cluster experiments, Technical report INS-R0012, National Research Institute for Mathematics and Computer Science. Dongen, S.V., & Dongen, S.V. (2000). Performance criteria for graph clustering and Markov cluster experiments, Technical report INS-R0012, National Research Institute for Mathematics and Computer Science.
go back to reference Fallah, S., Tritchler, D., Beyene, J. (2008). Estimating number of clusters based on a general similarity matrix with application to microarray data. Journal of Statistical Applications in Genetics and Molecular Biology, 7(1), 1–25.MathSciNetMATH Fallah, S., Tritchler, D., Beyene, J. (2008). Estimating number of clusters based on a general similarity matrix with application to microarray data. Journal of Statistical Applications in Genetics and Molecular Biology, 7(1), 1–25.MathSciNetMATH
go back to reference Fern, X.Z., Brodley, C.E., Fern, X.Z., Brodley, C.E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning. Fern, X.Z., Brodley, C.E., Fern, X.Z., Brodley, C.E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning.
go back to reference Fern, X.Z., & Lin, W. (2008). Cluster ensemble selection. Journal of Statistical Analysis and Data Mining, 1(3), 128–141.MathSciNetCrossRef Fern, X.Z., & Lin, W. (2008). Cluster ensemble selection. Journal of Statistical Analysis and Data Mining, 1(3), 128–141.MathSciNetCrossRef
go back to reference Fred, A.L., & Jain, A.K. (2003). Robust data clustering. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2(128). Fred, A.L., & Jain, A.K. (2003). Robust data clustering. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2(128).
go back to reference Gionis, A., Mannila, H., Tsaparas, P. (2005). Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering ICDE (pp. 341–352). Gionis, A., Mannila, H., Tsaparas, P. (2005). Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering ICDE (pp. 341–352).
go back to reference Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J. (1998). WebACE: a Web agent for document categorization and exploration. In Proceedings of the 2nd International Conference on Autonomous Agents (pp. 408–415). Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J. (1998). WebACE: a Web agent for document categorization and exploration. In Proceedings of the 2nd International Conference on Autonomous Agents (pp. 408–415).
go back to reference Hu, X., Yoo, I., Zhang, X., Nanavati, P., Das, D. (2005). Wavelet transformation and cluster ensemble for gene expression analysis. International Journal of Bioinformatics Research and Applications, 1(4), 447–460.CrossRef Hu, X., Yoo, I., Zhang, X., Nanavati, P., Das, D. (2005). Wavelet transformation and cluster ensemble for gene expression analysis. International Journal of Bioinformatics Research and Applications, 1(4), 447–460.CrossRef
go back to reference Li, T., & Ding, C. (2006). The relationships among various nonnegative matrix factorization methods for clustering. In Proceedings of IEEE International Conference on Data Mining 2006 (pp. 362–371). Li, T., & Ding, C. (2006). The relationships among various nonnegative matrix factorization methods for clustering. In Proceedings of IEEE International Conference on Data Mining 2006 (pp. 362–371).
go back to reference Li, T., & Ding, C. (2008). Weighted consensus clustering. In Proceedings of 2008 SIAM International Conference on Data Mining (pp. 798–809). Li, T., & Ding, C. (2008). Weighted consensus clustering. In Proceedings of 2008 SIAM International Conference on Data Mining (pp. 798–809).
go back to reference Li, T., Ding, C., Jordan, M.I. (2007). Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7st IEEE International Conference on data Mining (pp. 577–582). Li, T., Ding, C., Jordan, M.I. (2007). Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7st IEEE International Conference on data Mining (pp. 577–582).
go back to reference Mallows, C.L. (1972). A note on asymptotic joint normality. The Annals of Mathematical Statistics, 43(2), 508–515.MathSciNetCrossRef Mallows, C.L. (1972). A note on asymptotic joint normality. The Annals of Mathematical Statistics, 43(2), 508–515.MathSciNetCrossRef
go back to reference Meila, M. (2002). Comparing clusterings, Technical report, Statistics, University of Washington. Meila, M. (2002). Comparing clusterings, Technical report, Statistics, University of Washington.
go back to reference Navlakha, S., Rastogi, R., Shrivastava, N. (2008). Graph summarization with bounded error. In SIGMOD (pp. 419–432). Navlakha, S., Rastogi, R., Shrivastava, N. (2008). Graph summarization with bounded error. In SIGMOD (pp. 419–432).
go back to reference Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C. (2009). Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In Inproceedings of the 13th Annual International Conference on Research in Computational Molecular Biology (pp. 400–417). Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C. (2009). Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In Inproceedings of the 13th Annual International Conference on Research in Computational Molecular Biology (pp. 400–417).
go back to reference Newman, M.E.J. (2006). Modularity and community structure in networks. In PNAS (pp. 8577–8582).CrossRef Newman, M.E.J. (2006). Modularity and community structure in networks. In PNAS (pp. 8577–8582).CrossRef
go back to reference Newman, M.E.J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.CrossRef Newman, M.E.J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.CrossRef
go back to reference Qi, Z., & Davidson, I. (2009). A principled and flexible framework for finding alternative clusterings. In SIGKDD (pp. 717–726). Qi, Z., & Davidson, I. (2009). A principled and flexible framework for finding alternative clusterings. In SIGKDD (pp. 717–726).
go back to reference Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.CrossRef Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.CrossRef
go back to reference Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 74(1), 016110.MathSciNetCrossRef Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 74(1), 016110.MathSciNetCrossRef
go back to reference Shlens, J. (2009). A tutorial on principal component analysis, Technical report, Center for Neural Science, New York University. Shlens, J. (2009). A tutorial on principal component analysis, Technical report, Center for Neural Science, New York University.
go back to reference Strehl, A., & Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208–230.CrossRef Strehl, A., & Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208–230.CrossRef
go back to reference Strehl, A., Ghosh, J., Cardie, C. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.MathSciNetMATH Strehl, A., Ghosh, J., Cardie, C. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.MathSciNetMATH
go back to reference Tan, P.-N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley Longman Publishing Co. Tan, P.-N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley Longman Publishing Co.
go back to reference von Luxburg, U. (n.d.) A tutorial on spectral clustering, Techonical report. von Luxburg, U. (n.d.) A tutorial on spectral clustering, Techonical report.
go back to reference Wu, J., Xiong, H., Chen, J. (2009). Towards understanding hierarchical clustering: a data distribution perspective. Neurocomputing, 72(10-12), 2319–2330.CrossRef Wu, J., Xiong, H., Chen, J. (2009). Towards understanding hierarchical clustering: a data distribution perspective. Neurocomputing, 72(10-12), 2319–2330.CrossRef
go back to reference Zhang, Y., Zeng, E., Li, T., Narasimhan, G. (2009). Weighted consensus clustering for identifying functional modules in protein-protein interaction networks. In The 8th International Conference on Machine Learning and Applications (pp. 539–544). Zhang, Y., Zeng, E., Li, T., Narasimhan, G. (2009). Weighted consensus clustering for identifying functional modules in protein-protein interaction networks. In The 8th International Conference on Machine Learning and Applications (pp. 539–544).
go back to reference Zhanga, S., Ning, X., Zhang, X. -S. (2006). Identification of functional modules in a PPI network by clique percolation clustering. Journal of Computational Biology and Chemistry, 30(6), 445–451.CrossRef Zhanga, S., Ning, X., Zhang, X. -S. (2006). Identification of functional modules in a PPI network by clique percolation clustering. Journal of Computational Biology and Chemistry, 30(6), 445–451.CrossRef
go back to reference Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Conference of Information and Knowledge Management (pp. 515–524). Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Conference of Information and Knowledge Management (pp. 515–524).
go back to reference Zhou, D., Li, J., Zha, H. (2005). A new mallows distance based metric for comparing clusterings. In Proceeding of International Conference on Machine Learning (pp. 1028–1035). Zhou, D., Li, J., Zha, H. (2005). A new mallows distance based metric for comparing clusterings. In Proceeding of International Conference on Machine Learning (pp. 1028–1035).
Metadata
Title
MCC: a Multiple Consensus Clustering Framework
Authors
Tao Li
Yi Zhang
Dingding Wang
Jian Xu
Publication date
09-08-2019
Publisher
Springer US
Published in
Journal of Classification / Issue 3/2019
Print ISSN: 0176-4268
Electronic ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-019-09318-4

Other articles of this Issue 3/2019

Journal of Classification 3/2019 Go to the issue

Premium Partner