Skip to main content
Top

2015 | OriginalPaper | Chapter

A Parallel Consensus Clustering Algorithm

Authors : Olgierd Unold, Tadeusz Tagowski

Published in: Machine Learning, Optimization, and Big Data

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Consensus clustering is a stability-based algorithm with a prediction power far better than other internal measures. Unfortunately, this method is reported to be slow in terms of time and hard to scalability. We presented here consensus clustering algorithm optimized for multi-core processors. We showed that it is possible to obtain scalable performance of the compute-intensive algorithm for high-dimensional data such as gene expression microarrays.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRef Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRef
2.
go back to reference Allison, D.B., et al.: Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7(1), 55–65 (2006)MathSciNetCrossRef Allison, D.B., et al.: Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7(1), 55–65 (2006)MathSciNetCrossRef
3.
go back to reference Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7 (2001) Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7 (2001)
4.
go back to reference Bertrand, P., Bel Mufti, G.: Loevinger’s measures of rule quality for assessing cluster stability. Comput. Stat. Data Anal. 50(4), 992–1015 (2006)MATHCrossRef Bertrand, P., Bel Mufti, G.: Loevinger’s measures of rule quality for assessing cluster stability. Comput. Stat. Data Anal. 50(4), 992–1015 (2006)MATHCrossRef
5.
go back to reference Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3(7), research0036 (2002)CrossRef Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3(7), research0036 (2002)CrossRef
6.
go back to reference Garge, N., et al.: Reproducible clusters from microarray research: whither? BMC Bioinform. 6(Suppl 2), S10 (2005)CrossRef Garge, N., et al.: Reproducible clusters from microarray research: whither? BMC Bioinform. 6(Suppl 2), S10 (2005)CrossRef
7.
go back to reference Giancarlo, R., Utro, F.: Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis. Theoret. Comput. Sci. 428, 58–79 (2012)MATHMathSciNetCrossRef Giancarlo, R., Utro, F.: Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis. Theoret. Comput. Sci. 428, 58–79 (2012)MATHMathSciNetCrossRef
8.
go back to reference Giancarlo, R., Scaturro, D., Utro, F.: Computational cluster validation for microarray data analysis: experimental assessment of clest, consensus clustering, figure of merit, gap statistics and model explorer. BMC Bioinform. 9(1), 462 (2008)CrossRef Giancarlo, R., Scaturro, D., Utro, F.: Computational cluster validation for microarray data analysis: experimental assessment of clest, consensus clustering, figure of merit, gap statistics and model explorer. BMC Bioinform. 9(1), 462 (2008)CrossRef
9.
go back to reference Giancarlo, R., Utro, F.: Speeding up the Consensus Clustering methodology for microarray data analysis. Algorithms Mol. Biol. 6(1), 1–13 (2011)CrossRef Giancarlo, R., Utro, F.: Speeding up the Consensus Clustering methodology for microarray data analysis. Algorithms Mol. Biol. 6(1), 1–13 (2011)CrossRef
10.
go back to reference Giurcaneanu, C.D., Tabus, I.: Cluster structure inference based on clustering stability with applications to microarray data analysis. EURASIP J. Appl. Sig. Process. 2004, 64–80 (2004)MATHCrossRef Giurcaneanu, C.D., Tabus, I.: Cluster structure inference based on clustering stability with applications to microarray data analysis. EURASIP J. Appl. Sig. Process. 2004, 64–80 (2004)MATHCrossRef
11.
go back to reference Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRef Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRef
12.
go back to reference Kustra, R., Zagdanski, A.: Data-fusion in clustering microarray data: balancing discovery and interpretability. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 50–63 (2010)CrossRef Kustra, R., Zagdanski, A.: Data-fusion in clustering microarray data: balancing discovery and interpretability. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 50–63 (2010)CrossRef
13.
go back to reference Lange, T., et al.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)MATHCrossRef Lange, T., et al.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)MATHCrossRef
14.
go back to reference Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13(11), 2573–2593 (2001)MATHCrossRef Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13(11), 2573–2593 (2001)MATHCrossRef
15.
go back to reference Liu, Y., et al.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM). IEEE (2010) Liu, Y., et al.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM). IEEE (2010)
16.
go back to reference MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
17.
go back to reference Monti, S., et al.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)MATHCrossRef Monti, S., et al.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)MATHCrossRef
19.
go back to reference Pirim, H., et al.: Clustering of high throughput gene expression data. Comput. Oper. Res. 39(12), 3046–3061 (2012)MathSciNetCrossRef Pirim, H., et al.: Clustering of high throughput gene expression data. Comput. Oper. Res. 39(12), 3046–3061 (2012)MathSciNetCrossRef
20.
go back to reference Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)CrossRef Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)CrossRef
21.
go back to reference RDevelopment Core Team: R: A language and environment for statistical computing, pp. 1–1731. R Foundation for Statistical Computing, Vienna, Austria (2008) RDevelopment Core Team: R: A language and environment for statistical computing, pp. 1–1731. R Foundation for Statistical Computing, Vienna, Austria (2008)
22.
go back to reference Simpson, T., et al.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11(1), 590 (2010)CrossRef Simpson, T., et al.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11(1), 590 (2010)CrossRef
23.
go back to reference Stevans, W.R.: Advanced Programming in the UNIX Environment. Pearson Education, India (2011) Stevans, W.R.: Advanced Programming in the UNIX Environment. Pearson Education, India (2011)
24.
go back to reference Unold, O., Tagowski, T.: A GPU-based consensus clustering. Glob. J. Comput. Sci. 4(2), 65–69 (2014) Unold, O., Tagowski, T.: A GPU-based consensus clustering. Glob. J. Comput. Sci. 4(2), 65–69 (2014)
25.
go back to reference Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)CrossRef Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)CrossRef
Metadata
Title
A Parallel Consensus Clustering Algorithm
Authors
Olgierd Unold
Tadeusz Tagowski
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-27926-8_28

Premium Partner