Skip to main content

2015 | OriginalPaper | Buchkapitel

A Parallel Consensus Clustering Algorithm

verfasst von : Olgierd Unold, Tadeusz Tagowski

Erschienen in: Machine Learning, Optimization, and Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Consensus clustering is a stability-based algorithm with a prediction power far better than other internal measures. Unfortunately, this method is reported to be slow in terms of time and hard to scalability. We presented here consensus clustering algorithm optimized for multi-core processors. We showed that it is possible to obtain scalable performance of the compute-intensive algorithm for high-dimensional data such as gene expression microarrays.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRef Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRef
2.
Zurück zum Zitat Allison, D.B., et al.: Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7(1), 55–65 (2006)MathSciNetCrossRef Allison, D.B., et al.: Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7(1), 55–65 (2006)MathSciNetCrossRef
3.
Zurück zum Zitat Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7 (2001) Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7 (2001)
4.
Zurück zum Zitat Bertrand, P., Bel Mufti, G.: Loevinger’s measures of rule quality for assessing cluster stability. Comput. Stat. Data Anal. 50(4), 992–1015 (2006)MATHCrossRef Bertrand, P., Bel Mufti, G.: Loevinger’s measures of rule quality for assessing cluster stability. Comput. Stat. Data Anal. 50(4), 992–1015 (2006)MATHCrossRef
5.
Zurück zum Zitat Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3(7), research0036 (2002)CrossRef Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3(7), research0036 (2002)CrossRef
6.
Zurück zum Zitat Garge, N., et al.: Reproducible clusters from microarray research: whither? BMC Bioinform. 6(Suppl 2), S10 (2005)CrossRef Garge, N., et al.: Reproducible clusters from microarray research: whither? BMC Bioinform. 6(Suppl 2), S10 (2005)CrossRef
7.
Zurück zum Zitat Giancarlo, R., Utro, F.: Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis. Theoret. Comput. Sci. 428, 58–79 (2012)MATHMathSciNetCrossRef Giancarlo, R., Utro, F.: Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis. Theoret. Comput. Sci. 428, 58–79 (2012)MATHMathSciNetCrossRef
8.
Zurück zum Zitat Giancarlo, R., Scaturro, D., Utro, F.: Computational cluster validation for microarray data analysis: experimental assessment of clest, consensus clustering, figure of merit, gap statistics and model explorer. BMC Bioinform. 9(1), 462 (2008)CrossRef Giancarlo, R., Scaturro, D., Utro, F.: Computational cluster validation for microarray data analysis: experimental assessment of clest, consensus clustering, figure of merit, gap statistics and model explorer. BMC Bioinform. 9(1), 462 (2008)CrossRef
9.
Zurück zum Zitat Giancarlo, R., Utro, F.: Speeding up the Consensus Clustering methodology for microarray data analysis. Algorithms Mol. Biol. 6(1), 1–13 (2011)CrossRef Giancarlo, R., Utro, F.: Speeding up the Consensus Clustering methodology for microarray data analysis. Algorithms Mol. Biol. 6(1), 1–13 (2011)CrossRef
10.
Zurück zum Zitat Giurcaneanu, C.D., Tabus, I.: Cluster structure inference based on clustering stability with applications to microarray data analysis. EURASIP J. Appl. Sig. Process. 2004, 64–80 (2004)MATHCrossRef Giurcaneanu, C.D., Tabus, I.: Cluster structure inference based on clustering stability with applications to microarray data analysis. EURASIP J. Appl. Sig. Process. 2004, 64–80 (2004)MATHCrossRef
11.
Zurück zum Zitat Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRef Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRef
12.
Zurück zum Zitat Kustra, R., Zagdanski, A.: Data-fusion in clustering microarray data: balancing discovery and interpretability. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 50–63 (2010)CrossRef Kustra, R., Zagdanski, A.: Data-fusion in clustering microarray data: balancing discovery and interpretability. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 50–63 (2010)CrossRef
13.
Zurück zum Zitat Lange, T., et al.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)MATHCrossRef Lange, T., et al.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)MATHCrossRef
14.
Zurück zum Zitat Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13(11), 2573–2593 (2001)MATHCrossRef Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13(11), 2573–2593 (2001)MATHCrossRef
15.
Zurück zum Zitat Liu, Y., et al.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM). IEEE (2010) Liu, Y., et al.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM). IEEE (2010)
16.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
17.
Zurück zum Zitat Monti, S., et al.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)MATHCrossRef Monti, S., et al.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)MATHCrossRef
19.
Zurück zum Zitat Pirim, H., et al.: Clustering of high throughput gene expression data. Comput. Oper. Res. 39(12), 3046–3061 (2012)MathSciNetCrossRef Pirim, H., et al.: Clustering of high throughput gene expression data. Comput. Oper. Res. 39(12), 3046–3061 (2012)MathSciNetCrossRef
20.
Zurück zum Zitat Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)CrossRef Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)CrossRef
21.
Zurück zum Zitat RDevelopment Core Team: R: A language and environment for statistical computing, pp. 1–1731. R Foundation for Statistical Computing, Vienna, Austria (2008) RDevelopment Core Team: R: A language and environment for statistical computing, pp. 1–1731. R Foundation for Statistical Computing, Vienna, Austria (2008)
22.
Zurück zum Zitat Simpson, T., et al.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11(1), 590 (2010)CrossRef Simpson, T., et al.: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11(1), 590 (2010)CrossRef
23.
Zurück zum Zitat Stevans, W.R.: Advanced Programming in the UNIX Environment. Pearson Education, India (2011) Stevans, W.R.: Advanced Programming in the UNIX Environment. Pearson Education, India (2011)
24.
Zurück zum Zitat Unold, O., Tagowski, T.: A GPU-based consensus clustering. Glob. J. Comput. Sci. 4(2), 65–69 (2014) Unold, O., Tagowski, T.: A GPU-based consensus clustering. Glob. J. Comput. Sci. 4(2), 65–69 (2014)
25.
Zurück zum Zitat Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)CrossRef Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)CrossRef
Metadaten
Titel
A Parallel Consensus Clustering Algorithm
verfasst von
Olgierd Unold
Tadeusz Tagowski
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-27926-8_28