Skip to main content

2014 | OriginalPaper | Buchkapitel

14. Analysis of Multiple DNA Microarray Datasets

verfasst von : Veselka Boeva, Elena Tsiporkova, Elena Kostadinova

Erschienen in: Springer Handbook of Bio-/Neuroinformatics

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In contrast to conventional clustering algorithms, where a single dataset is used to produce a clustering solution, we introduce herein a MapReduce approach for clustering of datasets generated in multiple-experiment settings. It is inspired by the map-reduce functions commonly used in functional programming and consists of two distinctive phases. Initially, the selected clustering algorithm is applied (mapped) to each experiment separately. This produces a list of different clustering solutions, one per experiment. These are further transformed (reduced) by portioning the cluster centers into a single clustering solution. The obtained partition is not disjoint in terms of the different participating genes, and it is further analyzed and refined by applying formal concept analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
14.1.
Zurück zum Zitat T.C. Havens, J.M Keller, M Popescu, J.C Bezdek, E. MacNeal Rehrig, H.M Appel, J.C Schultz: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and gene ontology annotations, Proc. North Am. Fuzzy Inf. Process. Soc. (2008) pp. 1–6 T.C. Havens, J.M Keller, M Popescu, J.C Bezdek, E. MacNeal Rehrig, H.M Appel, J.C Schultz: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and gene ontology annotations, Proc. North Am. Fuzzy Inf. Process. Soc. (2008) pp. 1–6
14.2.
Zurück zum Zitat D. Huang, W. Pan: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data, Bioinformatics 22(10), 1259–1268 (2006)CrossRef D. Huang, W. Pan: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data, Bioinformatics 22(10), 1259–1268 (2006)CrossRef
14.3.
Zurück zum Zitat J. Kasturi, R. Acharya: Clustering of diverse genomic data using information fusion, Bioinformatics 21(4), 423–429 (2005)CrossRef J. Kasturi, R. Acharya: Clustering of diverse genomic data using information fusion, Bioinformatics 21(4), 423–429 (2005)CrossRef
14.4.
Zurück zum Zitat G. Li, Z. Wang: Incorporating heterogeneous biological data sources in clustering gene expression data, Health 1, 17–23 (2009)CrossRef G. Li, Z. Wang: Incorporating heterogeneous biological data sources in clustering gene expression data, Health 1, 17–23 (2009)CrossRef
14.5.
Zurück zum Zitat R. Kustra, A. Zagdanski: Incorporating gene ontology in clustering gene expression data, Proc. 19th IEEE Symp. Comput.-Based Med. Syst. (2006) pp. 555–563 R. Kustra, A. Zagdanski: Incorporating gene ontology in clustering gene expression data, Proc. 19th IEEE Symp. Comput.-Based Med. Syst. (2006) pp. 555–563
14.6.
Zurück zum Zitat E. Johnson, H. Kargupta: Collective hierarchical clustering from distributed, heterogeneous data, LNCS 1759, 221–244 (1999) E. Johnson, H. Kargupta: Collective hierarchical clustering from distributed, heterogeneous data, LNCS 1759, 221–244 (1999)
14.7.
Zurück zum Zitat A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet
14.8.
Zurück zum Zitat A. Topchy, K. Jain, W. Punch: Clustering ensembles: Models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)CrossRef A. Topchy, K. Jain, W. Punch: Clustering ensembles: Models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)CrossRef
14.9.
Zurück zum Zitat A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39(1), 1–38 (1977)MathSciNetMATH A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39(1), 1–38 (1977)MathSciNetMATH
14.10.
Zurück zum Zitat E. Kostadinova, V. Boeva, N. Lavesson: Clustering of multiple microarray experiments using information integration, LNCS 6865, 123–137 (2011) E. Kostadinova, V. Boeva, N. Lavesson: Clustering of multiple microarray experiments using information integration, LNCS 6865, 123–137 (2011)
14.11.
Zurück zum Zitat B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: Foundations and Applications, Lect. Notes Artif. Intell., Vol. 3626 (Springer, Berlin, Heidelberg 2005) B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: Foundations and Applications, Lect. Notes Artif. Intell., Vol. 3626 (Springer, Berlin, Heidelberg 2005)
14.12.
Zurück zum Zitat J. Besson, C. Robardet, J.-F. Boulicaut: Constraint-based mining of formal concepts in transactional data, LNCS 3056, 615–624 (2004) J. Besson, C. Robardet, J.-F. Boulicaut: Constraint-based mining of formal concepts in transactional data, LNCS 3056, 615–624 (2004)
14.13.
Zurück zum Zitat J. Besson, C. Robardet, J.-F. Boulicaut, S. Rome: Constraint-based concept mining and its application to microarray data analysis, Intell. Data Anal. 9(1), 59–82 (2005) J. Besson, C. Robardet, J.-F. Boulicaut, S. Rome: Constraint-based concept mining and its application to microarray data analysis, Intell. Data Anal. 9(1), 59–82 (2005)
14.14.
Zurück zum Zitat D.P. Potter: A combinatorial approach to scientific exploration of gene expression data: An integrative method using formal concept analysis for the comparative analysis of microarray data. Ph.D. Thesis (Department of Mathematics, Virginia Tech 2005) D.P. Potter: A combinatorial approach to scientific exploration of gene expression data: An integrative method using formal concept analysis for the comparative analysis of microarray data. Ph.D. Thesis (Department of Mathematics, Virginia Tech 2005)
14.15.
Zurück zum Zitat V. Choi, Y. Huang, V. Lam, D. Potter, R. Laubenbacher, K. Duca: Using formal concept analysis for microarray data comparison, J. Bioinf. Comput. Biol. 6(1), 65–75 (2008)CrossRef V. Choi, Y. Huang, V. Lam, D. Potter, R. Laubenbacher, K. Duca: Using formal concept analysis for microarray data comparison, J. Bioinf. Comput. Biol. 6(1), 65–75 (2008)CrossRef
14.16.
Zurück zum Zitat M. Kaytoue-Uberall, S. Duplessis, A. Napoli: Using Formal Concept Analysis for the Extraction of Groups of Coexpressed Genes CCIS 14 (Springer, Berlin, Heidelberg 2008) pp. 445–455 M. Kaytoue-Uberall, S. Duplessis, A. Napoli: Using Formal Concept Analysis for the Extraction of Groups of Coexpressed Genes CCIS 14 (Springer, Berlin, Heidelberg 2008) pp. 445–455
14.17.
Zurück zum Zitat G. Rustici, J. Mata, K. Kivinen, P. Lió, C.J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse, J. Bähler: Periodic gene expression program of the fission yeast cell cycle, Nat. Genet. 36, 809–817 (2004)CrossRef G. Rustici, J. Mata, K. Kivinen, P. Lió, C.J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse, J. Bähler: Periodic gene expression program of the fission yeast cell cycle, Nat. Genet. 36, 809–817 (2004)CrossRef
14.18.
Zurück zum Zitat E. Tsiporkova, V. Boeva: Two-pass imputation algorithm for missing value estimation in gene expression time series, J. Bioinf. Comput. Biol. 5(5), 1005–1022 (2007)CrossRef E. Tsiporkova, V. Boeva: Two-pass imputation algorithm for missing value estimation in gene expression time series, J. Bioinf. Comput. Biol. 5(5), 1005–1022 (2007)CrossRef
14.19.
Zurück zum Zitat V. Boeva, E. Tsiporkova: A multipurpose time series data standardization method. Intelligent systems: From theory to practice, Stud. Comput. Intell. 299, 445–460 (2010)CrossRef V. Boeva, E. Tsiporkova: A multipurpose time series data standardization method. Intelligent systems: From theory to practice, Stud. Comput. Intell. 299, 445–460 (2010)CrossRef
14.20.
Zurück zum Zitat A.K. Jain, M.N. Murty, P.J. Flynn: Data clustering: A review, ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef A.K. Jain, M.N. Murty, P.J. Flynn: Data clustering: A review, ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef
14.21.
Zurück zum Zitat M. Ester, H.P. Kriegel, J. Sander, X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd ACM SIGKDD, Portland (1996) pp. 226–231 M. Ester, H.P. Kriegel, J. Sander, X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd ACM SIGKDD, Portland (1996) pp. 226–231
14.22.
Zurück zum Zitat M. Eisen, P.T Spollman, P.O Brown, D. Botstein: Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRef M. Eisen, P.T Spollman, P.O Brown, D. Botstein: Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRef
14.23.
Zurück zum Zitat S. Datta, S. Datta: Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics 19, 459–466 (2003)CrossRef S. Datta, S. Datta: Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics 19, 459–466 (2003)CrossRef
14.24.
Zurück zum Zitat J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Prob. 1, 281–297 (1967)MathSciNet J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Prob. 1, 281–297 (1967)MathSciNet
14.25.
Zurück zum Zitat L. Kaufman, P.J. Rousseeuw: Fitting Groups in Data: An Introduction to Cluster Analysis (Wiley, New York 1990)CrossRef L. Kaufman, P.J. Rousseeuw: Fitting Groups in Data: An Introduction to Cluster Analysis (Wiley, New York 1990)CrossRef
14.26.
Zurück zum Zitat G. Babu, M. Murty: A near optimal initial seed value selection in k-means algorithm using a genetic algorithm, Pattern Recognit. Lett. 14, 763–769 (1993)CrossRefMATH G. Babu, M. Murty: A near optimal initial seed value selection in k-means algorithm using a genetic algorithm, Pattern Recognit. Lett. 14, 763–769 (1993)CrossRefMATH
14.27.
Zurück zum Zitat S.S. Khan, A. Ahmad: Cluster center initialization algorithm for k-means clustering, Pattern Recognit. Lett. 25, 1293–1302 (2004)CrossRef S.S. Khan, A. Ahmad: Cluster center initialization algorithm for k-means clustering, Pattern Recognit. Lett. 25, 1293–1302 (2004)CrossRef
14.28.
Zurück zum Zitat M. Al-Daoud: A new algorithm for cluster initialization, World Acad. Sci. Eng. Technol. 4, 74–76 (2005) M. Al-Daoud: A new algorithm for cluster initialization, World Acad. Sci. Eng. Technol. 4, 74–76 (2005)
14.30.
Zurück zum Zitat M. Halkidi, Y. Batistakis, M. Vazirgiannis: On clustering validation techniques, J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)CrossRefMATH M. Halkidi, Y. Batistakis, M. Vazirgiannis: On clustering validation techniques, J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)CrossRefMATH
14.31.
Zurück zum Zitat S. Theodoridis, K. Koutroubas: Pattern Recognition (Academic, New York 1999) S. Theodoridis, K. Koutroubas: Pattern Recognition (Academic, New York 1999)
14.32.
Zurück zum Zitat A.K. Jain, R.C. Dubes: Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs 2006) A.K. Jain, R.C. Dubes: Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs 2006)
14.33.
Zurück zum Zitat J. Handl, J. Knowles, D.B. Bell: Computational cluster validation in post-genomic data analysis, Bioinformatics 21, 3201–3212 (2005)CrossRef J. Handl, J. Knowles, D.B. Bell: Computational cluster validation in post-genomic data analysis, Bioinformatics 21, 3201–3212 (2005)CrossRef
14.34.
Zurück zum Zitat P. Rousseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH P. Rousseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH
14.36.
Zurück zum Zitat S. Maere, K. Heymans, M. Kuiper: BiNGO: A Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21, 3448–3449 (2005)CrossRef S. Maere, K. Heymans, M. Kuiper: BiNGO: A Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21, 3448–3449 (2005)CrossRef
Metadaten
Titel
Analysis of Multiple DNA Microarray Datasets
verfasst von
Veselka Boeva
Elena Tsiporkova
Elena Kostadinova
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-642-30574-0_14