Skip to main content

2015 | OriginalPaper | Buchkapitel

Latent Forests to Model Genetical Data for the Purpose of Multilocus Genome-Wide Association Studies. Which Clustering Should Be Chosen?

verfasst von : Duc-Thanh Phan, Philippe Leray, Christine Sinoquet

Erschienen in: Biomedical Engineering Systems and Technologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The aim of genetic association studies, and in particular genome-wide association studies (GWASs), is to unravel the genetics of complex diseases. In this domain, machine learning offers an attractive alternative to classical statistical approaches. The seminal works of Mourad et al. [1] have led to the proposal of a novel class of probabilistic graphical models, the forest of latent trees (FLTM). The design of this model was motivated by the necessity to model genetical data at the genome scale, prior to a multilocus GWAS. A multilocus GWAS fully exploits information about the complex dependences existing within genetical data, to help detect the loci associated with the studied pathology. The FLTM framework also allows data dimension reduction. The FLTM model is a hierarchical Bayesian network with latent variables. Central to the FLTM construction is the recursive clustering of variables, in a bottom up subsuming process. This article focuses on the analysis of the impact of the choice of the clustering method used in the FLTM learning algorithm, in a GWAS context. We rely on a real GWAS data set describing 41400 variables for each of 3004 controls and 2005 cases affected by Crohn’s disease, and compare the impact of three clustering methods. We compare statistics about data dimension reduction as well as trends concerning the ability to split or group putative causal SNPs in agreement with the underlying biological reality. To assess the risk of missing significant association results due to subsumption, we also compare the clustering methods through the corresponding FLTM-based GWASs. In the GWAS context and in this framework, the choice of the clustering method does not influence the satisfying performance of the GWAS.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Mourad, R., Sinoquet, C., Leray, P.: A hierarchical bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 12, 16 (2011)CrossRef Mourad, R., Sinoquet, C., Leray, P.: A hierarchical bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 12, 16 (2011)CrossRef
2.
Zurück zum Zitat International Monetary Fund: Macro-fiscal Implications of Health Care Reform in Advanced and Emerging Economies. IMF Policy Paper, Washington (2010) International Monetary Fund: Macro-fiscal Implications of Health Care Reform in Advanced and Emerging Economies. IMF Policy Paper, Washington (2010)
3.
Zurück zum Zitat Hechter, E.: On Genetic Variants Underlying Common Disease. Ph.D. thesis, University of Oxford (2011) Hechter, E.: On Genetic Variants Underlying Common Disease. Ph.D. thesis, University of Oxford (2011)
4.
Zurück zum Zitat Gibbs, R.A., Belmont, J.W., Hardenbol, P., et al.: The international hapmap project. Nature 426(6968), 789–796 (2003)CrossRef Gibbs, R.A., Belmont, J.W., Hardenbol, P., et al.: The international hapmap project. Nature 426(6968), 789–796 (2003)CrossRef
5.
Zurück zum Zitat The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 7319, 1061–1073 (2010) The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 7319, 1061–1073 (2010)
6.
Zurück zum Zitat Balding, D.J.: A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7(10), 781–791 (2006)CrossRef Balding, D.J.: A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7(10), 781–791 (2006)CrossRef
7.
Zurück zum Zitat Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69(1), 1–14 (2001)CrossRef Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69(1), 1–14 (2001)CrossRef
8.
Zurück zum Zitat Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547), 1719–1723 (2001)CrossRef Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547), 1719–1723 (2001)CrossRef
9.
Zurück zum Zitat Abel, H.J., Thomas, A.: Accuracy and computational efficiency of a graphical modeling approach to linkage disequilibrium estimation. Stat. Appl. Genet. Mol. Biol. 10(1), Article 5 (2011) Abel, H.J., Thomas, A.: Accuracy and computational efficiency of a graphical modeling approach to linkage disequilibrium estimation. Stat. Appl. Genet. Mol. Biol. 10(1), Article 5 (2011)
10.
Zurück zum Zitat Verzilli, C.J., Stallard, N., Whittaker, J.C.: Bayesian graphical models for genome-wide association studies. Am. J. Hum. Genet. 79, 100–112 (2006)CrossRef Verzilli, C.J., Stallard, N., Whittaker, J.C.: Bayesian graphical models for genome-wide association studies. Am. J. Hum. Genet. 79, 100–112 (2006)CrossRef
11.
Zurück zum Zitat Browning, B.L., Browning, S.R.: Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet. Epidemiol. 31, 365–375 (2007)CrossRef Browning, B.L., Browning, S.R.: Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet. Epidemiol. 31, 365–375 (2007)CrossRef
12.
Zurück zum Zitat Ackerman, M., Ben-David, S.: Clusterability: A theoretical study. In: 12th International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 1–8 (2009). J. Mach. Learn. Res Ackerman, M., Ben-David, S.: Clusterability: A theoretical study. In: 12th International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 1–8 (2009). J. Mach. Learn. Res
13.
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef
14.
Zurück zum Zitat Robinson, R.W.: Counting unlabeled acyclic digraphs. In: Little, C.H.C. (ed.) Combinatorial Mathematics V. Lecture Notes in Mathematics, vol. 622, pp. 28–43. Springer, New York (1977)CrossRef Robinson, R.W.: Counting unlabeled acyclic digraphs. In: Little, C.H.C. (ed.) Combinatorial Mathematics V. Lecture Notes in Mathematics, vol. 622, pp. 28–43. Springer, New York (1977)CrossRef
15.
Zurück zum Zitat Zhang, N.L.: Hierarchical latent class models for cluster analysis. J. Mach. Learn. Res. 5(6), 697–723 (2004)MATH Zhang, N.L.: Hierarchical latent class models for cluster analysis. J. Mach. Learn. Res. 5(6), 697–723 (2004)MATH
16.
Zurück zum Zitat Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. In: 3rd Annual International Conference on Computational Molecular Biology, pp. 33–42 (1999) Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. In: 3rd Annual International Conference on Computational Molecular Biology, pp. 33–42 (1999)
17.
Zurück zum Zitat Cahill, J.: Error-Tolerant Clustering of Gene Microarray Data. Bachelor’s Honors thesis, Boston College, Massachusetts (2002) Cahill, J.: Error-Tolerant Clustering of Gene Microarray Data. Bachelor’s Honors thesis, Boston College, Massachusetts (2002)
18.
Zurück zum Zitat Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996) Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)
19.
Zurück zum Zitat Meila, M. Comparing clusterings: an axiomatic view. In: 22nd International Conference on Machine learning, pp. 577–584 (2005) Meila, M. Comparing clusterings: an axiomatic view. In: 22nd International Conference on Machine learning, pp. 577–584 (2005)
20.
Zurück zum Zitat Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef
21.
Zurück zum Zitat Mirkin, B.: Mathematical classification and clustering: from how to what and why. J. Classifi. 2(1), 193–218 (1998) Mirkin, B.: Mathematical classification and clustering: from how to what and why. J. Classifi. 2(1), 193–218 (1998)
22.
Zurück zum Zitat Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)MATHCrossRef Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)MATHCrossRef
23.
Zurück zum Zitat Purcell, S., Neale, B., Todd-Brown, K., et al.: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81(3), 559–575 (2007)CrossRef Purcell, S., Neale, B., Todd-Brown, K., et al.: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81(3), 559–575 (2007)CrossRef
24.
Zurück zum Zitat Gabriel, S.B., Schaffner, S.F., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., Altshuler, D.: The structure of haplotype blocks in the human genome. Science 296(5576), 2225–2229 (2002)CrossRef Gabriel, S.B., Schaffner, S.F., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., Altshuler, D.: The structure of haplotype blocks in the human genome. Science 296(5576), 2225–2229 (2002)CrossRef
25.
Zurück zum Zitat Wang, N., Akey, J.M., Zhang, K., Chakraborty, R., Jin, L.: Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet. 71(5), 1227–1234 (2002)CrossRef Wang, N., Akey, J.M., Zhang, K., Chakraborty, R., Jin, L.: Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet. 71(5), 1227–1234 (2002)CrossRef
26.
Zurück zum Zitat Wellcome trust case control consortium: genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661–678 (2007) Wellcome trust case control consortium: genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661–678 (2007)
27.
Zurück zum Zitat Barrett, J.C., Hansoul, S., Nicolae, D.L., et al.: Genome-wide association defines more than 30 Distinct susceptibility loci for crohn’s disease. Nat. Genet. 40(8), 955–962 (2008)CrossRef Barrett, J.C., Hansoul, S., Nicolae, D.L., et al.: Genome-wide association defines more than 30 Distinct susceptibility loci for crohn’s disease. Nat. Genet. 40(8), 955–962 (2008)CrossRef
Metadaten
Titel
Latent Forests to Model Genetical Data for the Purpose of Multilocus Genome-Wide Association Studies. Which Clustering Should Be Chosen?
verfasst von
Duc-Thanh Phan
Philippe Leray
Christine Sinoquet
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-27707-3_11