nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Latent Forests to Model Genetical Data for the Purpose of Multilocus Genome-Wide Association Studies. Which Clustering Should Be Chosen?

verfasst von : Duc-Thanh Phan, Philippe Leray, Christine Sinoquet

Erschienen in: Biomedical Engineering Systems and Technologies

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The aim of genetic association studies, and in particular genome-wide association studies (GWASs), is to unravel the genetics of complex diseases. In this domain, machine learning offers an attractive alternative to classical statistical approaches. The seminal works of Mourad et al. [1] have led to the proposal of a novel class of probabilistic graphical models, the forest of latent trees (FLTM). The design of this model was motivated by the necessity to model genetical data at the genome scale, prior to a multilocus GWAS. A multilocus GWAS fully exploits information about the complex dependences existing within genetical data, to help detect the loci associated with the studied pathology. The FLTM framework also allows data dimension reduction. The FLTM model is a hierarchical Bayesian network with latent variables. Central to the FLTM construction is the recursive clustering of variables, in a bottom up subsuming process. This article focuses on the analysis of the impact of the choice of the clustering method used in the FLTM learning algorithm, in a GWAS context. We rely on a real GWAS data set describing 41400 variables for each of 3004 controls and 2005 cases affected by Crohn’s disease, and compare the impact of three clustering methods. We compare statistics about data dimension reduction as well as trends concerning the ability to split or group putative causal SNPs in agreement with the underlying biological reality. To assess the risk of missing significant association results due to subsumption, we also compare the clustering methods through the corresponding FLTM-based GWASs. In the GWAS context and in this framework, the choice of the clustering method does not influence the satisfying performance of the GWAS.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel An Iterative Mesh Optimization Method for 3D Meristem Reconstruction at Cell Level

Nächstes Kapitel Crosstalk Network Biomarkers of a Pathogen-Host Interaction Difference Network from Innate to Adaptive Immunity

Mourad, R., Sinoquet, C., Leray, P.: A hierarchical bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 12, 16 (2011)CrossRef

International Monetary Fund: Macro-fiscal Implications of Health Care Reform in Advanced and Emerging Economies. IMF Policy Paper, Washington (2010)

Hechter, E.: On Genetic Variants Underlying Common Disease. Ph.D. thesis, University of Oxford (2011)

Gibbs, R.A., Belmont, J.W., Hardenbol, P., et al.: The international hapmap project. Nature 426(6968), 789–796 (2003)CrossRef

The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 7319, 1061–1073 (2010)

Balding, D.J.: A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7(10), 781–791 (2006)CrossRef

Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69(1), 1–14 (2001)CrossRef

Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547), 1719–1723 (2001)CrossRef

Abel, H.J., Thomas, A.: Accuracy and computational efficiency of a graphical modeling approach to linkage disequilibrium estimation. Stat. Appl. Genet. Mol. Biol. 10(1), Article 5 (2011)

10.

Verzilli, C.J., Stallard, N., Whittaker, J.C.: Bayesian graphical models for genome-wide association studies. Am. J. Hum. Genet. 79, 100–112 (2006)CrossRef

11.

Browning, B.L., Browning, S.R.: Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet. Epidemiol. 31, 365–375 (2007)CrossRef

12.

Ackerman, M., Ben-David, S.: Clusterability: A theoretical study. In: 12th International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 1–8 (2009). J. Mach. Learn. Res

13.

Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef

14.

Robinson, R.W.: Counting unlabeled acyclic digraphs. In: Little, C.H.C. (ed.) Combinatorial Mathematics V. Lecture Notes in Mathematics, vol. 622, pp. 28–43. Springer, New York (1977)CrossRef

15.

Zhang, N.L.: Hierarchical latent class models for cluster analysis. J. Mach. Learn. Res. 5(6), 697–723 (2004)MATH

16.

Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. In: 3rd Annual International Conference on Computational Molecular Biology, pp. 33–42 (1999)

17.

Cahill, J.: Error-Tolerant Clustering of Gene Microarray Data. Bachelor’s Honors thesis, Boston College, Massachusetts (2002)

18.

Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)

19.

Meila, M. Comparing clusterings: an axiomatic view. In: 22nd International Conference on Machine learning, pp. 577–584 (2005)

20.

Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef

21.

Mirkin, B.: Mathematical classification and clustering: from how to what and why. J. Classifi. 2(1), 193–218 (1998)

22.

Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)MATHCrossRef

23.

Purcell, S., Neale, B., Todd-Brown, K., et al.: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81(3), 559–575 (2007)CrossRef

24.

Gabriel, S.B., Schaffner, S.F., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., Altshuler, D.: The structure of haplotype blocks in the human genome. Science 296(5576), 2225–2229 (2002)CrossRef

25.

Wang, N., Akey, J.M., Zhang, K., Chakraborty, R., Jin, L.: Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet. 71(5), 1227–1234 (2002)CrossRef

26.

Wellcome trust case control consortium: genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661–678 (2007)

27.

Barrett, J.C., Hansoul, S., Nicolae, D.L., et al.: Genome-wide association defines more than 30 Distinct susceptibility loci for crohn’s disease. Nat. Genet. 40(8), 955–962 (2008)CrossRef

Titel: Latent Forests to Model Genetical Data for the Purpose of Multilocus Genome-Wide Association Studies. Which Clustering Should Be Chosen?
verfasst von: Duc-Thanh Phan
Philippe Leray
Christine Sinoquet
Verlag: Springer International Publishing
Buch: Biomedical Engineering Systems and Technologies
Print ISBN: 978-3-319-27706-6

Electronic ISBN: 978-3-319-27707-3

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-319-27707-3_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"