Skip to main content
Top

2015 | OriginalPaper | Chapter

Predicting the Metagenomics Content with Multiple CART Trees

Authors : Dante Travisany, Diego Galarce, Alejandro Maass, Rodrigo Assar

Published in: Mathematical Models in Biology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Metagenomics is a technique for the characterization and identification of microbial genomes using direct isolation of genomic DNA from the environment without cultivation. One of the key step in this process is the taxonomic classification and clustering of the DNA fragments, process also known as binning. To date, the most common practice is classifying through alignments to public databases. When a representing specie is present in this database the process is simple and successful, if not, an underestimation of taxonomic abundances is produced. In this work we propose a alignment-free method capable of assign taxa to each read in the sample by analyzing the statistical properties of the reads. Given an environment, we collect genomes from public available databases and generate genomic fragments libraries. Then, statistics of k-mer frequencies, GC ratio and GC skew are computed for each read and stored in an environment-associated dataset used to build a robust machine learning procedure based on multiple CART trees. Finally, for each read the CART trees are asked about their taxa and the most voted ones are selected. The method was tested using simulated and public human gut microbiome data sets. The database was constructed using 98 genera present in Gastrointestinal Tract available at Human Microbiome Project. A multiple CART tree with 558-trees predictor was generated, capable to estimate the genus and abundance in the sample with 47 % of accuracy in read assignments. Performance rates are comparable with those from semi-supervised methods and also the computation times were reduced due to alignment-free methodology. Restricted to 17 early considered genera, our method increases its accuracy to 77 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abubucker, S., Segata, N, Goll, J., Schubert, A.M., Izard, J., Cantarel, B.L., Rodriguez-Mueller, B., Zucker, J., Thiagarajan, M., Henrissat, B., White, O., Kelley, S.T., Meth, B., Schloss, P.D., Gevers, D., Mitreva, M., Huttenhower, C.: Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8(6), e1002358 (2012)CrossRef Abubucker, S., Segata, N, Goll, J., Schubert, A.M., Izard, J., Cantarel, B.L., Rodriguez-Mueller, B., Zucker, J., Thiagarajan, M., Henrissat, B., White, O., Kelley, S.T., Meth, B., Schloss, P.D., Gevers, D., Mitreva, M., Huttenhower, C.: Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8(6), e1002358 (2012)CrossRef
2.
go back to reference Bentley, S.D., Parkhill, J.: Comparative genomic structure of prokaryotes. Ann. Rev. Genet. 38(1), 771–791 (2004)CrossRef Bentley, S.D., Parkhill, J.: Comparative genomic structure of prokaryotes. Ann. Rev. Genet. 38(1), 771–791 (2004)CrossRef
3.
go back to reference Brady, A., Salzberg, S.L.: Phymm and phymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)CrossRef Brady, A., Salzberg, S.L.: Phymm and phymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)CrossRef
4.
go back to reference Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC, New York (1984)MATH Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC, New York (1984)MATH
5.
go back to reference Chatterji, S., Yamazaki, I., Bai, Z., Eisen,J.A.: Compostbin: a dna composition-based algorithm for binning environmental shotgun reads. In: Research in Computational Molecular Biology, pp. 17–28. Springer, Heidelberg (2008) Chatterji, S., Yamazaki, I., Bai, Z., Eisen,J.A.: Compostbin: a dna composition-based algorithm for binning environmental shotgun reads. In: Research in Computational Molecular Biology, pp. 17–28. Springer, Heidelberg (2008)
6.
go back to reference Chernov, A.V., Reyes, L., Xu, Z., Gonzalez, B., Golovko, G., Peterson, S., Perucho, M., Fofanov, Y., Strongin, A.Y.: Mycoplasma CG- and GATC-specific DNA methyltransferases selectively and efficiently methylate the host genome and alter the epigenetic landscape in human cells. Epigenetics 10(4), 303–318 (2015)CrossRef Chernov, A.V., Reyes, L., Xu, Z., Gonzalez, B., Golovko, G., Peterson, S., Perucho, M., Fofanov, Y., Strongin, A.Y.: Mycoplasma CG- and GATC-specific DNA methyltransferases selectively and efficiently methylate the host genome and alter the epigenetic landscape in human cells. Epigenetics 10(4), 303–318 (2015)CrossRef
7.
go back to reference Dong, H., Chen, Y., Shen, Y., Wang, S., Zhao, G., Jin, W.: Artificial duplicate reads in sequencing data of 454 genome sequencer flx system. Acta Biochim. Biophys. Sin. 43(6), 496–500 (2011)CrossRef Dong, H., Chen, Y., Shen, Y., Wang, S., Zhao, G., Jin, W.: Artificial duplicate reads in sequencing data of 454 genome sequencer flx system. Acta Biochim. Biophys. Sin. 43(6), 496–500 (2011)CrossRef
8.
go back to reference Drezen, E., Rizk, G., Chikhi, R., Deltel, C., Lemaitre, C., Peterlongo, P., Lavenier, D.: Gatb: genome assembly & analysis tool box. Bioinformatics 30(20), 2959–2961 (2014)CrossRef Drezen, E., Rizk, G., Chikhi, R., Deltel, C., Lemaitre, C., Peterlongo, P., Lavenier, D.: Gatb: genome assembly & analysis tool box. Bioinformatics 30(20), 2959–2961 (2014)CrossRef
9.
go back to reference Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)CrossRef Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)CrossRef
11.
go back to reference Hdar, C., Assar, R., Colombres, M., Aravena, A., Pavez, L., Gonzlez, M., Martnez, S., Inestrosa, N.C., Maass, A.: Genome-wide identification of new Wnt/-catenin target genes in the human genome using CART method. BMC Genomics 11(1), 348 (2010)CrossRef Hdar, C., Assar, R., Colombres, M., Aravena, A., Pavez, L., Gonzlez, M., Martnez, S., Inestrosa, N.C., Maass, A.: Genome-wide identification of new Wnt/-catenin target genes in the human genome using CART method. BMC Genomics 11(1), 348 (2010)CrossRef
12.
go back to reference Hugenholtz, P., Tyson, G.W.: Microbiology: metagenomics. Nature 455(7212), 481–483 (2008)CrossRef Hugenholtz, P., Tyson, G.W.: Microbiology: metagenomics. Nature 455(7212), 481–483 (2008)CrossRef
13.
go back to reference Johnson, S., Trost, B., Long, J.R., Pittet, V., Kusalik, A.: A better sequence-read simulator program for metagenomics. BMC Bioinf. 15(Suppl 9), S14 (2014)CrossRef Johnson, S., Trost, B., Long, J.R., Pittet, V., Kusalik, A.: A better sequence-read simulator program for metagenomics. BMC Bioinf. 15(Suppl 9), S14 (2014)CrossRef
14.
go back to reference Lan, R., Reeves, P.R.: Escherichia coli in disguise: molecular origins of shigella. Microbes Infect. 4(11), 1125–1132 (2002)CrossRef Lan, R., Reeves, P.R.: Escherichia coli in disguise: molecular origins of shigella. Microbes Infect. 4(11), 1125–1132 (2002)CrossRef
15.
go back to reference Leonard, M.T., Davis-Richardson, A.G., Ardissone, A.N., Kemppainen, K.M., Drew, J.C., Ilonen, J., Knip, M., Simell, O., Toppari, J., Veijola, R. et al.: The methylome of the gut microbiome: disparate dam methylation patterns in intestinal bacteroides dorei. Front. Microbiol. 5, 361 (2014)CrossRef Leonard, M.T., Davis-Richardson, A.G., Ardissone, A.N., Kemppainen, K.M., Drew, J.C., Ilonen, J., Knip, M., Simell, O., Toppari, J., Veijola, R. et al.: The methylome of the gut microbiome: disparate dam methylation patterns in intestinal bacteroides dorei. Front. Microbiol. 5, 361 (2014)CrossRef
16.
go back to reference Lysholm, F., Andersson, B., Persson, B.: An efficient simulator of 454 data using configurable statistical models. BMC Res. Notes 4(1), 449 (2011)CrossRef Lysholm, F., Andersson, B., Persson, B.: An efficient simulator of 454 data using configurable statistical models. BMC Res. Notes 4(1), 449 (2011)CrossRef
17.
go back to reference Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A, Stevens, R., Wilke, A. et al.: The metagenomics rast server–a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinf. 9(1), 386 (2008)CrossRef Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T., Rodriguez, A, Stevens, R., Wilke, A. et al.: The metagenomics rast server–a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinf. 9(1), 386 (2008)CrossRef
18.
go back to reference Poinar, H.N., Schwarz, C., Qi, J., Shapiro, B., MacPhee, R.D.E., Buigues, B., Tikhonov, A., Huson, D.H., Tomsho, L.P., Auch, A., Rampp, M., Miller, W., Schuster, S.C.: Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311(5759), 392–394 (2006)CrossRef Poinar, H.N., Schwarz, C., Qi, J., Shapiro, B., MacPhee, R.D.E., Buigues, B., Tikhonov, A., Huson, D.H., Tomsho, L.P., Auch, A., Rampp, M., Miller, W., Schuster, S.C.: Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311(5759), 392–394 (2006)CrossRef
19.
go back to reference Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., Mende, D.R., Li, J., Xu, J., Li, S., Li, D., Cao, J., Wang, B., Liang, H., Zheng, H., Xie, Y., Tap, J., Lepage, P., Bertalan, M., Batto, J.-M., Hansen, T., Paslier, D.L., Linneberg, A., Nielsen, H.B., Pelletier, E., Renault, P., Sicheritz-Ponten, T., Turner, K., Zhu, H., Yu, C., Li, S., Jian, M., Zhou, Y., Li, Y., Zhang, X., Li, S., Qin, N., Yang, H., Wang, J., Brunak, S., Dor, J., Guarner, F., Kristiansen, K., Pedersen, O., Parkhill, J., Weissenbach, J., Bork, P., Ehrlich, S.D., Wang, J.: A human gut microbial gene catalog established by metagenomic sequencing. Nature 464(7285), 59–65 (2010) Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., Mende, D.R., Li, J., Xu, J., Li, S., Li, D., Cao, J., Wang, B., Liang, H., Zheng, H., Xie, Y., Tap, J., Lepage, P., Bertalan, M., Batto, J.-M., Hansen, T., Paslier, D.L., Linneberg, A., Nielsen, H.B., Pelletier, E., Renault, P., Sicheritz-Ponten, T., Turner, K., Zhu, H., Yu, C., Li, S., Jian, M., Zhou, Y., Li, Y., Zhang, X., Li, S., Qin, N., Yang, H., Wang, J., Brunak, S., Dor, J., Guarner, F., Kristiansen, K., Pedersen, O., Parkhill, J., Weissenbach, J., Bork, P., Ehrlich, S.D., Wang, J.: A human gut microbial gene catalog established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)
20.
go back to reference Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSimA sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)CrossRef Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSimA sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)CrossRef
21.
go back to reference Riva, A., Delorme, M.-O., Chevalier, T., Guilhot, N., Hénaut, C., Hénaut, A.: The difficult interpretation of transcriptome data: the case of the gatc regulatory network. Comput. Biol. Chem. 28(2), 109–118 (2004)MATHCrossRef Riva, A., Delorme, M.-O., Chevalier, T., Guilhot, N., Hénaut, C., Hénaut, A.: The difficult interpretation of transcriptome data: the case of the gatc regulatory network. Comput. Biol. Chem. 28(2), 109–118 (2004)MATHCrossRef
22.
go back to reference Rodrigue, S., Materna, A.C., Timberlake, S.C., Blackburn, M.C., Malmstrom, R.R., Alm, E.J., Chisholm, S.W.: Unlocking short read sequencing for metagenomics. PLoS ONE 5(7), e11840 (2010)CrossRef Rodrigue, S., Materna, A.C., Timberlake, S.C., Blackburn, M.C., Malmstrom, R.R., Alm, E.J., Chisholm, S.W.: Unlocking short read sequencing for metagenomics. PLoS ONE 5(7), e11840 (2010)CrossRef
23.
go back to reference Segata, N., Boernigen, D., Tickle, T.L., Morgan, X.C., Garrett, W.S., Huttenhower, C.: Computational meta’omics for microbial community studies. Mol. Syst. Biol. 9(1), 666 (2013)CrossRef Segata, N., Boernigen, D., Tickle, T.L., Morgan, X.C., Garrett, W.S., Huttenhower, C.: Computational meta’omics for microbial community studies. Mol. Syst. Biol. 9(1), 666 (2013)CrossRef
24.
go back to reference Tringe, S.G., Rubin, E.M.: Metagenomics: DNA sequencing of environmental samples. Nat. Rev. Genet. 6(11), 805–814 (2005)CrossRef Tringe, S.G., Rubin, E.M.: Metagenomics: DNA sequencing of environmental samples. Nat. Rev. Genet. 6(11), 805–814 (2005)CrossRef
25.
go back to reference Turnbaugh, P.J., Ley, R.E., Hamady, M., Fraser-Liggett, C.M., Knight, R., Gordon, J.I.: The human microbiome project. Nature 449(7164), 804–810 (2007)CrossRef Turnbaugh, P.J., Ley, R.E., Hamady, M., Fraser-Liggett, C.M., Knight, R., Gordon, J.I.: The human microbiome project. Nature 449(7164), 804–810 (2007)CrossRef
26.
go back to reference Valenzuela, M., Bravo, D., Canales, J., Sanhueza, C., Daz, N., Almarza, O., Toledo, H., Quest, A.F.G.: Helicobacter pyloriInduced loss of survivin and gastric cell viability is attributable to secreted bacterial Gamma-glutamyl transpeptidase activity. J. Infect. Dis. 208(7), jit286 (2013) Valenzuela, M., Bravo, D., Canales, J., Sanhueza, C., Daz, N., Almarza, O., Toledo, H., Quest, A.F.G.: Helicobacter pyloriInduced loss of survivin and gastric cell viability is attributable to secreted bacterial Gamma-glutamyl transpeptidase activity. J. Infect. Dis. 208(7), jit286 (2013)
27.
go back to reference Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.-H., Smith, H.O.: Environmental genome shotgun sequencing of the sargasso Sea. Science 304(5667), 66–74 (2004)CrossRef Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.-H., Smith, H.O.: Environmental genome shotgun sequencing of the sargasso Sea. Science 304(5667), 66–74 (2004)CrossRef
28.
go back to reference Weitschek, E., Santoni, D., Fiscon, G., De Cola, M.C., Bertolazzi, P., Felici, G.: Next generation sequencing reads comparison with an alignment-free distance. BMC Research Notes 7, 869 (2014)CrossRef Weitschek, E., Santoni, D., Fiscon, G., De Cola, M.C., Bertolazzi, P., Felici, G.: Next generation sequencing reads comparison with an alignment-free distance. BMC Research Notes 7, 869 (2014)CrossRef
29.
go back to reference Wu, Y.-W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)MathSciNetCrossRef Wu, Y.-W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)MathSciNetCrossRef
30.
go back to reference Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan, I., Dominguez-Bello, M.G., Contreras, M., Magris, M., Hidalgo, G., Baldassano, R.N., Anokhin, A.P., Heath, A.C., Warner, B., Reeder, J., Kuczynski, J., Caporaso, J.G., Lozupone, C.A., Lauber, C., Clemente, J.C., Knights, D., Knight, R., Gordon. J.I.: Human gut microbiome viewed across age and geography. Nature 486(7402), 222–227 (2012) Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan, I., Dominguez-Bello, M.G., Contreras, M., Magris, M., Hidalgo, G., Baldassano, R.N., Anokhin, A.P., Heath, A.C., Warner, B., Reeder, J., Kuczynski, J., Caporaso, J.G., Lozupone, C.A., Lauber, C., Clemente, J.C., Knights, D., Knight, R., Gordon. J.I.: Human gut microbiome viewed across age and geography. Nature 486(7402), 222–227 (2012)
31.
go back to reference Zuo, G., Xu, Z., Hao, B.: Shigella strains are not clones of escherichia coli but sister species in the genus escherichia. Genomics Proteomics Bioinformatics 11(1), 61–65 (2013)CrossRef Zuo, G., Xu, Z., Hao, B.: Shigella strains are not clones of escherichia coli but sister species in the genus escherichia. Genomics Proteomics Bioinformatics 11(1), 61–65 (2013)CrossRef
Metadata
Title
Predicting the Metagenomics Content with Multiple CART Trees
Authors
Dante Travisany
Diego Galarce
Alejandro Maass
Rodrigo Assar
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-23497-7_11

Premium Partner