Skip to main content
Top

2018 | OriginalPaper | Chapter

Improving Metagenomic Assemblies Through Data Partitioning: A GC Content Approach

Authors : Fábio Miranda, Cassio Batista, Artur Silva, Jefferson Morais, Nelson Neto, Rommel Ramos

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Assembling metagenomic data sequenced by NGS platforms poses significant computational challenges, especially due to large volumes of data, sequencing errors, and variations in size, complexity, diversity and abundance of organisms present in a given metagenome. To overcome these problems, this work proposes an open-source, bioinformatic tool called GCSplit, which partitions metagenomic sequences into subsets using a computationally inexpensive metric: the GC content. Experiments performed on real data show that preprocessing short reads with GCSplit prior to assembly reduces memory consumption and generates higher quality results, such as an increase in the size of the largest contig and N50 metric, while both the L50 value and the total number of contigs produced in the assembly were reduced. GCSplit is available at https://​github.​com/​mirand863/​gcsplit.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Vogel, T.M., Simonet, P., Jansson, J.K., et al.: TerraGenome: a consortium for the sequencing of a soil metagenome. Nat. Rev. Microbiol. 7, 252 (2009)CrossRef Vogel, T.M., Simonet, P., Jansson, J.K., et al.: TerraGenome: a consortium for the sequencing of a soil metagenome. Nat. Rev. Microbiol. 7, 252 (2009)CrossRef
2.
go back to reference Venter, J.C., Remington, K., Heidelberg, J.F., et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004)CrossRef Venter, J.C., Remington, K., Heidelberg, J.F., et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004)CrossRef
3.
go back to reference Qin, J., Li, R., Raes, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010)CrossRef Qin, J., Li, R., Raes, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010)CrossRef
4.
go back to reference Turnbaugh, P.J., Ley, R.E., Hamady, M., et al.: The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449, 804–810 (2007)CrossRef Turnbaugh, P.J., Ley, R.E., Hamady, M., et al.: The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449, 804–810 (2007)CrossRef
5.
go back to reference Namiki, T., Hachiya, T., Tanaka, H., et al.: MetaVelvet: an extension of Velvet assembler to De Novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012)CrossRef Namiki, T., Hachiya, T., Tanaka, H., et al.: MetaVelvet: an extension of Velvet assembler to De Novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012)CrossRef
6.
go back to reference Rodrigue, S., Materna, A.C., Timberlake, S., et al.: Unlocking short read sequencing for metagenomics. PLoS ONE 5, e11840 (2010)CrossRef Rodrigue, S., Materna, A.C., Timberlake, S., et al.: Unlocking short read sequencing for metagenomics. PLoS ONE 5, e11840 (2010)CrossRef
7.
go back to reference Nielsen, H.B., Almeida, M., Juncker, A.S., et al.: Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014)CrossRef Nielsen, H.B., Almeida, M., Juncker, A.S., et al.: Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014)CrossRef
8.
go back to reference Wojcieszek, M., Pawełkowicz, M., Nowak, R., et al.: Genomes correction and assembling: present methods and tools. In: SPIE Proceedings, vol. 9290, p. 92901X (2014) Wojcieszek, M., Pawełkowicz, M., Nowak, R., et al.: Genomes correction and assembling: present methods and tools. In: SPIE Proceedings, vol. 9290, p. 92901X (2014)
9.
go back to reference Charuvaka, A., Rangwala, H.: Evaluation of short read metagenomic assembly. BMC Genom. 12, S8 (2011)CrossRef Charuvaka, A., Rangwala, H.: Evaluation of short read metagenomic assembly. BMC Genom. 12, S8 (2011)CrossRef
10.
go back to reference Rasheed, Z., Rangwala, H.: Mc-MinH: metagenome clustering using minwise based hashing. In: SIAM International Conference in Data Mining, pp. 677–685 (2013) Rasheed, Z., Rangwala, H.: Mc-MinH: metagenome clustering using minwise based hashing. In: SIAM International Conference in Data Mining, pp. 677–685 (2013)
11.
go back to reference Howe, A.C., Jansson, J.K., Malfatti, S.A., et al.: Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl. Acad. Sci. 111, 4904–4909 (2014)CrossRef Howe, A.C., Jansson, J.K., Malfatti, S.A., et al.: Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl. Acad. Sci. 111, 4904–4909 (2014)CrossRef
12.
go back to reference Nurk, S., Meleshko, D., Korobeynikov, A., et al.: metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017)CrossRef Nurk, S., Meleshko, D., Korobeynikov, A., et al.: metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017)CrossRef
13.
go back to reference Brown, C.T., Howe, A., Zhang, Q., et al.: A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv:1203.4802 (2012) Brown, C.T., Howe, A., Zhang, Q., et al.: A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv:​1203.​4802 (2012)
14.
go back to reference Haas, B.J., Papanicolaou, A., Yassour, M., et al.: De Novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013)CrossRef Haas, B.J., Papanicolaou, A., Yassour, M., et al.: De Novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013)CrossRef
15.
go back to reference McCorrison, J.M., Venepally, P., Singh, I., et al.: NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly. BMC bioinform. 15, 357 (2014)CrossRef McCorrison, J.M., Venepally, P., Singh, I., et al.: NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly. BMC bioinform. 15, 357 (2014)CrossRef
16.
17.
go back to reference Pell, J., Hintze, A., Canino-Koning, R., et al.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. 109, 13272–13277 (2012)MathSciNetCrossRefMATH Pell, J., Hintze, A., Canino-Koning, R., et al.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. 109, 13272–13277 (2012)MathSciNetCrossRefMATH
18.
go back to reference Crusoe, M.R., Alameldin, H.F., Awad, S., et al.: The khmer software package: enabling efficient nucleotide sequence analysis. F1000Research 4, 900 (2015) Crusoe, M.R., Alameldin, H.F., Awad, S., et al.: The khmer software package: enabling efficient nucleotide sequence analysis. F1000Research 4, 900 (2015)
19.
go back to reference Rengasamy, V., Medvedev, P., Madduri, K.: Parallel and memory-efficient preprocessing for metagenome assembly. In: IPDPSW, pp. 283–292 (2017) Rengasamy, V., Medvedev, P., Madduri, K.: Parallel and memory-efficient preprocessing for metagenome assembly. In: IPDPSW, pp. 283–292 (2017)
20.
go back to reference Cleary, B., Brito, I.L., Huang, K., et al.: Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015)CrossRef Cleary, B., Brito, I.L., Huang, K., et al.: Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015)CrossRef
21.
go back to reference Melsted, P., Halldórsson, B.V.: KmerStream: streaming algorithms for k-mer abundance estimation. Bioinformatics 30, 3541–3547 (2014)CrossRef Melsted, P., Halldórsson, B.V.: KmerStream: streaming algorithms for k-mer abundance estimation. Bioinformatics 30, 3541–3547 (2014)CrossRef
22.
go back to reference Bankevich, A., Nurk, S., Antipov, D., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012)MathSciNetCrossRef Bankevich, A., Nurk, S., Antipov, D., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012)MathSciNetCrossRef
23.
go back to reference Stamps, B.W., Corsetti, F.A., Spear, J.R., et al.: Draft genome of a novel Chlorobi member assembled by tetranucleotide binning of a hot spring metagenome. Genome Announc. 2, e00897–e00914 (2014) Stamps, B.W., Corsetti, F.A., Spear, J.R., et al.: Draft genome of a novel Chlorobi member assembled by tetranucleotide binning of a hot spring metagenome. Genome Announc. 2, e00897–e00914 (2014)
24.
go back to reference Ibarbalz, F.M., Orellana, E., Figuerola, E.L., et al.: Shotgun metagenomic profiles have a high capacity to discriminate samples of activated sludge according to wastewater type. Appl. Environ. Microbiol. 82, 5186–5196 (2016)CrossRef Ibarbalz, F.M., Orellana, E., Figuerola, E.L., et al.: Shotgun metagenomic profiles have a high capacity to discriminate samples of activated sludge according to wastewater type. Appl. Environ. Microbiol. 82, 5186–5196 (2016)CrossRef
25.
go back to reference Gurevich, A., Saveliev, V., Vyahhi, N., et al.: QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013)CrossRef Gurevich, A., Saveliev, V., Vyahhi, N., et al.: QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013)CrossRef
Metadata
Title
Improving Metagenomic Assemblies Through Data Partitioning: A GC Content Approach
Authors
Fábio Miranda
Cassio Batista
Artur Silva
Jefferson Morais
Nelson Neto
Rommel Ramos
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-78723-7_36

Premium Partner