Skip to main content

2017 | OriginalPaper | Buchkapitel

GAVGA: A Genetic Algorithm for Viral Genome Assembly

verfasst von : Renato R. M. Oliveira, Filipe Damasceno, Ronald Souza, Reginaldo Santos, Manoel Lima, Regiane Kawasaki, Claudomiro Sales

Erschienen in: Progress in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Bioinformatics has grown considerably since the development of the first sequencing machine, being today intensively used with the next generation DNA sequencers. Viral genomes represent a great challenge to bioinformatics due to its high mutation rate, forming quasispecies in the same infected host. In this paper, we implement and evaluate the performance of a genetic algorithm, named GAVGA, through the quality of a viral genome assembly. The assembly process works by first clustering the reads that share a common substring called seed and for each cluster, checks if there are overlapping reads with a given similarity percentage using a genetic algorithm. The assembled data are then compared to Newbler, SPAdes and ABySS assemblers, and also to a viral assembler such as VICUNA, which confirms the feasibility of our approach. GAVGA was implemented in python 2.7+ and can be downloaded at https://​sourceforge.​net/​projects/​gavga-assembler/​.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Nagarajan, N., Pop, M.: Sequence assembly demystified. Nat. Rev. Genet. 14(3), 157–167 (2013)CrossRef Nagarajan, N., Pop, M.: Sequence assembly demystified. Nat. Rev. Genet. 14(3), 157–167 (2013)CrossRef
2.
Zurück zum Zitat Palmer, L.E., Dejori, M., Bolanos, R., Fasulo, D.: Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction. BMC Bioinform. 11(1), 33 (2010)CrossRef Palmer, L.E., Dejori, M., Bolanos, R., Fasulo, D.: Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction. BMC Bioinform. 11(1), 33 (2010)CrossRef
3.
Zurück zum Zitat Goés, F., Alves, R., Corrêa, L., Chaparro, C., Thom, L.: Towards an ensemble learning strategy for metagenomic gene prediction. In: Campos, S. (ed.) BSB 2014. LNCS, vol. 8826, pp. 17–24. Springer, Cham (2014). doi:10.1007/978-3-319-12418-6_3CrossRef Goés, F., Alves, R., Corrêa, L., Chaparro, C., Thom, L.: Towards an ensemble learning strategy for metagenomic gene prediction. In: Campos, S. (ed.) BSB 2014. LNCS, vol. 8826, pp. 17–24. Springer, Cham (2014). doi:10.​1007/​978-3-319-12418-6_​3CrossRef
4.
Zurück zum Zitat Peltola, H., Söderlund, H., Tarhio, J., Ukkonen, E.: Algorithms for some string matching problems arising in molecular genetics. In: IFIP Congress, pp. 59–64 (1983) Peltola, H., Söderlund, H., Tarhio, J., Ukkonen, E.: Algorithms for some string matching problems arising in molecular genetics. In: IFIP Congress, pp. 59–64 (1983)
5.
Zurück zum Zitat Henn, M.R., Boutwell, C.L., Charlebois, P., Lennon, N.J., Power, K.A., Macalalad, A.R., Zody, M.C.: Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 8(3), e1002529 (2012)CrossRef Henn, M.R., Boutwell, C.L., Charlebois, P., Lennon, N.J., Power, K.A., Macalalad, A.R., Zody, M.C.: Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 8(3), e1002529 (2012)CrossRef
6.
Zurück zum Zitat Yang, X., Charlebois, P., Gnerre, S., Coole, M.G., Lennon, N.J., Levin, J.Z., Henn, M.R.: De novo assembly of highly diverse viral populations. BMC Genom. 13(1), 475 (2012)CrossRef Yang, X., Charlebois, P., Gnerre, S., Coole, M.G., Lennon, N.J., Levin, J.Z., Henn, M.R.: De novo assembly of highly diverse viral populations. BMC Genom. 13(1), 475 (2012)CrossRef
7.
Zurück zum Zitat Hunt, M., Gall, A., Ong, S.H., Brener, J., Ferns, B., Goulder, P., Otto, T.D.: IVA: accurate de novo assembly of RNA virus genomes. Bioinformatics 31(14), 2374–2376 (2015)CrossRef Hunt, M., Gall, A., Ong, S.H., Brener, J., Ferns, B., Goulder, P., Otto, T.D.: IVA: accurate de novo assembly of RNA virus genomes. Bioinformatics 31(14), 2374–2376 (2015)CrossRef
8.
Zurück zum Zitat Ruby, J.G., Bellare, P., DeRisi, J.L.: PRICE: software for the targeted assembly of components of (Meta) genomic sequence data. G3: Genes Genomes Genet. 3(5), 865–880 (2013)CrossRef Ruby, J.G., Bellare, P., DeRisi, J.L.: PRICE: software for the targeted assembly of components of (Meta) genomic sequence data. G3: Genes Genomes Genet. 3(5), 865–880 (2013)CrossRef
9.
Zurück zum Zitat Yamashita, A., Sekizuka, T., Kuroda, M.: VirusTAP: Viral genome-targeted assembly pipeline. Frontiers in microbiology 7, 32 (2016) Yamashita, A., Sekizuka, T., Kuroda, M.: VirusTAP: Viral genome-targeted assembly pipeline. Frontiers in microbiology 7, 32 (2016)
10.
Zurück zum Zitat Gusfield, D. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge university press (1997) Gusfield, D. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge university press (1997)
11.
Zurück zum Zitat Szwarcfiter, J.L.: Grafos e algoritimos computacionais, 2nd edn. Rio de Janeiro (1998) Szwarcfiter, J.L.: Grafos e algoritimos computacionais, 2nd edn. Rio de Janeiro (1998)
12.
Zurück zum Zitat Hoffman, A.J., Wolfe, J., Garfinkel, R.S., Johnson, D.S., Papadimitriou, C.H., Gilmore, P.C., Golden, B.L.: The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, Hoboken (1986) Hoffman, A.J., Wolfe, J., Garfinkel, R.S., Johnson, D.S., Papadimitriou, C.H., Gilmore, P.C., Golden, B.L.: The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, Hoboken (1986)
13.
Zurück zum Zitat Bertossi, A.A.: The edge Hamiltonian path problem is NP-complete. Inf. Process. Lett. 13(4–5), 157–159 (1981)MathSciNetCrossRef Bertossi, A.A.: The edge Hamiltonian path problem is NP-complete. Inf. Process. Lett. 13(4–5), 157–159 (1981)MathSciNetCrossRef
14.
Zurück zum Zitat Héam, P.C., Hugot, V., Kouchnarenko, O.: The emptiness problem for tree automata with at least one global disequality constraint is NP-hard. Inf. Process. Lett. 118, 6–9 (2017)MathSciNetCrossRef Héam, P.C., Hugot, V., Kouchnarenko, O.: The emptiness problem for tree automata with at least one global disequality constraint is NP-hard. Inf. Process. Lett. 118, 6–9 (2017)MathSciNetCrossRef
15.
Zurück zum Zitat Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991) Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991)
17.
Zurück zum Zitat Sherry, S.: NCBI SRA Toolkit technology for next generation sequence data. In: Plant and Animal Genome XX Conference (2012) Sherry, S.: NCBI SRA Toolkit technology for next generation sequence data. In: Plant and Animal Genome XX Conference (2012)
18.
Zurück zum Zitat Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30(5), 614–620 (2014)CrossRef Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30(5), 614–620 (2014)CrossRef
19.
Zurück zum Zitat Chu, J., Sadeghi, S., Raymond, A., Jackman, S.D., Nip, K.M., Mar, R., Birol, I.: BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics 30(23), 3402–3404 (2014)CrossRef Chu, J., Sadeghi, S., Raymond, A., Jackman, S.D., Nip, K.M., Mar, R., Birol, I.: BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics 30(23), 3402–3404 (2014)CrossRef
20.
Zurück zum Zitat Schmieder, R., Edwards, R.: Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6), 863–864 (2011)CrossRef Schmieder, R., Edwards, R.: Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6), 863–864 (2011)CrossRef
21.
Zurück zum Zitat Vavak, F., Fogarty, T.C.: Comparison of steady state and generational genetic algorithms for use in nonstationary environments. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 192–195. IEEE (1996) Vavak, F., Fogarty, T.C.: Comparison of steady state and generational genetic algorithms for use in nonstationary environments. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 192–195. IEEE (1996)
22.
Zurück zum Zitat Rechenberg, I.: Evolution Strategy: Optimization of Technical systems by means of biological evolution. Fromman-Holzboog, Stuttgart, 104 (1973) Rechenberg, I.: Evolution Strategy: Optimization of Technical systems by means of biological evolution. Fromman-Holzboog, Stuttgart, 104 (1973)
23.
Zurück zum Zitat Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRef Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRef
24.
Zurück zum Zitat Parsons, R.J., Forrest, S., Burks, C.: Genetic algorithms, operators, and DNA fragment assembly. Mach. Learn. 21(1), 11–33 (1995) Parsons, R.J., Forrest, S., Burks, C.: Genetic algorithms, operators, and DNA fragment assembly. Mach. Learn. 21(1), 11–33 (1995)
25.
Zurück zum Zitat Fraga, J.S. Algoritmos genéticos e o problema de montagem de reads (Master’s thesis) (2014) Fraga, J.S. Algoritmos genéticos e o problema de montagem de reads (Master’s thesis) (2014)
26.
Zurück zum Zitat Kikuchi, S., Chakraborty, G.: Heuristically tuned GA to solve genome fragment assembly problem. In: IEEE Congress on Evolutionary Computation, CEC 2006. pp. 1491–1498. IEEE (2006) Kikuchi, S., Chakraborty, G.: Heuristically tuned GA to solve genome fragment assembly problem. In: IEEE Congress on Evolutionary Computation, CEC 2006. pp. 1491–1498. IEEE (2006)
27.
Zurück zum Zitat Indumathy, R., Maheswari, S.U.: Nature inspired algorithms to solve DNA fragment assembly problem: a survey. Int. J. Bioinform. Res. Appl. 2(2), 45–50 (2012) Indumathy, R., Maheswari, S.U.: Nature inspired algorithms to solve DNA fragment assembly problem: a survey. Int. J. Bioinform. Res. Appl. 2(2), 45–50 (2012)
28.
Zurück zum Zitat Hughes, J., Houghten, S., Mallen-Fullerton, G. M., Ashlock, D.: Recentering and restarting genetic algorithm variations for dna fragment assembly. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014) Hughes, J., Houghten, S., Mallen-Fullerton, G. M., Ashlock, D.: Recentering and restarting genetic algorithm variations for dna fragment assembly. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)
29.
Zurück zum Zitat Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2012)CrossRef Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2012)CrossRef
30.
Zurück zum Zitat Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Pyshkin, A.V.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)MathSciNetCrossRef Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Pyshkin, A.V.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)MathSciNetCrossRef
31.
Zurück zum Zitat Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRef Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRef
Metadaten
Titel
GAVGA: A Genetic Algorithm for Viral Genome Assembly
verfasst von
Renato R. M. Oliveira
Filipe Damasceno
Ronald Souza
Reginaldo Santos
Manoel Lima
Regiane Kawasaki
Claudomiro Sales
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-65340-2_33