Skip to main content
Top

2021 | OriginalPaper | Chapter

Deep Learning for Taxonomic Classification of Biological Bacterial Sequences

Authors : Marwah A. Helaly, Sherine Rady, Mostafa M. Aref

Published in: Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Biological sequence classification is a key task in Bioinformatics. For research labs today, the classification of unknown biological sequences is essential for facilitating the identification, grouping and study of organisms and their evolution. This work focuses on the task of taxonomic classification of bacterial species into their hierarchical taxonomic ranks. Barcode sequences of the 16S rRNA dataset—which are known for their relatively short sequence lengths and highly discriminative characteristics—are used for classification. Several sequence representations and CNN architecture combinations are considered, each tested with the aim of learning and finding the best approaches for efficient and effective taxonomic classification. Sequence representations include k-mer based representations, integer-encoding, one-hot encoding and the usage of embedding layers in the CNN. Experimental results and comparisons have shown that representations which hold some sequential information about a sequence perform much better than a raw representation. A maximum accuracy of 91.7% was achieved with a deeper CNN when the employed sequence representation was more representative of the sequence. However with less representative representations a wide and shallow network was able to efficiently extract information and provide a reasonable accuracy of 90.6%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Brandenberg, O., et al.: Introduction to Molecular Biology and Genetic Engineering (2011) Brandenberg, O., et al.: Introduction to Molecular Biology and Genetic Engineering (2011)
2.
go back to reference Setubal, J., Meidanis, J.: Introduction To Computational Molecular Biology (1997) Setubal, J., Meidanis, J.: Introduction To Computational Molecular Biology (1997)
3.
go back to reference Jalali, S.K., Ojha, R., Venkatesan, T.: DNA barcoding for identification of agriculturally important insects. In: New Horizons in Insect Science: Towards Sustainable Pest Management. Springer, pp. 13–23 (2015) Jalali, S.K., Ojha, R., Venkatesan, T.: DNA barcoding for identification of agriculturally important insects. In: New Horizons in Insect Science: Towards Sustainable Pest Management. Springer, pp. 13–23 (2015)
4.
go back to reference Paul, D.N., Hebert, S.R., deWaard, J.R.: Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species (2003) Paul, D.N., Hebert, S.R., deWaard, J.R.: Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species (2003)
9.
go back to reference Huerta, M., et al.: Nih Working Definition of Bioinformatics and Computational Biology (2000) Huerta, M., et al.: Nih Working Definition of Bioinformatics and Computational Biology (2000)
10.
go back to reference Libbrecht, M.W., Noble, W.S.: Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16(6), 321–332 (2015)CrossRef Libbrecht, M.W., Noble, W.S.: Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16(6), 321–332 (2015)CrossRef
11.
go back to reference Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Briefings Bioinform. 18(5), 851–869 (2017) Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Briefings Bioinform. 18(5), 851–869 (2017)
12.
go back to reference Reece, J.B., et al.: Biology: Concepts & Connections, 7th edn. Pearson Benjamin Cummings, San Francisco, California (2012) Reece, J.B., et al.: Biology: Concepts & Connections, 7th edn. Pearson Benjamin Cummings, San Francisco, California (2012)
13.
go back to reference LeCun, Y, Bengio, Y., Hinton, G.: Deep learning. In: Nature 521.7553, p. 436 (2015) LeCun, Y, Bengio, Y., Hinton, G.: Deep learning. In: Nature 521.7553, p. 436 (2015)
14.
go back to reference Najafabadi, M.M., et al.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)CrossRef Najafabadi, M.M., et al.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)CrossRef
15.
go back to reference Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Phys. 148(3), 574–591 (1959) Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Phys. 148(3), 574–591 (1959)
16.
go back to reference Buduma, N., Locascio, N.: Fundamentals of Deep Learning: Designing Next-generation Machine Intelligence Algorithms. O’Reilly Media, Inc. (2017) Buduma, N., Locascio, N.: Fundamentals of Deep Learning: Designing Next-generation Machine Intelligence Algorithms. O’Reilly Media, Inc. (2017)
17.
go back to reference Fiannaca, A., et al.: Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform. 19(7), 198 (2018)CrossRef Fiannaca, A., et al.: Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform. 19(7), 198 (2018)CrossRef
18.
go back to reference Kristensen, T., Guillaume, F.: Different regimes for classification of DNA sequences. In: IEEE 7th International Conference on Cybernetics 20 Marwah A. Helaly, Sherine Rady, and Mostafa M. Aref and Intelligent Systems and IEEE Conference on Robotics, Automation and Mechatronics, pp. 114–119. IEEE (2015) Kristensen, T., Guillaume, F.: Different regimes for classification of DNA sequences. In: IEEE 7th International Conference on Cybernetics 20 Marwah A. Helaly, Sherine Rady, and Mostafa M. Aref and Intelligent Systems and IEEE Conference on Robotics, Automation and Mechatronics, pp. 114–119. IEEE (2015)
19.
go back to reference Alhersh, T., et al.: Species identification using part of DNA sequence: evidence from machine learning algorithms. In: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies, pp. 490–494. ICST (2016) Alhersh, T., et al.: Species identification using part of DNA sequence: evidence from machine learning algorithms. In: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies, pp. 490–494. ICST (2016)
20.
go back to reference Rizzo, R., et al.: A deep learning approach to dna sequence classification. In: International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 129–140. Springer (2015) Rizzo, R., et al.: A deep learning approach to dna sequence classification. In: International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 129–140. Springer (2015)
21.
go back to reference Rizzo, R., et al.: Classification experiments of DNA sequences by using a deep neural network and chaos game representation, pp. 222–228 (2016) Rizzo, R., et al.: Classification experiments of DNA sequences by using a deep neural network and chaos game representation, pp. 222–228 (2016)
22.
go back to reference Lo Bosco, G., Di Gangi, M.A. (2017) Deep learning architectures for DNA sequence classification, pp. 162–171 (2017) Lo Bosco, G., Di Gangi, M.A. (2017) Deep learning architectures for DNA sequence classification, pp. 162–171 (2017)
23.
go back to reference Kassim, N.A., Abdullah, A.: Classification of DNA sequences using convolutional neural network approach. Innovations Comput. Technol. Appl. 2, (2017) Kassim, N.A., Abdullah, A.: Classification of DNA sequences using convolutional neural network approach. Innovations Comput. Technol. Appl. 2, (2017)
24.
go back to reference Nguyen, N.G., et al.: DNA sequence classification by convolutional neural network. J. Biomed. Sci. Eng. 9, 280–286 (2016)CrossRef Nguyen, N.G., et al.: DNA sequence classification by convolutional neural network. J. Biomed. Sci. Eng. 9, 280–286 (2016)CrossRef
25.
26.
go back to reference Min, X., et al.: DeepEnhancer: predicting enhancers by convolutional neural networks. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 637–644. IEEE (2016) Min, X., et al.: DeepEnhancer: predicting enhancers by convolutional neural networks. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 637–644. IEEE (2016)
27.
go back to reference Ghandi, M., et al.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7) (2014) Ghandi, M., et al.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7) (2014)
29.
go back to reference Helaly, M.A., Rady, S., Aref, M.M.: Convolutional neural networks for biological sequence taxonomic classification: a comparitive study. In: Accepted for Publication in the International Conference on Advanced Intelligent Systems and Informatics (2019) Helaly, M.A., Rady, S., Aref, M.M.: Convolutional neural networks for biological sequence taxonomic classification: a comparitive study. In: Accepted for Publication in the International Conference on Advanced Intelligent Systems and Informatics (2019)
Metadata
Title
Deep Learning for Taxonomic Classification of Biological Bacterial Sequences
Authors
Marwah A. Helaly
Sherine Rady
Mostafa M. Aref
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-59338-4_20

Premium Partner