Skip to main content
Top

2018 | OriginalPaper | Chapter

A Revamp Approach for Training of HMM to Accelerate Classification of 16S rRNA Gene Sequences

Authors : Prakash Choudhary, M. P. Kurhekar

Published in: Transactions on Computational Science XXXIII

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the era of Information Technology, the field of Bioinformatics is rapidly growing with research in various related topics. The database of biological information has become much higher than its consumption. Automatic classification of biological information is one of the critical problems in Bioinformatics. Therefore, the critical issue is to regulate and manage the enormous amount of novel information to facilitate access to this useful and valuable biological information. The specific nucleus dilemma in classifying biological information is the annotation of various biological sequences with functional features. Annotation of the significant and rapidly increasing amount of genomic sequence data requires computational tools for classification of genes in DNA sequences. This paper presents a computational method for classification of highly conserved 16S rRNA biological sequences. We took Biological sequence classification as motivation to reveal a methodology that uses Hidden Markov Models (HMMs) to classify them. This paper explains the description of the algorithms used for implementing three phases of HMM (training, decoding, and evaluation) to classify sequences into clusters that have known similar functional properties. In the implementation of the training phase, we have addressed practical issues like initial parameter selection for HMM and computational weakness for the large data set. Later in the paper, we have shown that methodology presents a classification accuracy of 91% for Bacillus and 97% for Clostridia.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ferles, C., Beaufort, W.-S., Ferle, V.: Self-Organizing Hidden Markov Model Map (SOHMMM): biological sequence clustering and cluster visualization. Methods Mol. Biol. 1552, 83–101 (2017)CrossRef Ferles, C., Beaufort, W.-S., Ferle, V.: Self-Organizing Hidden Markov Model Map (SOHMMM): biological sequence clustering and cluster visualization. Methods Mol. Biol. 1552, 83–101 (2017)CrossRef
8.
go back to reference Hawrylycz, M., et al.: Multi-scale correlation structure of gene expression in the brain. Neural Netw. 24(9), 933–942 (2011)CrossRef Hawrylycz, M., et al.: Multi-scale correlation structure of gene expression in the brain. Neural Netw. 24(9), 933–942 (2011)CrossRef
12.
go back to reference Baralis, E., Bruno, G., Fiori, A.: Measuring gene similarity by means of the classification distance. Knowl. Inf. Syst. 29(1), 81–101 (2011)CrossRef Baralis, E., Bruno, G., Fiori, A.: Measuring gene similarity by means of the classification distance. Knowl. Inf. Syst. 29(1), 81–101 (2011)CrossRef
21.
go back to reference Saengsiri, P., Meesad, P., Wichian, S.N., Herwig, U.: Classification models based-on incremental learning algorithm and feature selection on gene expression data. In: 8th Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference, pp. 426–429 (2011). https://doi.org/10.1109/ECTICON.2011.5947866 Saengsiri, P., Meesad, P., Wichian, S.N., Herwig, U.: Classification models based-on incremental learning algorithm and feature selection on gene expression data. In: 8th Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference, pp. 426–429 (2011). https://​doi.​org/​10.​1109/​ECTICON.​2011.​5947866
22.
go back to reference Welch, L.: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc. Newsl. 53(4), 10–13 (2003) Welch, L.: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc. Newsl. 53(4), 10–13 (2003)
23.
go back to reference Karplus, K., et al.: Predicting protein structure using hidden Markov models. Proteins 1, 134–139 (2007) Karplus, K., et al.: Predicting protein structure using hidden Markov models. Proteins 1, 134–139 (2007)
25.
26.
go back to reference Camproux, A.C., Tuffery, P., Chevrolat, J.P., Boisvieux, J.F., Hazout, S.: Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng. 12(12), 1063–1073 (1999)CrossRef Camproux, A.C., Tuffery, P., Chevrolat, J.P., Boisvieux, J.F., Hazout, S.: Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng. 12(12), 1063–1073 (1999)CrossRef
27.
go back to reference Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Durbin, R.: Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26(1), 320–322 (1998)CrossRef Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Durbin, R.: Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26(1), 320–322 (1998)CrossRef
28.
go back to reference Di Francesco, V., Garnier, J., Munson, P.J.: Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. J. Mol. Biol. 267(2), 446–463 (1997)CrossRef Di Francesco, V., Garnier, J., Munson, P.J.: Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. J. Mol. Biol. 267(2), 446–463 (1997)CrossRef
30.
go back to reference Mann, T.P.: Numerically stable Hidden Markov Model implementation (2006) Mann, T.P.: Numerically stable Hidden Markov Model implementation (2006)
32.
go back to reference Fu, B.: Computer architecture. Fall Project Report (2009) Fu, B.: Computer architecture. Fall Project Report (2009)
34.
go back to reference Vijayabaskar, M.S.: Introduction to hidden Markov models and its applications in biology. In: Westhead, D.R., Vijayabaskar, M.S. (eds.) Hidden Markov Models: Methods and Protocols, Methods in Molecular Biology, vol. 1552 (2017) Vijayabaskar, M.S.: Introduction to hidden Markov models and its applications in biology. In: Westhead, D.R., Vijayabaskar, M.S. (eds.) Hidden Markov Models: Methods and Protocols, Methods in Molecular Biology, vol. 1552 (2017)
35.
go back to reference Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Nave Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73(16), 61–67 (2007)CrossRef Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Nave Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73(16), 61–67 (2007)CrossRef
38.
go back to reference Fontana, C., Favaro, M., Pelliccioni, M., Pistoia, E.S., Favalli, C.: Use of the MicroSeq 16S rRNA gene based sequencing for identification of bacterial isolates that commercial automated systems failed to identify correctly. J. Clin. Microbiol. 43(2), 615–619 (2005)CrossRef Fontana, C., Favaro, M., Pelliccioni, M., Pistoia, E.S., Favalli, C.: Use of the MicroSeq 16S rRNA gene based sequencing for identification of bacterial isolates that commercial automated systems failed to identify correctly. J. Clin. Microbiol. 43(2), 615–619 (2005)CrossRef
39.
go back to reference Patel, J.B.: 16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory. Mol. Diagn. 6(4), 313–321 (2001)MathSciNetCrossRef Patel, J.B.: 16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory. Mol. Diagn. 6(4), 313–321 (2001)MathSciNetCrossRef
41.
go back to reference Song, Y., Liu, C., BolaÅos, M., Lee, J., McTeague, M., Finegold, S.M.: Evaluation of 16S rRNA sequencing and reevaluation of a short biochemical scheme for identification of clinically significant Bacteroides species. J. Clin. Microbiol. 43(4), 1531–1537 (2005)CrossRef Song, Y., Liu, C., BolaÅos, M., Lee, J., McTeague, M., Finegold, S.M.: Evaluation of 16S rRNA sequencing and reevaluation of a short biochemical scheme for identification of clinically significant Bacteroides species. J. Clin. Microbiol. 43(4), 1531–1537 (2005)CrossRef
42.
go back to reference Heikens, E., Fleer, A., Paauw, A., Florijn, A., Fluitt, A.C.: Comparison of genotypic and phenotypic methods for species-level identification of clinical isolates of coagulase-negative staphylococci. J. Clin. Microbiol. 43(5), 2286–2290 (2005)CrossRef Heikens, E., Fleer, A., Paauw, A., Florijn, A., Fluitt, A.C.: Comparison of genotypic and phenotypic methods for species-level identification of clinical isolates of coagulase-negative staphylococci. J. Clin. Microbiol. 43(5), 2286–2290 (2005)CrossRef
43.
go back to reference Bosshard, P.P., Zbinden, R., Abels, S., Bddinghaus, B., Altwegg, M., Bttger, E.C.: 16S rRNA gene sequencing versus the API 20 NE system and the VITEK 2 ID-GNB card for identification of nonfermenting Gram-negative bacteria in the clinical laboratory. J. Clin. Microbiol. 44(4), 1359–1366 (2006)CrossRef Bosshard, P.P., Zbinden, R., Abels, S., Bddinghaus, B., Altwegg, M., Bttger, E.C.: 16S rRNA gene sequencing versus the API 20 NE system and the VITEK 2 ID-GNB card for identification of nonfermenting Gram-negative bacteria in the clinical laboratory. J. Clin. Microbiol. 44(4), 1359–1366 (2006)CrossRef
Metadata
Title
A Revamp Approach for Training of HMM to Accelerate Classification of 16S rRNA Gene Sequences
Authors
Prakash Choudhary
M. P. Kurhekar
Copyright Year
2018
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-58039-4_3

Premium Partner