2018 | OriginalPaper | Buchkapitel
Tipp
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Erschienen in:
Transactions on Computational Science XXXIII
In the era of Information Technology, the field of Bioinformatics is rapidly growing with research in various related topics. The database of biological information has become much higher than its consumption. Automatic classification of biological information is one of the critical problems in Bioinformatics. Therefore, the critical issue is to regulate and manage the enormous amount of novel information to facilitate access to this useful and valuable biological information. The specific nucleus dilemma in classifying biological information is the annotation of various biological sequences with functional features. Annotation of the significant and rapidly increasing amount of genomic sequence data requires computational tools for classification of genes in DNA sequences. This paper presents a computational method for classification of highly conserved 16S rRNA biological sequences. We took Biological sequence classification as motivation to reveal a methodology that uses Hidden Markov Models (HMMs) to classify them. This paper explains the description of the algorithms used for implementing three phases of HMM (training, decoding, and evaluation) to classify sequences into clusters that have known similar functional properties. In the implementation of the training phase, we have addressed practical issues like initial parameter selection for HMM and computational weakness for the large data set. Later in the paper, we have shown that methodology presents a classification accuracy of 91% for Bacillus and 97% for Clostridia.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Anzeige
1.
Ferles, C., Beaufort, W.-S., Ferle, V.: Self-Organizing Hidden Markov Model Map (SOHMMM): biological sequence clustering and cluster visualization. Methods Mol. Biol.
1552, 83–101 (2017)
CrossRef
2.
Cole, J.R., et al.: Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res.
42(Database issue), D633–D642 (2014).
https://doi.org/10.1093/nar/gkt1244
CrossRef
3.
Lu, X.X., Wu, W., Wang, M., Huang, Y.F.: 16S rRNA gene sequencing for pathogen identification from clinical specimens. Zhonghua Yi Xue Za Zhi
88(2), 123–126 (2008).
https://doi.org/10.3321/j.issn:0376-2491.2008.02.014
CrossRef
4.
Gales, M., Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Sig. Process.
1(3), 195–304 (2008).
https://doi.org/10.1561/2000000004
CrossRefMATH
5.
Yoon, B.-J.: Hidden Markov models and their applications in biological sequence analysis. Curr. Genomics
10(6), 402–415 (2009).
https://doi.org/10.2174/138920209789177575
CrossRef
6.
Xing, Z., Jian, P., Eamonn, K.: A brief survey on sequence classification. SIGKDD Explor.
12(1), 40–48 (2010).
https://doi.org/10.1145/1882471.1882478
CrossRef
7.
Kang, M.-S., Kim, H., Lee, S., Kim, M.H.: Feature-based gene classification and region clustering using gene expression grid data in mouse Hippocampal region. J. KIISE
43(1), 54–60 (2016).
https://doi.org/10.5626/JOK.2016.43.1.54
CrossRef
8.
Hawrylycz, M., et al.: Multi-scale correlation structure of gene expression in the brain. Neural Netw.
24(9), 933–942 (2011)
CrossRef
9.
Chandra, B., Gupta, M.: An efficient statistical feature selection approach for classification of gene expression data.
44(4), 529–535 (2011).
https://doi.org/10.1016/j.jbi.2011.01.001
CrossRef
10.
Abusamra, H.: A comparative study of feature selection and classification methods for gene expression data of glioma, 5–14 (2013).
https://doi.org/10.1016/j.procs.2013.10.003
CrossRef
11.
Doungpan, N., Engchuan, W., Meechai, A., Fong, S., Chan, J.H.: Gene-Network-Based Feature Set (GNFS) for expression-based cancer classification. J. Med. Imaging Health Inform.
6(4), 1093–1101 (2016).
https://doi.org/10.1166/jmihi.2016.1806
CrossRef
12.
Baralis, E., Bruno, G., Fiori, A.: Measuring gene similarity by means of the classification distance. Knowl. Inf. Syst.
29(1), 81–101 (2011)
CrossRef
13.
Iqbal, M.J., Faye, I., Said, A.M., Belhaouari Samir, B.: A distance-based feature-encoding technique for protein sequence classification in bioinformatics. In: IEEE International Conference on Computational Intelligence and Cybernetics (CYBERNETICSCOM), pp. 1–5 (2013).
https://doi.org/10.1109/CyberneticsCom.2013.6865770
14.
Kaya, H., Gunduz Oguducu, S.: A distance based time series classification framework. Inf. Syst. (2015).
https://doi.org/10.1016/j.is.2015.02.005
CrossRef
15.
Chen, H., Zhang, Y., Gutmanb, I.: A kernel-based clustering method for gene selection with gene expression data. J. Biomed. Inform. 12–20 (2016).
https://doi.org/10.1016/j.jbi.2016.05.007
CrossRef
16.
Wang, S., Li, X., Zhang, S.: Neighborhood rough set model based gene selection for multi-subtype tumor classification. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS, vol. 5226, pp. 146–158. Springer, Heidelberg (2008).
https://doi.org/10.1007/978-3-540-87442-3_20
CrossRef
17.
Bauer, S., Robinson, P.N., Gagneur, J.: Model-based gene set analysis for Bioconductor. Bioinformatics
27(13), 1882–1883 (2011).
https://doi.org/10.1093/bioinformatics/btr296
CrossRef
18.
Bauer, S., Gagneur, J., Robinson, P.N.: Going Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res.
38(11), 3523–3532 (2010).
https://doi.org/10.1093/nar/gkq045
CrossRef
19.
Guo, P., et al.: Gene expression profile based classification models of psoriasis. Genomics
103(1), 48–55 (2014).
https://doi.org/10.1016/j.ygeno.2013.11.001
CrossRef
20.
Onan, A., Korukolu, S.: A feature selection model based on genetic rank aggregation for text sentiment classification.
43(1), 25–38 (2015).
https://doi.org/10.1177/0165551515613226
CrossRef
21.
Saengsiri, P., Meesad, P., Wichian, S.N., Herwig, U.: Classification models based-on incremental learning algorithm and feature selection on gene expression data. In: 8th Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference, pp. 426–429 (2011).
https://doi.org/10.1109/ECTICON.2011.5947866
22.
Welch, L.: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc. Newsl.
53(4), 10–13 (2003)
23.
Karplus, K., et al.: Predicting protein structure using hidden Markov models. Proteins
1, 134–139 (2007)
24.
Yakhnenko, O., Silvescu, A., Honavar, V.: Discriminatively trained Markov model for sequence classification. In: Fifth IEEE International Conference on Data Mining, pp. 1–8 (2005).
https://doi.org/10.1109/ICDM.2005.52
25.
Srivastava, P.K., Desai, D.K., Nandi, S., Lynn, A.M.: HMM-ModE-Improved classification using profile hidden Markov models by optimizing the discrimination threshold and modifying emission probabilities with negative training sequences. BMC Bioinform. (2007).
https://doi.org/10.1186/1471-2105-8-104
CrossRef
26.
Camproux, A.C., Tuffery, P., Chevrolat, J.P., Boisvieux, J.F., Hazout, S.: Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng.
12(12), 1063–1073 (1999)
CrossRef
27.
Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Durbin, R.: Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res.
26(1), 320–322 (1998)
CrossRef
28.
Di Francesco, V., Garnier, J., Munson, P.J.: Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. J. Mol. Biol.
267(2), 446–463 (1997)
CrossRef
29.
Liu, T., Lemeire, J., Yang, L.: Proper initialization of Hidden Markov models for industrial applications. In: IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), pp. 490–494 (2014).
https://doi.org/10.1109/ChinaSIP.2014.6889291
30.
Mann, T.P.: Numerically stable Hidden Markov Model implementation (2006)
31.
Tatavarty, U.R.: Implementation of numerically stable hidden Markov model. UNLV Theses, Dissertations, Professional Papers, and Capstones. 1018 (2011).
http://digitalscholarship.unlv.edu/thesesdissertations/1018
32.
Fu, B.: Computer architecture. Fall Project Report (2009)
33.
Jose, S., Nair, P., Biju, V.G., Mathew, B.B., Prashanth, C.M.: Hidden Markov model: application towards genomic analysis. In: International Conference on Circuit, Power and Computing Technologies (ICCPCT), pp. 1–7. IEEE (2016).
https://doi.org/10.1109/ICCPCT.2016.7530222
34.
Vijayabaskar, M.S.: Introduction to hidden Markov models and its applications in biology. In: Westhead, D.R., Vijayabaskar, M.S. (eds.) Hidden Markov Models: Methods and Protocols, Methods in Molecular Biology, vol. 1552 (2017)
35.
Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Nave Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol.
73(16), 61–67 (2007)
CrossRef
36.
Ghosh, T.S., Gajjalla, P., Mohammed, M.H., Mande, S.S.: C16S A Hidden Markov Model based algorithm for taxonomic classification of 16S rRNA gene sequences. Genomics
99(4), 195–201 (2012).
https://doi.org/10.1016/j.ygeno.2012.01.008
CrossRef
37.
Janda, J.M., Abbott, S.L.: 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, Perils, and Pitfalls. J. Clin. Microbiol.
45(9), 2761–2764 (2007).
https://doi.org/10.1128/JCM.01228-07
CrossRef
38.
Fontana, C., Favaro, M., Pelliccioni, M., Pistoia, E.S., Favalli, C.: Use of the MicroSeq 16S rRNA gene based sequencing for identification of bacterial isolates that commercial automated systems failed to identify correctly. J. Clin. Microbiol.
43(2), 615–619 (2005)
CrossRef
39.
Patel, J.B.: 16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory. Mol. Diagn.
6(4), 313–321 (2001)
MathSciNetCrossRef
40.
Mizrahi-Man, O., Davenport, E.R., Gilad, Y.: Taxonomic classification of bacterial 16S rRNA genes using short sequencing reads: evaluation of effective study designs. PLoS ONE
8(1), e53608 (2013).
https://doi.org/10.1371/journal.pone.0053608
CrossRef
41.
Song, Y., Liu, C., BolaÅos, M., Lee, J., McTeague, M., Finegold, S.M.: Evaluation of 16S rRNA sequencing and reevaluation of a short biochemical scheme for identification of clinically significant Bacteroides species. J. Clin. Microbiol.
43(4), 1531–1537 (2005)
CrossRef
42.
Heikens, E., Fleer, A., Paauw, A., Florijn, A., Fluitt, A.C.: Comparison of genotypic and phenotypic methods for species-level identification of clinical isolates of coagulase-negative staphylococci. J. Clin. Microbiol.
43(5), 2286–2290 (2005)
CrossRef
43.
Bosshard, P.P., Zbinden, R., Abels, S., Bddinghaus, B., Altwegg, M., Bttger, E.C.: 16S rRNA gene sequencing versus the API 20 NE system and the VITEK 2 ID-GNB card for identification of nonfermenting Gram-negative bacteria in the clinical laboratory. J. Clin. Microbiol.
44(4), 1359–1366 (2006)
CrossRef
- Titel
- A Revamp Approach for Training of HMM to Accelerate Classification of 16S rRNA Gene Sequences
- DOI
- https://doi.org/10.1007/978-3-662-58039-4_3
- Autoren:
-
Prakash Choudhary
M. P. Kurhekar
- Verlag
- Springer Berlin Heidelberg
- Sequenznummer
- 3