Skip to main content
Erschienen in:
Buchtitelbild

2020 | OriginalPaper | Buchkapitel

Identification of Coding Regions in Prokaryotic DNA Sequences Using Bayesian Classification

verfasst von : Mohammad Al Bataineh

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The identification of protein-coding regions in genomic DNA sequences is a well-known problem in computational genomics. Various computational algorithms can be employed to achieve the identification process. The rapid advances in this field have motivated the development of innovative engineering methods that allow for further analysis and modeling of many processes in molecular biology. The proposed algorithm utilizes well-known concepts in communications theory, such as correlation, the maximal ratio combining (MRC) algorithm, and filtering techniques to create a signal whose maxima and minima indicate coding and noncoding regions, respectively. The proposed algorithm investigates several prokaryotic genome sequences. Two Bayesian classifiers are designed to test and evaluate the performance of the proposed algorithm. The obtained simulation results prove that the algorithm can efficiently and accurately detect protein-coding regions, which is being demonstrated by the obtained sensitivity and specificity values that are comparable to well-known gene detection methods in prokaryotes. The obtained results further verify the correctness and the biological relevance of using communications theory concepts for genomic sequence analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Atkins, G.: Information Theory and Molecular Biology, vol. 327, no. 1. Cambridge University Press, New York (1993) Atkins, G.: Information Theory and Molecular Biology, vol. 327, no. 1. Cambridge University Press, New York (1993)
3.
Zurück zum Zitat Weindl, J., Hanus, P., Dawy, Z., Zech, J., Hagenauer, J., Mueller, J.C.: Modeling DNA-binding of Escherichia coli sigma(70) exhibits a characteristic energy landscape around strong promoters. Nucleic Acids Res. 35(20), 7003–7010 (2007)CrossRef Weindl, J., Hanus, P., Dawy, Z., Zech, J., Hagenauer, J., Mueller, J.C.: Modeling DNA-binding of Escherichia coli sigma(70) exhibits a characteristic energy landscape around strong promoters. Nucleic Acids Res. 35(20), 7003–7010 (2007)CrossRef
4.
Zurück zum Zitat Al Bataineh, M., Al-qudah, Z.: Cognitive interference channel: achievable rate region and power allocation. IET Commun. 9(2), 249–257 (2015)CrossRef Al Bataineh, M., Al-qudah, Z.: Cognitive interference channel: achievable rate region and power allocation. IET Commun. 9(2), 249–257 (2015)CrossRef
5.
Zurück zum Zitat Al Bataineh, M., Huang, L., Atkin, G.: TFBS detection algorithm using distance metrics based on center of mass and polyphase mapping. In: 2012 7th International Symposium on Health Informatics and Bioinformatics, no. 1, pp. 37–40 (2012) Al Bataineh, M., Huang, L., Atkin, G.: TFBS detection algorithm using distance metrics based on center of mass and polyphase mapping. In: 2012 7th International Symposium on Health Informatics and Bioinformatics, no. 1, pp. 37–40 (2012)
6.
Zurück zum Zitat Al Bataineh, M.: Analysis of genomic translation using a communications theory approach. Illinois Institute of Technology, Chicago (2010) Al Bataineh, M.: Analysis of genomic translation using a communications theory approach. Illinois Institute of Technology, Chicago (2010)
7.
Zurück zum Zitat Al Bataineh, M., Alonso, M., Wang, S., Zhang, W., Atkin, G.: Ribosome binding model using a codebook and exponential metric. In: 2007 IEEE International Conference on Electro/Information Technology, pp. 438–442 (2007) Al Bataineh, M., Alonso, M., Wang, S., Zhang, W., Atkin, G.: Ribosome binding model using a codebook and exponential metric. In: 2007 IEEE International Conference on Electro/Information Technology, pp. 438–442 (2007)
8.
Zurück zum Zitat Al Bataineh, M., Huang, L., Muhamed, I., Menhart, N., Atkin, G.E.: Gene expression analysis using communications, coding and information theory based models. In: 2009 International Conference on Bioinformatics & Computational Biology, BIOCOMP 2009, pp. 181–185 (2009) Al Bataineh, M., Huang, L., Muhamed, I., Menhart, N., Atkin, G.E.: Gene expression analysis using communications, coding and information theory based models. In: 2009 International Conference on Bioinformatics & Computational Biology, BIOCOMP 2009, pp. 181–185 (2009)
10.
Zurück zum Zitat Huang, L., et al.: Identification of transcription factor binding sites based on the Chi-Square (X2) distance of a probabilistic vector model. In: 2009 International Conference on Future BioMedical Information Engineering (FBIE 2009), pp. 73–76 (2009) Huang, L., et al.: Identification of transcription factor binding sites based on the Chi-Square (X2) distance of a probabilistic vector model. In: 2009 International Conference on Future BioMedical Information Engineering (FBIE 2009), pp. 73–76 (2009)
11.
Zurück zum Zitat Weindl, J., Hagenauer, J.: Applying techniques from frame synchronization for biological sequence analysis. In: IEEE International Conference on Communications, pp. 833–838 (2007) Weindl, J., Hagenauer, J.: Applying techniques from frame synchronization for biological sequence analysis. In: IEEE International Conference on Communications, pp. 833–838 (2007)
12.
Zurück zum Zitat Reiss, D.J., Schwikowski, B.: Predicting protein-peptide interactions via a network-based motif sampler. Bioinformatics 20(Suppl. 1), i274–i282 (2004)CrossRef Reiss, D.J., Schwikowski, B.: Predicting protein-peptide interactions via a network-based motif sampler. Bioinformatics 20(Suppl. 1), i274–i282 (2004)CrossRef
13.
Zurück zum Zitat Dawy, Z., Hanus, P., Weindl, J., Dingel, J., Morcos, F.: On genomic coding theory. Eur. Trans. Telecommun. 18(8), 873–879 (2007)CrossRef Dawy, Z., Hanus, P., Weindl, J., Dingel, J., Morcos, F.: On genomic coding theory. Eur. Trans. Telecommun. 18(8), 873–879 (2007)CrossRef
14.
Zurück zum Zitat Rosen, G.L., Moore, J.D.: Investigation of coding structure in DNA. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. 361–364 (2003) Rosen, G.L., Moore, J.D.: Investigation of coding structure in DNA. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. 361–364 (2003)
15.
Zurück zum Zitat MacDonaill, D.A.: Digital parity and the composition of the nucleotide alphabet. Shaping the alphabet with error coding. IEEE Eng. Med. Biol. Mag. 25(1), 54–61 (2006)CrossRef MacDonaill, D.A.: Digital parity and the composition of the nucleotide alphabet. Shaping the alphabet with error coding. IEEE Eng. Med. Biol. Mag. 25(1), 54–61 (2006)CrossRef
16.
Zurück zum Zitat Crowley, E.M.: A Bayesian method for finding regulatory segments in DNA. Biopolymers 58(2), 165–174 (2001)CrossRef Crowley, E.M.: A Bayesian method for finding regulatory segments in DNA. Biopolymers 58(2), 165–174 (2001)CrossRef
17.
Zurück zum Zitat Huang, L., Bataineh, M.A., Atkin, G.E., Wang, S., Zhang, W.: A Novel gene detection method based on period-3 property. In: Conference Proceedings - IEEE Engineering in Medicine and Biology Society, vol. 2009, pp. 3857–3860 (2009) Huang, L., Bataineh, M.A., Atkin, G.E., Wang, S., Zhang, W.: A Novel gene detection method based on period-3 property. In: Conference Proceedings - IEEE Engineering in Medicine and Biology Society, vol. 2009, pp. 3857–3860 (2009)
18.
Zurück zum Zitat Kakumani, R., Devabhaktuni, V., Ahmad, M.O.: Prediction of protein-coding regions in DNA sequences using a model-based approach. In: ISCAS 2008, vol. 18, no. 21, pp. 1918–1921 (2008) Kakumani, R., Devabhaktuni, V., Ahmad, M.O.: Prediction of protein-coding regions in DNA sequences using a model-based approach. In: ISCAS 2008, vol. 18, no. 21, pp. 1918–1921 (2008)
19.
Zurück zum Zitat Uberbacher, E.C., Mural, R.J.: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl. Acad. Sci. U. S. A. 88(24), 11261–11265 (1991)CrossRef Uberbacher, E.C., Mural, R.J.: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl. Acad. Sci. U. S. A. 88(24), 11261–11265 (1991)CrossRef
20.
Zurück zum Zitat Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4(2), 127–141 (1997)CrossRef Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4(2), 127–141 (1997)CrossRef
21.
Zurück zum Zitat Eddy, S.R.: Hidden Markov models and genome sequence analysis. FASEB J. 12(8), A1327–A1327 (1998) Eddy, S.R.: Hidden Markov models and genome sequence analysis. FASEB J. 12(8), A1327–A1327 (1998)
22.
Zurück zum Zitat Yada, T., Totoki, Y., Takagi, T., Nakai, K.: A novel bacterial gene-finding system with improved accuracy in locating start codons. DNA Res. 8(3), 97–106 (2001)CrossRef Yada, T., Totoki, Y., Takagi, T., Nakai, K.: A novel bacterial gene-finding system with improved accuracy in locating start codons. DNA Res. 8(3), 97–106 (2001)CrossRef
23.
Zurück zum Zitat Besemer, J., Lomsadze, A., Borodovsky, M.: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29(12), 2607–2618 (2001)CrossRef Besemer, J., Lomsadze, A., Borodovsky, M.: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29(12), 2607–2618 (2001)CrossRef
24.
Zurück zum Zitat Walker, M., Pavlovic, V., Kasif, S.: A comparative genomic method for computational identification of prokaryotic translation initiation sites. Nucleic Acids Res. 30(14), 3181–3191 (2002)CrossRef Walker, M., Pavlovic, V., Kasif, S.: A comparative genomic method for computational identification of prokaryotic translation initiation sites. Nucleic Acids Res. 30(14), 3181–3191 (2002)CrossRef
25.
Zurück zum Zitat Hannenhalli, S.S., Hayes, W.S., Hatzigeorgiou, A.G., Fickett, J.W.: Bacterial start site prediction. Nucleic Acids Res. 27(17), 3577–3582 (1999)CrossRef Hannenhalli, S.S., Hayes, W.S., Hatzigeorgiou, A.G., Fickett, J.W.: Bacterial start site prediction. Nucleic Acids Res. 27(17), 3577–3582 (1999)CrossRef
26.
Zurück zum Zitat Nishi, T., Ikemura, T., Kanaya, S.: GeneLook: a novel ab initio gene identification system suitable for automated annotation of prokaryotic sequences. Gene 346, 115–125 (2005)CrossRef Nishi, T., Ikemura, T., Kanaya, S.: GeneLook: a novel ab initio gene identification system suitable for automated annotation of prokaryotic sequences. Gene 346, 115–125 (2005)CrossRef
27.
Zurück zum Zitat Hayes, W.S., Borodovsky, M.: How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res. 8(11), 1154–1171 (1998)CrossRef Hayes, W.S., Borodovsky, M.: How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res. 8(11), 1154–1171 (1998)CrossRef
28.
Zurück zum Zitat Osada, Y., Saito, R., Tomita, M.: Analysis of base-pairing potentials between 16S rRNA and 5′ UTR for translation initiation in various prokaryotes. Bioinformatics 15(7), 578–581 (1999)CrossRef Osada, Y., Saito, R., Tomita, M.: Analysis of base-pairing potentials between 16S rRNA and 5′ UTR for translation initiation in various prokaryotes. Bioinformatics 15(7), 578–581 (1999)CrossRef
29.
Zurück zum Zitat Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.-R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)CrossRef Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.-R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)CrossRef
30.
Zurück zum Zitat Schneider, T.D.: Measuring molecular information. J. Theor. Biol. 201(1), 87–92 (1999)CrossRef Schneider, T.D.: Measuring molecular information. J. Theor. Biol. 201(1), 87–92 (1999)CrossRef
31.
Zurück zum Zitat Besemer, J., Borodovsky, M.: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 33(Suppl. 2), W451–W454 (2005)CrossRef Besemer, J., Borodovsky, M.: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 33(Suppl. 2), W451–W454 (2005)CrossRef
32.
Zurück zum Zitat Raman, R., Overton, G.C.: Application of hidden Markov modeling in the characterization of transcription factor binding sites. In: Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, vol. 5, pp. 275–283 (1994) Raman, R., Overton, G.C.: Application of hidden Markov modeling in the characterization of transcription factor binding sites. In: Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, vol. 5, pp. 275–283 (1994)
33.
Zurück zum Zitat Krogh, A., Mian, I.S., Haussler, D.: A hidden markov model that finds genes in Escherichia-Coli DNA. Nucleic Acids Res. 22(22), 4768–4778 (1994)CrossRef Krogh, A., Mian, I.S., Haussler, D.: A hidden markov model that finds genes in Escherichia-Coli DNA. Nucleic Acids Res. 22(22), 4768–4778 (1994)CrossRef
34.
Zurück zum Zitat Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)CrossRef Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)CrossRef
35.
Zurück zum Zitat Delcher, A.L., Bratke, K.A., Powers, E.C., Salzberg, S.L.: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6), 673–679 (2007)CrossRef Delcher, A.L., Bratke, K.A., Powers, E.C., Salzberg, S.L.: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6), 673–679 (2007)CrossRef
36.
Zurück zum Zitat Vaidyanathan, P.P.: Genomics and proteomics: a signal processor’s tour. Circuits Syst. Mag. IEEE 4(4), 6–29 (2004)CrossRef Vaidyanathan, P.P.: Genomics and proteomics: a signal processor’s tour. Circuits Syst. Mag. IEEE 4(4), 6–29 (2004)CrossRef
37.
Zurück zum Zitat Al Bataineh, M., Al-qudah, Z.: A novel gene identification algorithm with Bayesian classification. Biomed. Signal Process. Control 31, 6–15 (2017)CrossRef Al Bataineh, M., Al-qudah, Z.: A novel gene identification algorithm with Bayesian classification. Biomed. Signal Process. Control 31, 6–15 (2017)CrossRef
38.
Zurück zum Zitat Guan, R., Tuqan, J.: IIR filter design for gene identification. In: Gensips Processing, Baltimore, Maryland (2004) Guan, R., Tuqan, J.: IIR filter design for gene identification. In: Gensips Processing, Baltimore, Maryland (2004)
39.
Zurück zum Zitat Vaidyanathan, P., Yoon, B.: Gene and exon prediction using allpass-based filters. In: Workshop on Genomic Signal Processing and Statistics, vol. 3 (2002) Vaidyanathan, P., Yoon, B.: Gene and exon prediction using allpass-based filters. In: Workshop on Genomic Signal Processing and Statistics, vol. 3 (2002)
40.
Zurück zum Zitat Murray, K.B., Gorse, D., Thornton, J.M.: Wavelet transforms for the characterization and detection of repeating motifs. J. Mol. Biol. 316, 341–363 (2002)CrossRef Murray, K.B., Gorse, D., Thornton, J.M.: Wavelet transforms for the characterization and detection of repeating motifs. J. Mol. Biol. 316, 341–363 (2002)CrossRef
41.
Zurück zum Zitat Borodovsky, M., Ekisheva, S.: Problems and Solutions in Biological Sequence Analysis. Cambridge University Press, Cambridge (2006) Borodovsky, M., Ekisheva, S.: Problems and Solutions in Biological Sequence Analysis. Cambridge University Press, Cambridge (2006)
42.
Zurück zum Zitat Vaidyanathan, P.P., Yoon, B.: Digital filters for gene prediction applications. In: Proceedings of the 36th Asilomar Conference on Signals, Systems, and Computers. Monterey, CA (2002) Vaidyanathan, P.P., Yoon, B.: Digital filters for gene prediction applications. In: Proceedings of the 36th Asilomar Conference on Signals, Systems, and Computers. Monterey, CA (2002)
43.
Zurück zum Zitat Sharma, S.D., Shakya, K., Sharma, S.N.: Evaluation of DNA mapping schemes for exon detection. In: 2011 International Conference on Computer, Communication and Electrical Technology, ICCCET 2011, pp. 71–74 (2011) Sharma, S.D., Shakya, K., Sharma, S.N.: Evaluation of DNA mapping schemes for exon detection. In: 2011 International Conference on Computer, Communication and Electrical Technology, ICCCET 2011, pp. 71–74 (2011)
44.
Zurück zum Zitat Anastassiou, D.: Genomic signal processing. IEEE Signal Process. Mag. 18, 8–20 (2001)CrossRef Anastassiou, D.: Genomic signal processing. IEEE Signal Process. Mag. 18, 8–20 (2001)CrossRef
45.
Zurück zum Zitat Rangel, P., Giovannetti, J.: Genomes and Databases on the Internet: A Practical Guide to Functions and Applications. Horizon Scientific Press, Wymondham (2002) Rangel, P., Giovannetti, J.: Genomes and Databases on the Internet: A Practical Guide to Functions and Applications. Horizon Scientific Press, Wymondham (2002)
46.
Zurück zum Zitat Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35(Suppl. 1), D61–D65 (2007)CrossRef Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35(Suppl. 1), D61–D65 (2007)CrossRef
47.
Zurück zum Zitat Baisnee, P.F., Hampson, S., Baldi, P.: Why are complementary DNA strands symmetric? Bioinformatics 18(8), 1021–1033 (2002)CrossRef Baisnee, P.F., Hampson, S., Baldi, P.: Why are complementary DNA strands symmetric? Bioinformatics 18(8), 1021–1033 (2002)CrossRef
48.
Zurück zum Zitat Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34(3), 353–367 (1996)CrossRef Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34(3), 353–367 (1996)CrossRef
Metadaten
Titel
Identification of Coding Regions in Prokaryotic DNA Sequences Using Bayesian Classification
verfasst von
Mohammad Al Bataineh
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-45385-5_1