Skip to main content

2016 | OriginalPaper | Buchkapitel

Advances in Soft Computing Approaches for Gene Prediction: A Bioinformatics Approach

verfasst von : Minu Kesheri, Rajeshwar P. Sinha, Swarna Kanchan

Erschienen in: Medical Imaging in Clinical Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The flooding of gene sequencing projects lead to the deposition of large amount of genomic data in public databases. These public databases contribute in genome annotation which helps in discovering the new genes and finding their function. Due to lack of genome annotation and high-throughput experimental approaches, computational gene prediction has always been one of the challenging areas for bioinformatics/computational biology scientists. Gene finding is more difficult in eukaryotes as compared to prokaryotes due to presence of introns. Gene prediction in very crucial especially for disease identification in human, which will help a lot in bio-medical research. Ab intio gene prediction is a difficult method which uses signal and content sensors to make predictions while homology based method makes use of homology with known genes. This chapter describes various gene structure prediction programmes which are based on individual/hybrid soft computing approaches. Soft computing approaches include Genetic algorithm, Hidden Markov Model, Fast Fourier Transformation, Support vector Machine, Dynamic programming and Artificial Neural Network. Hybrid soft computing approaches combine the results of several soft computing programs to deliver better accuracy than individual single soft computing approaches. Moreover, various parameters for measuring the accuracy of gene prediction programs will also be discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)CrossRef Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)CrossRef
2.
Zurück zum Zitat Loha, S.K., Lowa, S.T., Mohamada, M.S., et al.: A review of software for predicting gene function. Int. J. Bio-Sc. Bio-Tech. 7(2), 57–70 (2015)CrossRef Loha, S.K., Lowa, S.T., Mohamada, M.S., et al.: A review of software for predicting gene function. Int. J. Bio-Sc. Bio-Tech. 7(2), 57–70 (2015)CrossRef
3.
Zurück zum Zitat Lewin, B.: Genes. Pearson Prentice Hall, New Jersey (2004) Lewin, B.: Genes. Pearson Prentice Hall, New Jersey (2004)
4.
Zurück zum Zitat Wang, Z., Chen, Y., Li, Y.: A brief review of computational gene prediction methods. Genomics, Proteomics Bioinform. 2, 216–221 (2004) Wang, Z., Chen, Y., Li, Y.: A brief review of computational gene prediction methods. Genomics, Proteomics Bioinform. 2, 216–221 (2004)
5.
Zurück zum Zitat Palleja, A., Harrington, E.D., Bork, P.: Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genom. 9, 335 (2008)CrossRef Palleja, A., Harrington, E.D., Bork, P.: Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genom. 9, 335 (2008)CrossRef
6.
Zurück zum Zitat Xiong, J.: Essential Bioinformatics. Cambridge University Press, New York (2006)CrossRef Xiong, J.: Essential Bioinformatics. Cambridge University Press, New York (2006)CrossRef
7.
Zurück zum Zitat Fickett, J.W.: ORFs and genes: how strong a connection? J. Comput. Biol. 2, 117–123 (1995)CrossRef Fickett, J.W.: ORFs and genes: how strong a connection? J. Comput. Biol. 2, 117–123 (1995)CrossRef
8.
Zurück zum Zitat Ramakrishna, R., Srinivasan, R.: Gene identification in bacterial and organellar genomes using GeneScan. Comp. Chem. 23, 165–174 (1999)CrossRef Ramakrishna, R., Srinivasan, R.: Gene identification in bacterial and organellar genomes using GeneScan. Comp. Chem. 23, 165–174 (1999)CrossRef
9.
Zurück zum Zitat Hyatt, D., Chen, G.L., Locascio, P.F.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010)CrossRef Hyatt, D., Chen, G.L., Locascio, P.F.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010)CrossRef
10.
Zurück zum Zitat Schellenberg, M.J., Ritchie, D.B., MacMillan, A.M.: PremRNA splicing: a complex picture in higher definition. Trends Biochem. Sci. 33(6), 243–246 (2008)CrossRef Schellenberg, M.J., Ritchie, D.B., MacMillan, A.M.: PremRNA splicing: a complex picture in higher definition. Trends Biochem. Sci. 33(6), 243–246 (2008)CrossRef
11.
Zurück zum Zitat Frishman, D., Mironov, A., Gelfand, M.: Starts of bacterial genes: estimating the reliability of computer predictions. Gene 234, 257–265 (1999)CrossRef Frishman, D., Mironov, A., Gelfand, M.: Starts of bacterial genes: estimating the reliability of computer predictions. Gene 234, 257–265 (1999)CrossRef
12.
Zurück zum Zitat Allen, J.E., Pertea, M., Salzberg, S.L.: Computational gene prediction using multiple sources of evidence. Genome Res. 14, 142–148 (2004)CrossRef Allen, J.E., Pertea, M., Salzberg, S.L.: Computational gene prediction using multiple sources of evidence. Genome Res. 14, 142–148 (2004)CrossRef
13.
Zurück zum Zitat Fickett, J.W., Hatzigeorgiou, A.G.: Eukaryotic promoter prediction. Genome Res. 7, 861–878 (1997) Fickett, J.W., Hatzigeorgiou, A.G.: Eukaryotic promoter prediction. Genome Res. 7, 861–878 (1997)
14.
Zurück zum Zitat Prestridge, D.S.: Predicting pol II promoter sequences using transcription factor binding sites. J. Mol. Bio. 249, 923–932 (1995)CrossRef Prestridge, D.S.: Predicting pol II promoter sequences using transcription factor binding sites. J. Mol. Bio. 249, 923–932 (1995)CrossRef
15.
Zurück zum Zitat Sharp, P.A., Burge, C.B.: Classification of introns: U2-type or 1.112-type. Cell 91, 875–879 (1997)CrossRef Sharp, P.A., Burge, C.B.: Classification of introns: U2-type or 1.112-type. Cell 91, 875–879 (1997)CrossRef
16.
Zurück zum Zitat Minoche, A.E., Dohm, J.C., Schneider, J., et al.: Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biol. 16, 184 (2015)CrossRef Minoche, A.E., Dohm, J.C., Schneider, J., et al.: Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biol. 16, 184 (2015)CrossRef
17.
Zurück zum Zitat Rawat, V., Abdelsamad, A., Pietzenuk, B., et al.: Improving the Annotation of Arabidopsis lyrata Using RNA-Seq Data. PLoS ONE 10(9), e0137391 (2015)CrossRef Rawat, V., Abdelsamad, A., Pietzenuk, B., et al.: Improving the Annotation of Arabidopsis lyrata Using RNA-Seq Data. PLoS ONE 10(9), e0137391 (2015)CrossRef
18.
Zurück zum Zitat Testa, A.C., Hane, J.K., Ellwood, S.R., et al.: CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genom. 16, 170 (2015)CrossRef Testa, A.C., Hane, J.K., Ellwood, S.R., et al.: CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genom. 16, 170 (2015)CrossRef
19.
Zurück zum Zitat Wang, Y., Chen, L., Song, N., et al.: GASS: genome structural annotation for Eukaryotes based on species similarity. BMC Genom. 16, 150 (2015)CrossRef Wang, Y., Chen, L., Song, N., et al.: GASS: genome structural annotation for Eukaryotes based on species similarity. BMC Genom. 16, 150 (2015)CrossRef
20.
Zurück zum Zitat Mühlhausen, S., Kollmar, M.: Predicting the fungal CUG codon translation with Bagheera. BMC Genom. 15, 411 (2014)CrossRef Mühlhausen, S., Kollmar, M.: Predicting the fungal CUG codon translation with Bagheera. BMC Genom. 15, 411 (2014)CrossRef
21.
Zurück zum Zitat Staden, R., McLachlan, A.D.: Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10(1), 141–156 (1982)CrossRef Staden, R., McLachlan, A.D.: Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10(1), 141–156 (1982)CrossRef
22.
Zurück zum Zitat Lewis, S., Ashburner, M., Reese, M.G.: Annotating eukaryote genomes. Curr. Opin. Struc. Biol. 10, 349–354 (2000)CrossRef Lewis, S., Ashburner, M., Reese, M.G.: Annotating eukaryote genomes. Curr. Opin. Struc. Biol. 10, 349–354 (2000)CrossRef
23.
Zurück zum Zitat Mathe, C., Sagot, M.-F., Schiex, T., et al.: Current methods for gene prediction, their strengths and weakness. Nucleic Acid Res. 30(19), 4103–4117 (2002)CrossRef Mathe, C., Sagot, M.-F., Schiex, T., et al.: Current methods for gene prediction, their strengths and weakness. Nucleic Acid Res. 30(19), 4103–4117 (2002)CrossRef
24.
Zurück zum Zitat Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)CrossRef Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)CrossRef
25.
Zurück zum Zitat Kulp, D., Haussler, D., Reese, M.G., et al.: A generalized hidden Markov model for the recognition of human genes in DNA. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology, vol. 4, pp. 134–142 (1996) Kulp, D., Haussler, D., Reese, M.G., et al.: A generalized hidden Markov model for the recognition of human genes in DNA. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology, vol. 4, pp. 134–142 (1996)
26.
Zurück zum Zitat Krogh, A.: Two methods for improving performance of an HMM and their application for gene-finding. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology (ISMB ‘97), vol. 5, pp. 179–186 (1997) Krogh, A.: Two methods for improving performance of an HMM and their application for gene-finding. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology (ISMB ‘97), vol. 5, pp. 179–186 (1997)
27.
Zurück zum Zitat Parra, G., Blanco, E., Guigó, R.: GeneID in Drosophila. Genome Res. 10, 391–393 (2000)CrossRef Parra, G., Blanco, E., Guigó, R.: GeneID in Drosophila. Genome Res. 10, 391–393 (2000)CrossRef
28.
Zurück zum Zitat Khandelwal, G., Jayaram, B.: Phenomenological model for predicting melting temperatures of DNA sequences. PLoS ONE 5(8), e12433 (2010)CrossRef Khandelwal, G., Jayaram, B.: Phenomenological model for predicting melting temperatures of DNA sequences. PLoS ONE 5(8), e12433 (2010)CrossRef
29.
Zurück zum Zitat Borodovsky, M., Rudd, K.E., Koonin, E.V.: Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 22(22), 4756–4767 (1994)CrossRef Borodovsky, M., Rudd, K.E., Koonin, E.V.: Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 22(22), 4756–4767 (1994)CrossRef
30.
Zurück zum Zitat Down, T.A., Hubbard, T.J.P.: Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002)CrossRef Down, T.A., Hubbard, T.J.P.: Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002)CrossRef
31.
Zurück zum Zitat Mathé, C., Peresetsky, A., Déhais, P., Van Montagu, M., et al.: Classification of Arabidopsis thaliana gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction. J. Mol. Biol. 285, 1977–1991 (1999)CrossRef Mathé, C., Peresetsky, A., Déhais, P., Van Montagu, M., et al.: Classification of Arabidopsis thaliana gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction. J. Mol. Biol. 285, 1977–1991 (1999)CrossRef
32.
Zurück zum Zitat Bailey, L.C., Searls, D.B., Overton, G.C.: Analysis of EST driven gene annotation in human genomic sequence. Genome Res. 8, 362–376 (1998)CrossRef Bailey, L.C., Searls, D.B., Overton, G.C.: Analysis of EST driven gene annotation in human genomic sequence. Genome Res. 8, 362–376 (1998)CrossRef
33.
Zurück zum Zitat Bucher, P.: Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 5’ unrelated promoter sequences. J. Mol. Biol. 212, 563–578 (1990)CrossRef Bucher, P.: Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 5’ unrelated promoter sequences. J. Mol. Biol. 212, 563–578 (1990)CrossRef
34.
Zurück zum Zitat Birney, E., Durbin, R.: Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000)CrossRef Birney, E., Durbin, R.: Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000)CrossRef
35.
Zurück zum Zitat Yeh, R.-F., Lim, L.P., Burge, C.B.: Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803–816 (2001)CrossRef Yeh, R.-F., Lim, L.P., Burge, C.B.: Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803–816 (2001)CrossRef
36.
Zurück zum Zitat Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)CrossRef Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)CrossRef
37.
Zurück zum Zitat Uberbacher, E.C., Mural, R.J.: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl. Acad. Sci. U.S.A. 88, 11261–11265 (1991)CrossRef Uberbacher, E.C., Mural, R.J.: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl. Acad. Sci. U.S.A. 88, 11261–11265 (1991)CrossRef
38.
Zurück zum Zitat Xu, Y., Einstein, J.R., Shah, M., et al: An improved system for exon recognition and gene modeling in human DNA sequences. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology, pp. 376–383. AAAI/MIT Press (1994) Xu, Y., Einstein, J.R., Shah, M., et al: An improved system for exon recognition and gene modeling in human DNA sequences. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology, pp. 376–383. AAAI/MIT Press (1994)
39.
Zurück zum Zitat Yandell, M., Ence, D.: A beginner’s guide to eukaryotic genome annotation. Nat. Rev. Genet. 13, 329–342 (2012)CrossRef Yandell, M., Ence, D.: A beginner’s guide to eukaryotic genome annotation. Nat. Rev. Genet. 13, 329–342 (2012)CrossRef
40.
Zurück zum Zitat Thomas, A., Skolnick, M.H.: A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–160 (1994)CrossRefMATH Thomas, A., Skolnick, M.H.: A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–160 (1994)CrossRefMATH
41.
Zurück zum Zitat Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)CrossRef Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)CrossRef
42.
Zurück zum Zitat Solovyev, V.V., Salamov, A.A., Lawrence, C.B.: Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22(24), 5156–5163 (1994)CrossRef Solovyev, V.V., Salamov, A.A., Lawrence, C.B.: Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22(24), 5156–5163 (1994)CrossRef
43.
Zurück zum Zitat Notredame, C., Higgins, D.G.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524 (1996)CrossRef Notredame, C., Higgins, D.G.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524 (1996)CrossRef
44.
Zurück zum Zitat Stanke, M., Waack, S.: Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(2), ii215–ii225 (2003) Stanke, M., Waack, S.: Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(2), ii215–ii225 (2003)
45.
Zurück zum Zitat Ooi, C.H., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)CrossRef Ooi, C.H., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)CrossRef
46.
Zurück zum Zitat Perez-Rodriguez, J., Garcia-Pedrajas, N.: An evolutionary algorithm for gene structure prediction. In: Industrial Engineering and Other Applications of Applied Intelligent Systems II, vol. 6704, pp. 386–395. Springer, Heidelberg (2011) Perez-Rodriguez, J., Garcia-Pedrajas, N.: An evolutionary algorithm for gene structure prediction. In: Industrial Engineering and Other Applications of Applied Intelligent Systems II, vol. 6704, pp. 386–395. Springer, Heidelberg (2011)
47.
Zurück zum Zitat Levitsky, V.G., Katokhin, A.V.: Recognition of eukaryotic promoters using a genetic algorithm based on iterative discriminant analysis. In Silico Biol. 3(1–2), 81–87 (2003) Levitsky, V.G., Katokhin, A.V.: Recognition of eukaryotic promoters using a genetic algorithm based on iterative discriminant analysis. In Silico Biol. 3(1–2), 81–87 (2003)
48.
Zurück zum Zitat Kamath, U., Compton, J., Islamaj-Doğan, R., et al.: An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(5), 1387–1398 (2012)CrossRef Kamath, U., Compton, J., Islamaj-Doğan, R., et al.: An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(5), 1387–1398 (2012)CrossRef
49.
Zurück zum Zitat Kamath, U., Jong, K.E., Snehu, A.: Effective automated feature construction and selection for classification of biological sequences. PLoS ONE 9(7), e99982 (2014)CrossRef Kamath, U., Jong, K.E., Snehu, A.: Effective automated feature construction and selection for classification of biological sequences. PLoS ONE 9(7), e99982 (2014)CrossRef
50.
Zurück zum Zitat Fickett, J.W., Tung, C.-S.: Assessment of protein coding measures. Nucleic Acids Res. 20(24), 6441–6450 (1992)CrossRef Fickett, J.W., Tung, C.-S.: Assessment of protein coding measures. Nucleic Acids Res. 20(24), 6441–6450 (1992)CrossRef
51.
Zurück zum Zitat Tiwari, S., Ramachandran, S., Bhattacharya, A., et al.: Prediction of probable genes by fourier analysis of genomic sequences. Bioinformatics 13(3), 263–270 (1997)CrossRef Tiwari, S., Ramachandran, S., Bhattacharya, A., et al.: Prediction of probable genes by fourier analysis of genomic sequences. Bioinformatics 13(3), 263–270 (1997)CrossRef
52.
Zurück zum Zitat Yan, M., Lin, Z.-S., Zhang, C.-T., et al.: A new fourier transform approach for protein coding measure based on the format of the Z curve. Bioinformatics 14(8), 685–690 (1998)MathSciNetCrossRef Yan, M., Lin, Z.-S., Zhang, C.-T., et al.: A new fourier transform approach for protein coding measure based on the format of the Z curve. Bioinformatics 14(8), 685–690 (1998)MathSciNetCrossRef
53.
Zurück zum Zitat Issac, B., Singh, H., Kaur, H., et al.: Locating probable genes using Fourier transform approach. Bioinformatics 18(1), 196–197 (2002)CrossRef Issac, B., Singh, H., Kaur, H., et al.: Locating probable genes using Fourier transform approach. Bioinformatics 18(1), 196–197 (2002)CrossRef
54.
Zurück zum Zitat Goel, N., Singh, S., Aseri, T.C.: A review of soft computing techniques for gene prediction. ISRN Genom 2013, 191206 (2013) Goel, N., Singh, S., Aseri, T.C.: A review of soft computing techniques for gene prediction. ISRN Genom 2013, 191206 (2013)
55.
Zurück zum Zitat Wu, C.H.: Artificial neural networks for molecular sequence analysis. Comput. Chem. 21(4), 237–256 (1997)CrossRef Wu, C.H.: Artificial neural networks for molecular sequence analysis. Comput. Chem. 21(4), 237–256 (1997)CrossRef
56.
Zurück zum Zitat Uberbacher, E.C., Hyatt, D., Shah, M.: GrailEXP and genome analysis pipeline for genome annotation. Current Protocols in Bioinformatics. Chapter 4, unit 4.9, pp. 4.9.1–4.9.15 (2004) Uberbacher, E.C., Hyatt, D., Shah, M.: GrailEXP and genome analysis pipeline for genome annotation. Current Protocols in Bioinformatics. Chapter 4, unit 4.9, pp. 4.9.1–4.9.15 (2004)
57.
Zurück zum Zitat Pedersen, A.G., Nielsen, H.: Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology, vol. 5, pp. 226–233 (1997) Pedersen, A.G., Nielsen, H.: Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. In: Proceedings of the International Conference on intelligent Systems for Molecular Biology, vol. 5, pp. 226–233 (1997)
58.
Zurück zum Zitat Wang, K., Ussery, D.W., Brunak, S.: Analysis and prediction of gene splice sites in four Aspergillus genomes. Fungal Genet. Biol. 46(1), S14–S18 (2009)CrossRef Wang, K., Ussery, D.W., Brunak, S.: Analysis and prediction of gene splice sites in four Aspergillus genomes. Fungal Genet. Biol. 46(1), S14–S18 (2009)CrossRef
59.
Zurück zum Zitat Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38(20), e191 (2010)CrossRef Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38(20), e191 (2010)CrossRef
60.
Zurück zum Zitat Zhang, M.Q.: Computational prediction of eukaryotic protein-coding genes. Nat. Rev. Genet. 3, 698–709 (2002)CrossRef Zhang, M.Q.: Computational prediction of eukaryotic protein-coding genes. Nat. Rev. Genet. 3, 698–709 (2002)CrossRef
61.
Zurück zum Zitat Bocs, S., Cruveiller, S., Vallenet, D., et al.: AMIGENE: annotation of microbial genes. Nucleic Acids Res. 31(13), 3723–3726 (2003)CrossRef Bocs, S., Cruveiller, S., Vallenet, D., et al.: AMIGENE: annotation of microbial genes. Nucleic Acids Res. 31(13), 3723–3726 (2003)CrossRef
62.
Zurück zum Zitat Larsen, T.S., Krogh, A.: EasyGene a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinform. 4, 21 (2003)CrossRef Larsen, T.S., Krogh, A.: EasyGene a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinform. 4, 21 (2003)CrossRef
63.
Zurück zum Zitat Reid, I., O’Toole, N., Zabaneh, O., et al.: SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinform. 15, 229 (2014)CrossRef Reid, I., O’Toole, N., Zabaneh, O., et al.: SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinform. 15, 229 (2014)CrossRef
64.
Zurück zum Zitat Rogic, S., Mackworth, A.K., Ouellette, F.B.F., et al.: Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11, 817–832 (2001)CrossRef Rogic, S., Mackworth, A.K., Ouellette, F.B.F., et al.: Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11, 817–832 (2001)CrossRef
65.
Zurück zum Zitat Guigó, R.: Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5(4), 681–702 (1998)CrossRef Guigó, R.: Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5(4), 681–702 (1998)CrossRef
66.
Zurück zum Zitat Howe, K.L., Chothia, T., Durbin, R.: GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res. 12, 1418–1427 (2002)CrossRef Howe, K.L., Chothia, T., Durbin, R.: GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res. 12, 1418–1427 (2002)CrossRef
67.
Zurück zum Zitat Milanesi, L., D’Angelo, D., Rogozin, I.B.: GeneBuilder: interactive in silico prediction of gene structure. Bioinformatics 15(7–8), 612–621 (1999)CrossRef Milanesi, L., D’Angelo, D., Rogozin, I.B.: GeneBuilder: interactive in silico prediction of gene structure. Bioinformatics 15(7–8), 612–621 (1999)CrossRef
68.
Zurück zum Zitat Schweikert, G., Zien, A., Zeller, G., et al.: mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19(11), 2133–2243 (2009)CrossRef Schweikert, G., Zien, A., Zeller, G., et al.: mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19(11), 2133–2243 (2009)CrossRef
69.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
70.
Zurück zum Zitat Boser, B., Guyon, I., Vapnik, V.N. et al: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press (1992) Boser, B., Guyon, I., Vapnik, V.N. et al: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press (1992)
71.
Zurück zum Zitat Hou, Y., Hsu, W., Lee, M.L., et al.: Efficient remote homology detection using local structure. Bioinformatics 19(17), 2294–2301 (2003)CrossRef Hou, Y., Hsu, W., Lee, M.L., et al.: Efficient remote homology detection using local structure. Bioinformatics 19(17), 2294–2301 (2003)CrossRef
72.
Zurück zum Zitat Cai, Y.D., Lin, S.L.: Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta 1648, 127–133 (2003)CrossRef Cai, Y.D., Lin, S.L.: Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta 1648, 127–133 (2003)CrossRef
73.
Zurück zum Zitat Brown, M.P.S., Grundy, W.N., Lin, D., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U.S.A. 97(1), 262–267 (2000)CrossRef Brown, M.P.S., Grundy, W.N., Lin, D., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U.S.A. 97(1), 262–267 (2000)CrossRef
74.
Zurück zum Zitat Liu, Y., Guo, J., Hu, G., Zhu, H., et al.: Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinform. 14(5), S12 (2013)CrossRef Liu, Y., Guo, J., Hu, G., Zhu, H., et al.: Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinform. 14(5), S12 (2013)CrossRef
75.
Zurück zum Zitat Lin, K., Kuang, Y., Joseph, J.S., et al.: Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics. Nucleic Acids Res. 30(11), 2599–2607 (2002)CrossRef Lin, K., Kuang, Y., Joseph, J.S., et al.: Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics. Nucleic Acids Res. 30(11), 2599–2607 (2002)CrossRef
76.
Zurück zum Zitat Krause, L., McHardy, A.C., Nattkemper, T.W., et al.: GISMO—gene identification using a support vector machine for ORF classification. Nucleic Acids Res. 35(2), 540–549 (2007)CrossRef Krause, L., McHardy, A.C., Nattkemper, T.W., et al.: GISMO—gene identification using a support vector machine for ORF classification. Nucleic Acids Res. 35(2), 540–549 (2007)CrossRef
77.
Zurück zum Zitat Quinlan J.R C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993) Quinlan J.R C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
78.
Zurück zum Zitat Allen, J.E., Majoros, W.H., Pertea, M., et al.: JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biol. 7(1), S9 (2006)CrossRef Allen, J.E., Majoros, W.H., Pertea, M., et al.: JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biol. 7(1), S9 (2006)CrossRef
79.
Zurück zum Zitat Middendorf, M., Kundaje, A., Wiggins, C., et al.: Predicting genetic regulatory response using classification. Bioinformatics 20(1), i232–i240 (2004)CrossRef Middendorf, M., Kundaje, A., Wiggins, C., et al.: Predicting genetic regulatory response using classification. Bioinformatics 20(1), i232–i240 (2004)CrossRef
80.
Zurück zum Zitat Salzberg, S., Delcher, A.L., Fasman, K.H., et al.: Decision tree system for finding genes in DNA. J. Comput. Biol. 5, 667–680 (1998)CrossRef Salzberg, S., Delcher, A.L., Fasman, K.H., et al.: Decision tree system for finding genes in DNA. J. Comput. Biol. 5, 667–680 (1998)CrossRef
81.
Zurück zum Zitat Zhou, Y., Liang, Y., Hu, C., et al.: An artificial neural network method for combining gene prediction based on equitable weights. Neurocomputing 71(4–6), 538–543 (2008)CrossRef Zhou, Y., Liang, Y., Hu, C., et al.: An artificial neural network method for combining gene prediction based on equitable weights. Neurocomputing 71(4–6), 538–543 (2008)CrossRef
82.
Zurück zum Zitat Song, K., Tong, T., Wu, F., et al.: Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr. Biol. 6, 460–469 (2014)CrossRef Song, K., Tong, T., Wu, F., et al.: Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr. Biol. 6, 460–469 (2014)CrossRef
83.
Zurück zum Zitat Saberkari, H., shamsi, M., Sedaaghi, M.H.: A Hybrid Anti-notch/Goertzel model for gene prediction in DNA sequences. Appl. Med. Inf. 34(2), 13–22 (2014) Saberkari, H., shamsi, M., Sedaaghi, M.H.: A Hybrid Anti-notch/Goertzel model for gene prediction in DNA sequences. Appl. Med. Inf. 34(2), 13–22 (2014)
84.
Zurück zum Zitat Piro, R.M., Di, C.F.: Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J. 279(5), 678–696 (2012)CrossRef Piro, R.M., Di, C.F.: Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J. 279(5), 678–696 (2012)CrossRef
85.
Zurück zum Zitat Piro, R.M., Ala, U., Molineris, I., et al.: An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction. Eur. J. Hum. Genet. 19, 1173–1180 (2011)CrossRef Piro, R.M., Ala, U., Molineris, I., et al.: An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction. Eur. J. Hum. Genet. 19, 1173–1180 (2011)CrossRef
86.
Zurück zum Zitat Lee, I., Blom, U.M., Wang, P.I., et al.: Prioritizing candidate disease genes by networkbased boosting of genome-wide association data. Genome Res. 21, 1109–1121 (2011)CrossRef Lee, I., Blom, U.M., Wang, P.I., et al.: Prioritizing candidate disease genes by networkbased boosting of genome-wide association data. Genome Res. 21, 1109–1121 (2011)CrossRef
87.
Zurück zum Zitat Yandell, M., Huff, C., Hu, H., et al.: A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011)CrossRef Yandell, M., Huff, C., Hu, H., et al.: A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011)CrossRef
88.
Zurück zum Zitat Burset, M., Guigo, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)CrossRef Burset, M., Guigo, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)CrossRef
89.
Zurück zum Zitat Rogic, S., Mackworth, A.K., Ouellette, F.B.: Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11(5), 817–832 (2001)CrossRef Rogic, S., Mackworth, A.K., Ouellette, F.B.: Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11(5), 817–832 (2001)CrossRef
90.
Zurück zum Zitat Borodovsky, M., McIninch, J.: GeneMark: parallel gene recognition for both DNA strands. Comput. Chem. 17(2), 123–133 (1993)CrossRefMATH Borodovsky, M., McIninch, J.: GeneMark: parallel gene recognition for both DNA strands. Comput. Chem. 17(2), 123–133 (1993)CrossRefMATH
91.
Zurück zum Zitat Delcher, A.L., Harmon, D., Kasif, S., et al.: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27(23), 4636–4641 (1999)CrossRef Delcher, A.L., Harmon, D., Kasif, S., et al.: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27(23), 4636–4641 (1999)CrossRef
92.
Zurück zum Zitat Salamov, A.A., Solovyev, V.V.: Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000)CrossRef Salamov, A.A., Solovyev, V.V.: Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000)CrossRef
93.
Zurück zum Zitat Lukashin, A.V., Borodovsky, M.: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26(4), 1107–1115 (1998)CrossRef Lukashin, A.V., Borodovsky, M.: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26(4), 1107–1115 (1998)CrossRef
94.
Zurück zum Zitat Schweikert, G., Behr, J., Zien, A., et al.: mGene.web: a web service for accurate computational gene finding. Nucleic Acids Res. 37(2), W312–W316 (2009)CrossRef Schweikert, G., Behr, J., Zien, A., et al.: mGene.web: a web service for accurate computational gene finding. Nucleic Acids Res. 37(2), W312–W316 (2009)CrossRef
95.
Zurück zum Zitat Hoff, K.J., Stanke, M.: WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res. 41(W1), W123–W128 (2013)CrossRef Hoff, K.J., Stanke, M.: WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res. 41(W1), W123–W128 (2013)CrossRef
Metadaten
Titel
Advances in Soft Computing Approaches for Gene Prediction: A Bioinformatics Approach
verfasst von
Minu Kesheri
Rajeshwar P. Sinha
Swarna Kanchan
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-33793-7_17