Skip to main content
Erschienen in: Neural Computing and Applications 2/2005

01.07.2005 | Original Article

A neural network based multi-classifier system for gene identification in DNA sequences

verfasst von: Romesh Ranawana, Vasile Palade

Erschienen in: Neural Computing and Applications | Ausgabe 2/2005

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper presents a neural network based multi-classifier system for the identification of Escherichia coli promoter sequences in strings of DNA. As each gene in DNA is preceded by a promoter sequence, the successful location of an E. coli promoter leads to the identification of the corresponding E. coli gene in the DNA sequence. A set of 324 known E. coli promoters and a set of 429 known non-promoter sequences were encoded using four different encoding methods. The encoded sequences were then used to train four different neural networks. The classification results of the four individual neural networks were then combined through an aggregation function, which used a variation of the logarithmic opinion pool method. The weights of this function were determined by a genetic algorithm. The multi-classifier system was then tested on 159 known promoter sequences and 171 non-promoter sequences not contained in the training set. The results obtained through this study proved that the same data set, when presented to neural networks in different forms, can provide slightly varying results. It also proves that when different opinions of more classifiers on the same input data are integrated within a multi-classifier system, we can obtain results that are better than the individual performances of the neural networks. The performances of our multi-classifier system outperform the results of other prediction systems for E. coli promoters developed so far.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baldi P, Brunak S (1998) Bioinformatics–the machine learning approach. MIT Press, Cambridge Baldi P, Brunak S (1998) Bioinformatics–the machine learning approach. MIT Press, Cambridge
2.
Zurück zum Zitat Birney E (2001) Hidden Markov Models in biological sequence analysis. IBM J Res Dev 45(3/4):449–454 Birney E (2001) Hidden Markov Models in biological sequence analysis. IBM J Res Dev 45(3/4):449–454
3.
Zurück zum Zitat Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65CrossRef Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65CrossRef
4.
Zurück zum Zitat Demeler B, Zhou GW (1991) Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 19(7):1593–1599 Demeler B, Zhou GW (1991) Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 19(7):1593–1599
5.
Zurück zum Zitat Dietterich TG (1997) Machine learning research: four current directions. AI Mag 18(4):97–136 Dietterich TG (1997) Machine learning research: four current directions. AI Mag 18(4):97–136
6.
Zurück zum Zitat Galas DJ, Eggert M, Waterman MS (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from E. coli. J Mol Biol 186(1):117–128CrossRef Galas DJ, Eggert M, Waterman MS (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from E. coli. J Mol Biol 186(1):117–128CrossRef
8.
Zurück zum Zitat Henderson, J, Salzberg S, Fasman K (1997) Finding genes in DNA with a hidden markov model. J Comput Biol 4(2):127–141 Henderson, J, Salzberg S, Fasman K (1997) Finding genes in DNA with a hidden markov model. J Comput Biol 4(2):127–141
10.
Zurück zum Zitat Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 179–186 Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 179–186
11.
Zurück zum Zitat Kulp D, Haussler D, Reese MG, Eeckman FHÄ (1996) Generalized hidden markov model for the recognition of human genes in DNA. In: Proceedings of the 4th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 134–142 Kulp D, Haussler D, Reese MG, Eeckman FHÄ (1996) Generalized hidden markov model for the recognition of human genes in DNA. In: Proceedings of the 4th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 134–142
12.
Zurück zum Zitat Ma Q, Wang JTL, Wu CH (2001) Application of Bayesian neural networks to biological data mining: a case study in DNA sequence classification. IEEE Trans Syst Man Cybern Part C 31(4):468–475CrossRef Ma Q, Wang JTL, Wu CH (2001) Application of Bayesian neural networks to biological data mining: a case study in DNA sequence classification. IEEE Trans Syst Man Cybern Part C 31(4):468–475CrossRef
14.
Zurück zum Zitat Mahadevan I, Ghosh I (1994) Analysis of E. coli promoter structures using neural networks. Nucleic Acids Res 22(11):2158–2165 Mahadevan I, Ghosh I (1994) Analysis of E. coli promoter structures using neural networks. Nucleic Acids Res 22(11):2158–2165
15.
Zurück zum Zitat Mandler EJ, Schurmann J (1988) Combining the classification results of independent classifiers based on the Dempster/Schafer theory of evidence. Pattern Recognit Artif Intell X:381–393 Mandler EJ, Schurmann J (1988) Combining the classification results of independent classifiers based on the Dempster/Schafer theory of evidence. Pattern Recognit Artif Intell X:381–393
16.
Zurück zum Zitat Ohno-Machado L, Vinterbo S, Weber G (2002) Classification of gene expression data using fuzzy logic. J Intell Fuzzy Syst 12(1):19–24 Ohno-Machado L, Vinterbo S, Weber G (2002) Classification of gene expression data using fuzzy logic. J Intell Fuzzy Syst 12(1):19–24
17.
Zurück zum Zitat Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8:869–893 Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8:869–893
18.
Zurück zum Zitat Pedersen AG, Jensen LJ, Brunak S, Stærfeldt A, Ussery DW (2000) A DNA structural atlas for Escherichia coli. J Mol Biol 299:907–390CrossRef Pedersen AG, Jensen LJ, Brunak S, Stærfeldt A, Ussery DW (2000) A DNA structural atlas for Escherichia coli. J Mol Biol 299:907–390CrossRef
19.
Zurück zum Zitat Reidmiller M, Braun H (1993) A direct adaptive method for faster Backpropagation learning: the RPROP algorithm. In: International conference on neural networks (ICNN-93, San Francisco, CA). IEEE Press, Piscataway, pp 586–591 Reidmiller M, Braun H (1993) A direct adaptive method for faster Backpropagation learning: the RPROP algorithm. In: International conference on neural networks (ICNN-93, San Francisco, CA). IEEE Press, Piscataway, pp 586–591
20.
Zurück zum Zitat Riis SK, Krogh A (1996) Improving prediction of protein secondary structure using neural networks and multiple sequence alignments. J Comput Biol 3:163–183 Riis SK, Krogh A (1996) Improving prediction of protein secondary structure using neural networks and multiple sequence alignments. J Comput Biol 3:163–183
21.
Zurück zum Zitat Rogova G (1994) Combining the results of several neural network classifiers. Neural Netw 7(5):777–781CrossRef Rogova G (1994) Combining the results of several neural network classifiers. Neural Netw 7(5):777–781CrossRef
22.
Zurück zum Zitat Roli F, Giacinto G (2002) Hybrid methods in pattern recognition, chapter design of multiple classifier systems. Worldwide Scientific Publishing, pp 199–226 Roli F, Giacinto G (2002) Hybrid methods in pattern recognition, chapter design of multiple classifier systems. Worldwide Scientific Publishing, pp 199–226
23.
Zurück zum Zitat Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599CrossRefPubMed Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599CrossRefPubMed
24.
Zurück zum Zitat Ruta D, Gabrys B (2001) Analysis of the correlation between majority voting error and the diversity measures in multiple classifier systems. In: Proceedings of the SOCO/ISFI’2001 conference, ISBN: 3-906454-27-4, Abstract p 50, Paper no.#1824-025, Paisley Ruta D, Gabrys B (2001) Analysis of the correlation between majority voting error and the diversity measures in multiple classifier systems. In: Proceedings of the SOCO/ISFI’2001 conference, ISBN: 3-906454-27-4, Abstract p 50, Paper no.#1824-025, Paisley
25.
Zurück zum Zitat Salzberg S, Delcher AL, Fasman KH, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol Winter 5(4):667–80 Salzberg S, Delcher AL, Fasman KH, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol Winter 5(4):667–80
26.
Zurück zum Zitat Sharkey ACJ, Sharkey NE (1997) Combining diverse neural networks. Knowl Eng Rev 12(3):231–247CrossRef Sharkey ACJ, Sharkey NE (1997) Combining diverse neural networks. Knowl Eng Rev 12(3):231–247CrossRef
27.
Zurück zum Zitat Snyder EE, Stormo G (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18CrossRef Snyder EE, Stormo G (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18CrossRef
28.
Zurück zum Zitat Stormo GD, Schneider TD, Gold LM, Ehrenfeucht A (1982) Use of the Perceptron algorithm to distinguish translation initiation sites in E. coli. Nucleic Acids Res 10:2997–3011 Stormo GD, Schneider TD, Gold LM, Ehrenfeucht A (1982) Use of the Perceptron algorithm to distinguish translation initiation sites in E. coli. Nucleic Acids Res 10:2997–3011
29.
Zurück zum Zitat Uberbacher EC, Mural RJ (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265 Uberbacher EC, Mural RJ (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265
30.
Zurück zum Zitat Woolf PJ, Wang Y (2000) A fuzzy logic approach to analysing gene expression data. Physiol Genomics 3:9–15MATH Woolf PJ, Wang Y (2000) A fuzzy logic approach to analysing gene expression data. Physiol Genomics 3:9–15MATH
31.
Zurück zum Zitat Wu CH (1997) Artificial neural networks for molecular sequence analysis. Comput Chem 21(4):237–256CrossRefMATH Wu CH (1997) Artificial neural networks for molecular sequence analysis. Comput Chem 21(4):237–256CrossRefMATH
32.
Zurück zum Zitat Xu L, Krzyzak A, Suen CY (1991) Associative Switch for combining multiple classifiers. In: Proceedings of the international joint conference on neural networks, IEEE Press, Seattle, pp I-43–48 Xu L, Krzyzak A, Suen CY (1991) Associative Switch for combining multiple classifiers. In: Proceedings of the international joint conference on neural networks, IEEE Press, Seattle, pp I-43–48
33.
Zurück zum Zitat Xu L, Krzyzak A, Suen CY (1992) Methods for combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435CrossRef Xu L, Krzyzak A, Suen CY (1992) Methods for combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435CrossRef
34.
Zurück zum Zitat Zenobi G, Cuningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the 12th European conference on machine learning, pp 576–587 Zenobi G, Cuningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the 12th European conference on machine learning, pp 576–587
Metadaten
Titel
A neural network based multi-classifier system for gene identification in DNA sequences
verfasst von
Romesh Ranawana
Vasile Palade
Publikationsdatum
01.07.2005
Erschienen in
Neural Computing and Applications / Ausgabe 2/2005
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-004-0447-7

Weitere Artikel der Ausgabe 2/2005

Neural Computing and Applications 2/2005 Zur Ausgabe