Skip to main content
Erschienen in: Neural Computing and Applications 6/2009

01.09.2009 | Original Article

Integrating genomic binding site predictions using real-valued meta classifiers

verfasst von: Yi Sun, Mark Robinson, Rod Adams, Rene te Boekhorst, Alistair G. Rust, Neil Davey

Erschienen in: Neural Computing and Applications | Ausgabe 6/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Currently the best algorithms for predicting transcription factor binding sites in DNA sequences are severely limited in accuracy. There is good reason to believe that predictions from different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks, rules sets, support vector machines and the Adaboost algorithm to predictions from 12 key real valued algorithms. Furthermore, we use a ‘window’ of consecutive results as the input vector in order to contextualise the neighbouring results. We improve the classification result with the aid of under- and over-sampling techniques. We find that support vector machines and the Adaboost algorithm outperform the original individual algorithms and the other classifiers employed in this work. In particular they give a better tradeoff between recall and precision.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abnizova I, Rust A, Robinson M, Te Boekhorst R, Gilks WR (2006) Transcription binding site prediction using markov models. J Bioinform Comput Biol 4(2):425–441, 16819793 (P,S,G,E,B) Abnizova I, Rust A, Robinson M, Te Boekhorst R, Gilks WR (2006) Transcription binding site prediction using markov models. J Bioinform Comput Biol 4(2):425–441, 16819793 (P,S,G,E,B)
2.
Zurück zum Zitat Abnizova I, te Boekhorst R, Walter C, Gilks WR (2005) Some statistical properties of regulatory DNA sequences and their use in predicting regulatory regions in Drosophila genome: the fluffy tail test. BMC Bioinformatics 6:109 Abnizova I, te Boekhorst R, Walter C, Gilks WR (2005) Some statistical properties of regulatory DNA sequences and their use in predicting regulatory regions in Drosophila genome: the fluffy tail test. BMC Bioinformatics 6:109
3.
Zurück zum Zitat Apostolico A, Bock ME, Lonardi S, Xu X (2000) Efficient detection of unusual words. J Comput Biol 7(1–2):71–94 Apostolico A, Bock ME, Lonardi S, Xu X (2000) Efficient detection of unusual words. J Comput Biol 7(1–2):71–94
4.
Zurück zum Zitat Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the second international conference on intelligent systems for molecular biology. AAAI Press, pp 28–36 Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the second international conference on intelligent systems for molecular biology. AAAI Press, pp 28–36
5.
Zurück zum Zitat Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
6.
Zurück zum Zitat Blanchette M, Tompa M (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842CrossRef Blanchette M, Tompa M (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842CrossRef
7.
Zurück zum Zitat Brown CT (2002) New computational approaches for analysis of cis-regulatory networks. Dev Biol 246(1):86–102CrossRef Brown CT (2002) New computational approaches for analysis of cis-regulatory networks. Dev Biol 246(1):86–102CrossRef
8.
Zurück zum Zitat Buckland M, Gey F (1994) The relationship between recall and precision. J Am Soc Inform Sci 45(1):12–19CrossRef Buckland M, Gey F (1994) The relationship between recall and precision. J Am Soc Inform Sci 45(1):12–19CrossRef
9.
Zurück zum Zitat Bucher P (1990) Weight matrix descriptions of four eukaryotic RNA polymerase II promotor elements derived from 502 unrelated promotor sequences. J Mol Biol 212:563–578 Bucher P (1990) Weight matrix descriptions of four eukaryotic RNA polymerase II promotor elements derived from 502 unrelated promotor sequences. J Mol Biol 212:563–578
10.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
11.
Zurück zum Zitat Fawcett R (2004) ROC graphs: notes and practical considerations for researchers. Kluwer, Dordrecht Fawcett R (2004) ROC graphs: notes and practical considerations for researchers. Kluwer, Dordrecht
12.
Zurück zum Zitat Fawcett T (2001) Using rule sets to maximize ROC performance. In: Proceedings of the IEEE international conference on data mining (ICDM-2001), IEEE Computer Society, Los Alamitos, pp 131–138 Fawcett T (2001) Using rule sets to maximize ROC performance. In: Proceedings of the IEEE international conference on data mining (ICDM-2001), IEEE Computer Society, Los Alamitos, pp 131–138
13.
Zurück zum Zitat Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36 Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
14.
Zurück zum Zitat Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214CrossRef Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214CrossRef
15.
Zurück zum Zitat Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MATHCrossRefMathSciNet Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MATHCrossRefMathSciNet
16.
Zurück zum Zitat Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. Workshop on learning from imbalanced datasets, II, ICML, Washington Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. Workshop on learning from imbalanced datasets, II, ICML, Washington
17.
Zurück zum Zitat Japkowicz N (2003) Class imbalances: are we focusing on the right issure? Workshop on learning from imbalanced datasets, II, ICML, Washington Japkowicz N (2003) Class imbalances: are we focusing on the right issure? Workshop on learning from imbalanced datasets, II, ICML, Washington
18.
Zurück zum Zitat Joshi M, Kumar V, Agarwal R (2001) Evaluating Boosting algorithms to classify rare classes: comparison and improvements. In: First IEEE international conference on data mining, San Jose Joshi M, Kumar V, Agarwal R (2001) Evaluating Boosting algorithms to classify rare classes: comparison and improvements. In: First IEEE international conference on data mining, San Jose
19.
Zurück zum Zitat Markstein M, Stathopoulos A, Markstein V, Markstein P, Harafuji N, Keys D, Lee B, Richardson P, Rokshar D, Levine M (2002) Decoding noncoding regulatory DNAs in metazoan genomes. In: Proceeding of 1st IEEE computer society bioinformatics conference (CSB 2002), Stanford, August 2002, pp 14–16 Markstein M, Stathopoulos A, Markstein V, Markstein P, Harafuji N, Keys D, Lee B, Richardson P, Rokshar D, Levine M (2002) Decoding noncoding regulatory DNAs in metazoan genomes. In: Proceeding of 1st IEEE computer society bioinformatics conference (CSB 2002), Stanford, August 2002, pp 14–16
20.
Zurück zum Zitat Quinlan JR (1993) C4.5: programs for machine learning, Morgan Kauffman, Los Altos Quinlan JR (1993) C4.5: programs for machine learning, Morgan Kauffman, Los Altos
21.
Zurück zum Zitat Rajewsky N, Vergassola M, Gaul U, Siggia ED (2002) Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3:30CrossRef Rajewsky N, Vergassola M, Gaul U, Siggia ED (2002) Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3:30CrossRef
22.
Zurück zum Zitat Schapire RE, Freund Y, Bartlett PL, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686MATHCrossRefMathSciNet Schapire RE, Freund Y, Bartlett PL, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686MATHCrossRefMathSciNet
23.
Zurück zum Zitat Scholköpf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge Scholköpf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge
24.
Zurück zum Zitat Sun Y, Robinson M, Adams R, Kayes P, Rust AG, Davey N (2005) Integrating binding site predictions using meta classification methods. In: Proceedings ICANNGA05 Sun Y, Robinson M, Adams R, Kayes P, Rust AG, Davey N (2005) Integrating binding site predictions using meta classification methods. In: Proceedings ICANNGA05
25.
Zurück zum Zitat Sun Y, Robinson M, te Boekhorst R, Adams R, Rust AG, Davey N (2006) Using feature selection filtering metohds for binding site predictions. In: The 5th IEEE international conference on cognitive informatics, ICCI05, Beijing Sun Y, Robinson M, te Boekhorst R, Adams R, Rust AG, Davey N (2006) Using feature selection filtering metohds for binding site predictions. In: The 5th IEEE international conference on cognitive informatics, ICCI05, Beijing
26.
Zurück zum Zitat Sun Y, Robinson M, te Boekhorst R, Adams R, Rust AG, Davey N (2006) Using sampling methods to improve binding site predictions. In: 14th European symposium on artificial neural networks, ESANN, Bruges Sun Y, Robinson M, te Boekhorst R, Adams R, Rust AG, Davey N (2006) Using sampling methods to improve binding site predictions. In: 14th European symposium on artificial neural networks, ESANN, Bruges
27.
Zurück zum Zitat Sun Y, Robinson M, Adams R, Rust A, Davey N (2008) Prediction of binding sites in the mouse genome using support vector machine. In: Kurkova V, Neruda R, Koutnik J (eds) Proceedings of 18th international conference on artificial neural networks (ICANN2008). Springer Part 2 (LNCS 5164), Prague, September 2008, pp 91–100 Sun Y, Robinson M, Adams R, Rust A, Davey N (2008) Prediction of binding sites in the mouse genome using support vector machine. In: Kurkova V, Neruda R, Koutnik J (eds) Proceedings of 18th international conference on artificial neural networks (ICANN2008). Springer Part 2 (LNCS 5164), Prague, September 2008, pp 91–100
28.
Zurück zum Zitat Te Boekhorst R, Abnizova I, Nehaniv C (2008) Discriminating coding, non-coding and regulatory regions using rescaled range and detrended fluctuation analysis. Biosystems 91(1):183–194CrossRef Te Boekhorst R, Abnizova I, Nehaniv C (2008) Discriminating coding, non-coding and regulatory regions using rescaled range and detrended fluctuation analysis. Biosystems 91(1):183–194CrossRef
29.
Zurück zum Zitat Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouz P Moreau Y (2001) A Gibbs sampling method to detect over-represented motifs in upstream regions of coexpressed genes. In: Proceedings Recomb’2001, pp 305–312 Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouz P Moreau Y (2001) A Gibbs sampling method to detect over-represented motifs in upstream regions of coexpressed genes. In: Proceedings Recomb’2001, pp 305–312
30.
Zurück zum Zitat Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144 Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144
31.
Zurück zum Zitat White RJ (2001) Gene transcription: mechanisms and control. Blackwell, Oxford White RJ (2001) Gene transcription: mechanisms and control. Blackwell, Oxford
32.
Zurück zum Zitat Wolfsberg TG, Gabrieliam AE, Campbell AE, Cho MJ, Spouge RJ, Landsman D (1999) Candidatge regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Res 9:775–792 Wolfsberg TG, Gabrieliam AE, Campbell AE, Cho MJ, Spouge RJ, Landsman D (1999) Candidatge regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Res 9:775–792
33.
Zurück zum Zitat Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MathSciNet Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MathSciNet
Metadaten
Titel
Integrating genomic binding site predictions using real-valued meta classifiers
verfasst von
Yi Sun
Mark Robinson
Rod Adams
Rene te Boekhorst
Alistair G. Rust
Neil Davey
Publikationsdatum
01.09.2009
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 6/2009
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-008-0204-4

Weitere Artikel der Ausgabe 6/2009

Neural Computing and Applications 6/2009 Zur Ausgabe

Premium Partner