Skip to main content
Top
Published in: Soft Computing 6/2015

01-06-2015 | Methodologies and Application

Effect of simple ensemble methods on protein secondary structure prediction

Authors: Hafida Bouziane, Belhadri Messabih, Abdallah Chouarfia

Published in: Soft Computing | Issue 6/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Ensemble methods for building improved classifier models have been an important topic in machine learning, pattern recognition and data mining areas, where they have shown great promise. They boast a robustness that has spearheaded their application in many practical classification problems, especially when there is a significant diversity among the ensemble members. Actually, they replace traditional machine learning techniques in many applications and special attention has been devoted to them as a mean to improve the prediction accuracy for problems of high complexity. Several combination rules have been investigated in this context. However, it is claimed that no rule is always better than others for designing an optimal decision. The present study evaluates the performance of two different ensemble methods for protein secondary structure prediction. We focus on weighted opinions pooling and the most common aggregation rules for decisions inference. The ensemble members are accurate protein secondary structure single model predictors namely, Multi-Class Support Vector Machines and Artificial Neural Networks. Experiments are carried out using cross-validation tests on RS126 and CB513 benchmark datasets. Our results clearly confirm that ensembles are more accurate than a single model and the experimental comparison of the investigated ensemble schemes demonstrates that the newly introduced rule called Exponential Opinion Pool competes well against state-of-the-art fixed rules, especially the sum rule which in some cases is able to achieve better performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Anfinsen C (1973) Principles that govern the folding of protein chains. Science 181:223CrossRef Anfinsen C (1973) Principles that govern the folding of protein chains. Science 181:223CrossRef
go back to reference Baumgartner D, Serpen G (2012) Global-local hybrid ensemble classifier for KDD 2004 cup particle physics dataset. Int J Mach Learn Comput 2(3):231–234CrossRef Baumgartner D, Serpen G (2012) Global-local hybrid ensemble classifier for KDD 2004 cup particle physics dataset. Int J Mach Learn Comput 2(3):231–234CrossRef
go back to reference Bouziane H, Messabih B, Chouarfia A (2011) Profiles and majority voting-based ensemble method for protein secondary structure prediction. Evolut Bioinform 7:171–189 Bouziane H, Messabih B, Chouarfia A (2011) Profiles and majority voting-based ensemble method for protein secondary structure prediction. Evolut Bioinform 7:171–189
go back to reference Chen J, Chaudhari N (2006) Bidirectional segmented-memory recurrent neural network for protein secondary structure prediction. Soft Comput 10:315–324CrossRef Chen J, Chaudhari N (2006) Bidirectional segmented-memory recurrent neural network for protein secondary structure prediction. Soft Comput 10:315–324CrossRef
go back to reference Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292 Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
go back to reference Cuff J, Barton G (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins Struct Funct Genet 34(4):508–519 Cuff J, Barton G (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins Struct Funct Genet 34(4):508–519
go back to reference Didaci L, Fumera G, Roli F (2013) Diversity in classifier ensembles: fertile concept or dead end? Lecture Notes in Computer Science, vol 7872, pp 37–48 Didaci L, Fumera G, Roli F (2013) Diversity in classifier ensembles: fertile concept or dead end? Lecture Notes in Computer Science, vol 7872, pp 37–48
go back to reference Dietterich T (2000) Ensemble methods in machine learning. Lecture Notes in Computer Science, vol 1857, pp 1–15 Dietterich T (2000) Ensemble methods in machine learning. Lecture Notes in Computer Science, vol 1857, pp 1–15
go back to reference Dietterich T (1997) Machine-learning research: four current directions. AI Mag 18(4):97–136 Dietterich T (1997) Machine-learning research: four current directions. AI Mag 18(4):97–136
go back to reference Dietterich T (2002) Ensemble learning. In: Arbib MA (ed) The handbook of brain theory and neural networks, 2nd edn. Bradford Books, The MIT Press, Cambridge Dietterich T (2002) Ensemble learning. In: Arbib MA (ed) The handbook of brain theory and neural networks, 2nd edn. Bradford Books, The MIT Press, Cambridge
go back to reference Guermeur Y, Lifchitz A, Vert R (2004) Kernel methods in computational biology. MIT Press, Cambridge Guermeur Y, Lifchitz A, Vert R (2004) Kernel methods in computational biology. MIT Press, Cambridge
go back to reference Guermeur Y, Monfrini E (2011) A quadratic loss multi-class SVM for which a radius-margin bound applies. Informatica 22(1):73–96 Guermeur Y, Monfrini E (2011) A quadratic loss multi-class SVM for which a radius-margin bound applies. Informatica 22(1):73–96
go back to reference Guermeur Y, Thomarat F (2011) Estimating the class posterior probabilities in protein secondary structure prediction. In: 6th IAPR international conference on pattern recognition in bioinformatics, pp 260–271 Guermeur Y, Thomarat F (2011) Estimating the class posterior probabilities in protein secondary structure prediction. In: 6th IAPR international conference on pattern recognition in bioinformatics, pp 260–271
go back to reference Hansen J (2000) Combining predictors: meta machine learning methods and bias/variance & ambiguity decompositions. PhD thesis, BRICS, Department of Computer Science, University of Aarhus, pp 1–191 Hansen J (2000) Combining predictors: meta machine learning methods and bias/variance & ambiguity decompositions. PhD thesis, BRICS, Department of Computer Science, University of Aarhus, pp 1–191
go back to reference Jiao T, Zong G, Zheng W (2013) New stability conditions for GRNs with neutral delay. Soft Comput 17:703–712CrossRefMATH Jiao T, Zong G, Zheng W (2013) New stability conditions for GRNs with neutral delay. Soft Comput 17:703–712CrossRefMATH
go back to reference Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22:2577–2637CrossRef Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22:2577–2637CrossRef
go back to reference Kittler J, Hatef M, Duin R, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239CrossRef Kittler J, Hatef M, Duin R, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239CrossRef
go back to reference Kuncheva L, Bezdek J, Guin R (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34(2):299–314CrossRefMATH Kuncheva L, Bezdek J, Guin R (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34(2):299–314CrossRefMATH
go back to reference Kuncheva L (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34:299–314CrossRefMATH Kuncheva L (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34:299–314CrossRefMATH
go back to reference Kuncheva L (2005) Combining pattern classifiers. Wiley Press, New York Kuncheva L (2005) Combining pattern classifiers. Wiley Press, New York
go back to reference Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51:181–207 Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51:181–207
go back to reference Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81CrossRefMATHMathSciNet Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81CrossRefMATHMathSciNet
go back to reference Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451CrossRef Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451CrossRef
go back to reference Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neural network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 535–541 Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neural network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 535–541
go back to reference Ou Y, Oyang Y, Chen C (2005) A novel radial basis function network classifier with centers set by hierarchical clustering. In: International joint conference on neural networks (IJCNN), vol 1, pp 1383–1388 Ou Y, Oyang Y, Chen C (2005) A novel radial basis function network classifier with centers set by hierarchical clustering. In: International joint conference on neural networks (IJCNN), vol 1, pp 1383–1388
go back to reference Pauling L, Corey R, Branson H (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Natl Acad Sci USA 37(4):205–211 Pauling L, Corey R, Branson H (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Natl Acad Sci USA 37(4):205–211
go back to reference Platt J (2000) Probabilities for SV machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, chapter 5. The MIT Press, Cambridge, pp 61–73 Platt J (2000) Probabilities for SV machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, chapter 5. The MIT Press, Cambridge, pp 61–73
go back to reference Qian N, Sejnowski T (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884 Qian N, Sejnowski T (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
go back to reference Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232(2):584–599CrossRef Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232(2):584–599CrossRef
go back to reference Rost B, Sander C (1993) Prediction of secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599CrossRef Rost B, Sander C (1993) Prediction of secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599CrossRef
go back to reference Rost B, Sander C (1994) Combining evolutionnary information and neural networks to predict protein secondary structure prediction. Proteins 19:55–72CrossRef Rost B, Sander C (1994) Combining evolutionnary information and neural networks to predict protein secondary structure prediction. Proteins 19:55–72CrossRef
go back to reference Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9:56–68CrossRef Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9:56–68CrossRef
go back to reference Schapire R, Freund Y (2012) Boosting: foundations and algorithms. MIT Press, Cambridge Schapire R, Freund Y (2012) Boosting: foundations and algorithms. MIT Press, Cambridge
go back to reference Sewell M (2011) Ensemble learning. Research Note, pp 1–12 Sewell M (2011) Ensemble learning. Research Note, pp 1–12
go back to reference Shafer G (1976) A mathematical theory of evidence. Princeton University Press, New Jersey Shafer G (1976) A mathematical theory of evidence. Princeton University Press, New Jersey
go back to reference Tuliakov S, Jaejer S, Govindaraju V, Doermann D (2008) Review of classifier combination methods, vol 90. Machine learning in document analysis and recognition. Springer, Berlin Tuliakov S, Jaejer S, Govindaraju V, Doermann D (2008) Review of classifier combination methods, vol 90. Machine learning in document analysis and recognition. Springer, Berlin
go back to reference Wallace B (2012) Class probability estimates are unreliable for imbalanced data (and How to Fix Them). In: 13th IEEE international conference on data mining, pp 695–704 Wallace B (2012) Class probability estimates are unreliable for imbalanced data (and How to Fix Them). In: 13th IEEE international conference on data mining, pp 695–704
go back to reference Weston J, Watkins C (1998) Multi-class support vector machines. Tech. Rep. CSD-TR-98-04, Royal Holloway, University of London, Department of Computer Science Weston J, Watkins C (1998) Multi-class support vector machines. Tech. Rep. CSD-TR-98-04, Royal Holloway, University of London, Department of Computer Science
go back to reference Whalen S, Pandey G (2013) A comparative analysis of ensemble classifiers: case studies in genomics. In: 13th IEEE international conference on data mining Whalen S, Pandey G (2013) A comparative analysis of ensemble classifiers: case studies in genomics. In: 13th IEEE international conference on data mining
go back to reference Xu L, Krzyÿzak A, Suen C (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst 22(3):418–435 Xu L, Krzyÿzak A, Suen C (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst 22(3):418–435
go back to reference Zhang Z, Jordan M (2006) Bayesian multicategory support vector machines. In: UAI’06, pp 552–559 Zhang Z, Jordan M (2006) Bayesian multicategory support vector machines. In: UAI’06, pp 552–559
go back to reference Zong G, Liu J, Zhang Y, Hou L (2010) Delay-range-dependent exponential stability criteria and decay estimation for switched hopfield neural networks of neutral type. Nonlinear Anal Hybrid Syst 4(3):583–592CrossRefMATHMathSciNet Zong G, Liu J, Zhang Y, Hou L (2010) Delay-range-dependent exponential stability criteria and decay estimation for switched hopfield neural networks of neutral type. Nonlinear Anal Hybrid Syst 4(3):583–592CrossRefMATHMathSciNet
Metadata
Title
Effect of simple ensemble methods on protein secondary structure prediction
Authors
Hafida Bouziane
Belhadri Messabih
Abdallah Chouarfia
Publication date
01-06-2015
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 6/2015
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1355-0

Other articles of this Issue 6/2015

Soft Computing 6/2015 Go to the issue

Premium Partner