Skip to main content
Top
Published in: The Journal of Supercomputing 5/2021

02-11-2020

Deep learning model with ensemble techniques to compute the secondary structure of proteins

Authors: Rayed AlGhamdi, Azra Aziz, Mohammed Alshehri, Kamal Raj Pardasani, Tarique Aziz

Published in: The Journal of Supercomputing | Issue 5/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Protein secondary structure is the local conformation assigned to protein sequences with the help of its three-dimensional structure. Assigning the local conformation to protein sequences requires much computational work. There exists a vast literature on the protein secondary structure prediction approaches (more than 20 techniques), but to date, none of the existing techniques is entirely accurate. Thus, there is an excellent room for developing new models of protein secondary structure prediction to address the issues of prediction accuracy. In the present study, ensemble techniques such as AdaBoost- and Bagging-based deep learning models are proposed to predict the protein secondary structure. The data from standard datasets, namely CB513, RS126, PTOP742, PSA472, and MANESH, have been used for training and testing purposes. These standard datasets possess less than 25% redundancy. The model is evaluated using performance measures: Q8 and Q3 cross-validation accuracy, class precision, class recall, kappa factor, and testing on a dataset that is not used for training purposes, i.e., blind test. The ensembling technique used along with variability in datasets can remove the bias of each dataset by balancing it and making the features more distinguishable, leading to the improvement in accuracy as compared to the conventional and existing techniques. The proposed model shows an average improvement of ~ 2% and ~ 3% accuracy over the existing methods in a blind test for Q8 and Q3 accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Hoye AT (2010) Synthesis of natural and non-natural polycylicalkaloids. Doctoral dissertation, University of Pittsburgh Hoye AT (2010) Synthesis of natural and non-natural polycylicalkaloids. Doctoral dissertation, University of Pittsburgh
2.
go back to reference Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13(2):222–245CrossRef Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13(2):222–245CrossRef
3.
go back to reference Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics 11(6):681–684CrossRef Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics 11(6):681–684CrossRef
4.
go back to reference Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553CrossRef Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553CrossRef
5.
go back to reference Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct Funct Bioinform 19(1):55–72CrossRef Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct Funct Bioinform 19(1):55–72CrossRef
6.
go back to reference Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202CrossRef Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202CrossRef
7.
go back to reference Pollastri G, Przybylski D, Rost B, Baldi P (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins Struct Funct Bioinform 47(2):228–235CrossRef Pollastri G, Przybylski D, Rost B, Baldi P (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins Struct Funct Bioinform 47(2):228–235CrossRef
8.
go back to reference Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43(1):389–394CrossRef Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43(1):389–394CrossRef
9.
go back to reference Wang Z, Zhao F, Peng J, Xu J (2011) Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 11(19):3786–3792CrossRef Wang Z, Zhao F, Peng J, Xu J (2011) Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 11(19):3786–3792CrossRef
10.
go back to reference Awais M, Iqbal MJ, Ahmad I, Alassafi MO, Alghamdi R, Basheri M, Waqas M (2019) Real-time surveillance through face recognition using hog and feedforward neural networks. IEEE Access 7:121236–121244CrossRef Awais M, Iqbal MJ, Ahmad I, Alassafi MO, Alghamdi R, Basheri M, Waqas M (2019) Real-time surveillance through face recognition using hog and feedforward neural networks. IEEE Access 7:121236–121244CrossRef
12.
go back to reference Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33(2):36–38CrossRef Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33(2):36–38CrossRef
13.
go back to reference Zhou GP, Assa Munt N (2001) Some insights into protein structural class prediction. Proteins Struct Funct Bioinform 44(1):57–59CrossRef Zhou GP, Assa Munt N (2001) Some insights into protein structural class prediction. Proteins Struct Funct Bioinform 44(1):57–59CrossRef
14.
go back to reference Guo Y, Wang B, Li W, Yang B (2018) Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks. Bioinform Comput Biol 16(5):185–200 Guo Y, Wang B, Li W, Yang B (2018) Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks. Bioinform Comput Biol 16(5):185–200
15.
go back to reference LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
16.
go back to reference Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11CrossRef Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11CrossRef
17.
go back to reference Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6(1):1–11CrossRef Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6(1):1–11CrossRef
18.
go back to reference Zhang B, Li J, Lü Q (2018) Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform 19(1):293–302CrossRef Zhang B, Li J, Lü Q (2018) Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform 19(1):293–302CrossRef
19.
go back to reference Giulini M, Potestio R (2019) A deep learning approach to the structural analysis of proteins. Interface Focus 9(3):201–210CrossRef Giulini M, Potestio R (2019) A deep learning approach to the structural analysis of proteins. Interface Focus 9(3):201–210CrossRef
21.
go back to reference Li Z, Yu Y (2016) Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks. In: International joint conference on artificial intelligence (IJCAI). 160–176 Li Z, Yu Y (2016) Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks. In: International joint conference on artificial intelligence (IJCAI). 160–176
22.
go back to reference Guo Y, Li W, Wang B, Liu H, Zhou D (2019) DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction. BMC Bioinform 20(1):341–352CrossRef Guo Y, Li W, Wang B, Liu H, Zhou D (2019) DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction. BMC Bioinform 20(1):341–352CrossRef
23.
go back to reference Mirabello C, Wallner B (2019) rawMSA: end-to-end deep learning using raw multiple sequence alignments. PLoS ONE 14(8):1–15CrossRef Mirabello C, Wallner B (2019) rawMSA: end-to-end deep learning using raw multiple sequence alignments. PLoS ONE 14(8):1–15CrossRef
24.
go back to reference Makhlouf MA (2018) Deep learning for prediction of protein-protein interaction. Egypt Comput Sci J 42(3):1–14 Makhlouf MA (2018) Deep learning for prediction of protein-protein interaction. Egypt Comput Sci J 42(3):1–14
25.
go back to reference Adhikari B, Hou J, Cheng J (2018) Protein contact prediction by integrating deep multiple sequence alignments, coevolution, and machine learning. Proteins Struct Funct Bioinform 86:84–96CrossRef Adhikari B, Hou J, Cheng J (2018) Protein contact prediction by integrating deep multiple sequence alignments, coevolution, and machine learning. Proteins Struct Funct Bioinform 86:84–96CrossRef
26.
go back to reference Zhou J, Wang H, Zhao Z, Xu R, Lu Q (2018) CNNH_PSS: protein 8-class secondary structure prediction by a convolutional neural network with the highway. BMC Bioinform 19(4):99–109 Zhou J, Wang H, Zhao Z, Xu R, Lu Q (2018) CNNH_PSS: protein 8-class secondary structure prediction by a convolutional neural network with the highway. BMC Bioinform 19(4):99–109
27.
go back to reference Ji S, Oruç T, Mead L, Rehman MF, Thomas CM, Butterworth S, Winn PJ (2019) DeepCDpred: inter-residue distance and contact prediction for improved prediction of protein structure. PLoS ONE 14(1):1–15CrossRef Ji S, Oruç T, Mead L, Rehman MF, Thomas CM, Butterworth S, Winn PJ (2019) DeepCDpred: inter-residue distance and contact prediction for improved prediction of protein structure. PLoS ONE 14(1):1–15CrossRef
28.
go back to reference Dietterich T G (2000, June). Ensemble methods in machine learning. International workshop on multiple classifier systems. 1–15 Springer, Berlin, Heidelberg Dietterich T G (2000, June). Ensemble methods in machine learning. International workshop on multiple classifier systems. 1–15 Springer, Berlin, Heidelberg
29.
go back to reference Liu Y, Yang C, Gao Z, Yao Y (2018) Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes. Chemom Intell Lab Syst 174:15–21CrossRef Liu Y, Yang C, Gao Z, Yao Y (2018) Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes. Chemom Intell Lab Syst 174:15–21CrossRef
30.
go back to reference Liu Y, Fan Y, Chen J (2017) Flame images for oxygen content prediction of combustion systems using DBN. Energy Fuels 31(8):8776–8783CrossRef Liu Y, Fan Y, Chen J (2017) Flame images for oxygen content prediction of combustion systems using DBN. Energy Fuels 31(8):8776–8783CrossRef
31.
go back to reference He X, Ji J, Liu K, Gao Z, Liu Y (2019) Soft sensing of silicon content via bagging local semi-supervised models. Sensors 19(17):38–41CrossRef He X, Ji J, Liu K, Gao Z, Liu Y (2019) Soft sensing of silicon content via bagging local semi-supervised models. Sensors 19(17):38–41CrossRef
32.
go back to reference Liu Y, Zhang Z, Chen J (2015) Ensemble local kernel learning for online prediction of distributed product outputs in chemical processes. Chem Eng Sci 137:140–151CrossRef Liu Y, Zhang Z, Chen J (2015) Ensemble local kernel learning for online prediction of distributed product outputs in chemical processes. Chem Eng Sci 137:140–151CrossRef
33.
go back to reference Rose PW, Prlić A, Bi C, Bluhm WF, Christie CH, Dutta S, Young J (2015) The RCSB protein data bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43(1):345–356CrossRef Rose PW, Prlić A, Bi C, Bluhm WF, Christie CH, Dutta S, Young J (2015) The RCSB protein data bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43(1):345–356CrossRef
34.
go back to reference Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRef Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRef
35.
go back to reference Karchin, R. (2003). Evaluating local structure alphabets for protein structure prediction. Doctoral dissertation, University of California, Santa Cruz 2003 Karchin, R. (2003). Evaluating local structure alphabets for protein structure prediction. Doctoral dissertation, University of California, Santa Cruz 2003
36.
go back to reference Engh RA, Huber R (1991) Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A 47(4):392–400CrossRef Engh RA, Huber R (1991) Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A 47(4):392–400CrossRef
37.
go back to reference Almalawi A, AlGhamdi R, Fahad A (2017) Investigate the use of anchor-text and of query-document similarity scores to predict the performance of search engine. Int J Adv Comput Sci Appl 8(11):320–332 Almalawi A, AlGhamdi R, Fahad A (2017) Investigate the use of anchor-text and of query-document similarity scores to predict the performance of search engine. Int J Adv Comput Sci Appl 8(11):320–332
38.
go back to reference Koehl P, Levitt M (1999) A brighter future for protein structure prediction. Nat Struct Biol 6:108–111CrossRef Koehl P, Levitt M (1999) A brighter future for protein structure prediction. Nat Struct Biol 6:108–111CrossRef
40.
go back to reference Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
41.
go back to reference Günzel H, Albrecht J, Lehner W (1999) Data mining in a multidimensional environment. Advances in Databases and Information Systems. Springer, Berlin/Heidelberg, pp 191–204CrossRef Günzel H, Albrecht J, Lehner W (1999) Data mining in a multidimensional environment. Advances in Databases and Information Systems. Springer, Berlin/Heidelberg, pp 191–204CrossRef
42.
go back to reference Wood JM (2007) Understanding and computing cohen's kappa: a tutorial. WebPsychEmpiricist. ID: 141840274 Wood JM (2007) Understanding and computing cohen's kappa: a tutorial. WebPsychEmpiricist. ID: 141840274
Metadata
Title
Deep learning model with ensemble techniques to compute the secondary structure of proteins
Authors
Rayed AlGhamdi
Azra Aziz
Mohammed Alshehri
Kamal Raj Pardasani
Tarique Aziz
Publication date
02-11-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 5/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03467-9

Other articles of this Issue 5/2021

The Journal of Supercomputing 5/2021 Go to the issue

Premium Partner