Skip to main content
Erschienen in: The Journal of Supercomputing 5/2021

02.11.2020

Deep learning model with ensemble techniques to compute the secondary structure of proteins

verfasst von: Rayed AlGhamdi, Azra Aziz, Mohammed Alshehri, Kamal Raj Pardasani, Tarique Aziz

Erschienen in: The Journal of Supercomputing | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Protein secondary structure is the local conformation assigned to protein sequences with the help of its three-dimensional structure. Assigning the local conformation to protein sequences requires much computational work. There exists a vast literature on the protein secondary structure prediction approaches (more than 20 techniques), but to date, none of the existing techniques is entirely accurate. Thus, there is an excellent room for developing new models of protein secondary structure prediction to address the issues of prediction accuracy. In the present study, ensemble techniques such as AdaBoost- and Bagging-based deep learning models are proposed to predict the protein secondary structure. The data from standard datasets, namely CB513, RS126, PTOP742, PSA472, and MANESH, have been used for training and testing purposes. These standard datasets possess less than 25% redundancy. The model is evaluated using performance measures: Q8 and Q3 cross-validation accuracy, class precision, class recall, kappa factor, and testing on a dataset that is not used for training purposes, i.e., blind test. The ensembling technique used along with variability in datasets can remove the bias of each dataset by balancing it and making the features more distinguishable, leading to the improvement in accuracy as compared to the conventional and existing techniques. The proposed model shows an average improvement of ~ 2% and ~ 3% accuracy over the existing methods in a blind test for Q8 and Q3 accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Hoye AT (2010) Synthesis of natural and non-natural polycylicalkaloids. Doctoral dissertation, University of Pittsburgh Hoye AT (2010) Synthesis of natural and non-natural polycylicalkaloids. Doctoral dissertation, University of Pittsburgh
2.
Zurück zum Zitat Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13(2):222–245CrossRef Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13(2):222–245CrossRef
3.
Zurück zum Zitat Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics 11(6):681–684CrossRef Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics 11(6):681–684CrossRef
4.
Zurück zum Zitat Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553CrossRef Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553CrossRef
5.
Zurück zum Zitat Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct Funct Bioinform 19(1):55–72CrossRef Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct Funct Bioinform 19(1):55–72CrossRef
6.
Zurück zum Zitat Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202CrossRef Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202CrossRef
7.
Zurück zum Zitat Pollastri G, Przybylski D, Rost B, Baldi P (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins Struct Funct Bioinform 47(2):228–235CrossRef Pollastri G, Przybylski D, Rost B, Baldi P (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins Struct Funct Bioinform 47(2):228–235CrossRef
8.
Zurück zum Zitat Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43(1):389–394CrossRef Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43(1):389–394CrossRef
9.
Zurück zum Zitat Wang Z, Zhao F, Peng J, Xu J (2011) Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 11(19):3786–3792CrossRef Wang Z, Zhao F, Peng J, Xu J (2011) Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 11(19):3786–3792CrossRef
10.
Zurück zum Zitat Awais M, Iqbal MJ, Ahmad I, Alassafi MO, Alghamdi R, Basheri M, Waqas M (2019) Real-time surveillance through face recognition using hog and feedforward neural networks. IEEE Access 7:121236–121244CrossRef Awais M, Iqbal MJ, Ahmad I, Alassafi MO, Alghamdi R, Basheri M, Waqas M (2019) Real-time surveillance through face recognition using hog and feedforward neural networks. IEEE Access 7:121236–121244CrossRef
12.
Zurück zum Zitat Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33(2):36–38CrossRef Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33(2):36–38CrossRef
13.
Zurück zum Zitat Zhou GP, Assa Munt N (2001) Some insights into protein structural class prediction. Proteins Struct Funct Bioinform 44(1):57–59CrossRef Zhou GP, Assa Munt N (2001) Some insights into protein structural class prediction. Proteins Struct Funct Bioinform 44(1):57–59CrossRef
14.
Zurück zum Zitat Guo Y, Wang B, Li W, Yang B (2018) Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks. Bioinform Comput Biol 16(5):185–200 Guo Y, Wang B, Li W, Yang B (2018) Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks. Bioinform Comput Biol 16(5):185–200
15.
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
16.
Zurück zum Zitat Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11CrossRef Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11CrossRef
17.
Zurück zum Zitat Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6(1):1–11CrossRef Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6(1):1–11CrossRef
18.
Zurück zum Zitat Zhang B, Li J, Lü Q (2018) Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform 19(1):293–302CrossRef Zhang B, Li J, Lü Q (2018) Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform 19(1):293–302CrossRef
19.
Zurück zum Zitat Giulini M, Potestio R (2019) A deep learning approach to the structural analysis of proteins. Interface Focus 9(3):201–210CrossRef Giulini M, Potestio R (2019) A deep learning approach to the structural analysis of proteins. Interface Focus 9(3):201–210CrossRef
21.
Zurück zum Zitat Li Z, Yu Y (2016) Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks. In: International joint conference on artificial intelligence (IJCAI). 160–176 Li Z, Yu Y (2016) Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks. In: International joint conference on artificial intelligence (IJCAI). 160–176
22.
Zurück zum Zitat Guo Y, Li W, Wang B, Liu H, Zhou D (2019) DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction. BMC Bioinform 20(1):341–352CrossRef Guo Y, Li W, Wang B, Liu H, Zhou D (2019) DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction. BMC Bioinform 20(1):341–352CrossRef
23.
Zurück zum Zitat Mirabello C, Wallner B (2019) rawMSA: end-to-end deep learning using raw multiple sequence alignments. PLoS ONE 14(8):1–15CrossRef Mirabello C, Wallner B (2019) rawMSA: end-to-end deep learning using raw multiple sequence alignments. PLoS ONE 14(8):1–15CrossRef
24.
Zurück zum Zitat Makhlouf MA (2018) Deep learning for prediction of protein-protein interaction. Egypt Comput Sci J 42(3):1–14 Makhlouf MA (2018) Deep learning for prediction of protein-protein interaction. Egypt Comput Sci J 42(3):1–14
25.
Zurück zum Zitat Adhikari B, Hou J, Cheng J (2018) Protein contact prediction by integrating deep multiple sequence alignments, coevolution, and machine learning. Proteins Struct Funct Bioinform 86:84–96CrossRef Adhikari B, Hou J, Cheng J (2018) Protein contact prediction by integrating deep multiple sequence alignments, coevolution, and machine learning. Proteins Struct Funct Bioinform 86:84–96CrossRef
26.
Zurück zum Zitat Zhou J, Wang H, Zhao Z, Xu R, Lu Q (2018) CNNH_PSS: protein 8-class secondary structure prediction by a convolutional neural network with the highway. BMC Bioinform 19(4):99–109 Zhou J, Wang H, Zhao Z, Xu R, Lu Q (2018) CNNH_PSS: protein 8-class secondary structure prediction by a convolutional neural network with the highway. BMC Bioinform 19(4):99–109
27.
Zurück zum Zitat Ji S, Oruç T, Mead L, Rehman MF, Thomas CM, Butterworth S, Winn PJ (2019) DeepCDpred: inter-residue distance and contact prediction for improved prediction of protein structure. PLoS ONE 14(1):1–15CrossRef Ji S, Oruç T, Mead L, Rehman MF, Thomas CM, Butterworth S, Winn PJ (2019) DeepCDpred: inter-residue distance and contact prediction for improved prediction of protein structure. PLoS ONE 14(1):1–15CrossRef
28.
Zurück zum Zitat Dietterich T G (2000, June). Ensemble methods in machine learning. International workshop on multiple classifier systems. 1–15 Springer, Berlin, Heidelberg Dietterich T G (2000, June). Ensemble methods in machine learning. International workshop on multiple classifier systems. 1–15 Springer, Berlin, Heidelberg
29.
Zurück zum Zitat Liu Y, Yang C, Gao Z, Yao Y (2018) Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes. Chemom Intell Lab Syst 174:15–21CrossRef Liu Y, Yang C, Gao Z, Yao Y (2018) Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes. Chemom Intell Lab Syst 174:15–21CrossRef
30.
Zurück zum Zitat Liu Y, Fan Y, Chen J (2017) Flame images for oxygen content prediction of combustion systems using DBN. Energy Fuels 31(8):8776–8783CrossRef Liu Y, Fan Y, Chen J (2017) Flame images for oxygen content prediction of combustion systems using DBN. Energy Fuels 31(8):8776–8783CrossRef
31.
Zurück zum Zitat He X, Ji J, Liu K, Gao Z, Liu Y (2019) Soft sensing of silicon content via bagging local semi-supervised models. Sensors 19(17):38–41CrossRef He X, Ji J, Liu K, Gao Z, Liu Y (2019) Soft sensing of silicon content via bagging local semi-supervised models. Sensors 19(17):38–41CrossRef
32.
Zurück zum Zitat Liu Y, Zhang Z, Chen J (2015) Ensemble local kernel learning for online prediction of distributed product outputs in chemical processes. Chem Eng Sci 137:140–151CrossRef Liu Y, Zhang Z, Chen J (2015) Ensemble local kernel learning for online prediction of distributed product outputs in chemical processes. Chem Eng Sci 137:140–151CrossRef
33.
Zurück zum Zitat Rose PW, Prlić A, Bi C, Bluhm WF, Christie CH, Dutta S, Young J (2015) The RCSB protein data bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43(1):345–356CrossRef Rose PW, Prlić A, Bi C, Bluhm WF, Christie CH, Dutta S, Young J (2015) The RCSB protein data bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43(1):345–356CrossRef
34.
Zurück zum Zitat Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRef Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRef
35.
Zurück zum Zitat Karchin, R. (2003). Evaluating local structure alphabets for protein structure prediction. Doctoral dissertation, University of California, Santa Cruz 2003 Karchin, R. (2003). Evaluating local structure alphabets for protein structure prediction. Doctoral dissertation, University of California, Santa Cruz 2003
36.
Zurück zum Zitat Engh RA, Huber R (1991) Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A 47(4):392–400CrossRef Engh RA, Huber R (1991) Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A 47(4):392–400CrossRef
37.
Zurück zum Zitat Almalawi A, AlGhamdi R, Fahad A (2017) Investigate the use of anchor-text and of query-document similarity scores to predict the performance of search engine. Int J Adv Comput Sci Appl 8(11):320–332 Almalawi A, AlGhamdi R, Fahad A (2017) Investigate the use of anchor-text and of query-document similarity scores to predict the performance of search engine. Int J Adv Comput Sci Appl 8(11):320–332
38.
Zurück zum Zitat Koehl P, Levitt M (1999) A brighter future for protein structure prediction. Nat Struct Biol 6:108–111CrossRef Koehl P, Levitt M (1999) A brighter future for protein structure prediction. Nat Struct Biol 6:108–111CrossRef
40.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
41.
Zurück zum Zitat Günzel H, Albrecht J, Lehner W (1999) Data mining in a multidimensional environment. Advances in Databases and Information Systems. Springer, Berlin/Heidelberg, pp 191–204CrossRef Günzel H, Albrecht J, Lehner W (1999) Data mining in a multidimensional environment. Advances in Databases and Information Systems. Springer, Berlin/Heidelberg, pp 191–204CrossRef
42.
Zurück zum Zitat Wood JM (2007) Understanding and computing cohen's kappa: a tutorial. WebPsychEmpiricist. ID: 141840274 Wood JM (2007) Understanding and computing cohen's kappa: a tutorial. WebPsychEmpiricist. ID: 141840274
Metadaten
Titel
Deep learning model with ensemble techniques to compute the secondary structure of proteins
verfasst von
Rayed AlGhamdi
Azra Aziz
Mohammed Alshehri
Kamal Raj Pardasani
Tarique Aziz
Publikationsdatum
02.11.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 5/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03467-9

Weitere Artikel der Ausgabe 5/2021

The Journal of Supercomputing 5/2021 Zur Ausgabe

Premium Partner