Skip to main content
Top

2015 | OriginalPaper | Chapter

Sequence-Based Random Projection Ensemble Approach to Identify Hotspot Residues from Whole Protein Sequence

Authors : Peng Chen, ShanShan Hu, Bing Wang, Jun Zhang

Published in: Intelligent Computing Theories and Methodologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Hot spot residues of proteins are key to performing specific functions in many biological processes. However the identification of hot spots by experimental methods is costly and time-consuming. Computational method is an alternative to identify hot spots by using sequential and structural information. However, structural information of protein is not always available. In this paper, the issue of identifying hot spots is addressed by using statistically physicochemical properties of amino acids only. Firstly, 34 relatively independent physicochemical properties are extracted from the 544 properties in AAindex1. Since the hot spots data set is extremely imbalanced, the ratio of the number of hot spots to that of non-hot spots is about 1.4 %, the hot spot set and a set of non-hot spot subset with roughly the number of that hot spots forms an initial input matrix. Random projection on the matrix achieves an input to a REPTree classifier. Several random projections and different sets of non-hot spots build an ensemble REPTree system. Experimental results showed that although our method performed worse it is a complement to the experiments on hot spot determination, on the commonly used hot spot benchmark sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280((1), 1–9 (1998)CrossRef Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280((1), 1–9 (1998)CrossRef
2.
go back to reference Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196)), 383–386 (1995)CrossRef Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196)), 383–386 (1995)CrossRef
3.
go back to reference Kortemme, T., Baker, D.: A simple physical model for binding energy hot spot in protein-protein complex. Proc. Natl. Acad. Sci. USA 99(22), 14116–141121 (2002)CrossRef Kortemme, T., Baker, D.: A simple physical model for binding energy hot spot in protein-protein complex. Proc. Natl. Acad. Sci. USA 99(22), 14116–141121 (2002)CrossRef
4.
go back to reference Keskin, O., Ma, B., Nussinov, R.: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)CrossRef Keskin, O., Ma, B., Nussinov, R.: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)CrossRef
5.
go back to reference Thorn, K.S., Bogan, A.A.: Asedb: a database of alanine mutations and their Effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)CrossRef Thorn, K.S., Bogan, A.A.: Asedb: a database of alanine mutations and their Effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)CrossRef
6.
go back to reference Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C., Ellison, B., Galer, S., Shapley, J., Fuentes, E., Tsai, J.: The binding interface database (bid): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)CrossRef Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C., Ellison, B., Galer, S., Shapley, J., Fuentes, E., Tsai, J.: The binding interface database (bid): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)CrossRef
7.
go back to reference Kumar, M.D.S., Gromiha, M.M.: Pint: protein-protein interactions thermodynam-Ic database. Nucleic Acids Res. 34, D195–D198 (2006)CrossRef Kumar, M.D.S., Gromiha, M.M.: Pint: protein-protein interactions thermodynam-Ic database. Nucleic Acids Res. 34, D195–D198 (2006)CrossRef
8.
go back to reference Moal, I.H., Fernández-Recio, J.: Skempi: A structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)CrossRef Moal, I.H., Fernández-Recio, J.: Skempi: A structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)CrossRef
9.
go back to reference DeLano, W.L.: unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)CrossRef DeLano, W.L.: unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)CrossRef
10.
go back to reference Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein–protein complexes. Proc. Natl. Acad. Sci. 99(22), 14116–14121 (2002)CrossRef Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein–protein complexes. Proc. Natl. Acad. Sci. 99(22), 14116–14121 (2002)CrossRef
11.
go back to reference Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)CrossRef Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)CrossRef
12.
go back to reference Gao, Y., Wang, R., Lai, L.: Structure-based method for analyzing protein-protein interfaces. J. Mol. Model. 10(1), 44–54 (2004)CrossRef Gao, Y., Wang, R., Lai, L.: Structure-based method for analyzing protein-protein interfaces. J. Mol. Model. 10(1), 44–54 (2004)CrossRef
13.
go back to reference Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., Serrano, L.: The foldx web server: an online Force field. Nucleic Acids Res. 33(Web Server issue), W382–W388 (2005)CrossRef Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., Serrano, L.: The foldx web server: an online Force field. Nucleic Acids Res. 33(Web Server issue), W382–W388 (2005)CrossRef
14.
go back to reference Huo, S., Massova, I., Kollman, P.A.: Computational alanine scanning of the 1:1 human growth hormone-receptor complex. J. Comput. Chem. 23(1), 15–27 (2002)CrossRef Huo, S., Massova, I., Kollman, P.A.: Computational alanine scanning of the 1:1 human growth hormone-receptor complex. J. Comput. Chem. 23(1), 15–27 (2002)CrossRef
15.
go back to reference Rajamani, D., Thiel, S., Vajda, S., Camacho, C.J.: Anchor residues in protein-Protein interactions. Proc. Natl. Acad. Sci. USA 101(31), 11287–11292 (2004)CrossRef Rajamani, D., Thiel, S., Vajda, S., Camacho, C.J.: Anchor residues in protein-Protein interactions. Proc. Natl. Acad. Sci. USA 101(31), 11287–11292 (2004)CrossRef
16.
go back to reference Gonzlez-Ruiz, D., Gohlke, H.: Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. Curr. Med. Chem. 13(22), 2607–2625 (2006)CrossRef Gonzlez-Ruiz, D., Gohlke, H.: Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. Curr. Med. Chem. 13(22), 2607–2625 (2006)CrossRef
17.
go back to reference Ma, B., Elkayam, T., Wolfson, H., Nussinov, R.: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. USA 100(10), 5772–5777 (2003)CrossRef Ma, B., Elkayam, T., Wolfson, H., Nussinov, R.: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. USA 100(10), 5772–5777 (2003)CrossRef
18.
go back to reference del Sol, A., O’Meara, P.: Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3), 672–682 (2005)CrossRef del Sol, A., O’Meara, P.: Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3), 672–682 (2005)CrossRef
19.
go back to reference Brinda, K.V., Kannan, N., Vishveshwara, S.: Analysis of homodimeric protein interfaces by graph-spectral methods. Protein Eng. 15(4), 265–277 (2002)CrossRef Brinda, K.V., Kannan, N., Vishveshwara, S.: Analysis of homodimeric protein interfaces by graph-spectral methods. Protein Eng. 15(4), 265–277 (2002)CrossRef
20.
go back to reference Guharoy, M., Chakrabarti, P.: Conservation and relative importance of residues across protein-protein interfaces. Proc. Natl. Acad. Sci. USA 102(43), 15447–15452 (2005)CrossRef Guharoy, M., Chakrabarti, P.: Conservation and relative importance of residues across protein-protein interfaces. Proc. Natl. Acad. Sci. USA 102(43), 15447–15452 (2005)CrossRef
21.
go back to reference Grosdidier, S., Fernndez-Recio, J.: identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinform. 9, 447 (2008)CrossRef Grosdidier, S., Fernndez-Recio, J.: identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinform. 9, 447 (2008)CrossRef
22.
go back to reference Ofran, Y., Rost, B.: Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 3(7), e119 (2007)CrossRef Ofran, Y., Rost, B.: Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 3(7), e119 (2007)CrossRef
23.
go back to reference Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68(4), 813–823 (2007)CrossRef Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68(4), 813–823 (2007)CrossRef
24.
go back to reference Guney, E., Tuncbag, N., Keskin, O., Gursoy, A.: Hotsprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 36(Database issue), D662–D666 (2008) Guney, E., Tuncbag, N., Keskin, O., Gursoy, A.: Hotsprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 36(Database issue), D662–D666 (2008)
25.
go back to reference Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)CrossRef Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)CrossRef
26.
go back to reference Cho, K.I., Kim, D., Lee, D.: A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res. 37(8), 2672–2687 (2009)CrossRef Cho, K.I., Kim, D., Lee, D.: A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res. 37(8), 2672–2687 (2009)CrossRef
27.
go back to reference Lise, S., Archambeau, C., Pontil, M., Jones, D.T.: Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform. 10, 365 (2009)CrossRef Lise, S., Archambeau, C., Pontil, M., Jones, D.T.: Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform. 10, 365 (2009)CrossRef
28.
go back to reference Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: Apis: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)CrossRef Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: Apis: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)CrossRef
29.
go back to reference Tuncbag, N., Keskin, O., Gursoy, A.: Hotpoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(Web Server issue), W402–W406 (2010)CrossRef Tuncbag, N., Keskin, O., Gursoy, A.: Hotpoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(Web Server issue), W402–W406 (2010)CrossRef
30.
go back to reference Lise, S., Buchan, D., Pontil, M., Jones, D.T.: Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS ONE 6(2), e16774 (2011)CrossRef Lise, S., Buchan, D., Pontil, M., Jones, D.T.: Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS ONE 6(2), e16774 (2011)CrossRef
31.
go back to reference Wang, L., Liu, Z.P., Zhang, X.S., Chen, L.: Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng. Des. Sel. 25(3), 119–126 (2012)CrossRef Wang, L., Liu, Z.P., Zhang, X.S., Chen, L.: Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng. Des. Sel. 25(3), 119–126 (2012)CrossRef
32.
go back to reference Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J.Z., Gao, X.: Accurate prediction of hot Spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8), 1351–1362 (2013)CrossRef Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J.Z., Gao, X.: Accurate prediction of hot Spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8), 1351–1362 (2013)CrossRef
33.
go back to reference Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008) Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)
34.
go back to reference Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Miller, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRef Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Miller, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRef
35.
go back to reference Chen, P., Li, J.: Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinform. 11, 402 (2010)CrossRef Chen, P., Li, J.: Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinform. 11, 402 (2010)CrossRef
36.
go back to reference Chen, P., Wong, L., Li, J.: Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1155–1165 (2012)MathSciNetCrossRef Chen, P., Wong, L., Li, J.: Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1155–1165 (2012)MathSciNetCrossRef
37.
go back to reference Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998) Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)
38.
go back to reference Kaski, S.: dimensionality reduction by random mapping: fast similarity computation for clustering. In: Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference, vol. 1, pp. 413–418 (1998) Kaski, S.: dimensionality reduction by random mapping: fast similarity computation for clustering. In: Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference, vol. 1, pp. 413–418 (1998)
39.
go back to reference Esposito, F., Malerba, D., Semeraro, G., Tamma, V.: The Effects of pruning methods on the predictive accuracy of induced decision trees (1999) Esposito, F., Malerba, D., Semeraro, G., Tamma, V.: The Effects of pruning methods on the predictive accuracy of induced decision trees (1999)
40.
go back to reference Chen, P., Huang, J.Z., Gao, X.: Ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinform. 15(Suppl 15), S4 (2014)CrossRef Chen, P., Huang, J.Z., Gao, X.: Ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinform. 15(Suppl 15), S4 (2014)CrossRef
41.
go back to reference Kuncheva, L.I., Whitaker, C.J., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)MathSciNetCrossRefMATH Kuncheva, L.I., Whitaker, C.J., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)MathSciNetCrossRefMATH
42.
go back to reference Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)CrossRef Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)CrossRef
43.
go back to reference Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Bio. 157(1), 105–132 (1982)CrossRef Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Bio. 157(1), 105–132 (1982)CrossRef
Metadata
Title
Sequence-Based Random Projection Ensemble Approach to Identify Hotspot Residues from Whole Protein Sequence
Authors
Peng Chen
ShanShan Hu
Bing Wang
Jun Zhang
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-22186-1_37

Premium Partner