Skip to main content

2017 | OriginalPaper | Buchkapitel

Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions

verfasst von : Khalid Raza

Erschienen in: Information, Communication and Computing Technology

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The long awaited challenge of post-genomic era and systems biology research is computational prediction of protein-protein interactions (PPIs) that ultimately lead to protein functions prediction. The important research questions is how protein complexes with known sequence and structure be used to identify and classify protein binding sites, and how to infer knowledge from these classification such as predicting PPIs of proteins with unknown sequence and structure. Several machine learning techniques have been applied for the prediction of PPIs, but the accuracy of their prediction wholly depends on the number of features being used for training. In this paper, we have performed a survey of protein features used for the prediction of PPIs. The open research challenges and opportunities in the area have also been discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Blow, N.: Systems biology: untangling the protein web. Nature 460, 415–418 (2009)CrossRef Blow, N.: Systems biology: untangling the protein web. Nature 460, 415–418 (2009)CrossRef
Zurück zum Zitat Bock, J.R., Gough, D.A.: Predicting protein–protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)CrossRef Bock, J.R., Gough, D.A.: Predicting protein–protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)CrossRef
Zurück zum Zitat Bordner, A.J., Abagyan, R.: Statistical analysis and prediction of protein-protein interfaces. Proteins 60, 353–366 (2005)CrossRef Bordner, A.J., Abagyan, R.: Statistical analysis and prediction of protein-protein interfaces. Proteins 60, 353–366 (2005)CrossRef
Zurück zum Zitat Browne, F., Wang, H., Zheng, H., Azuaje, F.: An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions. J. Integr. Bioinform. 3, 230–246 (2006) Browne, F., Wang, H., Zheng, H., Azuaje, F.: An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions. J. Integr. Bioinform. 3, 230–246 (2006)
Zurück zum Zitat Chatterjee, P., et al.: PPI_SVM: prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables. Cell. Mol. Biol. Lett. 16, 264–278 (2011)CrossRef Chatterjee, P., et al.: PPI_SVM: prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables. Cell. Mol. Biol. Lett. 16, 264–278 (2011)CrossRef
Zurück zum Zitat Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)CrossRef Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)CrossRef
Zurück zum Zitat Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D., Davis, R.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)CrossRef Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D., Davis, R.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)CrossRef
Zurück zum Zitat Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221(4612), 709–713 (1983)CrossRef Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221(4612), 709–713 (1983)CrossRef
Zurück zum Zitat De Las Rivas, J., de Luis, A.: Interactome data and databases: different types of protein interaction. Comp. Funct. Genomics 5, 173–178 (2004)CrossRef De Las Rivas, J., de Luis, A.: Interactome data and databases: different types of protein interaction. Comp. Funct. Genomics 5, 173–178 (2004)CrossRef
Zurück zum Zitat Deng, L., Guan, J., Dong, Q., Zhou, S.: Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics, 10, 426 (2009)CrossRef Deng, L., Guan, J., Dong, Q., Zhou, S.: Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics, 10, 426 (2009)CrossRef
Zurück zum Zitat Dong, Q., Wang, X., Lin, L., Guan, Y.: Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins. BMC Bioinformatics 8, 147 (2007) Dong, Q., Wang, X., Lin, L., Guan, Y.: Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins. BMC Bioinformatics 8, 147 (2007)
Zurück zum Zitat Fariselli, P., et al.: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002)CrossRef Fariselli, P., et al.: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002)CrossRef
Zurück zum Zitat Grigoriev, A.: On the number of protein- protein interactions in the yeast proteome. Nucleic Acids Res. 31, 4157–4161 (2003)CrossRef Grigoriev, A.: On the number of protein- protein interactions in the yeast proteome. Nucleic Acids Res. 31, 4157–4161 (2003)CrossRef
Zurück zum Zitat Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003). doi:10.1126/science.1087361 CrossRef Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003). doi:10.​1126/​science.​1087361 CrossRef
Zurück zum Zitat Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55(3), 379–400 (1971)CrossRef Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55(3), 379–400 (1971)CrossRef
Zurück zum Zitat Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J Mol. Biol. 257, 342–358 (1996)CrossRef Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J Mol. Biol. 257, 342–358 (1996)CrossRef
Zurück zum Zitat Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, no. 1998, pp. 296–304 (1998) Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, no. 1998, pp. 296–304 (1998)
Zurück zum Zitat Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 28, 37–40 (2000)CrossRef Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 28, 37–40 (2000)CrossRef
Zurück zum Zitat Mihel, J., Šikić, M., Tomic, S., Jeren, B., Vlahovicek, K.: PSAIA—protein structure and interaction analyzer. BMC Struct. Biol. 8, 21 (2008)CrossRef Mihel, J., Šikić, M., Tomic, S., Jeren, B., Vlahovicek, K.: PSAIA—protein structure and interaction analyzer. BMC Struct. Biol. 8, 21 (2008)CrossRef
Zurück zum Zitat Neuvirth, H., Raz, R., Schreiber, G.: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 338, 181–199 (2004)CrossRef Neuvirth, H., Raz, R., Schreiber, G.: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 338, 181–199 (2004)CrossRef
Zurück zum Zitat Ofran, Y., Rost, B.: Predicted protein–protein interaction sites from local sequence information. FEBS Lett. 544, 236–239 (2003)CrossRef Ofran, Y., Rost, B.: Predicted protein–protein interaction sites from local sequence information. FEBS Lett. 544, 236–239 (2003)CrossRef
Zurück zum Zitat Patil, A., Nakamura, H.: Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinform. 6, 100 (2005)CrossRef Patil, A., Nakamura, H.: Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinform. 6, 100 (2005)CrossRef
Zurück zum Zitat Rao, V., Srinivas, K., Sujini, G.N., Sunand, G.N.: Protein-protein interaction detection: methods and analysis. J. Proteomics 12, e0173163 (2014) Rao, V., Srinivas, K., Sujini, G.N., Sunand, G.N.: Protein-protein interaction detection: methods and analysis. J. Proteomics 12, e0173163 (2014)
Zurück zum Zitat Res, I., Mihalek, I., Lichtarge, O.: An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics 21, 2496–2501 (2005)CrossRef Res, I., Mihalek, I., Lichtarge, O.: An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics 21, 2496–2501 (2005)CrossRef
Zurück zum Zitat Richmond, T.J.: Solvent accessible surface area and excluded volume in proteins: analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol. 178(1), 63–89 (1984)CrossRef Richmond, T.J.: Solvent accessible surface area and excluded volume in proteins: analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol. 178(1), 63–89 (1984)CrossRef
Zurück zum Zitat Schneider, R., Sander, C.: The HSSP database of protein structure- sequence alignments. Nucleic Acids Res. 24, 201–205 (1996)CrossRef Schneider, R., Sander, C.: The HSSP database of protein structure- sequence alignments. Nucleic Acids Res. 24, 201–205 (1996)CrossRef
Zurück zum Zitat Shrake, A., Rupley, J.A.: Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79(2), 351–371 (1973)CrossRef Shrake, A., Rupley, J.A.: Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79(2), 351–371 (1973)CrossRef
Zurück zum Zitat Šikić, M., Tomic, S., Vlahovicek, K.: Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput. Biol. 5, e1000278 (2009)CrossRef Šikić, M., Tomic, S., Vlahovicek, K.: Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput. Biol. 5, e1000278 (2009)CrossRef
Zurück zum Zitat Wang, B., Sun, W., Zhang, J., Chen, P.: Current status of machine learning-based methods for identifying protein-protein interaction sites. Curr. Bioinform. 8, 177–182 (2013)CrossRef Wang, B., Sun, W., Zhang, J., Chen, P.: Current status of machine learning-based methods for identifying protein-protein interaction sites. Curr. Bioinform. 8, 177–182 (2013)CrossRef
Zurück zum Zitat Wang, B., Chen, P., Huang, D.-S., Li, J., Lok, T.-M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)CrossRef Wang, B., Chen, P., Huang, D.-S., Li, J., Lok, T.-M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)CrossRef
Zurück zum Zitat Weiser, J., Shenkin, P.S., Still, W.C.: Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 20(2), 217–230 (1999)CrossRef Weiser, J., Shenkin, P.S., Still, W.C.: Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 20(2), 217–230 (1999)CrossRef
Zurück zum Zitat You, Z., Ming, Z., Niu, B., Deng, S., Zhu, Z.: A SVM-based system for predicting protein-protein interactions using a novel representation of protein sequences. In: Huang, D.S., Bevilacqua, V., Figueroa, J.C., Premaratne, P. (eds.) ICIC 2013. LNCS, vol. 7995, pp. 629–637. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39479-9_73 CrossRef You, Z., Ming, Z., Niu, B., Deng, S., Zhu, Z.: A SVM-based system for predicting protein-protein interactions using a novel representation of protein sequences. In: Huang, D.S., Bevilacqua, V., Figueroa, J.C., Premaratne, P. (eds.) ICIC 2013. LNCS, vol. 7995, pp. 629–637. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-39479-9_​73 CrossRef
Zurück zum Zitat You, Z., Zhu, L., Zheng, C., Yu, H., Deng, S., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform. 15(Suppl 15), S9 (2014)CrossRef You, Z., Zhu, L., Zheng, C., Yu, H., Deng, S., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform. 15(Suppl 15), S9 (2014)CrossRef
Zurück zum Zitat Yu, H., Greenbaum, D., Xin, LuH, Zhu, X., Gerstein, M.: Genomic analysis of essentiality within protein networks. Trends Genet. 20(6), 227–231 (2004)CrossRef Yu, H., Greenbaum, D., Xin, LuH, Zhu, X., Gerstein, M.: Genomic analysis of essentiality within protein networks. Trends Genet. 20(6), 227–231 (2004)CrossRef
Zurück zum Zitat Xue, L.C., Dobbs, D., Bonvin, A.M., Honavar, V.: Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 589(23), 3516–3526 (2015)CrossRef Xue, L.C., Dobbs, D., Bonvin, A.M., Honavar, V.: Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 589(23), 3516–3526 (2015)CrossRef
Zurück zum Zitat Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res 37, 455–458 (2009)CrossRef Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res 37, 455–458 (2009)CrossRef
Zurück zum Zitat Zhang, M., Su, S., Bhatnagar, R., Hassett, D., Lu, L.: Prediction and analysis of the protein interactome in Pseudomonas aeruginosa to enable network-based drug target selection. PLoS ONE 7(7), e41202 (2012)CrossRef Zhang, M., Su, S., Bhatnagar, R., Hassett, D., Lu, L.: Prediction and analysis of the protein interactome in Pseudomonas aeruginosa to enable network-based drug target selection. PLoS ONE 7(7), e41202 (2012)CrossRef
Zurück zum Zitat Zubek, J., Tatjewski, M., Boniecki, A., Mnich, M., Basu, S., Plewczynski, D.: Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae. Peer J. 3, 1041 (2015)CrossRef Zubek, J., Tatjewski, M., Boniecki, A., Mnich, M., Basu, S., Plewczynski, D.: Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae. Peer J. 3, 1041 (2015)CrossRef
Metadaten
Titel
Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions
verfasst von
Khalid Raza
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-6544-6_28