Skip to main content

2015 | OriginalPaper | Buchkapitel

Consensus-Based Prediction of RNA and DNA Binding Residues from Protein Sequences

verfasst von : Jing Yan, Lukasz Kurgan

Erschienen in: Pattern Recognition and Machine Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Computational prediction of RNA- and DNA-binding residues from protein sequences offers a high-throughput and accurate solution to functionally annotate the avalanche of the protein sequence data. Although many predictors exist, the efforts to improve predictive performance with the use of consensus methods are so far limited. We explore and empirically compare a comprehensive set of different designs of consensuses including simple approaches that combine binary predictions and more sophisticated machine learning models. We consider both DNA- and RNA-binding motivated by similarities in these interactions, which should lead to similar conclusions. We observe that the simple consensuses do not provide improved predictive performance when applied to sequences that share low similarity with the datasets used to build their input predictors. However, use of machine learning models, such as linear regression, Support Vector Machine and Naïve Bayes, results in improved predictive performance when compared with the best individual predictors for the prediction of DNA- and RNA-binding residues.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Re, A., et al.: RNA-protein interactions: an overview. Meth. Mol. Biol. 1097, 491–521 (2014) Re, A., et al.: RNA-protein interactions: an overview. Meth. Mol. Biol. 1097, 491–521 (2014)
2.
Zurück zum Zitat Dey, B., et al.: DNA-protein interactions: methods for detection and analysis. Mol. Cell. Biochem. 365(1–2), 279–299 (2012)CrossRef Dey, B., et al.: DNA-protein interactions: methods for detection and analysis. Mol. Cell. Biochem. 365(1–2), 279–299 (2012)CrossRef
3.
Zurück zum Zitat Charoensawan, V., Wilson, D., Teichmann, S.A.: Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res. 38(21), 7364–7377 (2010)CrossRef Charoensawan, V., Wilson, D., Teichmann, S.A.: Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res. 38(21), 7364–7377 (2010)CrossRef
4.
Zurück zum Zitat Zhao, H., Yang, Y., Zhou, Y.: Prediction of RNA binding proteins comes of age from low resolution to high resolution. Mol. BioSyst. 9(10), 2417–2425 (2013)CrossRef Zhao, H., Yang, Y., Zhou, Y.: Prediction of RNA binding proteins comes of age from low resolution to high resolution. Mol. BioSyst. 9(10), 2417–2425 (2013)CrossRef
5.
Zurück zum Zitat Fornes, O., et al.: On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. Adv. Protein. Chem. Struct. Biol. 94, 77–120 (2014) Fornes, O., et al.: On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. Adv. Protein. Chem. Struct. Biol. 94, 77–120 (2014)
6.
Zurück zum Zitat Kauffman, C., Karypis, G.: Computational tools for protein-DNA interactions. Data Min. Knowl. Disc. 2(1), 14–28 (2012)CrossRef Kauffman, C., Karypis, G.: Computational tools for protein-DNA interactions. Data Min. Knowl. Disc. 2(1), 14–28 (2012)CrossRef
7.
Zurück zum Zitat Liu, L.A., Bradley, P.: Atomistic modeling of protein-DNA interaction specificity: progress and applications. Curr. Opin. Struct. Biol. 22(4), 397–405 (2012)CrossRef Liu, L.A., Bradley, P.: Atomistic modeling of protein-DNA interaction specificity: progress and applications. Curr. Opin. Struct. Biol. 22(4), 397–405 (2012)CrossRef
8.
Zurück zum Zitat Gromiha, M.M., Nagarajan, R.: Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. Adv. Protein. Chem. Struct. Biol. 91, 65–99 (2013) Gromiha, M.M., Nagarajan, R.: Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. Adv. Protein. Chem. Struct. Biol. 91, 65–99 (2013)
9.
Zurück zum Zitat Ding, X.M., et al.: Computational prediction of DNA-protein interactions: a review. Curr. Comput. Aided Drug Des. 6(3), 197–206 (2010)CrossRef Ding, X.M., et al.: Computational prediction of DNA-protein interactions: a review. Curr. Comput. Aided Drug Des. 6(3), 197–206 (2010)CrossRef
10.
Zurück zum Zitat Puton, T., et al.: Computational methods for prediction of protein-RNA interactions. J. Struct. Biol. 179(3), 261–268 (2012)CrossRef Puton, T., et al.: Computational methods for prediction of protein-RNA interactions. J. Struct. Biol. 179(3), 261–268 (2012)CrossRef
11.
Zurück zum Zitat Walia, R.R., et al.: Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art. BMC Bioinform. 13, 89 (2012)CrossRef Walia, R.R., et al.: Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art. BMC Bioinform. 13, 89 (2012)CrossRef
12.
Zurück zum Zitat Yan, J., Marcus, M., Kurgan, L.: Comprehensively designed consensus of standalone secondary structure predictors improves Q3 by over 3 %. J. Biomol. Struct. Dyn. 32(1), 36–51 (2014)CrossRef Yan, J., Marcus, M., Kurgan, L.: Comprehensively designed consensus of standalone secondary structure predictors improves Q3 by over 3 %. J. Biomol. Struct. Dyn. 32(1), 36–51 (2014)CrossRef
13.
Zurück zum Zitat Zhang, H., et al.: Critical assessment of high-throughput standalone methods for secondary structure prediction. Brief Bioinform. 12(6), 672–688 (2011)CrossRef Zhang, H., et al.: Critical assessment of high-throughput standalone methods for secondary structure prediction. Brief Bioinform. 12(6), 672–688 (2011)CrossRef
14.
Zurück zum Zitat Fan, X., Kurgan, L.: Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J. Biomol. Struct. Dyn. 32(3), 448–464 (2014)CrossRef Fan, X., Kurgan, L.: Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J. Biomol. Struct. Dyn. 32(3), 448–464 (2014)CrossRef
15.
Zurück zum Zitat Kozlowski, L.P., Bujnicki, J.M.: MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins. BMC Bioinform. 13, 111 (2012)CrossRef Kozlowski, L.P., Bujnicki, J.M.: MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins. BMC Bioinform. 13, 111 (2012)CrossRef
16.
Zurück zum Zitat Walsh, I., et al.: Comprehensive large-scale assessment of intrinsic protein disorder. Bioinformatics 31(2), 201–208 (2015)CrossRef Walsh, I., et al.: Comprehensive large-scale assessment of intrinsic protein disorder. Bioinformatics 31(2), 201–208 (2015)CrossRef
17.
Zurück zum Zitat Albrecht, M., et al.: Simple consensus procedures are effective and sufficient in secondary structure prediction. Protein Eng. 16(7), 459–462 (2003)CrossRef Albrecht, M., et al.: Simple consensus procedures are effective and sufficient in secondary structure prediction. Protein Eng. 16(7), 459–462 (2003)CrossRef
18.
Zurück zum Zitat Ahmad, S., Gromiha, M.M., Sarai, A.: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20(4), 477–486 (2004)CrossRef Ahmad, S., Gromiha, M.M., Sarai, A.: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20(4), 477–486 (2004)CrossRef
19.
Zurück zum Zitat Jeong, E., Chung, I.F., Miyano, S.: A neural network method for identification of RNA-interacting residues in protein. Genome Inform. 15(1), 105–116 (2004) Jeong, E., Chung, I.F., Miyano, S.: A neural network method for identification of RNA-interacting residues in protein. Genome Inform. 15(1), 105–116 (2004)
20.
Zurück zum Zitat Wang, L., et al.: BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 4(Suppl. 1), S3 (2010)CrossRef Wang, L., et al.: BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 4(Suppl. 1), S3 (2010)CrossRef
21.
Zurück zum Zitat Terribilini, M., et al.: RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res. 35, W578–W584 (2007)CrossRef Terribilini, M., et al.: RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res. 35, W578–W584 (2007)CrossRef
22.
Zurück zum Zitat Chu, W.Y., et al.: ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res. 37, W396–W401 (2009)CrossRef Chu, W.Y., et al.: ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res. 37, W396–W401 (2009)CrossRef
23.
Zurück zum Zitat Lee, J.H., et al.: Striking similarities in diverse telomerase proteins revealed by combining structure prediction and machine learning approaches. Pac. Symp. Biocomput. 13, 501–512 (2008) Lee, J.H., et al.: Striking similarities in diverse telomerase proteins revealed by combining structure prediction and machine learning approaches. Pac. Symp. Biocomput. 13, 501–512 (2008)
24.
Zurück zum Zitat Hwang, S., Gou, Z.K., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)CrossRef Hwang, S., Gou, Z.K., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)CrossRef
25.
Zurück zum Zitat Carson, M.B., Langlois, R., Lu, H.: NAPS: a residue-level nucleic acid-binding prediction server. Nucleic Acids Res. 38, W431–W435 (2010)CrossRef Carson, M.B., Langlois, R., Lu, H.: NAPS: a residue-level nucleic acid-binding prediction server. Nucleic Acids Res. 38, W431–W435 (2010)CrossRef
26.
Zurück zum Zitat Ma, X., et al.: Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE-ACM Trans. Comput. Biol. Bioinform. 9(6), 1766–1775 (2012)CrossRef Ma, X., et al.: Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE-ACM Trans. Comput. Biol. Bioinform. 9(6), 1766–1775 (2012)CrossRef
27.
Zurück zum Zitat Ma, X., et al.: Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature. Proteins 79(4), 1230–1239 (2011)CrossRef Ma, X., et al.: Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature. Proteins 79(4), 1230–1239 (2011)CrossRef
28.
Zurück zum Zitat Wang, L.J., Yang, M.Q., Yang, J.Y.: Prediction of DNA-binding residues from protein sequence information using random forests. BMC Genom. 10(Suppl. 1), S1 (2009)CrossRef Wang, L.J., Yang, M.Q., Yang, J.Y.: Prediction of DNA-binding residues from protein sequence information using random forests. BMC Genom. 10(Suppl. 1), S1 (2009)CrossRef
29.
Zurück zum Zitat Si, J., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(Suppl. 1), S7 (2011)CrossRef Si, J., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(Suppl. 1), S7 (2011)CrossRef
30.
Zurück zum Zitat Wang, L.J., Brown, S.J.: BindN: a Web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34, W243–W248 (2006)CrossRef Wang, L.J., Brown, S.J.: BindN: a Web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34, W243–W248 (2006)CrossRef
31.
Zurück zum Zitat Ofran, Y., Mysore, V., Rost, B.: Prediction of DNA-binding residues from sequence. Bioinformatics 23(13), I347–I353 (2007)CrossRef Ofran, Y., Mysore, V., Rost, B.: Prediction of DNA-binding residues from sequence. Bioinformatics 23(13), I347–I353 (2007)CrossRef
32.
Zurück zum Zitat Yan, C.H., et al.: Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform. 7, 262 (2006)CrossRef Yan, C.H., et al.: Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform. 7, 262 (2006)CrossRef
33.
Zurück zum Zitat Murakami, Y., et al.: PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences. Nucleic Acids Res. 38, W412–W416 (2010)CrossRef Murakami, Y., et al.: PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences. Nucleic Acids Res. 38, W412–W416 (2010)CrossRef
34.
Zurück zum Zitat Kumar, M., Gromiha, A.M., Raghava, G.P.S.: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 71(1), 189–194 (2008)CrossRef Kumar, M., Gromiha, A.M., Raghava, G.P.S.: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 71(1), 189–194 (2008)CrossRef
35.
Zurück zum Zitat Kuznetsov, I.B., et al.: Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64(1), 19–27 (2006)CrossRef Kuznetsov, I.B., et al.: Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64(1), 19–27 (2006)CrossRef
36.
Zurück zum Zitat Terribilini, M., et al.: Prediction of RNA binding sites in proteins from amino acid sequence. RNA 12(8), 1450–1462 (2006)CrossRef Terribilini, M., et al.: Prediction of RNA binding sites in proteins from amino acid sequence. RNA 12(8), 1450–1462 (2006)CrossRef
37.
Zurück zum Zitat Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform. 6, 33 (2005)CrossRef Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform. 6, 33 (2005)CrossRef
38.
Zurück zum Zitat Berman, H.M., et al.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)CrossRef Berman, H.M., et al.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)CrossRef
39.
Zurück zum Zitat Chen, K., et al.: A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure 19(5), 613–621 (2011)CrossRef Chen, K., et al.: A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure 19(5), 613–621 (2011)CrossRef
40.
Zurück zum Zitat Huang, Y., et al.: CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26(5), 680–682 (2010)CrossRef Huang, Y., et al.: CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26(5), 680–682 (2010)CrossRef
41.
Zurück zum Zitat Frank, E., et al.: Weka-a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 1269–1277. Springer, Heidelberg (2010) Frank, E., et al.: Weka-a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 1269–1277. Springer, Heidelberg (2010)
42.
Zurück zum Zitat Peng, Z., Kurgan, L.: On the complementarity of the consensus-based disorder prediction. Pac. Symp. Biocomput. 8, 176–187 (2012) Peng, Z., Kurgan, L.: On the complementarity of the consensus-based disorder prediction. Pac. Symp. Biocomput. 8, 176–187 (2012)
Metadaten
Titel
Consensus-Based Prediction of RNA and DNA Binding Residues from Protein Sequences
verfasst von
Jing Yan
Lukasz Kurgan
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19941-2_48

Premium Partner