Skip to main content
Erschienen in: Neural Computing and Applications 17/2021

19.01.2021 | Original Article

Granular multiple kernel learning for identifying RNA-binding protein residues via integrating sequence and structure information

verfasst von: Chao Yang, Yijie Ding, Qiaozhen Meng, Jijun Tang, Fei Guo

Erschienen in: Neural Computing and Applications | Ausgabe 17/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

RNA-binding proteins play an important role in the biological process. However, the traditional experiment technology to predict RNA-binding residues is time-consuming and expensive, so the development of an effective computational approach can provide a strategy to solve this issue. In recent years, most of the computational approaches are constructed on protein sequence information, but the protein structure has not been considered. In this paper, we use a novel computational model of RNA-binding residues prediction, using protein sequence and structure information. Our hybrid features are encoded by local sequence and structure feature extraction models. Our predictor is built by employing the Granular Multiple Kernel Support Vector Machine with Repetitive Under-sampling (GMKSVM-RU). In order to evaluate our method, we use fivefold cross-validation on the RBP129, our method achieves better experimental performance with MCC of 0.3367 and accuracy of 88.84%. In order to further evaluate our model, an independent data set (RBP60) is employed, and our method achieves MCC of 0.3921 and accuracy of 87.52%. Above results demonstrate that integrating sequence and structure information is beneficial to improve the prediction ability of RNA-binding residues.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen Y, Varani G (2005) Protein families and RNA recognition. FEBS J 272(9):2088–2097 Chen Y, Varani G (2005) Protein families and RNA recognition. FEBS J 272(9):2088–2097
2.
Zurück zum Zitat Glisovic T, Bachorik JL, Yong J, Dreyfuss G (2008) RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett 582(14):1977–1986 Glisovic T, Bachorik JL, Yong J, Dreyfuss G (2008) RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett 582(14):1977–1986
3.
Zurück zum Zitat Ding Y, Tang J, Guo F (2020) Identification of drug-target interactions via dual Laplacian regularized least squares with multiple kernel fusion. Knowl Based Syst 204:106254 Ding Y, Tang J, Guo F (2020) Identification of drug-target interactions via dual Laplacian regularized least squares with multiple kernel fusion. Knowl Based Syst 204:106254
4.
Zurück zum Zitat Ding Y, Tang J, Guo F (2020) Human protein subcellular localization identification via fuzzy model on kernelized neighborhood representation. Appl Soft Comput 96:106596 Ding Y, Tang J, Guo F (2020) Human protein subcellular localization identification via fuzzy model on kernelized neighborhood representation. Appl Soft Comput 96:106596
7.
Zurück zum Zitat Ding Y, Tang J, Guo F (2019) Identification of drug-side effect association via semisupervised model and multiple kernel learning. IEEE J Biomed Health Inform 23(6):2619–2632 Ding Y, Tang J, Guo F (2019) Identification of drug-side effect association via semisupervised model and multiple kernel learning. IEEE J Biomed Health Inform 23(6):2619–2632
8.
Zurück zum Zitat Ding Y, Tang J, Guo F (2019) Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing 325:211–224 Ding Y, Tang J, Guo F (2019) Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing 325:211–224
9.
Zurück zum Zitat Ding Y, Tang J, Guo F (2017) Identification of drug-target interactions via multiple information integration. Inf Sci 418:546–560 Ding Y, Tang J, Guo F (2017) Identification of drug-target interactions via multiple information integration. Inf Sci 418:546–560
10.
Zurück zum Zitat Ding Y, Tang J, Guo F (2019) Identification of drug-target interactions via fuzzy bipartite local model. Neural Comput Appl 418:1–17 Ding Y, Tang J, Guo F (2019) Identification of drug-target interactions via fuzzy bipartite local model. Neural Comput Appl 418:1–17
11.
Zurück zum Zitat Wang H, Ding Y, Tang J, Guo F (2020) Identification of membrane protein types via multivariate information fusion with Hilbert–Schmidt independence criterion. Neurocomputing 383:257–269 Wang H, Ding Y, Tang J, Guo F (2020) Identification of membrane protein types via multivariate information fusion with Hilbert–Schmidt independence criterion. Neurocomputing 383:257–269
13.
Zurück zum Zitat Kurgan L, Razib AA, Aghakhani S (2009) Meta prediction of protein crystallization propensity. BMC Struct Biol 9(1):50 Kurgan L, Razib AA, Aghakhani S (2009) Meta prediction of protein crystallization propensity. BMC Struct Biol 9(1):50
14.
Zurück zum Zitat Mizianty MJ, Kurgan L (2009) CRYSTALP2: sequence-based protein crystallization propensity prediction. Biochem Biophys Res Commun 390:10 Mizianty MJ, Kurgan L (2009) CRYSTALP2: sequence-based protein crystallization propensity prediction. Biochem Biophys Res Commun 390:10
15.
Zurück zum Zitat Yang J, Roy A, Zhang Y (2013) Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20):2588–2595 Yang J, Roy A, Zhang Y (2013) Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20):2588–2595
16.
Zurück zum Zitat Chen K, Mizianty MJ, Kurgan L (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3):331–341 Chen K, Mizianty MJ, Kurgan L (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3):331–341
17.
Zurück zum Zitat Yu DJ, Hu J, Huang Y, Shen HB, Qi Y, Tang ZM, Yang JY (2013) TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J Comput Chem 34(11):974–985 Yu DJ, Hu J, Huang Y, Shen HB, Qi Y, Tang ZM, Yang JY (2013) TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J Comput Chem 34(11):974–985
18.
Zurück zum Zitat Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY (2013) Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104:180–190 Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY (2013) Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104:180–190
19.
Zurück zum Zitat Zhu YH, Hu J, Song XN, Yu DJ (2019) DNAPred: accurate identification of dna-binding sites from protein sequence by ensembling hyperplane-distance-based support vector machines. J Chem Inf Model 59(6):3057–3071 Zhu YH, Hu J, Song XN, Yu DJ (2019) DNAPred: accurate identification of dna-binding sites from protein sequence by ensembling hyperplane-distance-based support vector machines. J Chem Inf Model 59(6):3057–3071
20.
Zurück zum Zitat Kumar M, Gromiha MM, Raghava GPS (2008) Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 71(1):189–194 Kumar M, Gromiha MM, Raghava GPS (2008) Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 71(1):189–194
21.
Zurück zum Zitat Spriggs RV, Murakami Y, Nakamura H, Jones S (2009) Protein function annotation from sequence: prediction of residues interacting with RNA. Bioinformatics 25(12):1492–1497 Spriggs RV, Murakami Y, Nakamura H, Jones S (2009) Protein function annotation from sequence: prediction of residues interacting with RNA. Bioinformatics 25(12):1492–1497
22.
Zurück zum Zitat Wang C, Fang Y, Xiao J, Li M (2011) Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids 40(1):239–248 Wang C, Fang Y, Xiao J, Li M (2011) Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids 40(1):239–248
23.
Zurück zum Zitat Wang L, Huang C, Yang MQ, Yang JY (2010) BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst Biol 4(S1):S3 Wang L, Huang C, Yang MQ, Yang JY (2010) BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst Biol 4(S1):S3
24.
Zurück zum Zitat Xiong D, Zeng J, Gong H (2015) RBRIdent: an algorithm for improved identification of RNA-binding residues in proteins from primary sequences. Proteins 83(6):1068–1077 Xiong D, Zeng J, Gong H (2015) RBRIdent: an algorithm for improved identification of RNA-binding residues in proteins from primary sequences. Proteins 83(6):1068–1077
25.
Zurück zum Zitat Tang Y, Liu D, Wang Z, Wen T, Deng L (2017) A boosting approach for prediction of protein-RNA binding residues. BMC Bioinform 18(13):465 Tang Y, Liu D, Wang Z, Wen T, Deng L (2017) A boosting approach for prediction of protein-RNA binding residues. BMC Bioinform 18(13):465
26.
Zurück zum Zitat Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, Dobbs D (2010) PRIDB: a protein-RNA interface database. Nucleic Acids Res 39(suppl-1):D277–D282 Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, Dobbs D (2010) PRIDB: a protein-RNA interface database. Nucleic Acids Res 39(suppl-1):D277–D282
27.
Zurück zum Zitat Walia RR, Xue LC, Wilkins K, El-Manzalawy Y, Dobbs D, Honavar V (2014) RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PLoS ONE 9(5):e97725 Walia RR, Xue LC, Wilkins K, El-Manzalawy Y, Dobbs D, Honavar V (2014) RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PLoS ONE 9(5):e97725
28.
Zurück zum Zitat Miao Z, Westhof E (2015) Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 43(11):5340–5351 Miao Z, Westhof E (2015) Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 43(11):5340–5351
29.
Zurück zum Zitat Miao Z, Westhof E (2015) A large-scale assessment of nucleic acids binding site prediction programs. PLoS Comput Biol 11(12):e1004639 Miao Z, Westhof E (2015) A large-scale assessment of nucleic acids binding site prediction programs. PLoS Comput Biol 11(12):e1004639
30.
Zurück zum Zitat Terribilini M, Lee J-H, Yan C, Jernigan RL, Honavar V, Dobbs D (2006) Prediction of RNA binding sites in proteins from amino acid sequence. RNA 12(8):1450–1462 Terribilini M, Lee J-H, Yan C, Jernigan RL, Honavar V, Dobbs D (2006) Prediction of RNA binding sites in proteins from amino acid sequence. RNA 12(8):1450–1462
31.
Zurück zum Zitat Cheng C-W, Su EC-Y, Hwang J-K, Sung T-Y, Hsu W-L (2008) Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinform 9(12):S6 Cheng C-W, Su EC-Y, Hwang J-K, Sung T-Y, Hsu W-L (2008) Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinform 9(12):S6
32.
Zurück zum Zitat Liu Z-P, Wu L-Y, Wang Y, Zhang X-S, Chen L (2010) Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26(13):1616–1622 Liu Z-P, Wu L-Y, Wang Y, Zhang X-S, Chen L (2010) Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26(13):1616–1622
33.
Zurück zum Zitat Yang X, Wang J, Sun J, Liu R (2015) Snbrfinder: a sequence-based hybridalgorithm for enhanced prediction of nucleic acid-binding residues. PLoS ONE 10(7):0133260 Yang X, Wang J, Sun J, Liu R (2015) Snbrfinder: a sequence-based hybridalgorithm for enhanced prediction of nucleic acid-binding residues. PLoS ONE 10(7):0133260
34.
Zurück zum Zitat Kim OT, Yura K, Go N (2006) Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction. Nucleic Acids Res 34(22):6450–6460 Kim OT, Yura K, Go N (2006) Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction. Nucleic Acids Res 34(22):6450–6460
35.
Zurück zum Zitat Chen YC, Lim C (2008) Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res 34:e29 Chen YC, Lim C (2008) Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res 34:e29
36.
Zurück zum Zitat Towfic F, Caragea C, Gemperline DC, Dobbs D, Honavar V (2010) Struct-NB: predicting protein-RNA binding sites using structural features. Int J Data Min Bioinform 4:21–43 Towfic F, Caragea C, Gemperline DC, Dobbs D, Honavar V (2010) Struct-NB: predicting protein-RNA binding sites using structural features. Int J Data Min Bioinform 4:21–43
37.
Zurück zum Zitat Yang XX, Deng ZL, Liu R (2014) RBRDetector: improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies. Proteins 82:2455–2471 Yang XX, Deng ZL, Liu R (2014) RBRDetector: improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies. Proteins 82:2455–2471
38.
Zurück zum Zitat Maetschke SR, Yuan Z (2009) Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinform 10:341 Maetschke SR, Yuan Z (2009) Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinform 10:341
39.
Zurück zum Zitat Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89:2195–2199 Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89:2195–2199
40.
Zurück zum Zitat Gabb HA, Jackson RM, Sternberg MJ (1997) Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 272:106–120 Gabb HA, Jackson RM, Sternberg MJ (1997) Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 272:106–120
41.
Zurück zum Zitat Ritchie DW, Kemp GJ (2000) Protein docking using spherical polar Fourier correlations. Proteins 39:178–194 Ritchie DW, Kemp GJ (2000) Protein docking using spherical polar Fourier correlations. Proteins 39:178–194
42.
Zurück zum Zitat Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
43.
Zurück zum Zitat Limin F, Beifang N, Zhengwei Z, Sitao W, Weizhong L (2012) CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics 28(23):3150–3152 Limin F, Beifang N, Zhengwei Z, Sitao W, Weizhong L (2012) CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics 28(23):3150–3152
44.
Zurück zum Zitat Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659 Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659
45.
Zurück zum Zitat Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
46.
Zurück zum Zitat Gish W, States DJ (1993) Identification of protein coding regions by database similarity search. Nat Genet 3(3):266–272 Gish W, States DJ (1993) Identification of protein coding regions by database similarity search. Nat Genet 3(3):266–272
47.
Zurück zum Zitat Allers J, Shamoo Y (2001) Structure-based analysis of protein-RNA interactions using the program ENTANGLE. J Mol Biol 311:75–86 Allers J, Shamoo Y (2001) Structure-based analysis of protein-RNA interactions using the program ENTANGLE. J Mol Biol 311:75–86
48.
Zurück zum Zitat Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
49.
Zurück zum Zitat Joosten RP, Te B, Tim AH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, Sander C, Vriend G (2010) A series of PDB related databases for everyday needs. Nucleic Acids Res 39(suppl-1):D411–D419 Joosten RP, Te B, Tim AH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, Sander C, Vriend G (2010) A series of PDB related databases for everyday needs. Nucleic Acids Res 39(suppl-1):D411–D419
50.
Zurück zum Zitat Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637 Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
51.
Zurück zum Zitat Guo F, Zou Q, Yang G, Wang D, Tang J, Xu J (2019) Identifying protein-protein interface via a novel multi-scale local sequence and structural representation. BMC Bioinform 20(15):1–11 Guo F, Zou Q, Yang G, Wang D, Tang J, Xu J (2019) Identifying protein-protein interface via a novel multi-scale local sequence and structural representation. BMC Bioinform 20(15):1–11
52.
Zurück zum Zitat Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH
53.
Zurück zum Zitat Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B (Cybern) 39:281–288 Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B (Cybern) 39:281–288
54.
Zurück zum Zitat Tang Y, Zhang YQ (2006) Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In: IEEE international conference on granular computing, pp 457–460 Tang Y, Zhang YQ (2006) Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In: IEEE international conference on granular computing, pp 457–460
55.
Zurück zum Zitat Ding Y, Tang J, Guo F (2017) Identification of protein-ligand binding sites by sequence information and ensemble classifier. J Chem Inf Model 57(12):3149–3161 Ding Y, Tang J, Guo F (2017) Identification of protein-ligand binding sites by sequence information and ensemble classifier. J Chem Inf Model 57(12):3149–3161
Metadaten
Titel
Granular multiple kernel learning for identifying RNA-binding protein residues via integrating sequence and structure information
verfasst von
Chao Yang
Yijie Ding
Qiaozhen Meng
Jijun Tang
Fei Guo
Publikationsdatum
19.01.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 17/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05573-4

Weitere Artikel der Ausgabe 17/2021

Neural Computing and Applications 17/2021 Zur Ausgabe

Premium Partner