Skip to main content
Top
Published in: Neural Computing and Applications 17/2020

13-03-2020 | Original Article

SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting

Authors: Minghui Wang, Xiaowen Cui, Bin Yu, Cheng Chen, Qin Ma, Hongyan Zhou

Published in: Neural Computing and Applications | Issue 17/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Protein cysteine S-sulfenylation is an essential and reversible post-translational modification that plays a crucial role in transcriptional regulation, stress response, cell signaling and protein function. Studies have shown that S-sulfenylation is involved in many human diseases such as cancer, diabetes and arteriosclerosis. However, experimental identification of protein S-sulfenylation sites is generally expensive and time-consuming. In this study, we proposed a new protein S-sulfenylation sites prediction method SulSite-GTB. First, fusion of amino acid composition, dipeptide composition, encoding based on grouped weight, K nearest neighbors, position-specific amino acid propensity, position-weighted amino acid composition and pseudo-position specific score matrix feature extraction to obtain the initial feature space. Secondly, we use the synthetic minority oversampling technique (SMOTE) algorithm to process the class imbalance data, and the least absolute shrinkage and selection operator (LASSO) are employed to remove the redundant and irrelevant features. Finally, the optimal feature subset is input into the gradient tree boosting classifier to predict the S-sulfenylation sites, and the five-fold cross-validation and independent test set method are used to evaluate the prediction performance of the model. Experimental results showed the overall prediction accuracy is 92.86% and 88.53%, respectively, and the AUC values are 0.9706 and 0.9425, respectively, on the training set and the independent test set. Compared with other prediction methods, the results show that the proposed method SulSite-GTB is significantly superior to other state-of-the-art methods and provides a new idea for the prediction of post-translational modification sites of other proteins. The source code and all datasets are available at https://​github.​com/​QUST-AIBBDRC/​SulSite-GTB/​.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Matthias M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21:255–261 Matthias M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21:255–261
2.
go back to reference Wei W, Liu Q, Yi T, Liu L, Li X, Lu C (2009) Oxidative stress, diabetes, and diabetic complications. Hemoglobin 33:370–377 Wei W, Liu Q, Yi T, Liu L, Li X, Lu C (2009) Oxidative stress, diabetes, and diabetic complications. Hemoglobin 33:370–377
3.
go back to reference Prabhu L, Hartley AV, Martin M, Warsame F, Sun E, Tao L (2015) Role of post-translational modification of the Y box binding protein 1 in human cancers. Genes Dis 2:240–246 Prabhu L, Hartley AV, Martin M, Warsame F, Sun E, Tao L (2015) Role of post-translational modification of the Y box binding protein 1 in human cancers. Genes Dis 2:240–246
4.
go back to reference Paulsen CE, Carroll KS (2013) Cysteine-mediated redox signaling: chemistry, biology, and tools for discovery. Chem Rev 113:4633–4679 Paulsen CE, Carroll KS (2013) Cysteine-mediated redox signaling: chemistry, biology, and tools for discovery. Chem Rev 113:4633–4679
5.
go back to reference Paulsen CE, Truong TH, Garcia FJ, Homann A, Gupta V, Leonard SE, Carroll KS (2012) Peroxide-dependent sulfenylation of the EGFR catalytic site enhances kinase activity. Nat Chem Biol 8:57–64 Paulsen CE, Truong TH, Garcia FJ, Homann A, Gupta V, Leonard SE, Carroll KS (2012) Peroxide-dependent sulfenylation of the EGFR catalytic site enhances kinase activity. Nat Chem Biol 8:57–64
6.
go back to reference Yang J, Gupta V, Carroll KS, Liebler DC (2014) Site-specific mapping and quantification of protein S-sulphenylation in cells. Nat Commun 5:4776 Yang J, Gupta V, Carroll KS, Liebler DC (2014) Site-specific mapping and quantification of protein S-sulphenylation in cells. Nat Commun 5:4776
7.
go back to reference Leonard SE, Carroll KS (2011) Chemical ‘omics’ approaches to understanding protein cysteine oxidation in biology. Curr Opin Chem Biol 15:88–102 Leonard SE, Carroll KS (2011) Chemical ‘omics’ approaches to understanding protein cysteine oxidation in biology. Curr Opin Chem Biol 15:88–102
8.
go back to reference Poole LB, Nelson KJ (2008) Discovering mechanisms of signaling-mediated cysteine oxidation. Curr Opin Chem Biol 12:18–24 Poole LB, Nelson KJ (2008) Discovering mechanisms of signaling-mediated cysteine oxidation. Curr Opin Chem Biol 12:18–24
9.
go back to reference Revati W, Jiang Q, Leimiao Y, Erika BS, Bruce K, Poole LB, Eunok P, Tsang AW, Furdui CM (2011) Isoform-specific regulation of Akt by PDGF-induced reactive oxygen species. Proc Natl Acad Sci 108:10550–10555 Revati W, Jiang Q, Leimiao Y, Erika BS, Bruce K, Poole LB, Eunok P, Tsang AW, Furdui CM (2011) Isoform-specific regulation of Akt by PDGF-induced reactive oxygen species. Proc Natl Acad Sci 108:10550–10555
10.
go back to reference Goedele R, Joris M (2011) Protein sulfenic acid formation: from cellular damage to redox regulation. Free Radic Biol Med 51:314–326 Goedele R, Joris M (2011) Protein sulfenic acid formation: from cellular damage to redox regulation. Free Radic Biol Med 51:314–326
11.
go back to reference Leonard SE, Reddie KG, Carroll KS (2009) Mining the thiol proteome for sulfenic acid modifications reveals new targets for oxidation in cells. ACS Chem Biol 4:783–799 Leonard SE, Reddie KG, Carroll KS (2009) Mining the thiol proteome for sulfenic acid modifications reveals new targets for oxidation in cells. ACS Chem Biol 4:783–799
12.
go back to reference Chen Z, Liu XH, Li FY, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu DK, Smith AI, Li L, Chou KC, Song JN (2018) Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 20:2267–2290 Chen Z, Liu XH, Li FY, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu DK, Smith AI, Li L, Chou KC, Song JN (2018) Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 20:2267–2290
13.
go back to reference Weng SL, Kao HJ, Huang CH, Lee TY (2017) MDD-Palm: identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition. PLoS ONE 12:e0179529 Weng SL, Kao HJ, Huang CH, Lee TY (2017) MDD-Palm: identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition. PLoS ONE 12:e0179529
14.
go back to reference Cui XW, Yu ZM, Yu B, Wang MH, Tian BG, Ma Q (2019) UbiSitePred: a novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou’s pseudo components. Chemometr Intell Lab Syst 184:28–43 Cui XW, Yu ZM, Yu B, Wang MH, Tian BG, Ma Q (2019) UbiSitePred: a novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou’s pseudo components. Chemometr Intell Lab Syst 184:28–43
15.
go back to reference Chen YJ, Lu CT, Huang KY, Wu HY, Chen YJ, Lee TY (2015) GSHSite: exploiting an iteratively statistical method to identify S-glutathionylation sites with substrate specificity. PLoS ONE 10:e0118752 Chen YJ, Lu CT, Huang KY, Wu HY, Chen YJ, Lee TY (2015) GSHSite: exploiting an iteratively statistical method to identify S-glutathionylation sites with substrate specificity. PLoS ONE 10:e0118752
16.
go back to reference Xie YB, Luo X, Li Y, Chen L, Ma W, Huang J, Cui J, Zhao Y, Xue Y, Zuo Z (2018) DeepNitro: prediction of protein nitration and nitrosylation sites by deep learning. Genom Proteom Bioinform 16:294–306 Xie YB, Luo X, Li Y, Chen L, Ma W, Huang J, Cui J, Zhao Y, Xue Y, Zuo Z (2018) DeepNitro: prediction of protein nitration and nitrosylation sites by deep learning. Genom Proteom Bioinform 16:294–306
17.
go back to reference Wuyun Q, Zheng W, Zhang Y, Ruan J, Hu G (2016) Improved species-specific lysine acetylation site prediction based on a large variety of features set. PLoS ONE 11:e0155370 Wuyun Q, Zheng W, Zhang Y, Ruan J, Hu G (2016) Improved species-specific lysine acetylation site prediction based on a large variety of features set. PLoS ONE 11:e0155370
18.
go back to reference Cai Y, Hu L, Shi X, Xie L, Li Y (2012) Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42:1387–1395 Cai Y, Hu L, Shi X, Xie L, Li Y (2012) Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42:1387–1395
19.
go back to reference Wen PP, Shi SP, Xu HD, Wang LN, Qiu JD (2016) Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics 32:3107–3115 Wen PP, Shi SP, Xu HD, Wang LN, Qiu JD (2016) Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics 32:3107–3115
20.
go back to reference Zhao XW, Zhao XS, Bao LL, Zhang YG, Dai JY, Yin MH (2017) Glypre: in silico prediction of protein glycation sites by fusing multiple features and support vector machine. Molecules 22:1891 Zhao XW, Zhao XS, Bao LL, Zhang YG, Dai JY, Yin MH (2017) Glypre: in silico prediction of protein glycation sites by fusing multiple features and support vector machine. Molecules 22:1891
21.
go back to reference Yu JL, Shi SP, Zhang F, Chen GD, Cao M (2019) PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics 35:2749–2756 Yu JL, Shi SP, Zhang F, Chen GD, Cao M (2019) PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics 35:2749–2756
22.
go back to reference Ning Q, Zhao X, Bao L, Ma Z, Zhao X (2018) Detecting succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinform 19:237 Ning Q, Zhao X, Bao L, Ma Z, Zhao X (2018) Detecting succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinform 19:237
23.
go back to reference Zuo Y, Jia CZ (2017) CarSite: identify carbonylated sites of human proteins based on a one-sided selection resampling method. Mol Biosyst 13:2362–2369 Zuo Y, Jia CZ (2017) CarSite: identify carbonylated sites of human proteins based on a one-sided selection resampling method. Mol Biosyst 13:2362–2369
24.
go back to reference Hu J, He X, Yu DJ, Yang XB, Yang JY, Shen HB (2014) A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS ONE 9:e107676 Hu J, He X, Yu DJ, Yang XB, Yang JY, Shen HB (2014) A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS ONE 9:e107676
25.
go back to reference Jia CZ, Zuo Y (2017) S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 422:84–89 Jia CZ, Zuo Y (2017) S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 422:84–89
26.
go back to reference Johansen MB, Kiemer L, Brunak S (2006) Analysis and prediction of mammalian protein glycation. Glycobiology 16:844–853 Johansen MB, Kiemer L, Brunak S (2006) Analysis and prediction of mammalian protein glycation. Glycobiology 16:844–853
27.
go back to reference Khan YD, Rasool N, Hussain W, Khan SA, Chou KC (2018) iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Mol Biol Rep 45:2501–2509 Khan YD, Rasool N, Hussain W, Khan SA, Chou KC (2018) iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Mol Biol Rep 45:2501–2509
28.
go back to reference Hou T, Zheng GY, Zhang PY, Jia J, Li J, Xie L, Wei CC, Li YX (2014) LAceP: lysine acetylation site prediction using logistic regression classifiers. PLoS One 9:e89575 Hou T, Zheng GY, Zhang PY, Jia J, Li J, Xie L, Wei CC, Li YX (2014) LAceP: lysine acetylation site prediction using logistic regression classifiers. PLoS One 9:e89575
29.
go back to reference Li FY, Li C, Marquez-Lago TT, Leier A, Akutsu T, Purcell AW, Smith AI, Lithgow T, Daly RJ, Song J (2018) Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome. Bioinformatics 34:4223–4231 Li FY, Li C, Marquez-Lago TT, Leier A, Akutsu T, Purcell AW, Smith AI, Lithgow T, Daly RJ, Song J (2018) Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome. Bioinformatics 34:4223–4231
30.
go back to reference Li Y, Wang M, Wang H, Tan H, Zhang Z, Webb GI, Song J (2014) Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. Sci Rep 4:5765 Li Y, Wang M, Wang H, Tan H, Zhang Z, Webb GI, Song J (2014) Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. Sci Rep 4:5765
31.
go back to reference Qiu WR, Sun BQ, Tang H, Huang J, Lin H (2017) Identify and analysis crotonylation sites in histone by using support vector machines. Artif Intell Med 83:75–81 Qiu WR, Sun BQ, Tang H, Huang J, Lin H (2017) Identify and analysis crotonylation sites in histone by using support vector machines. Artif Intell Med 83:75–81
32.
go back to reference Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC (2017) iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 110:239–246 Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC (2017) iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 110:239–246
33.
go back to reference Wei L, Xing P, Shi G, Ji ZL, Zou Q (2017) Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans Comput Biol Bioinform 16:1264–1273 Wei L, Xing P, Shi G, Ji ZL, Zou Q (2017) Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans Comput Biol Bioinform 16:1264–1273
34.
go back to reference Luo FL, Wang MH, Liu Y, Zhao XM, Li A (2019) DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 33:2766–2773 Luo FL, Wang MH, Liu Y, Zhao XM, Li A (2019) DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 33:2766–2773
35.
go back to reference He F, Wang R, Li J, Bao L, Xu D, Zhao X (2018) Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture. BMC Syst Biol 12:109 He F, Wang R, Li J, Bao L, Xu D, Zhao X (2018) Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture. BMC Syst Biol 12:109
36.
go back to reference Bui VM, Lu CT, Ho TT, Lee TY (2015) MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs. Bioinformatics 32:165–172 Bui VM, Lu CT, Ho TT, Lee TY (2015) MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs. Bioinformatics 32:165–172
37.
go back to reference Bui VM, Weng SL, Lu CT, Chang TH, Weng TY, Lee TY (2016) SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. BMC Genom 17:9 Bui VM, Weng SL, Lu CT, Chang TH, Weng TY, Lee TY (2016) SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. BMC Genom 17:9
38.
go back to reference Xu Y, Ding J, Wu LY (2016) iSulf-Cys: prediction of S-sulfenylation sites in proteins with physicochemical properties of amino acids. PLoS One 11:e0154237 Xu Y, Ding J, Wu LY (2016) iSulf-Cys: prediction of S-sulfenylation sites in proteins with physicochemical properties of amino acids. PLoS One 11:e0154237
39.
go back to reference Sakka M, Tzortzis G, Mantzaris MD, Bekas N, Kellici TF, Likas A, Galaris D, Gerothanassis IP, Tzakos AG (2016) PRESS: protein S-sulfenylation server. Bioinformatics 32:2710–2712 Sakka M, Tzortzis G, Mantzaris MD, Bekas N, Kellici TF, Likas A, Galaris D, Gerothanassis IP, Tzakos AG (2016) PRESS: protein S-sulfenylation server. Bioinformatics 32:2710–2712
40.
go back to reference Wang XF, Yan RX, Li JY, Song J (2016) SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. Mol Biosyst 12:2849–2858 Wang XF, Yan RX, Li JY, Song J (2016) SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. Mol Biosyst 12:2849–2858
41.
go back to reference Hasan MM, Guo D, Kurata H (2017) Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. Mol Biosyst 13:2545–2550 Hasan MM, Guo D, Kurata H (2017) Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. Mol Biosyst 13:2545–2550
42.
go back to reference Deng L, Xu XJ, Liu H (2018) PredCSO: an ensemble method for prediction of S-sulfenylation sites in proteins. Mol Omics 14:257–265 Deng L, Xu XJ, Liu H (2018) PredCSO: an ensemble method for prediction of S-sulfenylation sites in proteins. Mol Omics 14:257–265
43.
go back to reference Ju Z, Wang SY (2018) Prediction of S-sulfenylation sites using mRMR feature selection and fuzzy support vector machine algorithm. J Theor Biol 457:6–13MATH Ju Z, Wang SY (2018) Prediction of S-sulfenylation sites using mRMR feature selection and fuzzy support vector machine algorithm. J Theor Biol 457:6–13MATH
44.
go back to reference Wang L, Zhang R, Mu Y (2019) Fu-SulfPred: identification of Protein S-sulfenylation Sites by Fusing Forests via Chou’s General PseAAC. J Theor Biol 461:51–58MATH Wang L, Zhang R, Mu Y (2019) Fu-SulfPred: identification of Protein S-sulfenylation Sites by Fusing Forests via Chou’s General PseAAC. J Theor Biol 461:51–58MATH
45.
go back to reference Sun MA, Wang Y, Cheng H, Zhang Q, Ge W, Guo D (2012) RedoxDB-a curated database for experimentally verified protein oxidative modification. Bioinformatics 28:2551–2552 Sun MA, Wang Y, Cheng H, Zhang Q, Ge W, Guo D (2012) RedoxDB-a curated database for experimentally verified protein oxidative modification. Bioinformatics 28:2551–2552
46.
go back to reference Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659 Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659
47.
go back to reference Du XQ, Sun SW, Hu CJ, Yao Y, Yan YT, Zhang YP (2017) DeepPPI: boosting prediction of protein-protein interactions with deep neural networks. J Chem Inf Model 57:1499–1510 Du XQ, Sun SW, Hu CJ, Yao Y, Yan YT, Zhang YP (2017) DeepPPI: boosting prediction of protein-protein interactions with deep neural networks. J Chem Inf Model 57:1499–1510
48.
go back to reference Manoj B, Raghava GPS (2004) Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 279:23262–23266 Manoj B, Raghava GPS (2004) Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 279:23262–23266
49.
go back to reference Khan A, Majid A, Hayat M (2011) CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. Comput Biol Chem 35:218–229MathSciNetMATH Khan A, Majid A, Hayat M (2011) CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. Comput Biol Chem 35:218–229MathSciNetMATH
50.
go back to reference Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–6174 Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–6174
51.
go back to reference Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Wang Y, Webb GI, Smith AI, Daly RJ, Chou KC (2018) iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34:2499–2502 Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Wang Y, Webb GI, Smith AI, Daly RJ, Chou KC (2018) iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34:2499–2502
52.
go back to reference Tang YR, Chen YZ, Canchaya CA, Zhang ZD (2007) GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng Des Sel 20:405–412 Tang YR, Chen YZ, Canchaya CA, Zhang ZD (2007) GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng Des Sel 20:405–412
53.
go back to reference Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202 Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
54.
go back to reference Yu B, Li S, Qiu WY, Chen C, Chen RX, Wang L, Wang MH, Zhang Y (2017) Accurate prediction of subcellular location of apoptosis proteins combining Chou’s PseAAC and PsePSSM based on wavelet denoising. Oncotarget 8:107640–107665 Yu B, Li S, Qiu WY, Chen C, Chen RX, Wang L, Wang MH, Zhang Y (2017) Accurate prediction of subcellular location of apoptosis proteins combining Chou’s PseAAC and PsePSSM based on wavelet denoising. Oncotarget 8:107640–107665
55.
go back to reference Yu B, Li S, Qiu WY, Wang MH, Du JW, Zhang YS, Chen X (2018) Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genom 19:478 Yu B, Li S, Qiu WY, Wang MH, Du JW, Zhang YS, Chen X (2018) Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genom 19:478
56.
go back to reference Qiu WY, Li S, Cui XW, Yu ZM, Wang MH, Du JW, Peng YJ, Yu B (2018) Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J Theor Biol 450:86–103MathSciNetMATH Qiu WY, Li S, Cui XW, Yu ZM, Wang MH, Du JW, Peng YJ, Yu B (2018) Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J Theor Biol 450:86–103MathSciNetMATH
57.
go back to reference Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
58.
go back to reference Liu TG, Geng XB, Zheng XQ, Li RS, Wang J (2012) Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids 42:2243–2249 Liu TG, Geng XB, Zheng XQ, Li RS, Wang J (2012) Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids 42:2243–2249
59.
go back to reference Shen HB, Chou KC (2007) Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng Des Sel 20:561–567 Shen HB, Chou KC (2007) Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng Des Sel 20:561–567
60.
go back to reference Huang SY, Shi SP, Qiu JD, Liu MC (2015) Using support vector machines to identify protein phosphorylation sites in viruses. J Mol Graph Model 56:84–90 Huang SY, Shi SP, Qiu JD, Liu MC (2015) Using support vector machines to identify protein phosphorylation sites in viruses. J Mol Graph Model 56:84–90
61.
go back to reference Shi SP, Qiu JD, Sun XY, Suo SB, Huang SY, Liang RP (2012) PMeS: prediction of methylation sites based on enhanced feature encoding scheme. PLoS One 7:e38772 Shi SP, Qiu JD, Sun XY, Suo SB, Huang SY, Liang RP (2012) PMeS: prediction of methylation sites based on enhanced feature encoding scheme. PLoS One 7:e38772
62.
go back to reference Shi SP, Qiu JD, Sun XY, Suo SB, Huang SY, Liang RP (2012) A method to distinguish between lysine acetylation and lysine methylation from protein sequences. J Theor Biol 310:223–230MATH Shi SP, Qiu JD, Sun XY, Suo SB, Huang SY, Liang RP (2012) A method to distinguish between lysine acetylation and lysine methylation from protein sequences. J Theor Biol 310:223–230MATH
63.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
64.
go back to reference Wang XY, Yu B, Ma AJ, Chen C, Liu BQ, Ma Q (2019) Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics 35:2395–2402 Wang XY, Yu B, Ma AJ, Chen C, Liu BQ, Ma Q (2019) Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics 35:2395–2402
65.
go back to reference Shi H, Liu SM, Chen JQ, Li X, Ma Q, Yu B (2019) Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 111:1839–1852 Shi H, Liu SM, Chen JQ, Li X, Ma Q, Yu B (2019) Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 111:1839–1852
66.
go back to reference Yu B, Qiu WY, Chen C, Ma AJ, Jiang J, Zhou HY, Ma Q (2020) SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36:1074–1081 Yu B, Qiu WY, Chen C, Ma AJ, Jiang J, Zhou HY, Ma Q (2020) SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36:1074–1081
67.
go back to reference Kang CZ, Huo YH, Xin LH, Tian BG, Yu B (2019) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91MathSciNetMATH Kang CZ, Huo YH, Xin LH, Tian BG, Yu B (2019) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91MathSciNetMATH
68.
69.
go back to reference Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232MathSciNetMATH Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232MathSciNetMATH
70.
go back to reference Liu Y, Gu Y, Nguyen JC, Li H, Zhang J, Gao Y, Huang Y (2017) Symptom severity classification with gradient tree boosting. J Biomed Inform 75:105–111 Liu Y, Gu Y, Nguyen JC, Li H, Zhang J, Gao Y, Huang Y (2017) Symptom severity classification with gradient tree boosting. J Biomed Inform 75:105–111
71.
go back to reference Pan Y, Liu D, Deng L (2017) Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties. PLoS One 12:e0179314 Pan Y, Liu D, Deng L (2017) Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties. PLoS One 12:e0179314
72.
go back to reference Fan C, Liu D, Huang R, Chen Z, Deng L (2016) PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinform 17:8 Fan C, Liu D, Huang R, Chen Z, Deng L (2016) PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinform 17:8
73.
go back to reference Yu B, Li S, Chen C, Xu JM, Qiu WY, Wu X, Chen RX (2017) Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou’s pseudo amino acid composition. Chemometr Intell Lab 167:102–112 Yu B, Li S, Chen C, Xu JM, Qiu WY, Wu X, Chen RX (2017) Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou’s pseudo amino acid composition. Chemometr Intell Lab 167:102–112
74.
go back to reference Chen C, Zhang QM, Ma Q, Yu B (2019) LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion. Chemometr Intell Lab Syst 191:54–64 Chen C, Zhang QM, Ma Q, Yu B (2019) LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion. Chemometr Intell Lab Syst 191:54–64
75.
go back to reference Vladimir V, Iakoucheva LM, Predrag R (2006) Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22:1536–1537 Vladimir V, Iakoucheva LM, Predrag R (2006) Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22:1536–1537
76.
go back to reference Yu B, Lou LF, Li S, Zhang YS, Qiu WY, Wu X, Wang MH, Tian BG (2017) Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 76:260–273 Yu B, Lou LF, Li S, Zhang YS, Qiu WY, Wu X, Wang MH, Tian BG (2017) Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 76:260–273
77.
go back to reference Zhu J, Zou H, Rosset S, Hastie T (2006) Multi-class adaboost. Stat Interface 2:349–360MATH Zhu J, Zou H, Rosset S, Hastie T (2006) Multi-class adaboost. Stat Interface 2:349–360MATH
78.
go back to reference Zhang H, Liu G, Chow TW, Liu W (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22:1532–1546 Zhang H, Liu G, Chow TW, Liu W (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22:1532–1546
Metadata
Title
SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting
Authors
Minghui Wang
Xiaowen Cui
Bin Yu
Cheng Chen
Qin Ma
Hongyan Zhou
Publication date
13-03-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 17/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-04792-z

Other articles of this Issue 17/2020

Neural Computing and Applications 17/2020 Go to the issue

Premium Partner