Skip to main content

2019 | OriginalPaper | Buchkapitel

Improving Identification of Essential Proteins by a Novel Ensemble Method

verfasst von : Wei Dai, Xia Li, Wei Peng, Jurong Song, Jiancheng Zhong, Jianxin Wang

Erschienen in: Bioinformatics Research and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Essential proteins are indispensable for cell survival, and the identification of essential proteins plays a critical role in biological and pharmaceutical design research. Recently, some machine learning methods have been proposed by introducing effective protein features or by employing powerful classifiers. Seldom of them focused on improving the prediction accuracy by designing efficient strategies to ensemble different classifiers. In this work, a novel ensemble learning framework called by Tri-ensemble was proposed to integrate different classifiers, which selected three weak classifiers and trained these classifiers by continually adding the samples that are predicted to have abnormally high or abnormally low properties by the other two classifiers. We applied Tri-ensemble on predicting the essential protein of Yeast and E.coli. The results show that our approach achieves better performance than both individual classifiers and the other ensemble learning methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009) Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009)
2.
Zurück zum Zitat Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)CrossRef Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)CrossRef
3.
Zurück zum Zitat Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)CrossRef Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)CrossRef
4.
Zurück zum Zitat Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014)CrossRef Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014)CrossRef
6.
Zurück zum Zitat Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)CrossRef Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)CrossRef
7.
Zurück zum Zitat Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)CrossRef Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)CrossRef
8.
9.
Zurück zum Zitat Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012)CrossRef Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012)CrossRef
10.
Zurück zum Zitat Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)MathSciNet Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)MathSciNet
11.
Zurück zum Zitat Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013) Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013)
12.
Zurück zum Zitat Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012) Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012)
13.
Zurück zum Zitat Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002)CrossRef Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002)CrossRef
14.
Zurück zum Zitat Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007)CrossRef Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007)CrossRef
15.
Zurück zum Zitat Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012)CrossRef Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012)CrossRef
16.
Zurück zum Zitat Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006)CrossRef Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006)CrossRef
17.
Zurück zum Zitat Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009)CrossRef Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009)CrossRef
18.
Zurück zum Zitat Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013)CrossRef Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013)CrossRef
19.
Zurück zum Zitat Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009)CrossRef Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009)CrossRef
20.
Zurück zum Zitat Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011)MathSciNetCrossRef Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011)MathSciNetCrossRef
21.
Zurück zum Zitat Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)CrossRef Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)CrossRef
23.
Zurück zum Zitat Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998) Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998)
24.
Zurück zum Zitat Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016) Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016)
26.
Zurück zum Zitat Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)CrossRef Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)CrossRef
27.
Zurück zum Zitat Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004) Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004)
28.
Zurück zum Zitat Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998)CrossRef Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998)CrossRef
30.
Zurück zum Zitat Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002)CrossRef Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002)CrossRef
31.
Zurück zum Zitat Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)CrossRef Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)CrossRef
32.
Zurück zum Zitat Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010) Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010)
33.
Zurück zum Zitat Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005)CrossRef Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005)CrossRef
34.
Zurück zum Zitat Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007) Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007)
Metadaten
Titel
Improving Identification of Essential Proteins by a Novel Ensemble Method
verfasst von
Wei Dai
Xia Li
Wei Peng
Jurong Song
Jiancheng Zhong
Jianxin Wang
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-20242-2_13