Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2019 | OriginalPaper | Chapter

Improving Identification of Essential Proteins by a Novel Ensemble Method

Authors : Wei Dai, Xia Li, Wei Peng, Jurong Song, Jiancheng Zhong, Jianxin Wang

Published in: Bioinformatics Research and Applications

Publisher: Springer International Publishing

share
SHARE

Abstract

Essential proteins are indispensable for cell survival, and the identification of essential proteins plays a critical role in biological and pharmaceutical design research. Recently, some machine learning methods have been proposed by introducing effective protein features or by employing powerful classifiers. Seldom of them focused on improving the prediction accuracy by designing efficient strategies to ensemble different classifiers. In this work, a novel ensemble learning framework called by Tri-ensemble was proposed to integrate different classifiers, which selected three weak classifiers and trained these classifiers by continually adding the samples that are predicted to have abnormally high or abnormally low properties by the other two classifiers. We applied Tri-ensemble on predicting the essential protein of Yeast and E.coli. The results show that our approach achieves better performance than both individual classifiers and the other ensemble learning methods.
Literature
1.
go back to reference Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009) Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009)
2.
go back to reference Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001) CrossRef Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001) CrossRef
3.
go back to reference Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977) CrossRef Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977) CrossRef
4.
go back to reference Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014) CrossRef Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014) CrossRef
6.
go back to reference Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009) CrossRef Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009) CrossRef
7.
go back to reference Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987) CrossRef Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987) CrossRef
8.
9.
go back to reference Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012) CrossRef Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012) CrossRef
10.
go back to reference Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005) MathSciNet Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005) MathSciNet
11.
go back to reference Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013) Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013)
12.
go back to reference Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012) Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012)
13.
go back to reference Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002) CrossRef Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002) CrossRef
14.
go back to reference Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007) CrossRef Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007) CrossRef
15.
go back to reference Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012) CrossRef Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012) CrossRef
16.
go back to reference Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006) CrossRef Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006) CrossRef
17.
go back to reference Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009) CrossRef Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009) CrossRef
18.
go back to reference Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013) CrossRef Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013) CrossRef
19.
go back to reference Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009) CrossRef Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009) CrossRef
20.
go back to reference Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011) MathSciNetCrossRef Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011) MathSciNetCrossRef
21.
go back to reference Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005) CrossRef Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005) CrossRef
23.
go back to reference Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998) Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998)
24.
go back to reference Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016) Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016)
26.
go back to reference Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005) CrossRef Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005) CrossRef
27.
go back to reference Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004) Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004)
28.
go back to reference Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998) CrossRef Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998) CrossRef
30.
go back to reference Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002) CrossRef Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002) CrossRef
31.
go back to reference Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015) CrossRef Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015) CrossRef
32.
go back to reference Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010) Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010)
33.
go back to reference Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005) CrossRef Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005) CrossRef
34.
go back to reference Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007) Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007)
Metadata
Title
Improving Identification of Essential Proteins by a Novel Ensemble Method
Authors
Wei Dai
Xia Li
Wei Peng
Jurong Song
Jiancheng Zhong
Jianxin Wang
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-20242-2_13

Premium Partner