Top

Published in:

2019 | OriginalPaper | Chapter

Improving Identification of Essential Proteins by a Novel Ensemble Method

Authors : Wei Dai, Xia Li, Wei Peng, Jurong Song, Jiancheng Zhong, Jianxin Wang

Published in: Bioinformatics Research and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Essential proteins are indispensable for cell survival, and the identification of essential proteins plays a critical role in biological and pharmaceutical design research. Recently, some machine learning methods have been proposed by introducing effective protein features or by employing powerful classifiers. Seldom of them focused on improving the prediction accuracy by designing efficient strategies to ensemble different classifiers. In this work, a novel ensemble learning framework called by Tri-ensemble was proposed to integrate different classifiers, which selected three weak classifiers and trained these classifiers by continually adding the samples that are predicted to have abnormally high or abnormally low properties by the other two classifiers. We applied Tri-ensemble on predicting the essential protein of Yeast and E.coli. The results show that our approach achieves better performance than both individual classifiers and the other ensemble learning methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Automated Hub-Protein Detection via a New Fused Similarity Measure-Based Multi-objective Clustering Framework

next chapter Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009)

Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)CrossRef

Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)CrossRef

Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014)CrossRef

Stefan, W., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223(1), 45–53 (2003)MathSciNetCrossRef

Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)CrossRef

Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)CrossRef

Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples ☆. Soc. Netw. 11(1), 1–37 (1989)MathSciNetCrossRef

Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012)CrossRef

10.

Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)MathSciNet

11.

Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013)

12.

Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012)

13.

Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002)CrossRef

14.

Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007)CrossRef

15.

Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012)CrossRef

16.

Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006)CrossRef

17.

Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009)CrossRef

18.

Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013)CrossRef

19.

Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009)CrossRef

20.

Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011)MathSciNetCrossRef

21.

Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)CrossRef

22.

Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetMATH

23.

Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998)

24.

Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016)

25.

Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)MathSciNetMATH

26.

Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)CrossRef

27.

Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004)

28.

Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998)CrossRef

29.

Saccharomyces Genome Deletion Project. http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html

30.

Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002)CrossRef

31.

Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)CrossRef

32.

Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010)

33.

Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005)CrossRef

34.

Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007)

Title: Improving Identification of Essential Proteins by a Novel Ensemble Method
Authors: Wei Dai
Xia Li
Wei Peng
Jurong Song
Jiancheng Zhong
Jianxin Wang
Publisher: Springer International Publishing
Book: Bioinformatics Research and Applications
Print ISBN: 978-3-030-20241-5

Electronic ISBN: 978-3-030-20242-2

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-20242-2_13

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner