Skip to main content
Top

2023 | OriginalPaper | Chapter

Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data

Authors : Alvaro David Orjuela-Cañon, Diana C. Rodriguez, Oscar Perdomo

Published in: Applications of Computational Intelligence

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Martino, A., Giuliani, A., Rizzi, A.: Granular computing techniques for bioinformatics pattern recognition problems in non-metric spaces. In: Pedrycz, W., Chen, SM. (eds.) Computational Intelligence for Pattern Recognition. Studies in Computational Intelligence, vol. 777, pp. 53–81. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89629-8_3 Martino, A., Giuliani, A., Rizzi, A.: Granular computing techniques for bioinformatics pattern recognition problems in non-metric spaces. In: Pedrycz, W., Chen, SM. (eds.) Computational Intelligence for Pattern Recognition. Studies in Computational Intelligence, vol. 777, pp. 53–81. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-89629-8_​3
2.
go back to reference Dash, S., Subudhi, B.: Handbook of research on computational intelligence applications in bioinformatics. IGI Global (2016) Dash, S., Subudhi, B.: Handbook of research on computational intelligence applications in bioinformatics. IGI Global (2016)
3.
go back to reference Jumper, J., et al.: Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021) Jumper, J., et al.: Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)
4.
go back to reference Wei, G.-W.: Protein structure prediction beyond AlphaFold. Nat. Mach. Intell. 1, 336–337 (2019)CrossRef Wei, G.-W.: Protein structure prediction beyond AlphaFold. Nat. Mach. Intell. 1, 336–337 (2019)CrossRef
5.
go back to reference Auer, G.K., Weibel, D.B.: Bacterial cell mechanics. Biochemistry 56, 3710–3724 (2017)CrossRef Auer, G.K., Weibel, D.B.: Bacterial cell mechanics. Biochemistry 56, 3710–3724 (2017)CrossRef
6.
go back to reference Nevo-Dinur, K., Govindarajan, S., Amster-Choder, O.: Subcellular localization of RNA and proteins in prokaryotes. Trends Genet. 28, 314–322 (2012)CrossRef Nevo-Dinur, K., Govindarajan, S., Amster-Choder, O.: Subcellular localization of RNA and proteins in prokaryotes. Trends Genet. 28, 314–322 (2012)CrossRef
7.
go back to reference Branden, C.I., Tooze, J.: Introduction to protein structure. Garland Science (2012) Branden, C.I., Tooze, J.: Introduction to protein structure. Garland Science (2012)
8.
go back to reference Burley, S.K., Berman, H.M., Kleywegt, G.J., Markley, J.L., Nakamura, H., Velankar, S.: Protein data bank (PDB): the single global macromolecular structure archive. Protein Crystallogr. 627–641 (2017) Burley, S.K., Berman, H.M., Kleywegt, G.J., Markley, J.L., Nakamura, H., Velankar, S.: Protein data bank (PDB): the single global macromolecular structure archive. Protein Crystallogr. 627–641 (2017)
9.
go back to reference Hassanien, A.E., Al-Shammari, E.T., Ghali, N.I.: Computational intelligence techniques in bioinformatics. Comput. Biol. Chem. 47, 37–47 (2013)CrossRef Hassanien, A.E., Al-Shammari, E.T., Ghali, N.I.: Computational intelligence techniques in bioinformatics. Comput. Biol. Chem. 47, 37–47 (2013)CrossRef
10.
go back to reference Liu, M., Chen, X.: Computational intelligence and bioinformatics. Comput. Intell. 2, 234 (2015) Liu, M., Chen, X.: Computational intelligence and bioinformatics. Comput. Intell. 2, 234 (2015)
11.
go back to reference Jamal, S., Khubaib, M., Gangwar, R., Grover, S., Grover, A., Hasnain, S.E.: Artificial Intelligence and Machine learning based prediction of resistant and susceptible mutations in Mycobacterium tuberculosis. Sci. Rep. 10, 1–16 (2020) Jamal, S., Khubaib, M., Gangwar, R., Grover, S., Grover, A., Hasnain, S.E.: Artificial Intelligence and Machine learning based prediction of resistant and susceptible mutations in Mycobacterium tuberculosis. Sci. Rep. 10, 1–16 (2020)
12.
go back to reference Grønning, A.G.B., et al.: DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning. Nucleic Acids Res. 48, 7099–7118 (2020) Grønning, A.G.B., et al.: DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning. Nucleic Acids Res. 48, 7099–7118 (2020)
13.
go back to reference Orjuela-Cañón, A.D., Figueroa-García, J.C., Neruda, R.: Automated machine learning strategies to damage identification of neurofibromatosis mutations. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1341–1344 (2021) Orjuela-Cañón, A.D., Figueroa-García, J.C., Neruda, R.: Automated machine learning strategies to damage identification of neurofibromatosis mutations. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1341–1344 (2021)
14.
go back to reference Wang, X.-F., Gao, P., Liu, Y.-F., Li, H.-F., Lu, F.: Predicting thermophilic proteins by machine learning. Curr. Bioinform. 15, 493–502 (2020) Wang, X.-F., Gao, P., Liu, Y.-F., Li, H.-F., Lu, F.: Predicting thermophilic proteins by machine learning. Curr. Bioinform. 15, 493–502 (2020)
15.
go back to reference Kerepesi, C., Daróczy, B., Sturm, Á., Vellai, T., Benczúr, A.: Prediction and characterization of human ageing-related proteins by using machine learning. Sci. Rep. 8, 1–13 (2018)CrossRef Kerepesi, C., Daróczy, B., Sturm, Á., Vellai, T., Benczúr, A.: Prediction and characterization of human ageing-related proteins by using machine learning. Sci. Rep. 8, 1–13 (2018)CrossRef
16.
go back to reference Bonetta, R., Valentino, G.: Machine learning techniques for protein function prediction. Proteins Struct. Funct. Bioinform. 88, 397–413 (2020)CrossRef Bonetta, R., Valentino, G.: Machine learning techniques for protein function prediction. Proteins Struct. Funct. Bioinform. 88, 397–413 (2020)CrossRef
17.
go back to reference Wan, S., Mak, M.-W.: Machine learning for protein subcellular localization prediction. In: Machine Learning for Protein Subcellular Localization Prediction. De Gruyter (2015) Wan, S., Mak, M.-W.: Machine learning for protein subcellular localization prediction. In: Machine Learning for Protein Subcellular Localization Prediction. De Gruyter (2015)
18.
go back to reference Liu, M.-L., et al.: An overview on predicting protein subchloroplast localization by using machine learning methods. Curr. Protein Pept. Sci. 21, 1229–1241 (2020)CrossRef Liu, M.-L., et al.: An overview on predicting protein subchloroplast localization by using machine learning methods. Curr. Protein Pept. Sci. 21, 1229–1241 (2020)CrossRef
19.
go back to reference Yang, W., Zhu, X.-J., Huang, J., Ding, H., Lin, H.: A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform. 14, 234–240 (2019)CrossRef Yang, W., Zhu, X.-J., Huang, J., Ding, H., Lin, H.: A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform. 14, 234–240 (2019)CrossRef
20.
go back to reference Vila, J., et al.: Escherichia coli: an old friend with new tidings. FEMS Microbiol. Rev. 40, 437–463 (2016)CrossRef Vila, J., et al.: Escherichia coli: an old friend with new tidings. FEMS Microbiol. Rev. 40, 437–463 (2016)CrossRef
21.
go back to reference Keseler, I.M., et al.: EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res. 39, D583–D590 (2010)CrossRef Keseler, I.M., et al.: EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res. 39, D583–D590 (2010)CrossRef
22.
go back to reference Allocati, N., Masulli, M., Alexeyev, M.F., Di Ilio, C.: Escherichia coli in Europe: an overview. Int. J. Environ. Res. Public Health. 10, 6235–6254 (2013)CrossRef Allocati, N., Masulli, M., Alexeyev, M.F., Di Ilio, C.: Escherichia coli in Europe: an overview. Int. J. Environ. Res. Public Health. 10, 6235–6254 (2013)CrossRef
23.
go back to reference Yu, C.-S., Chen, Y.-C., Lu, C.-H., Hwang, J.-K.: Prediction of protein subcellular localization. Proteins Struct. Funct. Bioinform. 64, 643–651 (2006)CrossRef Yu, C.-S., Chen, Y.-C., Lu, C.-H., Hwang, J.-K.: Prediction of protein subcellular localization. Proteins Struct. Funct. Bioinform. 64, 643–651 (2006)CrossRef
25.
go back to reference Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Bioinform. 11, 95–110 (1991)CrossRef Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Bioinform. 11, 95–110 (1991)CrossRef
27.
go back to reference Anam, K., Al-Jumaily, A.: Evaluation of extreme learning machine for classification of individual and combined finger movements using electromyography on amputees and non-amputees. Neural Netw. 85, 51–68 (2017)CrossRef Anam, K., Al-Jumaily, A.: Evaluation of extreme learning machine for classification of individual and combined finger movements using electromyography on amputees and non-amputees. Neural Netw. 85, 51–68 (2017)CrossRef
28.
go back to reference Haykin, S.: Neural Networks and Learning Machines. Pearson, London (2009) Haykin, S.: Neural Networks and Learning Machines. Pearson, London (2009)
29.
go back to reference Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13 (2012) Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13 (2012)
30.
go back to reference Seixas, J.M., Calôba, L.P., Delpino, I.: Relevance criteria for variable selection in classifier designs. In: International Conference on Engineering Applications of Neural Networks, pp. 451–454 (1996) Seixas, J.M., Calôba, L.P., Delpino, I.: Relevance criteria for variable selection in classifier designs. In: International Conference on Engineering Applications of Neural Networks, pp. 451–454 (1996)
31.
go back to reference Horton, P., Nakai, K.: A probabilistic classification system for predicting the cellular localization sites of proteins. In: Ismb, pp. 109–115 (1996) Horton, P., Nakai, K.: A probabilistic classification system for predicting the cellular localization sites of proteins. In: Ismb, pp. 109–115 (1996)
32.
go back to reference Tiwari, A.K., Srivastava, R.: A survey of computational intelligence techniques in protein function prediction. Int. J. Proteomics 2014 (2014) Tiwari, A.K., Srivastava, R.: A survey of computational intelligence techniques in protein function prediction. Int. J. Proteomics 2014 (2014)
33.
go back to reference Zakeri, P., Moshiri, B., Sadeghi, M.: Prediction of protein submitochondria locations based on data fusion of various features of sequences. J. Theor. Biol. 269, 208–216 (2011)CrossRefMATH Zakeri, P., Moshiri, B., Sadeghi, M.: Prediction of protein submitochondria locations based on data fusion of various features of sequences. J. Theor. Biol. 269, 208–216 (2011)CrossRefMATH
Metadata
Title
Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data
Authors
Alvaro David Orjuela-Cañon
Diana C. Rodriguez
Oscar Perdomo
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-29783-0_3

Premium Partner