nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

Exploring Machine Learning Algorithms and Protein Language Models Strategies to Develop Enzyme Classification Systems

verfasst von : Diego Fernández, Álvaro Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Discovering functionalities for unknown enzymes has been one of the most common bioinformatics tasks. Functional annotation methods based on phylogenetic properties have been the gold standard in every genome annotation process. However, these methods only succeed if the minimum requirements for expressing similarity or homology are met. Alternatively, machine learning and deep learning methods have proven helpful in this problem, developing functional classification systems in various bioinformatics tasks. Nevertheless, there needs to be a clear strategy for elaborating predictive models and how amino acid sequences should be represented. In this work, we address the problem of functional classification of enzyme sequences (EC number) via machine learning methods, exploring various alternatives for training predictive models and numerical representation methods. The results show that the best performances are achieved by applying representations based on pre-trained models. However, there needs to be a clear strategy to train models. Therefore, when exploring several alternatives, it is observed that the methods based on CNN architectures proposed in this work present a more outstanding facility for learning and pattern extraction in complex systems, achieving performances above 97% and with error rates lower than 0.05 of binary cross entropy. Finally, we discuss the strategies explored and analyze future work to develop integrated methods for functional classification and the discovery of new enzymes to support current bioinformatics tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Preliminary Study on the Identification of Diseases by Electrocardiography Sensors’ Data

Nächstes Kapitel A System Biology and Bioinformatics Approach to Determine the Molecular Signature, Core Ontologies, Functional Pathways, Drug Compounds in Between Stress and Type 2 Diabetes

Arakaki, A.K., Huang, Y., Skolnick, J.: EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinform. 10(1), 1–15 (2009)CrossRef

Basso, A., Serban, S.: Industrial applications of immobilized enzymes-a review. Mol. Catal. 479, 110607 (2019)CrossRef

Bonetta, R., Valentino, G.: Machine learning techniques for protein function prediction. Proteins: Struct. Function Bioinform. 88(3), 397–413 (2020)

Burley, S.K., Berman, H.M., Kleywegt, G.J., Markley, J.L., Nakamura, H., Velankar, S.: Protein data bank (PDB): the single global macromolecular structure archive. In: Protein Crystallography: Methods and Protocols, pp. 627–641 (2017)

Cadet, F., et al.: A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci. Rep. 8(1), 16757 (2018)CrossRefPubMedPubMedCentral

Cock, P.J., et al.: Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422–1423 (2009)CrossRefPubMedPubMedCentral

UniProt Consortium: Uniprot: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506–D515 (2019)

Copeland, R.A.: Enzymes: A Practical Introduction to Structure, Mechanism, and Data Analysis. Wiley, Hoboken (2023)CrossRef

Dallago, C., et al.: Learned embeddings from deep learning to visualize and predict protein sets. Curr. Protoc. 1(5), e113 (2021)CrossRefPubMed

10.

Gao, W., Mahajan, S.P., Sulam, J., Gray, J.J.: Deep learning in protein structural modeling and design. Patterns 1(9), 100142 (2020)CrossRefPubMedPubMedCentral

11.

Greener, J.G., Kandathil, S.M., Moffat, L., Jones, D.T.: A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23(1), 40–55 (2022)CrossRefPubMed

12.

Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2017)CrossRefPubMed

13.

Kanehisa, M., Sato, Y., Kawashima, M.: KEGG mapping tools for uncovering hidden features in biological data. Protein Sci. 31(1), 47–53 (2022)CrossRefPubMed

14.

Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Suppl. 1), D202–D205 (2007)

15.

Kuo, C.H., Huang, C.Y., Shieh, C.J., Dong, C.D.: Enzymes and biocatalysis (2022)

16.

Li, Y., et al.: DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018)CrossRefPubMed

17.

Luo, Y., et al.: ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12(1), 1–14 (2021)CrossRef

18.

Maeda, K., Strassel, S.M.: Annotation tools for large-scale corpus development: using AGTK at the linguistic data consortium. In: LREC (2004)

19.

Mazurenko, S., Prokop, Z., Damborsky, J.: Machine learning in enzyme engineering. ACS Catal. 10(2), 1210–1223 (2019)CrossRef

20.

Medina-Ortiz, D., et al.: Generalized property-based encoders and digital signal processing facilitate predictive tasks in protein engineering. Front. Mol. Biosci. 9 (2022)

21.

Neves, M., Ševa, J.: An extensive review of tools for manual annotation of documents. Brief. Bioinform. 22(1), 146–163 (2021)CrossRefPubMed

22.

Przepiórkowski, A.: XML text interchange format in the national corpus of polish. In: The Proceedings of Practical Applications in Language and Computers PALC 2009 (2009)

23.

Qu, K., Wei, L., Zou, Q.: A review of DNA-binding proteins prediction methods. Curr. Bioinform. 14(3), 246–254 (2019)CrossRef

24.

Quiroz, C., et al.: Peptipedia: a user-friendly web application and a comprehensive database for peptide research supported by machine learning approach. Database 2021 (2021)

25.

Rao, R., et al.: Evaluating protein transfer learning with tape. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

26.

Ryu, J.Y., Kim, H.U., Lee, S.Y.: Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. 116(28), 13996–14001 (2019)CrossRefPubMedPubMedCentral

27.

Salgado, D., et al.: MyMiner: a web application for computer-assisted biocuration and text annotation. Bioinformatics 28(17), 2285–2287 (2012)CrossRefPubMed

28.

Sapoval, N., et al.: Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 13(1), 1728 (2022)CrossRefPubMedPubMedCentral

29.

Siedhoff, N.E., Illig, A.M., Schwaneberg, U., Davari, M.D.: PyPEF-an integrated framework for data-driven protein engineering. J. Chem. Inf. Model. 61(7), 3463–3476 (2021)CrossRefPubMed

30.

Tao, Z., Dong, B., Teng, Z., Zhao, Y.: The classification of enzymes by deep learning. IEEE Access 8, 89802–89811 (2020)CrossRef

Titel: Exploring Machine Learning Algorithms and Protein Language Models Strategies to Develop Enzyme Classification Systems
verfasst von: Diego Fernández
Álvaro Olivera-Nappa
Roberto Uribe-Paredes
David Medina-Ortiz
Verlag: Springer Nature Switzerland
Buch: Bioinformatics and Biomedical Engineering
Print ISBN: 978-3-031-34952-2

Electronic ISBN: 978-3-031-34953-9

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-34953-9_24

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner