nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

verfasst von : Luca Cappelletti, Stefano Taverni, Tommaso Fontana, Marcin P. Joachimiak, Justin Reese, Peter Robinson, Elena Casiraghi, Giorgio Valentini

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Inter-helical Residue Contact Prediction in -Helical Transmembrane Proteins Using Structural Features

Nächstes Kapitel Role of Parallel Processing in Brain Magnetic Resonance Imaging

Nur mit Berechtigung zugänglich

The bias-correction schema proposed here is made available in Rust with Python bindings as part of the GRAPE library for graph machine learning [3]. Besides novel implementations of DeepWalk, Node2Vec, and Walklets, GRAPE integrates the random forest implementation from sklearn [9].

Ben-Israel, A., Greville, T.N.E.: Generalized Inverses: Theory and Applications, vol. 15. Springer, Heidelberg (2003). https://doi.org/10.1007/b97366CrossRef

Campbell, S.L., Meyer, C.D.: Generalized Inverses of Linear Transformations. SIAM (2009)

Cappelletti, L., et al.: GRAPE: fast and scalable graph processing and embedding. arXiv preprint arXiv:2110.06196 (2022)

Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 21(1), 1–13 (2020)CrossRef

Cuzzocrea, A., Cappelletti, L., Valentini, G.: A neural model for the prediction of pathogenic genomic variants in mendelian diseases. In: Proceedings of the 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Barcelona, Spain, pp. 34–38 (2019)

Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)

Li, M.M., Huang, K., Zitnik, M.: Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6(12), 1353–1369 (2022)CrossRefPubMed

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

10.

Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

11.

Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 258–265 (2017)

12.

Petrini, A., et al.: parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. GigaScience 9(5), giaa052 (2020)

13.

Radhakrishna Rao, C., Mitra, S.K., et al.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 601–620. University of California Press, Oakland (1972)

14.

Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)CrossRefPubMed

15.

Yi, H.-C., You, Z.-H., Huang, D.-S., Kwoh, C.K.: Graph representation learning in bioinformatics: trends, methods and applications. Brief. Bioinform. 23(1), bbab340 (2021)CrossRef

16.

Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2019)CrossRefPubMedCentral

Titel: Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs
verfasst von: Luca Cappelletti
Stefano Taverni
Tommaso Fontana
Marcin P. Joachimiak
Justin Reese
Peter Robinson
Elena Casiraghi
Giorgio Valentini
Verlag: Springer Nature Switzerland
Buch: Bioinformatics and Biomedical Engineering
Print ISBN: 978-3-031-34959-1

Electronic ISBN: 978-3-031-34960-7

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-34960-7_26

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner