Skip to main content

2023 | OriginalPaper | Buchkapitel

Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

verfasst von : Luca Cappelletti, Stefano Taverni, Tommaso Fontana, Marcin P. Joachimiak, Justin Reese, Peter Robinson, Elena Casiraghi, Giorgio Valentini

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The bias-correction schema proposed here is made available in Rust with Python bindings as part of the GRAPE library for graph machine learning [3]. Besides novel implementations of DeepWalk, Node2Vec, and Walklets, GRAPE integrates the random forest implementation from sklearn [9].
 
Literatur
2.
Zurück zum Zitat Campbell, S.L., Meyer, C.D.: Generalized Inverses of Linear Transformations. SIAM (2009) Campbell, S.L., Meyer, C.D.: Generalized Inverses of Linear Transformations. SIAM (2009)
3.
4.
Zurück zum Zitat Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 21(1), 1–13 (2020)CrossRef Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 21(1), 1–13 (2020)CrossRef
5.
Zurück zum Zitat Cuzzocrea, A., Cappelletti, L., Valentini, G.: A neural model for the prediction of pathogenic genomic variants in mendelian diseases. In: Proceedings of the 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Barcelona, Spain, pp. 34–38 (2019) Cuzzocrea, A., Cappelletti, L., Valentini, G.: A neural model for the prediction of pathogenic genomic variants in mendelian diseases. In: Proceedings of the 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Barcelona, Spain, pp. 34–38 (2019)
6.
Zurück zum Zitat Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016) Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
7.
Zurück zum Zitat Li, M.M., Huang, K., Zitnik, M.: Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6(12), 1353–1369 (2022)CrossRefPubMed Li, M.M., Huang, K., Zitnik, M.: Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6(12), 1353–1369 (2022)CrossRefPubMed
8.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
9.
Zurück zum Zitat Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
10.
Zurück zum Zitat Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014) Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
11.
Zurück zum Zitat Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 258–265 (2017) Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 258–265 (2017)
12.
Zurück zum Zitat Petrini, A., et al.: parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. GigaScience 9(5), giaa052 (2020) Petrini, A., et al.: parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. GigaScience 9(5), giaa052 (2020)
13.
Zurück zum Zitat Radhakrishna Rao, C., Mitra, S.K., et al.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 601–620. University of California Press, Oakland (1972) Radhakrishna Rao, C., Mitra, S.K., et al.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 601–620. University of California Press, Oakland (1972)
14.
Zurück zum Zitat Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)CrossRefPubMed Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)CrossRefPubMed
15.
Zurück zum Zitat Yi, H.-C., You, Z.-H., Huang, D.-S., Kwoh, C.K.: Graph representation learning in bioinformatics: trends, methods and applications. Brief. Bioinform. 23(1), bbab340 (2021)CrossRef Yi, H.-C., You, Z.-H., Huang, D.-S., Kwoh, C.K.: Graph representation learning in bioinformatics: trends, methods and applications. Brief. Bioinform. 23(1), bbab340 (2021)CrossRef
16.
Zurück zum Zitat Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2019)CrossRefPubMedCentral Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2019)CrossRefPubMedCentral
Metadaten
Titel
Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs
verfasst von
Luca Cappelletti
Stefano Taverni
Tommaso Fontana
Marcin P. Joachimiak
Justin Reese
Peter Robinson
Elena Casiraghi
Giorgio Valentini
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-34960-7_26

Premium Partner