Top

Knowledge and Information Systems

Published in:

18-02-2022 | Regular Paper

A hidden challenge of link prediction: which pairs to check?

Authors: Caleb Belth, Alican Büyükçakır, Danai Koutra

Published in: Knowledge and Information Systems | Issue 3/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The traditional setup of link prediction in networks assumes that a test set of node pairs, which is usually balanced, is available over which to predict the presence of links. However, in practice, there is no test set: the ground truth is not known, so the number of possible pairs to predict over is quadratic in the number of nodes in the graph. Moreover, because graphs are sparse, most of these possible pairs will not be links. Thus, link prediction methods, which often rely on proximity-preserving embeddings or heuristic notions of node similarity, face a vast search space, with many pairs that are in close proximity, but that should not be linked. To mitigate this issue, we introduce LinkWaldo, a framework for choosing from this quadratic, massively skewed search space of node pairs, a concise set of candidate pairs that, in addition to being in close proximity, also structurally resemble the observed edges. This allows it to ignore some high-proximity but low-resemblance pairs, and also identify high-resemblance, lower-proximity pairs. Our framework is built on a model that theoretically combines stochastic block models (SBMs) with node proximity models. The block structure of the SBM maps out where in the search space new links are expected to fall, and the proximity identifies the most plausible links within these blocks, using locality sensitive hashing to avoid expensive exhaustive search. LinkWaldo can use any node representation learning or heuristic definition of proximity and can generate candidate pairs for any link prediction method, allowing the representation power of current and future methods to be realized for link prediction in practice. We evaluate LinkWaldo on 13 networks across multiple domains and show that on average it returns candidate sets containing 7–33% more missing and future links than both embedding-based and heuristic baselines’ sets. Our code is available at https://github.com/GemsLab/LinkWaldo.

previous article Postimpact similarity: a similarity measure for effective grouping of unlabelled text using spectral clustering

next article Effective scheduling algorithm for load balancing in fog environment using CNN and MPSO

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230CrossRef

Alivisatos AP, Chun M, Church GM, Greenspan RJ, Roukes ML, Yuste R (2012) The brain activity map project and the challenge of functional connectomics. Neuron 74(6):970–974CrossRef

Bawa M, Condie T, Ganesan P (2005) Lsh forest: self-tuning indexes for similarity search. In WWW, pp 651–660

Belth C, Büyükçakır A, Koutra D (2020) A hidden challenge of link prediction: Which pairs to check? In: ICDM, pp 831–840. IEEE

Belth C, Zheng X, Koutra D (2020) Mining persistent activity in continually evolving networks. In: KDD

Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: STOC

Donnat C, Zitnik M, Hallac D, Leskovec J (2018) Learning structural node embeddings via diffusion wavelets. In: KDD, pp 1320–1329. ACM

Duan L, Ma S, Aggarwal C, Ma T, Huai J (2017) An ensemble approach to link prediction. In: IEEE TKDE 29(11)

Gao M, Chen L, He X, Aoying Z (2018) Bipartite network embedding. In: SIGIR, Bine

10.

Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: KDD, pp 855–864. ACM

11.

Hamilton WL, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. IEEE Data Eng Bull 40(3):52–74

12.

Heimann M, Shen H, Safavi T, Danai K (2018) Representation learning-based graph alignment. In: CIKM, REGAL

13.

Di Jin, Heimann M, Safavi, T Wang M, Lee W, Snider L, Koutra D (2019) Smart roles: inferring professional roles in email networks. In: KDD, pp 2923–2933. ACM

14.

Joshi U, Urbani J (2020)Searching for embeddings in a haystack: link prediction on knowledge graphs with subgraph pruning. In: WebConf

15.

Kipf TN, Welling M (2016) Variational graph auto-encoders. In: NeurIPS workshop on Bayesian deep learning

16.

Kunegis J (2013) Konect: the koblenz network collection. In: WWW

17.

Latouche P, Birmelé E, Ambroise C et al (2011) Overlapping stochastic block models with application to the French political blogosphere. Ann Appl Stat 5(1):309–336MathSciNetCrossRef

18.

Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data

19.

Levin DA, Peres Y (2017) Markov chains and mixing times, volume 107. American Mathematical Soc

20.

Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. ASIS&T 58(7):1019–1031

21.

Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. CSUR 49(4):1–33CrossRef

22.

Mehta N, Carin L, Rai P (2019) Stochastic blockmodels meet graph neural networks. In: ICML

23.

Miller K, Michael IJ, Thomas LG (2009) Nonparametric latent feature models for link prediction. In: NeurIPS

24.

Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2)

25.

Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. ASIS&T 96(455):1077–1087MathSciNetMATH

26.

Pachev B, Webb B (2018) Fast link prediction for large networks using spectral embedding. J Complex Netw 6(1):79–94MathSciNetCrossRef

27.

Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: KDD, pp 701–710. ACM

28.

Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: WSDM, pp 459–467. ACM

29.

Ribeiro LFR, Saverese PHP, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: KDD, pp 385–394. ACM

30.

Rossi R, Ahmed N (2015) The network data repository with interactive graph analytics and visualization. In: AAAI

31.

Rossi RA, Di J, Kim S, Ahmed S, Koutra D, Lee JB (2020) On proximity and structural role-based embeddings in networks: Misconceptions, techniques, and applications. TKDD

32.

Safavi T, Koutra D, Meij E (2020) Evaluating the calibration of knowledge graph embeddings for trustworthy link prediction. In: EMNLP

33.

Song D, Meyer DA, Tao D (2015) Top-k link recommendation in social networks. In: ICDM, pp 389–398. IEEE

34.

Sporns O, Tononi G, Kötter R (2005) The human connectome: a structural description of the human brain. PLoS Comput Biol 1(4):e42CrossRef

35.

Tang J, Qu M, Mei Q (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In: KDD, pp 1165–1174. ACM

36.

Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: WWW, pp 1067–1077. ACM

37.

Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media, Berlin

38.

Varshney LR, Chen BL, Paniagua E, Hall DH, Chklovskii DB (2011) Structural properties of the caenorhabditis elegans neuronal network. PLoS Comput Biol 7(2):e1001066CrossRef

39.

Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927

40.

Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: NeurIPS, pp 5165–5175

41.

Zhu J, Xingyu L, Heimann M, Koutra D (2021) Node proximity is all you need: Unified structural and positional node and graph embedding. In: SDM, SIAM

Title: A hidden challenge of link prediction: which pairs to check?
Authors: Caleb Belth
Alican Büyükçakır
Danai Koutra
Publication date: 18-02-2022
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 3/2022
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-021-01632-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 3/2022

NAG: neural feature aggregation framework for credit card fraud detection

An improved confusion matrix for fusing multiple K-SVD classifiers

Ensemble of classifier chains and decision templates for multi-label classification

Effective scheduling algorithm for load balancing in fog environment using CNN and MPSO

Performance evaluation of machine learning for fault selection in power transmission lines

Attributed community search considering community focusing and latent relationship

Premium Partner