Skip to main content
Top
Published in: Knowledge and Information Systems 3/2022

18-02-2022 | Regular Paper

A hidden challenge of link prediction: which pairs to check?

Authors: Caleb Belth, Alican Büyükçakır, Danai Koutra

Published in: Knowledge and Information Systems | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The traditional setup of link prediction in networks assumes that a test set of node pairs, which is usually balanced, is available over which to predict the presence of links. However, in practice, there is no test set: the ground truth is not known, so the number of possible pairs to predict over is quadratic in the number of nodes in the graph. Moreover, because graphs are sparse, most of these possible pairs will not be links. Thus, link prediction methods, which often rely on proximity-preserving embeddings or heuristic notions of node similarity, face a vast search space, with many pairs that are in close proximity, but that should not be linked. To mitigate this issue, we introduce LinkWaldo, a framework for choosing from this quadratic, massively skewed search space of node pairs, a concise set of candidate pairs that, in addition to being in close proximity, also structurally resemble the observed edges. This allows it to ignore some high-proximity but low-resemblance pairs, and also identify high-resemblance, lower-proximity pairs. Our framework is built on a model that theoretically combines stochastic block models (SBMs) with node proximity models. The block structure of the SBM maps out where in the search space new links are expected to fall, and the proximity identifies the most plausible links within these blocks, using locality sensitive hashing to avoid expensive exhaustive search. LinkWaldo can use any node representation learning or heuristic definition of proximity and can generate candidate pairs for any link prediction method, allowing the representation power of current and future methods to be realized for link prediction in practice. We evaluate LinkWaldo on 13 networks across multiple domains and show that on average it returns candidate sets containing 7–33% more missing and future links than both embedding-based and heuristic baselines’ sets. Our code is available at https://​github.​com/​GemsLab/​LinkWaldo.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230CrossRef Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230CrossRef
2.
go back to reference Alivisatos AP, Chun M, Church GM, Greenspan RJ, Roukes ML, Yuste R (2012) The brain activity map project and the challenge of functional connectomics. Neuron 74(6):970–974CrossRef Alivisatos AP, Chun M, Church GM, Greenspan RJ, Roukes ML, Yuste R (2012) The brain activity map project and the challenge of functional connectomics. Neuron 74(6):970–974CrossRef
3.
go back to reference Bawa M, Condie T, Ganesan P (2005) Lsh forest: self-tuning indexes for similarity search. In WWW, pp 651–660 Bawa M, Condie T, Ganesan P (2005) Lsh forest: self-tuning indexes for similarity search. In WWW, pp 651–660
4.
go back to reference Belth C, Büyükçakır A, Koutra D (2020) A hidden challenge of link prediction: Which pairs to check? In: ICDM, pp 831–840. IEEE Belth C, Büyükçakır A, Koutra D (2020) A hidden challenge of link prediction: Which pairs to check? In: ICDM, pp 831–840. IEEE
5.
go back to reference Belth C, Zheng X, Koutra D (2020) Mining persistent activity in continually evolving networks. In: KDD Belth C, Zheng X, Koutra D (2020) Mining persistent activity in continually evolving networks. In: KDD
6.
go back to reference Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: STOC Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: STOC
7.
go back to reference Donnat C, Zitnik M, Hallac D, Leskovec J (2018) Learning structural node embeddings via diffusion wavelets. In: KDD, pp 1320–1329. ACM Donnat C, Zitnik M, Hallac D, Leskovec J (2018) Learning structural node embeddings via diffusion wavelets. In: KDD, pp 1320–1329. ACM
8.
go back to reference Duan L, Ma S, Aggarwal C, Ma T, Huai J (2017) An ensemble approach to link prediction. In: IEEE TKDE 29(11) Duan L, Ma S, Aggarwal C, Ma T, Huai J (2017) An ensemble approach to link prediction. In: IEEE TKDE 29(11)
9.
go back to reference Gao M, Chen L, He X, Aoying Z (2018) Bipartite network embedding. In: SIGIR, Bine Gao M, Chen L, He X, Aoying Z (2018) Bipartite network embedding. In: SIGIR, Bine
10.
go back to reference Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: KDD, pp 855–864. ACM Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: KDD, pp 855–864. ACM
11.
go back to reference Hamilton WL, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. IEEE Data Eng Bull 40(3):52–74 Hamilton WL, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. IEEE Data Eng Bull 40(3):52–74
12.
go back to reference Heimann M, Shen H, Safavi T, Danai K (2018) Representation learning-based graph alignment. In: CIKM, REGAL Heimann M, Shen H, Safavi T, Danai K (2018) Representation learning-based graph alignment. In: CIKM, REGAL
13.
go back to reference Di Jin, Heimann M, Safavi, T Wang M, Lee W, Snider L, Koutra D (2019) Smart roles: inferring professional roles in email networks. In: KDD, pp 2923–2933. ACM Di Jin, Heimann M, Safavi, T Wang M, Lee W, Snider L, Koutra D (2019) Smart roles: inferring professional roles in email networks. In: KDD, pp 2923–2933. ACM
14.
go back to reference Joshi U, Urbani J (2020)Searching for embeddings in a haystack: link prediction on knowledge graphs with subgraph pruning. In: WebConf Joshi U, Urbani J (2020)Searching for embeddings in a haystack: link prediction on knowledge graphs with subgraph pruning. In: WebConf
15.
go back to reference Kipf TN, Welling M (2016) Variational graph auto-encoders. In: NeurIPS workshop on Bayesian deep learning Kipf TN, Welling M (2016) Variational graph auto-encoders. In: NeurIPS workshop on Bayesian deep learning
16.
go back to reference Kunegis J (2013) Konect: the koblenz network collection. In: WWW Kunegis J (2013) Konect: the koblenz network collection. In: WWW
17.
go back to reference Latouche P, Birmelé E, Ambroise C et al (2011) Overlapping stochastic block models with application to the French political blogosphere. Ann Appl Stat 5(1):309–336MathSciNetCrossRef Latouche P, Birmelé E, Ambroise C et al (2011) Overlapping stochastic block models with application to the French political blogosphere. Ann Appl Stat 5(1):309–336MathSciNetCrossRef
19.
go back to reference Levin DA, Peres Y (2017) Markov chains and mixing times, volume 107. American Mathematical Soc Levin DA, Peres Y (2017) Markov chains and mixing times, volume 107. American Mathematical Soc
20.
go back to reference Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. ASIS&T 58(7):1019–1031 Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. ASIS&T 58(7):1019–1031
21.
go back to reference Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. CSUR 49(4):1–33CrossRef Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. CSUR 49(4):1–33CrossRef
22.
go back to reference Mehta N, Carin L, Rai P (2019) Stochastic blockmodels meet graph neural networks. In: ICML Mehta N, Carin L, Rai P (2019) Stochastic blockmodels meet graph neural networks. In: ICML
23.
go back to reference Miller K, Michael IJ, Thomas LG (2009) Nonparametric latent feature models for link prediction. In: NeurIPS Miller K, Michael IJ, Thomas LG (2009) Nonparametric latent feature models for link prediction. In: NeurIPS
24.
go back to reference Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2) Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2)
25.
go back to reference Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. ASIS&T 96(455):1077–1087MathSciNetMATH Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. ASIS&T 96(455):1077–1087MathSciNetMATH
26.
go back to reference Pachev B, Webb B (2018) Fast link prediction for large networks using spectral embedding. J Complex Netw 6(1):79–94MathSciNetCrossRef Pachev B, Webb B (2018) Fast link prediction for large networks using spectral embedding. J Complex Netw 6(1):79–94MathSciNetCrossRef
27.
go back to reference Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: KDD, pp 701–710. ACM Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: KDD, pp 701–710. ACM
28.
go back to reference Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: WSDM, pp 459–467. ACM Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: WSDM, pp 459–467. ACM
29.
go back to reference Ribeiro LFR, Saverese PHP, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: KDD, pp 385–394. ACM Ribeiro LFR, Saverese PHP, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: KDD, pp 385–394. ACM
30.
go back to reference Rossi R, Ahmed N (2015) The network data repository with interactive graph analytics and visualization. In: AAAI Rossi R, Ahmed N (2015) The network data repository with interactive graph analytics and visualization. In: AAAI
31.
go back to reference Rossi RA, Di J, Kim S, Ahmed S, Koutra D, Lee JB (2020) On proximity and structural role-based embeddings in networks: Misconceptions, techniques, and applications. TKDD Rossi RA, Di J, Kim S, Ahmed S, Koutra D, Lee JB (2020) On proximity and structural role-based embeddings in networks: Misconceptions, techniques, and applications. TKDD
32.
go back to reference Safavi T, Koutra D, Meij E (2020) Evaluating the calibration of knowledge graph embeddings for trustworthy link prediction. In: EMNLP Safavi T, Koutra D, Meij E (2020) Evaluating the calibration of knowledge graph embeddings for trustworthy link prediction. In: EMNLP
33.
go back to reference Song D, Meyer DA, Tao D (2015) Top-k link recommendation in social networks. In: ICDM, pp 389–398. IEEE Song D, Meyer DA, Tao D (2015) Top-k link recommendation in social networks. In: ICDM, pp 389–398. IEEE
34.
go back to reference Sporns O, Tononi G, Kötter R (2005) The human connectome: a structural description of the human brain. PLoS Comput Biol 1(4):e42CrossRef Sporns O, Tononi G, Kötter R (2005) The human connectome: a structural description of the human brain. PLoS Comput Biol 1(4):e42CrossRef
35.
go back to reference Tang J, Qu M, Mei Q (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In: KDD, pp 1165–1174. ACM Tang J, Qu M, Mei Q (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In: KDD, pp 1165–1174. ACM
36.
go back to reference Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: WWW, pp 1067–1077. ACM Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: WWW, pp 1067–1077. ACM
37.
go back to reference Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media, Berlin Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media, Berlin
38.
go back to reference Varshney LR, Chen BL, Paniagua E, Hall DH, Chklovskii DB (2011) Structural properties of the caenorhabditis elegans neuronal network. PLoS Comput Biol 7(2):e1001066CrossRef Varshney LR, Chen BL, Paniagua E, Hall DH, Chklovskii DB (2011) Structural properties of the caenorhabditis elegans neuronal network. PLoS Comput Biol 7(2):e1001066CrossRef
40.
go back to reference Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: NeurIPS, pp 5165–5175 Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: NeurIPS, pp 5165–5175
41.
go back to reference Zhu J, Xingyu L, Heimann M, Koutra D (2021) Node proximity is all you need: Unified structural and positional node and graph embedding. In: SDM, SIAM Zhu J, Xingyu L, Heimann M, Koutra D (2021) Node proximity is all you need: Unified structural and positional node and graph embedding. In: SDM, SIAM
Metadata
Title
A hidden challenge of link prediction: which pairs to check?
Authors
Caleb Belth
Alican Büyükçakır
Danai Koutra
Publication date
18-02-2022
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2022
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-021-01632-x

Other articles of this Issue 3/2022

Knowledge and Information Systems 3/2022 Go to the issue

Premium Partner