Skip to main content
Erschienen in: Knowledge and Information Systems 3/2015

01.12.2015 | Regular Paper

Evaluating link prediction methods

verfasst von: Yang Yang, Ryan N. Lichtenwalter, Nitesh V. Chawla

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data: a short course. AMLBook Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data: a short course. AMLBook
2.
Zurück zum Zitat Adamic L (2001) Friends and neighbors on the web. Soc Netw 25(3):211–230CrossRef Adamic L (2001) Friends and neighbors on the web. Soc Netw 25(3):211–230CrossRef
3.
Zurück zum Zitat Al-Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proceedings of SDM’06 workshop on link analysis, counterterrorism and security Al-Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proceedings of SDM’06 workshop on link analysis, counterterrorism and security
4.
Zurück zum Zitat Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 635–644 Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 635–644
6.
Zurück zum Zitat Barabási A-L, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaboration. Phys A Stat Mech Appl 311(3–4):590–614CrossRefMATH Barabási A-L, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaboration. Phys A Stat Mech Appl 311(3–4):590–614CrossRefMATH
7.
Zurück zum Zitat Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101CrossRef Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101CrossRef
9.
Zurück zum Zitat Davis D, Lichtenwalter R, Chawla NV (2011) Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 international conference on advances in social networks analysis and mining, pp 281–288 Davis D, Lichtenwalter R, Chawla NV (2011) Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 international conference on advances in social networks analysis and mining, pp 281–288
10.
Zurück zum Zitat Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240 Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240
11.
Zurück zum Zitat Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1271–1279 Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1271–1279
12.
Zurück zum Zitat Dong Y, Tang J, Wu S, Tian J, Chawla NV, Rao J, Cao H (2012) Link prediction and recommendation across heterogeneous social networks. In: Proceedings of the 2012 international conference on data mining, pp 181–190 Dong Y, Tang J, Wu S, Tian J, Chawla NV, Rao J, Cao H (2012) Link prediction and recommendation across heterogeneous social networks. In: Proceedings of the 2012 international conference on data mining, pp 181–190
13.
Zurück zum Zitat Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130CrossRef Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130CrossRef
14.
Zurück zum Zitat Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. ReCALL 31(HPL-2003-4), pp 1–38 Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. ReCALL 31(HPL-2003-4), pp 1–38
15.
Zurück zum Zitat Fletcher RJ, Acevedo MA, Reichert BE, Pias KE, Kitchens WM (2011) Social network models predict movement and connectivity in ecological landscapes. Proc Natl Acad Sci 108(48):19282–19287CrossRef Fletcher RJ, Acevedo MA, Reichert BE, Pias KE, Kitchens WM (2011) Social network models predict movement and connectivity in ecological landscapes. Proc Natl Acad Sci 108(48):19282–19287CrossRef
17.
Zurück zum Zitat Goldberg DS, Roth FP (2003) Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci 100(8):4372–4376MathSciNetCrossRefMATH Goldberg DS, Roth FP (2003) Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci 100(8):4372–4376MathSciNetCrossRefMATH
18.
Zurück zum Zitat Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123CrossRef Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123CrossRef
19.
20.
Zurück zum Zitat Hopcroft J, Lou T, Tang J (2011) Who will follow you back? Reciprocal relationship prediction. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp 1137–1146 Hopcroft J, Lou T, Tang J (2011) Who will follow you back? Reciprocal relationship prediction. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp 1137–1146
21.
Zurück zum Zitat Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint in proceedings on digital libraries, pp 7–11 Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint in proceedings on digital libraries, pp 7–11
22.
Zurück zum Zitat Leroy V, Cambazoglu BB, Bonchi F (2010) Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 393–402 Leroy V, Cambazoglu BB, Bonchi F (2010) Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 393–402
23.
Zurück zum Zitat Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 462–470 Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 462–470
24.
Zurück zum Zitat Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefMATH Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefMATH
25.
Zurück zum Zitat Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on world wide web, pp 641–650 Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on world wide web, pp 641–650
26.
Zurück zum Zitat Lichtenwalter RN, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 243–252 Lichtenwalter RN, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 243–252
27.
Zurück zum Zitat Lichtenwalter RN, Chawla NV (2012) Link prediction: fair and effective evaluation. In: IEEE/ACM international conference on social networks analysis and mining, pp 376–383 Lichtenwalter RN, Chawla NV (2012) Link prediction: fair and effective evaluation. In: IEEE/ACM international conference on social networks analysis and mining, pp 376–383
28.
Zurück zum Zitat Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: Proceedings of the twelfth international conference on information and knowledge management, pp 556–559 Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: Proceedings of the twelfth international conference on information and knowledge management, pp 556–559
29.
Zurück zum Zitat Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031CrossRef Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031CrossRef
30.
Zurück zum Zitat Liu X, He Q, Tian Y, Lee WC, McPherson J, Han J (2012) Event-based social networks: linking the online and offline social worlds. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1032–1040 Liu X, He Q, Tian Y, Lee WC, McPherson J, Han J (2012) Event-based social networks: linking the online and offline social worlds. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1032–1040
32.
Zurück zum Zitat Martinez ND, Hawkins BA, Dawah HA, Feifarek BP (1999) Effects of sampling effort on characterization of food-web structure. Ecology 80:1044–1055CrossRef Martinez ND, Hawkins BA, Dawah HA, Feifarek BP (1999) Effects of sampling effort on characterization of food-web structure. Ecology 80:1044–1055CrossRef
33.
Zurück zum Zitat Murata T, Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In: Proceedings of the IEEE/WIC/ACM international conference on web intelligence, pp 85–88 Murata T, Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In: Proceedings of the IEEE/WIC/ACM international conference on web intelligence, pp 85–88
34.
Zurück zum Zitat Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: statistical significance and interpretation. Q J R Meteorol Soc 2002:2145–2166CrossRef Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: statistical significance and interpretation. Q J R Meteorol Soc 2002:2145–2166CrossRef
35.
Zurück zum Zitat Narayanan A, Shi E, Rubinstein BIP (2011) Link prediction by de-anonymization: how we won the Kaggle social network challenge. Arxiv preprint arXiv:1102.4374 Narayanan A, Shi E, Rubinstein BIP (2011) Link prediction by de-anonymization: how we won the Kaggle social network challenge. Arxiv preprint arXiv:​1102.​4374
36.
Zurück zum Zitat Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev Lett E 64(2):025102CrossRef Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev Lett E 64(2):025102CrossRef
37.
Zurück zum Zitat Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98:404–409CrossRefMATH Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98:404–409CrossRefMATH
38.
Zurück zum Zitat O’Madadhain J, Hutchins J, Smyth P (2005) Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explor Newsl 7(2):23–30CrossRef O’Madadhain J, Hutchins J, Smyth P (2005) Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explor Newsl 7(2):23–30CrossRef
39.
Zurück zum Zitat O’Madadhain J, Smyth P, Adamic L (2005) Learning predictive models for link formation. In: International sunbelt social network conference O’Madadhain J, Smyth P, Adamic L (2005) Learning predictive models for link formation. In: International sunbelt social network conference
40.
Zurück zum Zitat Papadopoulos F, Kitsak M, Serrano M, Boguna M, Krioukov D (2012) Popularity versus similarity in growing networks. Nature 489(7417):537–540CrossRef Papadopoulos F, Kitsak M, Serrano M, Boguna M, Krioukov D (2012) Popularity versus similarity in growing networks. Nature 489(7417):537–540CrossRef
41.
Zurück zum Zitat Raeder T, Hoens TR, Chawla NV (2010) Consequences of variability in classifier performance estimates. In: Proceedings of the 10th IEEE international conference on data mining, pp 421–430 Raeder T, Hoens TR, Chawla NV (2010) Consequences of variability in classifier performance estimates. In: Proceedings of the 10th IEEE international conference on data mining, pp 421–430
42.
Zurück zum Zitat Sarukkai RR (2000) Link prediction and path analysis using Markov Chains. In: Proceedings of the 9th international WWW inproceedings on computer networks: the international journal of computer and telecommunications networking, pp 377–386 Sarukkai RR (2000) Link prediction and path analysis using Markov Chains. In: Proceedings of the 9th international WWW inproceedings on computer networks: the international journal of computer and telecommunications networking, pp 377–386
43.
Zurück zum Zitat Scellato S, Mascolo C, Musolesi M, Latora V (2010) Distance matters: geo-social metrics for online social networks. In: Proceedings of the 3rd conference on online social networks, pp 8–8 Scellato S, Mascolo C, Musolesi M, Latora V (2010) Distance matters: geo-social metrics for online social networks. In: Proceedings of the 3rd conference on online social networks, pp 8–8
44.
Zurück zum Zitat Scellato S, Noulas A, Mascolo C (2011) Exploring place features in link prediction on location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1046–1054 Scellato S, Noulas A, Mascolo C (2011) Exploring place features in link prediction on location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1046–1054
45.
Zurück zum Zitat Scripps J, Tan PN, Chen F, Esfahanian AH (2008) A matrix alignment approach for link prediction. In: Proceedings of the 19th international conference on pattern recognition, pp 1–4 Scripps J, Tan PN, Chen F, Esfahanian AH (2008) A matrix alignment approach for link prediction. In: Proceedings of the 19th international conference on pattern recognition, pp 1–4
46.
Zurück zum Zitat Stumpf MPH, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci 102(12):4221–4224CrossRef Stumpf MPH, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci 102(12):4221–4224CrossRef
47.
Zurück zum Zitat Sprinzak E, Sattath S, Margalit H (2003) How reliable are experimental protein–protein interaction data? J Mol Biol 327(5):919–923CrossRef Sprinzak E, Sattath S, Margalit H (2003) How reliable are experimental protein–protein interaction data? J Mol Biol 327(5):919–923CrossRef
48.
Zurück zum Zitat Sun Y, Han J, Aggarwal CC, Chawla NV (2012) When will it happen? Relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 663–672 Sun Y, Han J, Aggarwal CC, Chawla NV (2012) When will it happen? Relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 663–672
49.
Zurück zum Zitat Szilágyi A, Grimm V, Arakaki AK, Skolnick J (2005) Prediction of physical protein–protein interactions. Phys Biol 2(2):S1–16CrossRef Szilágyi A, Grimm V, Arakaki AK, Skolnick J (2005) Prediction of physical protein–protein interactions. Phys Biol 2(2):S1–16CrossRef
50.
Zurück zum Zitat Taskar B, Wong MF, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of the conference on neural information processing systems Taskar B, Wong MF, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of the conference on neural information processing systems
51.
Zurück zum Zitat Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in facebook. In: Proceeding of the 2nd ACM SIGCOMM workshop on social networks, pp 37–42 Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in facebook. In: Proceeding of the 2nd ACM SIGCOMM workshop on social networks, pp 37–42
52.
Zurück zum Zitat Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the IEEE international conference on data mining, pp 322–331 Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the IEEE international conference on data mining, pp 322–331
53.
Zurück zum Zitat Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL (2011) Human mobility, social ties, and link prediction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1100–1108 Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL (2011) Human mobility, social ties, and link prediction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1100–1108
54.
Zurück zum Zitat Wittie MP, Pejovic V, Deek L, Almeroth KC, Zhao BY (2010) Exploiting locality of interest in online social networks. In: Co-NEXT ’10 proceedings of the 6th international conference Wittie MP, Pejovic V, Deek L, Almeroth KC, Zhao BY (2010) Exploiting locality of interest in online social networks. In: Co-NEXT ’10 proceedings of the 6th international conference
55.
Zurück zum Zitat Yang Y, Chawla NV, Sun Y, Han J (2012) Predicting links in multi-relational and heterogeneous networks. In: Proceedings of the 12th IEEE international conference on data mining, pp 755–764 Yang Y, Chawla NV, Sun Y, Han J (2012) Predicting links in multi-relational and heterogeneous networks. In: Proceedings of the 12th IEEE international conference on data mining, pp 755–764
56.
Zurück zum Zitat Yin Z, Gupta M, Weninger T, Han J (2010) A unified framework for link recommendation using random walks. In: Proceedings of the 2010 international conference on advances in social networks analysis and mining, pp 152–159 Yin Z, Gupta M, Weninger T, Han J (2010) A unified framework for link recommendation using random walks. In: Proceedings of the 2010 international conference on advances in social networks analysis and mining, pp 152–159
Metadaten
Titel
Evaluating link prediction methods
verfasst von
Yang Yang
Ryan N. Lichtenwalter
Nitesh V. Chawla
Publikationsdatum
01.12.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0789-0

Weitere Artikel der Ausgabe 3/2015

Knowledge and Information Systems 3/2015 Zur Ausgabe