Skip to main content
Top
Published in:

01-12-2016 | Original Article

Reducing seed noise in personalized PageRank

Authors: Shengyu Huang, Xinsheng Li, K. Selçuk Candan, Maria Luisa Sapino

Published in: Social Network Analysis and Mining | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Otherwise the seed set would have been constructed differently.
 
Literature
go back to reference Andersen R, Borgs C, Chayes J, Feige U, Flaxman A, Kalai A, Mirrokni V, Tennenholtz M (2008) Trust-based recommendation systems: an axiomatic approach. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 199–208 Andersen R, Borgs C, Chayes J, Feige U, Flaxman A, Kalai A, Mirrokni V, Tennenholtz M (2008) Trust-based recommendation systems: an axiomatic approach. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 199–208
go back to reference Avrachenkov K, Litvak N, Nemirovsky D, Smirnova E, Sokol M (2011) Quick detection of top-k personalized PageRank lists. In: Algorithms and Models for the Web Graph. Springer, Berlin Heidelberg, pp 50–61 Avrachenkov K, Litvak N, Nemirovsky D, Smirnova E, Sokol M (2011) Quick detection of top-k personalized PageRank lists. In: Algorithms and Models for the Web Graph. Springer, Berlin Heidelberg, pp 50–61
go back to reference Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized PageRank on MapReduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 973–984 Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized PageRank on MapReduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 973–984
go back to reference Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. In: Proceedings of the VLDB Endowment, vol 4, pp 173–184 Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. In: Proceedings of the VLDB Endowment, vol 4, pp 173–184
go back to reference Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on very large data bases. Morgan Kaufman, San Francisco, pp 564–575 Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on very large data bases. Morgan Kaufman, San Francisco, pp 564–575
go back to reference Berkhin P (2007) Bookmark-coloring approach to personalized pagerank computing. Int Math 3(1):41–62MathSciNetMATH Berkhin P (2007) Bookmark-coloring approach to personalized pagerank computing. Int Math 3(1):41–62MathSciNetMATH
go back to reference Boldi P, Rosa M, Vigna S (2011) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th international conference on World Wide Web. ACM, New York, pp 625–634 Boldi P, Rosa M, Vigna S (2011) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th international conference on World Wide Web. ACM, New York, pp 625–634
go back to reference Borgs C, Brautbar M, Chayes J, Teng SH (2014) Multiscale matrix sampling and sublinear-time pagerank computation. Int Math 10(1–2):20–48MathSciNet Borgs C, Brautbar M, Chayes J, Teng SH (2014) Multiscale matrix sampling and sublinear-time pagerank computation. Int Math 10(1–2):20–48MathSciNet
go back to reference Borgatti MG, Jones C, Everett MG (1998) Network measures of social capital. Connections 21(2):27–36 Borgatti MG, Jones C, Everett MG (1998) Network measures of social capital. Connections 21(2):27–36
go back to reference Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117CrossRef Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117CrossRef
go back to reference Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 25–32 Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 25–32
go back to reference Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the fourth European conference on machine learning and principles and practice of knowledge discovery in databases, pp 294–305 Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the fourth European conference on machine learning and principles and practice of knowledge discovery in databases, pp 294–305
go back to reference Candan KS, Li WS (2002) Reasoning for Web document associations and its applications in site map construction. Data Knowl Eng 43(2):121–150CrossRefMATH Candan KS, Li WS (2002) Reasoning for Web document associations and its applications in site map construction. Data Knowl Eng 43(2):121–150CrossRefMATH
go back to reference Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, pp 571–580 Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, pp 571–580
go back to reference Chen M, Liu J, Tang X (2008) Clustering via random walk hitting time on directed graphs. In: Proceedings of the 23rd national conference on Artificial intelligence, pp 616–621 Chen M, Liu J, Tang X (2008) Clustering via random walk hitting time on directed graphs. In: Proceedings of the 23rd national conference on Artificial intelligence, pp 616–621
go back to reference Csalogany K, Fogaras D, Rácz B, Sarlós T (2005) Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Int Math 2(3):333–358MathSciNetMATH Csalogany K, Fogaras D, Rácz B, Sarlós T (2005) Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Int Math 2(3):333–358MathSciNetMATH
go back to reference Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadephia, PA, pp 1–211CrossRefMATH Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadephia, PA, pp 1–211CrossRefMATH
go back to reference Foster KC, Muth SQ, Potterat JJ, Rothenberg RB (2001) A faster Katz status score algorithm. Comput Math Organ Theo 7(4):275–285CrossRef Foster KC, Muth SQ, Potterat JJ, Rothenberg RB (2001) A faster Katz status score algorithm. Comput Math Organ Theo 7(4):275–285CrossRef
go back to reference Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transact Knowl Data Eng 5:1041–4347 Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transact Knowl Data Eng 5:1041–4347
go back to reference Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. In: Proceedings of the VLDB Endowment, vol 5, pp 442–453 Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. In: Proceedings of the VLDB Endowment, vol 5, pp 442–453
go back to reference Guan Z, Bu J, Mei Q, Chen C, Wang C (2009) Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 540–547 Guan Z, Bu J, Mei Q, Chen C, Wang C (2009) Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 540–547
go back to reference Gupta M, Pathak A, Chakrabarti S (2008) Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the 17th international conference on World Wide Web, pp 1225–1226 Gupta M, Pathak A, Chakrabarti S (2008) Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the 17th international conference on World Wide Web, pp 1225–1226
go back to reference Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 517–526 Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 517–526
go back to reference Jeh G, Widom J (2003) Scaling personalized web. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 271–279 Jeh G, Widom J (2003) Scaling personalized web. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 271–279
go back to reference Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43CrossRefMATH Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43CrossRefMATH
go back to reference Kamvar SD, Haveliwala T, Manning CD, Golub G (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 261–270 Kamvar SD, Haveliwala T, Manning CD, Golub G (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 261–270
go back to reference Kim HJ, Candan KS, Sapino ML (2013) LR-PPR: locality-sensitive, re-use promoting, approximate personalized PageRank computation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 1801–1806 Kim HJ, Candan KS, Sapino ML (2013) LR-PPR: locality-sensitive, re-use promoting, approximate personalized PageRank computation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 1801–1806
go back to reference Kim HN, El-Saddik A (2011) Personalized PageRank vectors for tag recommendations: inside FolkRank. In: Proceedings of the fifth ACM conference on recommender systems. ACM, New York, pp 45–52 Kim HN, El-Saddik A (2011) Personalized PageRank vectors for tag recommendations: inside FolkRank. In: Proceedings of the fifth ACM conference on recommender systems. ACM, New York, pp 45–52
go back to reference Langville AN, Meyer CD (2004) Updating pagerank with iterative aggregation. In: Proceedings of the 13th international World Wide Web conference on alternate track papers & posters. ACM, New York, pp 392–393 Langville AN, Meyer CD (2004) Updating pagerank with iterative aggregation. In: Proceedings of the 13th international World Wide Web conference on alternate track papers & posters. ACM, New York, pp 392–393
go back to reference Maehara T, Akiba T et al (2014) Computing personalized PageRank quickly by exploiting graph structures. In: Proceedings of the VLDB endowment, vol 7, pp 1023–1034 Maehara T, Akiba T et al (2014) Computing personalized PageRank quickly by exploiting graph structures. In: Proceedings of the VLDB endowment, vol 7, pp 1023–1034
go back to reference Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 135–146 Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 135–146
go back to reference Mei Q, Zhou D, Church K (2008) Query suggestion using hitting time. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 469–478 Mei Q, Zhou D, Church K (2008) Query suggestion using hitting time. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 469–478
go back to reference Palmer C, Gibbons P, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 81–90 Palmer C, Gibbons P, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 81–90
go back to reference Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4:179–189CrossRef Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4:179–189CrossRef
go back to reference Sarkar P, Moore AW, Prakash A (2008) Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on machine learning. ACM, pp 896–903 Sarkar P, Moore AW, Prakash A (2008) Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on machine learning. ACM, pp 896–903
go back to reference Sarma AD, Molla AR, Pandurangan G, Upfal E (2013) Fast distributed PageRank computation. In: Proceedings of 14th international conference on distributed computing and networking, pp 11–26 Sarma AD, Molla AR, Pandurangan G, Upfal E (2013) Fast distributed PageRank computation. In: Proceedings of 14th international conference on distributed computing and networking, pp 11–26
go back to reference Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 404–413 Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 404–413
go back to reference Tong H, Faloutsos C, Koren Y (2007) Fast direction-aware proximity for graph mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 747–756 Tong H, Faloutsos C, Koren Y (2007) Fast direction-aware proximity for graph mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 747–756
go back to reference Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, pp 613–622 Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, pp 613–622
go back to reference Wei F (2010) Tedi: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 99–110 Wei F (2010) Tedi: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 99–110
go back to reference White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Netw 16:335–346CrossRef White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Netw 16:335–346CrossRef
go back to reference Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 493–504 Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 493–504
go back to reference Zhou L, Chen L, Ozsu MT (2009) Distance-join: pattern match query in a large graph. In: Proceedings of the VLDB endowment, vol 2, pp 886–897 Zhou L, Chen L, Ozsu MT (2009) Distance-join: pattern match query in a large graph. In: Proceedings of the VLDB endowment, vol 2, pp 886–897
Metadata
Title
Reducing seed noise in personalized PageRank
Authors
Shengyu Huang
Xinsheng Li
K. Selçuk Candan
Maria Luisa Sapino
Publication date
01-12-2016
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2016
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-015-0309-6

Premium Partner