Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2016

01.12.2016 | Original Article

Reducing seed noise in personalized PageRank

verfasst von: Shengyu Huang, Xinsheng Li, K. Selçuk Candan, Maria Luisa Sapino

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Otherwise the seed set would have been constructed differently.
 
Literatur
Zurück zum Zitat Andersen R, Borgs C, Chayes J, Feige U, Flaxman A, Kalai A, Mirrokni V, Tennenholtz M (2008) Trust-based recommendation systems: an axiomatic approach. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 199–208 Andersen R, Borgs C, Chayes J, Feige U, Flaxman A, Kalai A, Mirrokni V, Tennenholtz M (2008) Trust-based recommendation systems: an axiomatic approach. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 199–208
Zurück zum Zitat Avrachenkov K, Litvak N, Nemirovsky D, Smirnova E, Sokol M (2011) Quick detection of top-k personalized PageRank lists. In: Algorithms and Models for the Web Graph. Springer, Berlin Heidelberg, pp 50–61 Avrachenkov K, Litvak N, Nemirovsky D, Smirnova E, Sokol M (2011) Quick detection of top-k personalized PageRank lists. In: Algorithms and Models for the Web Graph. Springer, Berlin Heidelberg, pp 50–61
Zurück zum Zitat Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized PageRank on MapReduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 973–984 Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized PageRank on MapReduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 973–984
Zurück zum Zitat Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. In: Proceedings of the VLDB Endowment, vol 4, pp 173–184 Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. In: Proceedings of the VLDB Endowment, vol 4, pp 173–184
Zurück zum Zitat Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on very large data bases. Morgan Kaufman, San Francisco, pp 564–575 Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on very large data bases. Morgan Kaufman, San Francisco, pp 564–575
Zurück zum Zitat Berkhin P (2007) Bookmark-coloring approach to personalized pagerank computing. Int Math 3(1):41–62MathSciNetMATH Berkhin P (2007) Bookmark-coloring approach to personalized pagerank computing. Int Math 3(1):41–62MathSciNetMATH
Zurück zum Zitat Boldi P, Rosa M, Vigna S (2011) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th international conference on World Wide Web. ACM, New York, pp 625–634 Boldi P, Rosa M, Vigna S (2011) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th international conference on World Wide Web. ACM, New York, pp 625–634
Zurück zum Zitat Borgs C, Brautbar M, Chayes J, Teng SH (2014) Multiscale matrix sampling and sublinear-time pagerank computation. Int Math 10(1–2):20–48MathSciNet Borgs C, Brautbar M, Chayes J, Teng SH (2014) Multiscale matrix sampling and sublinear-time pagerank computation. Int Math 10(1–2):20–48MathSciNet
Zurück zum Zitat Borgatti MG, Jones C, Everett MG (1998) Network measures of social capital. Connections 21(2):27–36 Borgatti MG, Jones C, Everett MG (1998) Network measures of social capital. Connections 21(2):27–36
Zurück zum Zitat Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117CrossRef Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117CrossRef
Zurück zum Zitat Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 25–32 Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 25–32
Zurück zum Zitat Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the fourth European conference on machine learning and principles and practice of knowledge discovery in databases, pp 294–305 Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the fourth European conference on machine learning and principles and practice of knowledge discovery in databases, pp 294–305
Zurück zum Zitat Candan KS, Li WS (2002) Reasoning for Web document associations and its applications in site map construction. Data Knowl Eng 43(2):121–150CrossRefMATH Candan KS, Li WS (2002) Reasoning for Web document associations and its applications in site map construction. Data Knowl Eng 43(2):121–150CrossRefMATH
Zurück zum Zitat Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, pp 571–580 Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, pp 571–580
Zurück zum Zitat Chen M, Liu J, Tang X (2008) Clustering via random walk hitting time on directed graphs. In: Proceedings of the 23rd national conference on Artificial intelligence, pp 616–621 Chen M, Liu J, Tang X (2008) Clustering via random walk hitting time on directed graphs. In: Proceedings of the 23rd national conference on Artificial intelligence, pp 616–621
Zurück zum Zitat Cohen E, Halperin E, Kaplan H, Zwick U (2003) Reachability and distance queries via 2-hop labels. SIAM J Comput 32(5):1338–1355MathSciNetCrossRefMATH Cohen E, Halperin E, Kaplan H, Zwick U (2003) Reachability and distance queries via 2-hop labels. SIAM J Comput 32(5):1338–1355MathSciNetCrossRefMATH
Zurück zum Zitat Csalogany K, Fogaras D, Rácz B, Sarlós T (2005) Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Int Math 2(3):333–358MathSciNetMATH Csalogany K, Fogaras D, Rácz B, Sarlós T (2005) Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Int Math 2(3):333–358MathSciNetMATH
Zurück zum Zitat Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadephia, PA, pp 1–211CrossRefMATH Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadephia, PA, pp 1–211CrossRefMATH
Zurück zum Zitat Foster KC, Muth SQ, Potterat JJ, Rothenberg RB (2001) A faster Katz status score algorithm. Comput Math Organ Theo 7(4):275–285CrossRef Foster KC, Muth SQ, Potterat JJ, Rothenberg RB (2001) A faster Katz status score algorithm. Comput Math Organ Theo 7(4):275–285CrossRef
Zurück zum Zitat Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transact Knowl Data Eng 5:1041–4347 Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transact Knowl Data Eng 5:1041–4347
Zurück zum Zitat Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. In: Proceedings of the VLDB Endowment, vol 5, pp 442–453 Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. In: Proceedings of the VLDB Endowment, vol 5, pp 442–453
Zurück zum Zitat Guan Z, Bu J, Mei Q, Chen C, Wang C (2009) Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 540–547 Guan Z, Bu J, Mei Q, Chen C, Wang C (2009) Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 540–547
Zurück zum Zitat Gupta M, Pathak A, Chakrabarti S (2008) Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the 17th international conference on World Wide Web, pp 1225–1226 Gupta M, Pathak A, Chakrabarti S (2008) Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the 17th international conference on World Wide Web, pp 1225–1226
Zurück zum Zitat Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 517–526 Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 517–526
Zurück zum Zitat Jeh G, Widom J (2003) Scaling personalized web. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 271–279 Jeh G, Widom J (2003) Scaling personalized web. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 271–279
Zurück zum Zitat Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43CrossRefMATH Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43CrossRefMATH
Zurück zum Zitat Kamvar SD, Haveliwala T, Manning CD, Golub G (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 261–270 Kamvar SD, Haveliwala T, Manning CD, Golub G (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 261–270
Zurück zum Zitat Kim HJ, Candan KS, Sapino ML (2013) LR-PPR: locality-sensitive, re-use promoting, approximate personalized PageRank computation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 1801–1806 Kim HJ, Candan KS, Sapino ML (2013) LR-PPR: locality-sensitive, re-use promoting, approximate personalized PageRank computation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 1801–1806
Zurück zum Zitat Kim HN, El-Saddik A (2011) Personalized PageRank vectors for tag recommendations: inside FolkRank. In: Proceedings of the fifth ACM conference on recommender systems. ACM, New York, pp 45–52 Kim HN, El-Saddik A (2011) Personalized PageRank vectors for tag recommendations: inside FolkRank. In: Proceedings of the fifth ACM conference on recommender systems. ACM, New York, pp 45–52
Zurück zum Zitat Langville AN, Meyer CD (2004) Updating pagerank with iterative aggregation. In: Proceedings of the 13th international World Wide Web conference on alternate track papers & posters. ACM, New York, pp 392–393 Langville AN, Meyer CD (2004) Updating pagerank with iterative aggregation. In: Proceedings of the 13th international World Wide Web conference on alternate track papers & posters. ACM, New York, pp 392–393
Zurück zum Zitat Maehara T, Akiba T et al (2014) Computing personalized PageRank quickly by exploiting graph structures. In: Proceedings of the VLDB endowment, vol 7, pp 1023–1034 Maehara T, Akiba T et al (2014) Computing personalized PageRank quickly by exploiting graph structures. In: Proceedings of the VLDB endowment, vol 7, pp 1023–1034
Zurück zum Zitat Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 135–146 Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 135–146
Zurück zum Zitat Mei Q, Zhou D, Church K (2008) Query suggestion using hitting time. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 469–478 Mei Q, Zhou D, Church K (2008) Query suggestion using hitting time. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 469–478
Zurück zum Zitat Palmer C, Gibbons P, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 81–90 Palmer C, Gibbons P, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 81–90
Zurück zum Zitat Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4:179–189CrossRef Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4:179–189CrossRef
Zurück zum Zitat Sarkar P, Moore AW, Prakash A (2008) Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on machine learning. ACM, pp 896–903 Sarkar P, Moore AW, Prakash A (2008) Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on machine learning. ACM, pp 896–903
Zurück zum Zitat Sarma AD, Molla AR, Pandurangan G, Upfal E (2013) Fast distributed PageRank computation. In: Proceedings of 14th international conference on distributed computing and networking, pp 11–26 Sarma AD, Molla AR, Pandurangan G, Upfal E (2013) Fast distributed PageRank computation. In: Proceedings of 14th international conference on distributed computing and networking, pp 11–26
Zurück zum Zitat Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 404–413 Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 404–413
Zurück zum Zitat Tong H, Faloutsos C, Koren Y (2007) Fast direction-aware proximity for graph mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 747–756 Tong H, Faloutsos C, Koren Y (2007) Fast direction-aware proximity for graph mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 747–756
Zurück zum Zitat Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, pp 613–622 Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, pp 613–622
Zurück zum Zitat Wei F (2010) Tedi: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 99–110 Wei F (2010) Tedi: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 99–110
Zurück zum Zitat White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Netw 16:335–346CrossRef White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Netw 16:335–346CrossRef
Zurück zum Zitat Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 493–504 Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 493–504
Zurück zum Zitat Zhou L, Chen L, Ozsu MT (2009) Distance-join: pattern match query in a large graph. In: Proceedings of the VLDB endowment, vol 2, pp 886–897 Zhou L, Chen L, Ozsu MT (2009) Distance-join: pattern match query in a large graph. In: Proceedings of the VLDB endowment, vol 2, pp 886–897
Metadaten
Titel
Reducing seed noise in personalized PageRank
verfasst von
Shengyu Huang
Xinsheng Li
K. Selçuk Candan
Maria Luisa Sapino
Publikationsdatum
01.12.2016
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2016
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-015-0309-6

Weitere Artikel der Ausgabe 1/2016

Social Network Analysis and Mining 1/2016 Zur Ausgabe