nach oben

Social Network Analysis and Mining

Erschienen in:

01.12.2016 | Original Article

Reducing seed noise in personalized PageRank

verfasst von: Shengyu Huang, Xinsheng Li, K. Selçuk Candan, Maria Luisa Sapino

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

Vorheriger Artikel Analysis and detection of labeled cyberbullying instances in Vine, a video-based social network

Nächster Artikel A penalty box approach for approximation betweenness and closeness centrality algorithms

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Otherwise the seed set would have been constructed differently.

Andersen R, Borgs C, Chayes J, Feige U, Flaxman A, Kalai A, Mirrokni V, Tennenholtz M (2008) Trust-based recommendation systems: an axiomatic approach. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 199–208

Avrachenkov K, Litvak N, Nemirovsky D, Smirnova E, Sokol M (2011) Quick detection of top-k personalized PageRank lists. In: Algorithms and Models for the Web Graph. Springer, Berlin Heidelberg, pp 50–61

Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized PageRank on MapReduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 973–984

Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. In: Proceedings of the VLDB Endowment, vol 4, pp 173–184

Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on very large data bases. Morgan Kaufman, San Francisco, pp 564–575

Berkhin P (2007) Bookmark-coloring approach to personalized pagerank computing. Int Math 3(1):41–62MathSciNetMATH

Boldi P, Rosa M, Vigna S (2011) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th international conference on World Wide Web. ACM, New York, pp 625–634

Borgs C, Brautbar M, Chayes J, Teng SH (2014) Multiscale matrix sampling and sublinear-time pagerank computation. Int Math 10(1–2):20–48MathSciNet

Borgatti MG, Jones C, Everett MG (1998) Network measures of social capital. Connections 21(2):27–36

Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117CrossRef

Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 25–32

Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the fourth European conference on machine learning and principles and practice of knowledge discovery in databases, pp 294–305

Candan KS, Li WS (2002) Reasoning for Web document associations and its applications in site map construction. Data Knowl Eng 43(2):121–150CrossRefMATH

Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, pp 571–580

Chen M, Liu J, Tang X (2008) Clustering via random walk hitting time on directed graphs. In: Proceedings of the 23rd national conference on Artificial intelligence, pp 616–621

Cohen E, Halperin E, Kaplan H, Zwick U (2003) Reachability and distance queries via 2-hop labels. SIAM J Comput 32(5):1338–1355MathSciNetCrossRefMATH

Csalogany K, Fogaras D, Rácz B, Sarlós T (2005) Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Int Math 2(3):333–358MathSciNetMATH

Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadephia, PA, pp 1–211CrossRefMATH

Foster KC, Muth SQ, Potterat JJ, Rothenberg RB (2001) A faster Katz status score algorithm. Comput Math Organ Theo 7(4):275–285CrossRef

Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transact Knowl Data Eng 5:1041–4347

Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. In: Proceedings of the VLDB Endowment, vol 5, pp 442–453

Guan Z, Bu J, Mei Q, Chen C, Wang C (2009) Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 540–547

Gupta M, Pathak A, Chakrabarti S (2008) Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the 17th international conference on World Wide Web, pp 1225–1226

Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 517–526

Jeh G, Widom J (2003) Scaling personalized web. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 271–279

Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43CrossRefMATH

Kamvar SD, Haveliwala T, Manning CD, Golub G (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 261–270

Kim HJ, Candan KS, Sapino ML (2013) LR-PPR: locality-sensitive, re-use promoting, approximate personalized PageRank computation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 1801–1806

Kim HN, El-Saddik A (2011) Personalized PageRank vectors for tag recommendations: inside FolkRank. In: Proceedings of the fifth ACM conference on recommender systems. ACM, New York, pp 45–52

Langville AN, Meyer CD (2004) Updating pagerank with iterative aggregation. In: Proceedings of the 13th international World Wide Web conference on alternate track papers & posters. ACM, New York, pp 392–393

Maehara T, Akiba T et al (2014) Computing personalized PageRank quickly by exploiting graph structures. In: Proceedings of the VLDB endowment, vol 7, pp 1023–1034

Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 135–146

Mei Q, Zhou D, Church K (2008) Query suggestion using hitting time. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 469–478

Palmer C, Gibbons P, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 81–90

Piegorsch W, Casella GE (1990) Inverting a sum of matrices. SIAM Rev 32:470MathSciNetCrossRefMATH

Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4:179–189CrossRef

Sarkar P, Moore AW, Prakash A (2008) Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on machine learning. ACM, pp 896–903

Sarma AD, Molla AR, Pandurangan G, Upfal E (2013) Fast distributed PageRank computation. In: Proceedings of 14th international conference on distributed computing and networking, pp 11–26

Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 404–413

Tong H, Faloutsos C, Koren Y (2007) Fast direction-aware proximity for graph mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 747–756

Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, pp 613–622

Wei F (2010) Tedi: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 99–110

White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Netw 16:335–346CrossRef

Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 493–504

Zhou L, Chen L, Ozsu MT (2009) Distance-join: pattern match query in a large graph. In: Proceedings of the VLDB endowment, vol 2, pp 886–897

Titel: Reducing seed noise in personalized PageRank
verfasst von: Shengyu Huang
Xinsheng Li
K. Selçuk Candan
Maria Luisa Sapino
Publikationsdatum: 01.12.2016
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2016
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-015-0309-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2016

Alpha-anonymization techniques for privacy preservation in social networks

Reply trees in Twitter: data analysis and branching process models

Multi-source models for civil unrest forecasting

On evaluating the effectiveness of rating similarity-based trust

A new approach to role and position detection in networks

Characterization of cross-posting activity for professional users across Facebook, Twitter and Google+