research-article

HubPPR: effective indexing for approximate personalized pagerank

Authors:
Sibo Wang

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Youze Tang

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Xiaokui Xiao

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Yin Yang

Hamad Bin Khalifa University, Qatar

Hamad Bin Khalifa University, Qatar
View Profile

,
Zengxiang Li

Institute of High Performance Computing, A*STAR, Singapore

Institute of High Performance Computing, A*STAR, Singapore
View Profile

Proceedings of the VLDB Endowment Volume 10 Issue 3pp 205–216https://doi.org/10.14778/3021924.3021936

Published:01 November 2016Publication History

Proceedings of the VLDB Endowment

Abstract

Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query Π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions.

Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.

References

https://sites.google.com/site/hubpprtr2016/.Google Scholar
http://snap.stanford.edu/data.Google Scholar
R. Andersen, C. Borgs, J. T. Chayes, J. E. Hopcroft, V. S. Mirrokni, and S. Teng. Local computation of pagerank contributions. In WAW, pages 150--165, 2007. Google ScholarDigital Library
R. Andersen, F. R. K. Chung, and K. J. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486, 2006. Google ScholarDigital Library
K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte carlo methods in pagerank computation: When one iteration is sufficient. SIAM J. Numerical Analysis, 45(2):890--904, 2007. Google ScholarDigital Library
K. Avrachenkov, N. Litvak, D. Nemirovsky, E. Smirnova, and M. Sokol. Quick detection of top-k personalized pagerank lists. In WAW, pages 50--61, 2011. Google ScholarDigital Library
L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, pages 635--644, 2011. Google ScholarDigital Library
B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalized pagerank on mapreduce. In SIGMOD, pages 973--984, 2011. Google ScholarDigital Library
P. Berkhin. Survey: A survey on pagerank computing. Internet Mathematics, 2(1):73--120, 2005.Google ScholarCross Ref
P. Berkhin. Bookmark-coloring algorithm for personalized pagerank computing. Internet Mathematics, 3(1):41--62, 2006.Google ScholarCross Ref
S. Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarDigital Library
F. R. K. Chung and L. Lu. Survey: Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarCross Ref
Y. Fujiwara, M. Nakatsuji, H. Shiokawa, T. Mishima, and M. Onizuka. Efficient ad-hoc search for personalized pagerank. In SIGMOD, pages 445--456, 2013. Google ScholarDigital Library
Y. Fujiwara, M. Nakatsuji, T. Yamamuro, H. Shiokawa, and M. Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15--23, 2012. Google ScholarDigital Library
F. L. Gall. Powers of tensors and fast matrix multiplication. In ISSAC, pages 296--303, 2014. Google ScholarDigital Library
M. S. Gupta, A. Pathak, and S. Chakrabarti. Fast algorithms for topk personalized pagerank queries. In WWW, pages 1225--1226, 2008. Google ScholarDigital Library
P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. Wtf: The who to follow service at twitter. In WWW, pages 505--514, 2013. Google ScholarDigital Library
G. Iván and V. Grolmusz. When the web meets the cell: using personalized pagerank for analyzing protein interaction networks. Bioinformatics, 27(3):405--407, 2011. Google ScholarDigital Library
G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003. Google ScholarDigital Library
P. Lofgren, S. Banerjee, and A. Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
P. A. Lofgren, S. Banerjee, A. Goel, and C. Seshadhri. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD, pages 1436--1445, 2014. Google ScholarDigital Library
T. Maehara, T. Akiba, Y. Iwata, and K.-i. Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. PVLDB, 7(12):1023--1034, 2014. Google ScholarDigital Library
R. Motwani and P. Raghavan. Randomized algorithms. Chapman & Hall/CRC, 2010. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
A. D. Sarma, A. R. Molla, G. Pandurangan, and E. Upfal. Fast distributed pagerank computation. In ICDCN, pages 11--26, 2013.Google Scholar
K. Shin, J. Jung, L. Sael, and U. Kang. BEAR: block elimination approach for random walk with restart on large graphs. In SIGMOD, pages 1571--1585, 2015. Google ScholarDigital Library
D. Williams. Probability with martingales. Cambridge university press, 1991.Google Scholar
F. Zhu, Y. Fang, K. C. Chang, and J. Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB, 6(6):481--492, 2013. Google ScholarDigital Library

Index Terms

HubPPR: effective indexing for approximate personalized pagerank
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Read More
Approximating expressive queries on graph-modeled data

We present GeX for the approximate matching of complex queries on graph-modeled data.GeX generalizes existing approaches and allows for querying any graph-based datasets.GeX query language supports queries ranging from keyword-based to complex ones.GeX ...
Read More
Scalable and efficient processing of top-k multiple-type integrated queries
Abstract
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 10, Issue 3
November 2016
216 pages
ISSN:2150-8097
Editors:
Peter Boncz
CWI
,
Ken Salem
University of Waterloo
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 November 2016
Published in pvldb Volume 10, Issue 3
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 202
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HubPPR: effective indexing for approximate personalized pagerank

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Equivalence and minimization of conjunctive queries under combined semantics

Approximating expressive queries on graph-modeled data

Scalable and efficient processing of top-k multiple-type integrated queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

HubPPR: effective indexing for approximate personalized pagerank

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Equivalence and minimization of conjunctive queries under combined semantics

Approximating expressive queries on graph-modeled data

Scalable and efficient processing of top-k multiple-type integrated queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media