Abstract
Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query Π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions.
Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.
- https://sites.google.com/site/hubpprtr2016/.Google Scholar
- http://snap.stanford.edu/data.Google Scholar
- R. Andersen, C. Borgs, J. T. Chayes, J. E. Hopcroft, V. S. Mirrokni, and S. Teng. Local computation of pagerank contributions. In WAW, pages 150--165, 2007. Google ScholarDigital Library
- R. Andersen, F. R. K. Chung, and K. J. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486, 2006. Google ScholarDigital Library
- K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte carlo methods in pagerank computation: When one iteration is sufficient. SIAM J. Numerical Analysis, 45(2):890--904, 2007. Google ScholarDigital Library
- K. Avrachenkov, N. Litvak, D. Nemirovsky, E. Smirnova, and M. Sokol. Quick detection of top-k personalized pagerank lists. In WAW, pages 50--61, 2011. Google ScholarDigital Library
- L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, pages 635--644, 2011. Google ScholarDigital Library
- B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalized pagerank on mapreduce. In SIGMOD, pages 973--984, 2011. Google ScholarDigital Library
- P. Berkhin. Survey: A survey on pagerank computing. Internet Mathematics, 2(1):73--120, 2005.Google ScholarCross Ref
- P. Berkhin. Bookmark-coloring algorithm for personalized pagerank computing. Internet Mathematics, 3(1):41--62, 2006.Google ScholarCross Ref
- S. Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarDigital Library
- F. R. K. Chung and L. Lu. Survey: Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
- D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarCross Ref
- Y. Fujiwara, M. Nakatsuji, H. Shiokawa, T. Mishima, and M. Onizuka. Efficient ad-hoc search for personalized pagerank. In SIGMOD, pages 445--456, 2013. Google ScholarDigital Library
- Y. Fujiwara, M. Nakatsuji, T. Yamamuro, H. Shiokawa, and M. Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15--23, 2012. Google ScholarDigital Library
- F. L. Gall. Powers of tensors and fast matrix multiplication. In ISSAC, pages 296--303, 2014. Google ScholarDigital Library
- M. S. Gupta, A. Pathak, and S. Chakrabarti. Fast algorithms for topk personalized pagerank queries. In WWW, pages 1225--1226, 2008. Google ScholarDigital Library
- P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. Wtf: The who to follow service at twitter. In WWW, pages 505--514, 2013. Google ScholarDigital Library
- G. Iván and V. Grolmusz. When the web meets the cell: using personalized pagerank for analyzing protein interaction networks. Bioinformatics, 27(3):405--407, 2011. Google ScholarDigital Library
- G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003. Google ScholarDigital Library
- P. Lofgren, S. Banerjee, and A. Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
- P. A. Lofgren, S. Banerjee, A. Goel, and C. Seshadhri. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD, pages 1436--1445, 2014. Google ScholarDigital Library
- T. Maehara, T. Akiba, Y. Iwata, and K.-i. Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. PVLDB, 7(12):1023--1034, 2014. Google ScholarDigital Library
- R. Motwani and P. Raghavan. Randomized algorithms. Chapman & Hall/CRC, 2010. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
- A. D. Sarma, A. R. Molla, G. Pandurangan, and E. Upfal. Fast distributed pagerank computation. In ICDCN, pages 11--26, 2013.Google Scholar
- K. Shin, J. Jung, L. Sael, and U. Kang. BEAR: block elimination approach for random walk with restart on large graphs. In SIGMOD, pages 1571--1585, 2015. Google ScholarDigital Library
- D. Williams. Probability with martingales. Cambridge university press, 1991.Google Scholar
- F. Zhu, Y. Fang, K. C. Chang, and J. Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB, 6(6):481--492, 2013. Google ScholarDigital Library
Index Terms
- HubPPR: effective indexing for approximate personalized pagerank
Recommendations
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Approximating expressive queries on graph-modeled data
We present GeX for the approximate matching of complex queries on graph-modeled data.GeX generalizes existing approaches and allows for querying any graph-based datasets.GeX query language supports queries ranging from keyword-based to complex ones.GeX ...
Scalable and efficient processing of top-k multiple-type integrated queries
AbstractIn this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Comments