ABSTRACT
Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph).
To address the deficiencies of existing solutions, we propose PPR, an algorithm for top-k PPR queries that ensure at least ρ precision (i.e., at least ρ fraction of the actual top-k results are returned) with at least 1 - 1/n probability, where ρ ∈;n (0, 1] is a user-specified parameter and n is the number of nodes in G. In addition, PPR offers non-trivial guarantees on query time in terms of ρ, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate PPR using a variety of benchmark datasets, and demonstrate that PPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set ρ = 1 (i.e., when PPR returns the exact top-k results). Notably, on a billion-edge Twitter graph, PPR only requires 15 seconds to answer a top-500 PPR query with ρ = 1.
- https://spark-summit.org/2017/events/random-walks-on-large-scale-graphs-with-apache-spark/.Google Scholar
- http://compbio.case.edu/omics/software/chopper/index.html.Google Scholar
- https://github.com/YubaoWu/FLoS.Google Scholar
- https://datalab.snu.ac.kr/bepi/.Google Scholar
- http://snap.stanford.edu/data/index.html.Google Scholar
- http://law.di.unimi.it/datasets.php.Google Scholar
- Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. Local computation of pagerank contributions. In WAW, pages 150--165, 2007. Google ScholarDigital Library
- Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486, 2006. Google ScholarDigital Library
- Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. Tuning bandit algorithms in stochastic environments. In ALT, volume 4754, pages 150--165. Springer, 2007. Google ScholarDigital Library
- Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, pages 635--644, 2011. Google ScholarDigital Library
- Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. Fast personalized pagerank on mapreduce. In SIGMOD, pages 973--984, 2011. Google ScholarDigital Library
- Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010. Google ScholarDigital Library
- Soumen Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarDigital Library
- Fan R. K. Chung and Lincoln Lu. Survey: Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
- Mustafa Coskun, Ananth Grama, and Mehmet Koyuturk. Efficient processing of network proximity queries via chebyshev acceleration. In KDD, pages 1515--1524, 2016. Google ScholarDigital Library
- Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarCross Ref
- Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, and Masaru Kitsuregawa. Fast and exact top-k search for random walk with restart. PVLDB, 5(5):442--453, 2012. Google ScholarDigital Library
- Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Efficient ad-hoc search for personalized pagerank. In SIGMOD, pages 445--456, 2013. Google ScholarDigital Library
- Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Fast and exact top-k algorithm for pagerank. In AAAI, 2013. Google ScholarDigital Library
- Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15--23, 2012. Google ScholarDigital Library
- Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. Distributed algorithms on exact personalized pagerank. In SIGMOD, pages 479--494, 2017. Google ScholarDigital Library
- Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. Fast algorithms for top-k personalized pagerank queries. In WWW, pages 1225--1226, 2008. Google ScholarDigital Library
- Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. Wtf: The who to follow service at twitter. In WWW, pages 505--514, 2013. Google ScholarDigital Library
- Ali Hadian, Sadegh Nobari, Behrooz Minaei-Bidgoli, and Qiang Qu. Roll: Fast in-memory generation of gigantic scale-free networks. In SIGMOD, pages 1829--1842. ACM, 2016. Google ScholarDigital Library
- Kalervo J"arvelin and Jaana Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pages 41--48, 2000. Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. Scaling personalized web search. In WWW, pages 271--279, 2003. Google ScholarDigital Library
- Jinhong Jung, Namyong Park, Sael Lee, and U Kang. Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In SIGMOD, pages 789--804, 2017. Google ScholarDigital Library
- David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing. Related pins at pinterest: The evolution of a real-world recommender system. In WWW (Companion), pages 583--592, 2017. Google ScholarDigital Library
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Bidirectional pagerank estimation: From average-case to worst-case. In WAW, pages 164--176, 2015. Google ScholarDigital Library
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
- Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD, pages 1436--1445, 2014. Google ScholarDigital Library
- Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. PVLDB, 7(12):1023--1034, 2014. Google ScholarDigital Library
- Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015. Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
- CH Ren, Luyi Mo, CM Kao, CK Cheng, and DWL Cheung. Clude: An efficient algorithm for lu decomposition over a sequence of evolving graphs. In EDBT, 2014.Google Scholar
- Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. Fast distributed pagerank computation. Theoretical Computer Science, 561:113--121, 2015. Google ScholarDigital Library
- Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. BEAR: block elimination approach for random walk with restart on large graphs. In SIGMOD, pages 1571--1585, 2015. Google ScholarDigital Library
- Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127--128, 1974.Google ScholarCross Ref
- Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016. Google ScholarDigital Library
- Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017. Google ScholarDigital Library
- Yubao Wu, Ruoming Jin, and Xiang Zhang. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In SIGMOD 2014, pages 1139--1150, 2014. Google ScholarDigital Library
- Weiren Yu and Xuemin Lin. IRWR: incremental random walk with restart. In SIGIR, pages 1017--1020, 2013. Google ScholarDigital Library
- Weiren Yu and Julie A. McCann. Random walk with restart over dynamic graphs. In ICDM, pages 589--598, 2016.Google ScholarCross Ref
- Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016. Google ScholarDigital Library
- Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB, 6(6):481--492, 2013. Google ScholarDigital Library
Index Terms
- TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs
Recommendations
Supporting top-k join queries in relational databases
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging ...
View usability and safety for the answering of top-k queries via materialized views
DOLAP '09: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAPIn this paper, we investigate the problem of answering top-k queries via materialized views. We provide theoretical guarantees for the adequacy of a view to answer a top-k query, along with algorithmic techniques to compute the query via a view when ...
Parallel data access for multiway rank joins
ICWE'11: Proceedings of the 11th international conference on Web engineeringRank join operators perform a relational join among two or more relations, assign numeric scores to the join results based on the given scoring function and return K join results with the highest scores. The top-K join results are obtained by accessing ...
Comments