research-article

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

Authors:
Zhewei Wei

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Xiaodong He

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Xiaokui Xiao

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Sibo Wang

University of Queensland, Brisbane, Australia

University of Queensland, Brisbane, Australia
View Profile

,
Shuo Shang

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
View Profile

,
Ji-Rong Wen

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018Pages 441–456https://doi.org/10.1145/3183713.3196920

Published:27 May 2018Publication History

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 441–456

ABSTRACT

Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph).

To address the deficiencies of existing solutions, we propose PPR, an algorithm for top-k PPR queries that ensure at least ρ precision (i.e., at least ρ fraction of the actual top-k results are returned) with at least 1 - 1/n probability, where ρ ∈;n (0, 1] is a user-specified parameter and n is the number of nodes in G. In addition, PPR offers non-trivial guarantees on query time in terms of ρ, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate PPR using a variety of benchmark datasets, and demonstrate that PPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set ρ = 1 (i.e., when PPR returns the exact top-k results). Notably, on a billion-edge Twitter graph, PPR only requires 15 seconds to answer a top-500 PPR query with ρ = 1.

References

https://spark-summit.org/2017/events/random-walks-on-large-scale-graphs-with-apache-spark/.Google Scholar
http://compbio.case.edu/omics/software/chopper/index.html.Google Scholar
https://github.com/YubaoWu/FLoS.Google Scholar
https://datalab.snu.ac.kr/bepi/.Google Scholar
http://snap.stanford.edu/data/index.html.Google Scholar
http://law.di.unimi.it/datasets.php.Google Scholar
Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. Local computation of pagerank contributions. In WAW, pages 150--165, 2007. Google ScholarDigital Library
Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486, 2006. Google ScholarDigital Library
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. Tuning bandit algorithms in stochastic environments. In ALT, volume 4754, pages 150--165. Springer, 2007. Google ScholarDigital Library
Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, pages 635--644, 2011. Google ScholarDigital Library
Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. Fast personalized pagerank on mapreduce. In SIGMOD, pages 973--984, 2011. Google ScholarDigital Library
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010. Google ScholarDigital Library
Soumen Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarDigital Library
Fan R. K. Chung and Lincoln Lu. Survey: Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
Mustafa Coskun, Ananth Grama, and Mehmet Koyuturk. Efficient processing of network proximity queries via chebyshev acceleration. In KDD, pages 1515--1524, 2016. Google ScholarDigital Library
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarCross Ref
Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, and Masaru Kitsuregawa. Fast and exact top-k search for random walk with restart. PVLDB, 5(5):442--453, 2012. Google ScholarDigital Library
Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Efficient ad-hoc search for personalized pagerank. In SIGMOD, pages 445--456, 2013. Google ScholarDigital Library
Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Fast and exact top-k algorithm for pagerank. In AAAI, 2013. Google ScholarDigital Library
Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15--23, 2012. Google ScholarDigital Library
Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. Distributed algorithms on exact personalized pagerank. In SIGMOD, pages 479--494, 2017. Google ScholarDigital Library
Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. Fast algorithms for top-k personalized pagerank queries. In WWW, pages 1225--1226, 2008. Google ScholarDigital Library
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. Wtf: The who to follow service at twitter. In WWW, pages 505--514, 2013. Google ScholarDigital Library
Ali Hadian, Sadegh Nobari, Behrooz Minaei-Bidgoli, and Qiang Qu. Roll: Fast in-memory generation of gigantic scale-free networks. In SIGMOD, pages 1829--1842. ACM, 2016. Google ScholarDigital Library
Kalervo J"arvelin and Jaana Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pages 41--48, 2000. Google ScholarDigital Library
Glen Jeh and Jennifer Widom. Scaling personalized web search. In WWW, pages 271--279, 2003. Google ScholarDigital Library
Jinhong Jung, Namyong Park, Sael Lee, and U Kang. Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In SIGMOD, pages 789--804, 2017. Google ScholarDigital Library
David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing. Related pins at pinterest: The evolution of a real-world recommender system. In WWW (Companion), pages 583--592, 2017. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Bidirectional pagerank estimation: From average-case to worst-case. In WAW, pages 164--176, 2015. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD, pages 1436--1445, 2014. Google ScholarDigital Library
Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. PVLDB, 7(12):1023--1034, 2014. Google ScholarDigital Library
Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015. Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
CH Ren, Luyi Mo, CM Kao, CK Cheng, and DWL Cheung. Clude: An efficient algorithm for lu decomposition over a sequence of evolving graphs. In EDBT, 2014.Google Scholar
Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. Fast distributed pagerank computation. Theoretical Computer Science, 561:113--121, 2015. Google ScholarDigital Library
Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. BEAR: block elimination approach for random walk with restart on large graphs. In SIGMOD, pages 1571--1585, 2015. Google ScholarDigital Library
Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127--128, 1974.Google ScholarCross Ref
Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016. Google ScholarDigital Library
Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017. Google ScholarDigital Library
Yubao Wu, Ruoming Jin, and Xiang Zhang. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In SIGMOD 2014, pages 1139--1150, 2014. Google ScholarDigital Library
Weiren Yu and Xuemin Lin. IRWR: incremental random walk with restart. In SIGIR, pages 1017--1020, 2013. Google ScholarDigital Library
Weiren Yu and Julie A. McCann. Random walk with restart over dynamic graphs. In ICDM, pages 589--598, 2016.Google ScholarCross Ref
Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016. Google ScholarDigital Library
Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB, 6(6):481--492, 2013. Google ScholarDigital Library

Index Terms

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs
1. Information systems
  1. Information systems applications
    1. Data mining
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Supporting top-k join queries in relational databases

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging ...
Read More
View usability and safety for the answering of top-k queries via materialized views
DOLAP '09: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP

In this paper, we investigate the problem of answering top-k queries via materialized views. We provide theoretical guarantees for the adequacy of a view to answer a top-k query, along with algorithmic techniques to compute the query via a view when ...
Read More
Parallel data access for multiway rank joins
ICWE'11: Proceedings of the 11th international conference on Web engineering

Rank join operators perform a relational join among two or more relations, assign numeric scores to the join results based on the given scoring function and return K join results with the highest scores. The top-K join results are obtained by accessing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
personalized pagerank
top-k queries
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 890
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Supporting top-k join queries in relational databases

View usability and safety for the answering of top-k queries via materialized views

Parallel data access for multiway rank joins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Supporting top-k join queries in relational databases

View usability and safety for the answering of top-k queries via materialized views

Parallel data access for multiway rank joins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media