research-article

Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks

Authors:
Sibo Wang

University of Queensland, Brisbane, Australia

University of Queensland, Brisbane, Australia
View Profile

,
Yufei Tao

Chinese University of Hong Kong, Hong Kong, Hong Kong

Chinese University of Hong Kong, Hong Kong, Hong Kong
View Profile

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018Pages 1113–1127https://doi.org/10.1145/3183713.3196919

Published:27 May 2018Publication History

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 1113–1127

ABSTRACT

Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR of t with respect to s is the probability that a random walk starting from s terminates at t. The average of the personalized PageRank score of t with respect to each source node υ∈ V is exactly the PageRank score π( t ) of node t , which denotes the overall importance of node t in the graph. A heavy hitter of node t is a node whose contribution to π( t ) is above a φ fraction, where φ is a value between 0 and 1. Finding heavy hitters has important applications in link spam detection, classification of web pages, and friend recommendations.

In this paper, we propose BLOG, an efficient framework for three types of heavy hitter queries: the pairwise approximate heavy hitter (AHH), the reverse AHH, and the multi-source reverse AHH queries. For pairwise AHH queries, our algorithm combines the Monte-Carlo approach and the backward propagation approach to reduce the cost of both methods, and incorporates new techniques to deal with high in-degree nodes. For reverse AHH and multi-source reverse AHH queries, our algorithm extends the ideas behind the pairwise AHH algorithm with a new "logarithmic bucketing'' technique to improve the query efficiency. Extensive experiments demonstrate that our BLOG is far more efficient than alternative solutions on the three queries.

References

Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. 2007. Local Computation of PageRank Contributions. In WAW. 150--165. Google ScholarDigital Library
Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local Graph Partitioning using PageRank Vectors. In FOCS. 475--486. Google ScholarDigital Library
Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. 2007. Monte Carlo methods in PageRank computation: When one iteration is sufficient. SIAM J. Numerical Analysis Vol. 45, 2 (2007), 890--904. Google ScholarDigital Library
Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM. 635--644. Google ScholarDigital Library
Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In SIGMOD. 973--984. Google ScholarDigital Library
András A. Benczúr, Károly Csalogány, Tamás Sarlós, and Máté Uher. 2005. SpamRank -- Fully Automatic Link Spam Detection. In AIRWeb. 25--38.Google Scholar
Pavel Berkhin. 2005. Survey: A Survey on PageRank Computing. Internet Mathematics Vol. 2, 1 (2005), 73--120.Google ScholarCross Ref
Pavel Berkhin. 2006. Bookmark-coloring algorithm for personalized pagerank computing. Internet Mathematics Vol. 3, 1 (2006), 41--62.Google ScholarCross Ref
Soumen Chakrabarti. 2007. Dynamic personalized pagerank in entity-relation graphs WWW. 571--580. Google ScholarDigital Library
Fan R. K. Chung and Lincoln Lu. 2006. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Mathematics Vol. 3, 1 (2006), 79--127.Google ScholarCross Ref
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics Vol. 2, 3 (2005), 333--358.Google ScholarCross Ref
Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance KDD. 15--23. Google ScholarDigital Library
Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017 a. Distributed Algorithms on Exact Personalized PageRank SIGMOD. 479--494. Google ScholarDigital Library
Wentian Guo, Yuchen Li, Mo Sha, and Kian-Lee Tan. 2017 b. Parallel Personalized Pagerank on Dynamic Graphs. PVLDB Vol. 11, 1 (2017), 93--106. Google ScholarDigital Library
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. Wtf: The who to follow service at twitter. In WWW. 505--514. Google ScholarDigital Library
Zoltán Gyöngyi, Pavel Berkhin, Hector Garcia-Molina, and Jan O. Pedersen. 2006. Link Spam Detection Based on Mass Estimation. In VLDB. 439--450. Google ScholarDigital Library
Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. 2006. Web content categorization using link information. Technical Report. Stanford.Google Scholar
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google ScholarDigital Library
Jinhong Jung, Namyong Park, Lee Sael, and U. Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804. Google ScholarDigital Library
Jinhong Jung, Kijung Shin, Lee Sael, and U. Kang. 2016. Random Walk with Restart on Large Graphs Using Block Elimination. ACM Trans. Database Syst. Vol. 41, 2 (2016), 12:1--12:43. Google ScholarDigital Library
David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. JASIST Vol. 58, 7 (2007), 1019--1031. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized pagerank estimation and search: A bidirectional approach WSDM. 163--172. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs KDD. 1436--1445. Google ScholarDigital Library
Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized PageRank quickly by exploiting graph structures. PVLDB Vol. 7, 12 (2014), 1023--1034. Google ScholarDigital Library
Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks. In SIGKDD 2015. 875--884. Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).Google Scholar
Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2013. Fast Distributed PageRank Computation. In ICDCN. 11--26.Google Scholar
Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. 2015. BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs SIGMOD. 1571--1585. Google ScholarDigital Library
Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB Vol. 10, 3 (2016), 205--216. Google ScholarDigital Library
Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514. Google ScholarDigital Library
Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate Personalized PageRank on Dynamic Graphs KDD. 1315--1324. Google ScholarDigital Library
Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation. PVLDB Vol. 6, 6 (2013), 481--492. Google ScholarDigital Library

Index Terms

Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Beating CountSketch for heavy hitters in insertion streams
STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

Given a stream p₁, …, p_m of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ₂-heavy hitters, i.e., those items j for which f_j ≥ є √F₂, where f_j is ...
Read More
Finding Subcube Heavy Hitters in Analytics Data Streams
WWW '18: Proceedings of the 2018 World Wide Web Conference

Modern data streams typically have high dimensionality. For example, digital analytics streams consist of user online activities (e.g., web browsing activity, commercial site activity, apps and social behavior, and response to ads). An important problem ...
Read More
Conditional heavy hitters: detecting interesting correlations in data streams

The notion of heavy hitters--items that make up a large fraction of the population--has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
approximate algorithms
heavy hitters
personalized pagerank
social recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 420
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beating CountSketch for heavy hitters in insertion streams

Finding Subcube Heavy Hitters in Analytics Data Streams

Conditional heavy hitters: detecting interesting correlations in data streams

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beating CountSketch for heavy hitters in insertion streams

Finding Subcube Heavy Hitters in Analytics Data Streams

Conditional heavy hitters: detecting interesting correlations in data streams

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media