skip to main content
10.1145/3183713.3196919acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks

Authors Info & Claims
Published:27 May 2018Publication History

ABSTRACT

Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR of t with respect to s is the probability that a random walk starting from s terminates at t. The average of the personalized PageRank score of t with respect to each source node υ∈ V is exactly the PageRank score π( t ) of node t , which denotes the overall importance of node t in the graph. A heavy hitter of node t is a node whose contribution to π( t ) is above a φ fraction, where φ is a value between 0 and 1. Finding heavy hitters has important applications in link spam detection, classification of web pages, and friend recommendations.

In this paper, we propose BLOG, an efficient framework for three types of heavy hitter queries: the pairwise approximate heavy hitter (AHH), the reverse AHH, and the multi-source reverse AHH queries. For pairwise AHH queries, our algorithm combines the Monte-Carlo approach and the backward propagation approach to reduce the cost of both methods, and incorporates new techniques to deal with high in-degree nodes. For reverse AHH and multi-source reverse AHH queries, our algorithm extends the ideas behind the pairwise AHH algorithm with a new "logarithmic bucketing'' technique to improve the query efficiency. Extensive experiments demonstrate that our BLOG is far more efficient than alternative solutions on the three queries.

References

  1. Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. 2007. Local Computation of PageRank Contributions. In WAW. 150--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local Graph Partitioning using PageRank Vectors. In FOCS. 475--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. 2007. Monte Carlo methods in PageRank computation: When one iteration is sufficient. SIAM J. Numerical Analysis Vol. 45, 2 (2007), 890--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM. 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In SIGMOD. 973--984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. András A. Benczúr, Károly Csalogány, Tamás Sarlós, and Máté Uher. 2005. SpamRank -- Fully Automatic Link Spam Detection. In AIRWeb. 25--38.Google ScholarGoogle Scholar
  7. Pavel Berkhin. 2005. Survey: A Survey on PageRank Computing. Internet Mathematics Vol. 2, 1 (2005), 73--120.Google ScholarGoogle ScholarCross RefCross Ref
  8. Pavel Berkhin. 2006. Bookmark-coloring algorithm for personalized pagerank computing. Internet Mathematics Vol. 3, 1 (2006), 41--62.Google ScholarGoogle ScholarCross RefCross Ref
  9. Soumen Chakrabarti. 2007. Dynamic personalized pagerank in entity-relation graphs WWW. 571--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fan R. K. Chung and Lincoln Lu. 2006. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Mathematics Vol. 3, 1 (2006), 79--127.Google ScholarGoogle ScholarCross RefCross Ref
  11. Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics Vol. 2, 3 (2005), 333--358.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance KDD. 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017 a. Distributed Algorithms on Exact Personalized PageRank SIGMOD. 479--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wentian Guo, Yuchen Li, Mo Sha, and Kian-Lee Tan. 2017 b. Parallel Personalized Pagerank on Dynamic Graphs. PVLDB Vol. 11, 1 (2017), 93--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. Wtf: The who to follow service at twitter. In WWW. 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zoltán Gyöngyi, Pavel Berkhin, Hector Garcia-Molina, and Jan O. Pedersen. 2006. Link Spam Detection Based on Mass Estimation. In VLDB. 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. 2006. Web content categorization using link information. Technical Report. Stanford.Google ScholarGoogle Scholar
  18. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jinhong Jung, Namyong Park, Lee Sael, and U. Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jinhong Jung, Kijung Shin, Lee Sael, and U. Kang. 2016. Random Walk with Restart on Large Graphs Using Block Elimination. ACM Trans. Database Syst. Vol. 41, 2 (2016), 12:1--12:43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. JASIST Vol. 58, 7 (2007), 1019--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized pagerank estimation and search: A bidirectional approach WSDM. 163--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Peter Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs KDD. 1436--1445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized PageRank quickly by exploiting graph structures. PVLDB Vol. 7, 12 (2014), 1023--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks. In SIGKDD 2015. 875--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).Google ScholarGoogle Scholar
  27. Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2013. Fast Distributed PageRank Computation. In ICDCN. 11--26.Google ScholarGoogle Scholar
  28. Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. 2015. BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs SIGMOD. 1571--1585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB Vol. 10, 3 (2016), 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate Personalized PageRank on Dynamic Graphs KDD. 1315--1324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation. PVLDB Vol. 6, 6 (2013), 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
      May 2018
      1874 pages
      ISBN:9781450347037
      DOI:10.1145/3183713

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 May 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader