skip to main content
10.1145/3299869.3319873acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Authors Info & Claims
Published:25 June 2019Publication History

ABSTRACT

SimRank is a classic measure of the similarities of nodes in a graph. Given a node u in graph $G =(V, E)$, a \em single-source SimRank query returns the SimRank similarities $s(u, v)$ between node u and each node $v \in V$. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes n, which renders them inapplicable for real-time and interactive analysis. This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size $O(m)$, where m is the number of edges in the graph, and guarantees a query time that depends on the \em reverse PageRank distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph. Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.

References

  1. http://snap.stanford.edu/data/index.html.Google ScholarGoogle Scholar
  2. http://law.di.unimi.it/datasets.php.Google ScholarGoogle Scholar
  3. Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. Hyperbolic graph generator. Computer Physics Communications, 196:492--496, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. SimrankGoogle ScholarGoogle Scholar
  5. : query rewriting through link analysis of the click graph. PVLDB, 1(1):408--421, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mansurul Bhuiyan and Mohammad Al Hasan. Representing graphs as bag of vertices and partitions for graph classification. Data Science and Engineering, 3(2):150--165, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  8. Bé la Bollobá s, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In SODA, pages 132--139, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pawel Brach, Marek Cygan, Jakub Lkacki, and Piotr Sankowski. Algorithmic complexity of power law networks. In SODA, pages 1306--1325. Society for Industrial and Applied Mathematics, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fan R. K. Chung and Lincoln Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dá niel Fogaras and Balá zs Rá cz. Scaling link-based similarity search. In WWW, pages 641--650, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dá niel Fogaras, Balá zs Rá cz, Ká roly Csalogá ny, and Tamá s Sarló s. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yuichiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, and Makoto Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. Parallel simrank computation on large graphs with iterative aggregation. In KDD, pages 543--552, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Minhao Jiang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong. Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB, 10(9):937--948, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ruoming Jin, Victor E Lee, and Hui Hong. Axiomatic ranking of network role similarity. In KDD, pages 922--930, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. JooYoung Lee and Rustam Tukhvatov. Evaluations of similarity measures on vk for link prediction. Data Science and Engineering, 3(3):277--289, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  21. Pei Lee, Laks V. S. Lakshmanan, and Jeffrey Xu Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Liben-Nowell and Jon M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  25. Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, and Zhewei Wei. Towards maximum independent sets on massive graphs. VLDB, 8(13):2122--2133, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB, 11(1):14--26, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. Accuracy estimate and optimization techniques for simrank computation. VLDB J., 19(1):45--66, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. Efficient simrank computation via linearization. CoRR, abs/1411.7228, 2014.Google ScholarGoogle Scholar
  30. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google ScholarGoogle Scholar
  31. Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. SIGKDD Explorations, 13(2):50--64, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Boyu Tian and Xiaokui Xiao. SLING: A near-optimal index structure for simrank. In SIGMOD, pages 1859--1874, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sibo Wang and Yufei Tao. Efficient algorithms for finding approximate heavy hitters in personalized pageranks. In SIGMOD, pages 1113--1127. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yue Wang, Xiang Lian, and Lei Chen. Efficient simrank tracking in dynamic graphs. In ICDE, pages 545--556. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wensi Xi, Edward A Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Qi Ye, Changlei Zhu, Gang Li, Zhimin Liu, and Feng Wang. Using node identifiers and community prior for graph-based classification. Data Science and Engineering, 3(1):68--83, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  39. Weiren Yu, Xuemin Lin, and Wenjie Zhang. Fast incremental simrank on link-evolving graphs. In ICDE, pages 304--315, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  40. Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Weiren Yu and Julie McCann. Gauging correct relative rankings for similarity search. In CIKM, pages 1791--1794, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Weiren Yu and Julie A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Weiren Yu and Julie Ann McCann. High quality graph-based similarity search. In SIGIR, pages 83--92, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Weiren Yu, Wenjie Zhang, Xuemin Lin, Qing Zhang, and Jiajin Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3):327--353, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast top-k similarity search on large networks. In SIGKDD, pages 1445--1454. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of simrank-based similarity search algorithms. PVLDB, 10(5):601--612, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
        June 2019
        2106 pages
        ISBN:9781450356435
        DOI:10.1145/3299869

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader