research-article

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Authors:
Zhewei Wei

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Xiaodong He

4Paradigm Inc., Beijing, China

4Paradigm Inc., Beijing, China
View Profile

,
Xiaokui Xiao

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Sibo Wang

Chinese University of Hong Kong, Hong Kong, Hong Kong

Chinese University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Yu Liu

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Xiaoyong Du

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Ji-Rong Wen

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019Pages 1042–1059https://doi.org/10.1145/3299869.3319873

Published:25 June 2019Publication History

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Pages 1042–1059

ABSTRACT

SimRank is a classic measure of the similarities of nodes in a graph. Given a node u in graph $G =(V, E)$, a \em single-source SimRank query returns the SimRank similarities $s(u, v)$ between node u and each node $v \in V$. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes n, which renders them inapplicable for real-time and interactive analysis. This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size $O(m)$, where m is the number of edges in the graph, and guarantees a query time that depends on the \em reverse PageRank distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph. Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.

References

http://snap.stanford.edu/data/index.html.Google Scholar
http://law.di.unimi.it/datasets.php.Google Scholar
Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. Hyperbolic graph generator. Computer Physics Communications, 196:492--496, 2015.Google ScholarCross Ref
Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. SimrankGoogle Scholar
: query rewriting through link analysis of the click graph. PVLDB, 1(1):408--421, 2008. Google ScholarDigital Library
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010.Google ScholarDigital Library
Mansurul Bhuiyan and Mohammad Al Hasan. Representing graphs as bag of vertices and partitions for graph classification. Data Science and Engineering, 3(2):150--165, 2018.Google ScholarCross Ref
Bé la Bollobá s, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In SODA, pages 132--139, 2003. Google ScholarDigital Library
Pawel Brach, Marek Cygan, Jakub Lkacki, and Piotr Sankowski. Algorithmic complexity of power law networks. In SODA, pages 1306--1325. Society for Industrial and Applied Mathematics, 2016. Google ScholarDigital Library
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703. Springer, 2002. Google ScholarDigital Library
Fan R. K. Chung and Lincoln Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
Dá niel Fogaras and Balá zs Rá cz. Scaling link-based similarity search. In WWW, pages 641--650, 2005. Google ScholarDigital Library
Dá niel Fogaras, Balá zs Rá cz, Ká roly Csalogá ny, and Tamá s Sarló s. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarCross Ref
Yuichiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, and Makoto Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013. Google ScholarDigital Library
Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. Parallel simrank computation on large graphs with iterative aggregation. In KDD, pages 543--552, 2010.Google ScholarDigital Library
Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002. Google ScholarDigital Library
Minhao Jiang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong. Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB, 10(9):937--948, 2017. Google ScholarDigital Library
Ruoming Jin, Victor E Lee, and Hui Hong. Axiomatic ranking of network role similarity. In KDD, pages 922--930, 2011. Google ScholarDigital Library
Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014. Google ScholarDigital Library
JooYoung Lee and Rustam Tukhvatov. Evaluations of similarity measures on vk for link prediction. Data Science and Engineering, 3(3):277--289, 2018.Google ScholarCross Ref
Pei Lee, Laks V. S. Lakshmanan, and Jeffrey Xu Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.Google ScholarDigital Library
Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010. Google ScholarDigital Library
Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.Google ScholarDigital Library
David Liben-Nowell and Jon M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007. Google ScholarCross Ref
Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, and Zhewei Wei. Towards maximum independent sets on massive graphs. VLDB, 8(13):2122--2133, 2015. Google ScholarDigital Library
Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB, 11(1):14--26, 2017. Google ScholarDigital Library
Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. Accuracy estimate and optimization techniques for simrank computation. VLDB J., 19(1):45--66, 2010. Google ScholarDigital Library
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. Efficient simrank computation via linearization. CoRR, abs/1411.7228, 2014.Google Scholar
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015. Google ScholarDigital Library
Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. SIGKDD Explorations, 13(2):50--64, 2011. Google ScholarDigital Library
Boyu Tian and Xiaokui Xiao. SLING: A near-optimal index structure for simrank. In SIGMOD, pages 1859--1874, 2016. Google ScholarDigital Library
Sibo Wang and Yufei Tao. Efficient algorithms for finding approximate heavy hitters in personalized pageranks. In SIGMOD, pages 1113--1127. ACM, 2018. Google ScholarDigital Library
Yue Wang, Xiang Lian, and Lei Chen. Efficient simrank tracking in dynamic graphs. In ICDE, pages 545--556. IEEE, 2018.Google ScholarCross Ref
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456. ACM, 2018. Google ScholarDigital Library
Wensi Xi, Edward A Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137. ACM, 2005. Google ScholarDigital Library
Qi Ye, Changlei Zhu, Gang Li, Zhimin Liu, and Feng Wang. Using node identifiers and community prior for graph-based classification. Data Science and Engineering, 3(1):68--83, 2018.Google ScholarCross Ref
Weiren Yu, Xuemin Lin, and Wenjie Zhang. Fast incremental simrank on link-evolving graphs. In ICDE, pages 304--315, 2014.Google ScholarCross Ref
Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013. Google ScholarDigital Library
Weiren Yu and Julie McCann. Gauging correct relative rankings for similarity search. In CIKM, pages 1791--1794, 2015. Google ScholarDigital Library
Weiren Yu and Julie A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015. Google ScholarDigital Library
Weiren Yu and Julie Ann McCann. High quality graph-based similarity search. In SIGIR, pages 83--92, 2015. Google ScholarDigital Library
Weiren Yu, Wenjie Zhang, Xuemin Lin, Qing Zhang, and Jiajin Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3):327--353, 2012. Google ScholarDigital Library
Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016. Google ScholarDigital Library
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast top-k similarity search on large networks. In SIGKDD, pages 1445--1454. ACM, 2015. Google ScholarDigital Library
Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of simrank-based similarity search algorithms. PVLDB, 10(5):601--612, 2017. Google ScholarDigital Library
Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562. ACM, 2009. Google ScholarDigital Library
Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562, 2009. Google ScholarDigital Library

Index Terms

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
1. Information systems
  1. Information systems applications
    1. Data mining
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

P-Simrank: Extending Simrank to Scale-Free Bipartite Networks
WWW '20: Proceedings of The Web Conference 2020

The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and ...
Read More
Accuracy estimation of link-based similarity measures and its application

Link-based similarity measures play a significant role in many graph based applications. Consequently, measuring node similarity in a graph is a fundamental problem of graph datamining. Personalized pagerank (PPR) and simrank (SR) have emerged as the ...
Read More
ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank

In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
General Chairs:
Peter Boncz
CWI & Vrije Universiteit Amsterdam, The Netherlands
,
Stefan Manegold
CWI & Universiteit Leiden, The Netherlands
,
Program Chairs:
Anastasia Ailamaki
EPFL, Switzerland
,
Amol Deshpande
University of Maryland, USA
,
Tim Kraska
MIT, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
personalized pagerank
power-law graphs
simrank
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 408
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

P-Simrank: Extending Simrank to Scale-Free Bipartite Networks

Accuracy estimation of link-based similarity measures and its application

ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank