ABSTRACT
SimRank is a classic measure of the similarities of nodes in a graph. Given a node u in graph $G =(V, E)$, a \em single-source SimRank query returns the SimRank similarities $s(u, v)$ between node u and each node $v \in V$. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes n, which renders them inapplicable for real-time and interactive analysis. This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size $O(m)$, where m is the number of edges in the graph, and guarantees a query time that depends on the \em reverse PageRank distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph. Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.
- http://snap.stanford.edu/data/index.html.Google Scholar
- http://law.di.unimi.it/datasets.php.Google Scholar
- Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. Hyperbolic graph generator. Computer Physics Communications, 196:492--496, 2015.Google ScholarCross Ref
- Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. SimrankGoogle Scholar
- : query rewriting through link analysis of the click graph. PVLDB, 1(1):408--421, 2008. Google ScholarDigital Library
- Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010.Google ScholarDigital Library
- Mansurul Bhuiyan and Mohammad Al Hasan. Representing graphs as bag of vertices and partitions for graph classification. Data Science and Engineering, 3(2):150--165, 2018.Google ScholarCross Ref
- Bé la Bollobá s, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In SODA, pages 132--139, 2003. Google ScholarDigital Library
- Pawel Brach, Marek Cygan, Jakub Lkacki, and Piotr Sankowski. Algorithmic complexity of power law networks. In SODA, pages 1306--1325. Society for Industrial and Applied Mathematics, 2016. Google ScholarDigital Library
- Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703. Springer, 2002. Google ScholarDigital Library
- Fan R. K. Chung and Lincoln Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
- Dá niel Fogaras and Balá zs Rá cz. Scaling link-based similarity search. In WWW, pages 641--650, 2005. Google ScholarDigital Library
- Dá niel Fogaras, Balá zs Rá cz, Ká roly Csalogá ny, and Tamá s Sarló s. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarCross Ref
- Yuichiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, and Makoto Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013. Google ScholarDigital Library
- Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. Parallel simrank computation on large graphs with iterative aggregation. In KDD, pages 543--552, 2010.Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002. Google ScholarDigital Library
- Minhao Jiang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong. Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB, 10(9):937--948, 2017. Google ScholarDigital Library
- Ruoming Jin, Victor E Lee, and Hui Hong. Axiomatic ranking of network role similarity. In KDD, pages 922--930, 2011. Google ScholarDigital Library
- Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014. Google ScholarDigital Library
- JooYoung Lee and Rustam Tukhvatov. Evaluations of similarity measures on vk for link prediction. Data Science and Engineering, 3(3):277--289, 2018.Google ScholarCross Ref
- Pei Lee, Laks V. S. Lakshmanan, and Jeffrey Xu Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.Google ScholarDigital Library
- Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010. Google ScholarDigital Library
- Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.Google ScholarDigital Library
- David Liben-Nowell and Jon M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007. Google ScholarCross Ref
- Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, and Zhewei Wei. Towards maximum independent sets on massive graphs. VLDB, 8(13):2122--2133, 2015. Google ScholarDigital Library
- Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB, 11(1):14--26, 2017. Google ScholarDigital Library
- Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. Accuracy estimate and optimization techniques for simrank computation. VLDB J., 19(1):45--66, 2010. Google ScholarDigital Library
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
- Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. Efficient simrank computation via linearization. CoRR, abs/1411.7228, 2014.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
- Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015. Google ScholarDigital Library
- Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. SIGKDD Explorations, 13(2):50--64, 2011. Google ScholarDigital Library
- Boyu Tian and Xiaokui Xiao. SLING: A near-optimal index structure for simrank. In SIGMOD, pages 1859--1874, 2016. Google ScholarDigital Library
- Sibo Wang and Yufei Tao. Efficient algorithms for finding approximate heavy hitters in personalized pageranks. In SIGMOD, pages 1113--1127. ACM, 2018. Google ScholarDigital Library
- Yue Wang, Xiang Lian, and Lei Chen. Efficient simrank tracking in dynamic graphs. In ICDE, pages 545--556. IEEE, 2018.Google ScholarCross Ref
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456. ACM, 2018. Google ScholarDigital Library
- Wensi Xi, Edward A Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137. ACM, 2005. Google ScholarDigital Library
- Qi Ye, Changlei Zhu, Gang Li, Zhimin Liu, and Feng Wang. Using node identifiers and community prior for graph-based classification. Data Science and Engineering, 3(1):68--83, 2018.Google ScholarCross Ref
- Weiren Yu, Xuemin Lin, and Wenjie Zhang. Fast incremental simrank on link-evolving graphs. In ICDE, pages 304--315, 2014.Google ScholarCross Ref
- Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013. Google ScholarDigital Library
- Weiren Yu and Julie McCann. Gauging correct relative rankings for similarity search. In CIKM, pages 1791--1794, 2015. Google ScholarDigital Library
- Weiren Yu and Julie A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015. Google ScholarDigital Library
- Weiren Yu and Julie Ann McCann. High quality graph-based similarity search. In SIGIR, pages 83--92, 2015. Google ScholarDigital Library
- Weiren Yu, Wenjie Zhang, Xuemin Lin, Qing Zhang, and Jiajin Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3):327--353, 2012. Google ScholarDigital Library
- Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016. Google ScholarDigital Library
- Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast top-k similarity search on large networks. In SIGKDD, pages 1445--1454. ACM, 2015. Google ScholarDigital Library
- Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of simrank-based similarity search algorithms. PVLDB, 10(5):601--612, 2017. Google ScholarDigital Library
- Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562. ACM, 2009. Google ScholarDigital Library
- Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562, 2009. Google ScholarDigital Library
Index Terms
- PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
Recommendations
P-Simrank: Extending Simrank to Scale-Free Bipartite Networks
WWW '20: Proceedings of The Web Conference 2020The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and ...
Accuracy estimation of link-based similarity measures and its application
Link-based similarity measures play a significant role in many graph based applications. Consequently, measuring node similarity in a graph is a fundamental problem of graph datamining. Personalized pagerank (PPR) and simrank (SR) have emerged as the ...
ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank
In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture ...
Comments