ABSTRACT
P-Rank is a simple and captivating link-based similarity measure that extends SimRank by exploiting both in- and out-links for similarity computation. However, the existing work of P-Rank computation is expensive in terms of time and space cost and cannot efficiently support similarity computation in large information networks. For tackling this problem, in this paper, we propose an optimization technique for fast P-Rank computation in information networks by adopting the spiritual of partial sums. We write P-Rank equation based on partial sums and further approximate this equation by setting a threshold for ignoring the small similarity scores during iterative similarity computation. An optimized similarity computation algorithm is developed, which reduces the computation cost by skipping the similarity scores smaller than the give threshold during accumulation operations. And the accuracy loss estimation under the threshold is given through extensive mathematical analysis. Extensive experiments demonstrate the effectiveness and efficiency of our proposed approach through comparing with the straightforward P-Rank computation algorithm.
- R. Amsler. 1972. Application of citation-based automatic classification. Technical report, The University of Texas at Austin Linguistics Research Center.Google Scholar
- Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. Simrank++: query rewriting through link analysis of the click graph. PVLDB 1, 1 (2008), 408--421. Google ScholarDigital Library
- Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In WWW. 641--650. Google ScholarDigital Library
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002). Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In KDD. 538--543. Google ScholarDigital Library
- M. M. Kessler. 1963. Bibliographic coupling between scientific papers. American Documentation 14 (1963), 10--25.Google ScholarCross Ref
- Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of SimRank for static and dynamic information networks. In EDBT. 465--476. Google ScholarDigital Library
- Lina Li, Cuiping Li, Hong Chen, and Xiaoyong Du. 2013. MapReduce-Based SimRank Computation and Its Application in Social Recommender System. In IEEE International Congress on Big Data, BigData Congress 2013, June 27 2013--July 2, 2013. 133--140. Google ScholarDigital Library
- Ruiqi Li, Xiang Zhao, Haichuan Shang, Yifan Chen, and Weidong Xiao. 2017. Fast top-k similarity join for SimRank. Inf. Sci. 381 (2017), 1--19. Google ScholarDigital Library
- Zhenjiang Lin, Irwin King, and Michael R. Lyu. 2006. PageSim: A Novel Link-Based Similarity Measure for the World Wide Web. In WI. 687--693. Google ScholarDigital Library
- Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. 2008. Accuracy estimate and optimization techniques for SimRank computation. PVLDB 1, 1 (2008), 422--433. Google ScholarDigital Library
- Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658. Google ScholarDigital Library
- Henry G. Small. 1973. Co-citation in the scientific literature: A new measure of the relationship betweenf two documents. Journal of the American Society for Information Scienc 24, 4 (1973), 265--269.Google ScholarCross Ref
- Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: measuring similarity using unified relationship matrix. In SIGIR. 130--137. Google ScholarDigital Library
- Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2006. LinkClus: Efficient Clustering via Heterogeneous Semantic Links. In VLDB. 427--438. Google ScholarDigital Library
- Mingxi Zhang, Hao Hu, Zhenying He, Liping Gao, and Liujie Sun. 2015. Efficient link-based similarity search in web networks. Expert Syst. Appl. 42, 22 (2015), 8868--8880. Google ScholarDigital Library
- Mingxi Zhang, Hao Hu, Zhenying He, and Wei Wang. 2015. Top-k similarity search in heterogeneous information networks with x-star network schema. Expert Syst. Appl. 42, 2 (2015), 699--712. Google ScholarDigital Library
- Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: a comprehensive structural similarity measure over information networks. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009. 553--562. Google ScholarDigital Library
Recommendations
P-Rank: a comprehensive structural similarity measure over information networks
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementWith the ubiquity of information networks and their broad applications, the issue of similarity computation between entities of an information network arises and draws extensive research interests. However, to effectively and comprehensively measure "...
Fast computation of SimRank for static and dynamic information networks
EDBT '10: Proceedings of the 13th International Conference on Extending Database TechnologyInformation networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between ...
Top-k similarity search in heterogeneous information networks with x-star network schema
The efficiency improvement is evident for similarity computation.The effectiveness of returned result is good for similarity search.The pruning algorithm is presented for supporting fast online query processing.The accuracy loss of pruning algorithm can ...
Comments