research-article

Partial sums-based P-Rank computation in information networks

Authors:
Jinhua Wang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Mingxi Zhang

University of Shanghai for Science and Technology, Shanghai, China

University of Shanghai for Science and Technology, Shanghai, China
View Profile

,
Zhenying He

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Wei Wang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

WI '17: Proceedings of the International Conference on Web IntelligenceAugust 2017Pages 1122–1130https://doi.org/10.1145/3106426.3109447

Published:23 August 2017Publication History

WI '17: Proceedings of the International Conference on Web Intelligence

Pages 1122–1130

ABSTRACT

P-Rank is a simple and captivating link-based similarity measure that extends SimRank by exploiting both in- and out-links for similarity computation. However, the existing work of P-Rank computation is expensive in terms of time and space cost and cannot efficiently support similarity computation in large information networks. For tackling this problem, in this paper, we propose an optimization technique for fast P-Rank computation in information networks by adopting the spiritual of partial sums. We write P-Rank equation based on partial sums and further approximate this equation by setting a threshold for ignoring the small similarity scores during iterative similarity computation. An optimized similarity computation algorithm is developed, which reduces the computation cost by skipping the similarity scores smaller than the give threshold during accumulation operations. And the accuracy loss estimation under the threshold is given through extensive mathematical analysis. Extensive experiments demonstrate the effectiveness and efficiency of our proposed approach through comparing with the straightforward P-Rank computation algorithm.

References

R. Amsler. 1972. Application of citation-based automatic classification. Technical report, The University of Texas at Austin Linguistics Research Center.Google Scholar
Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. Simrank++: query rewriting through link analysis of the click graph. PVLDB 1, 1 (2008), 408--421. Google ScholarDigital Library
Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In WWW. 641--650. Google ScholarDigital Library
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002). Google ScholarDigital Library
Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In KDD. 538--543. Google ScholarDigital Library
M. M. Kessler. 1963. Bibliographic coupling between scientific papers. American Documentation 14 (1963), 10--25.Google ScholarCross Ref
Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of SimRank for static and dynamic information networks. In EDBT. 465--476. Google ScholarDigital Library
Lina Li, Cuiping Li, Hong Chen, and Xiaoyong Du. 2013. MapReduce-Based SimRank Computation and Its Application in Social Recommender System. In IEEE International Congress on Big Data, BigData Congress 2013, June 27 2013--July 2, 2013. 133--140. Google ScholarDigital Library
Ruiqi Li, Xiang Zhao, Haichuan Shang, Yifan Chen, and Weidong Xiao. 2017. Fast top-k similarity join for SimRank. Inf. Sci. 381 (2017), 1--19. Google ScholarDigital Library
Zhenjiang Lin, Irwin King, and Michael R. Lyu. 2006. PageSim: A Novel Link-Based Similarity Measure for the World Wide Web. In WI. 687--693. Google ScholarDigital Library
Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. 2008. Accuracy estimate and optimization techniques for SimRank computation. PVLDB 1, 1 (2008), 422--433. Google ScholarDigital Library
Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658. Google ScholarDigital Library
Henry G. Small. 1973. Co-citation in the scientific literature: A new measure of the relationship betweenf two documents. Journal of the American Society for Information Scienc 24, 4 (1973), 265--269.Google ScholarCross Ref
Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: measuring similarity using unified relationship matrix. In SIGIR. 130--137. Google ScholarDigital Library
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2006. LinkClus: Efficient Clustering via Heterogeneous Semantic Links. In VLDB. 427--438. Google ScholarDigital Library
Mingxi Zhang, Hao Hu, Zhenying He, Liping Gao, and Liujie Sun. 2015. Efficient link-based similarity search in web networks. Expert Syst. Appl. 42, 22 (2015), 8868--8880. Google ScholarDigital Library
Mingxi Zhang, Hao Hu, Zhenying He, and Wei Wang. 2015. Top-k similarity search in heterogeneous information networks with x-star network schema. Expert Syst. Appl. 42, 2 (2015), 699--712. Google ScholarDigital Library
Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: a comprehensive structural similarity measure over information networks. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009. 553--562. Google ScholarDigital Library

Recommendations

P-Rank: a comprehensive structural similarity measure over information networks
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

With the ubiquity of information networks and their broad applications, the issue of similarity computation between entities of an information network arises and draws extensive research interests. However, to effectively and comprehensively measure "...
Read More
Fast computation of SimRank for static and dynamic information networks
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between ...
Read More
Top-k similarity search in heterogeneous information networks with x-star network schema

The efficiency improvement is evident for similarity computation.The effectiveness of returned result is good for similarity search.The pruning algorithm is presented for supporting fast online query processing.The accuracy loss of pruning algorithm can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WI '17: Proceedings of the International Conference on Web Intelligence
August 2017
1284 pages
ISBN:9781450349512
DOI:10.1145/3106426
Conference Chair:
Amit Sheth
Wright State University GuoROLE@GENERAL CHAIR
,
General Chairs:
Axel Ngonga
Leipzig University, Germany
,
yin Wang
Chongqing University of Posts and Telecommunications, China
,
Elizabeth Chang
The University of New South Wales, Australia
,
Dominik Ślęzak
Infobright Inc. & University of Warsaw, Poland
,
Bogdan Franczyk
Leipzig University, Germany
,
Program Chairs:
Rainer Alt
Leipzig University, Germany
,
Xiaohui Tao
University of Southern Queensland, Australia
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
P-Rank
information network
partial sums
similarity computation
Qualifiers
- research-article
Conference

Acceptance Rates
WI '17 Paper Acceptance Rate118of178submissions,66%Overall Acceptance Rate118of178submissions,66%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 54
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Partial sums-based P-Rank computation in information networks

WI '17: Proceedings of the International Conference on Web Intelligence

ABSTRACT

References

Cited By

Recommendations

P-Rank: a comprehensive structural similarity measure over information networks

Fast computation of SimRank for static and dynamic information networks

Top-k similarity search in heterogeneous information networks with x-star network schema