ABSTRACT
SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn3) time and O(n2) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks.
- W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang, "SimFusion: Measuring similarity using unified relationship matrix," in SIGIR, pp. 130--137, 2005. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank citation ranking: Bringing order to the web," technical report, Stanford InfoLab, November 1999.Google Scholar
- G. Jeh and J. Widom, "SimRank: A measure of structural-context similarity," in KDD, pp. 538--543, 2002. Google ScholarDigital Library
- P. Zhao, J. Han, and Y. Sun, "P-Rank: A comprehensive structural similarity measure over information networks," in CIKM, pp. 553--562, 2009. Google ScholarDigital Library
- H. Small, "Co-citation in the scientific literature: A new measure of the relationship between two documents," J. Am. Soc. Inf. Sci., vol. 24, no. 4, pp. 265--269, 1973.Google ScholarCross Ref
- B. Jarneving, "Bibliographic coupling and its application to research-front and other core documents," J. Informetrics, vol. 1, no. 4, pp. 287--307, 2007.Google ScholarCross Ref
- W. Xi, B. Zhang, Z. Chen, Y. Lu, S. Yan, W. Ma, and E. A. Fox, "Link Fusion: A unified link analysis framework for multi-type interrelated data objects," in WWW, pp. 319--327, 2004. Google ScholarDigital Library
- D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov, "Accuracy estimate and optimization techniques for SimRank computation," VLDB J., vol. 19, no. 1, pp. 45--66, 2010. Google ScholarDigital Library
- C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu, "Fast computation of SimRank for static and dynamic information networks," in EDBT, pp. 465--476, 2010. Google ScholarDigital Library
- G. He, H. Feng, C. Li, and H. Chen, "Parallel SimRank computation on large graphs with iterative aggregation," in KDD, pp. 543--552, 2010. Google ScholarDigital Library
- W. Yu, W. Zhang, X. Lin, Q. Zhang, and J. Le, "A space and time efficient algorithm for SimRank computation," World Wide Web, vol. 15, no. 3, pp. 327--353, 2012. Google ScholarDigital Library
- G. Xue, H. Zeng, Z. Chen, Y. Yu, W. Ma, W. Xi, and E. A. Fox, "MRSSA: An iterative algorithm for similarity spreading over interrelated objects," in CIKM, pp. 240--241, 2004. Google ScholarDigital Library
- Y. Cai, M. Zhang, C. H. Q. Ding, and S. Chakravarthy, "Closed form solution of similarity algorithms," in SIGIR, pp. 709--710, 2010. Google ScholarDigital Library
- G. Williams, Linear Algebra with Applications. Jones and Bartlett Publishers, 2007. Google ScholarDigital Library
- A. J. Laub, Matrix Analysis for Scientists and Engineers. SIAM: Society for Industrial and Applied Mathematics, 2004. Google ScholarDigital Library
- R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, February 1990. Google ScholarDigital Library
- Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition. Society for Industrial and Applied Mathematics, 2 ed., April 2003. Google ScholarDigital Library
Index Terms
- SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks
Recommendations
SimFusion: measuring similarity using unified relationship matrix
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalIn this paper we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlinks, user click-through sequences). We claim that iterative computations over ...
Partial sums-based P-Rank computation in information networks
WI '17: Proceedings of the International Conference on Web IntelligenceP-Rank is a simple and captivating link-based similarity measure that extends SimRank by exploiting both in- and out-links for similarity computation. However, the existing work of P-Rank computation is expensive in terms of time and space cost and ...
Discovery of time profiled temporal patterns
ICEMIS '19: Proceedings of the 5th International Conference on Engineering and MISFinding temporal association patterns from temporal dataset is addressed in a wider perspective in the existing literature. Discovering time profiled temporal patterns is addressed in our previous research works which includes proposing new support ...
Comments