skip to main content
10.1145/2348283.2348334acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn3) time and O(n2) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks.

References

  1. W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang, "SimFusion: Measuring similarity using unified relationship matrix," in SIGIR, pp. 130--137, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank citation ranking: Bringing order to the web," technical report, Stanford InfoLab, November 1999.Google ScholarGoogle Scholar
  3. G. Jeh and J. Widom, "SimRank: A measure of structural-context similarity," in KDD, pp. 538--543, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Zhao, J. Han, and Y. Sun, "P-Rank: A comprehensive structural similarity measure over information networks," in CIKM, pp. 553--562, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Small, "Co-citation in the scientific literature: A new measure of the relationship between two documents," J. Am. Soc. Inf. Sci., vol. 24, no. 4, pp. 265--269, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Jarneving, "Bibliographic coupling and its application to research-front and other core documents," J. Informetrics, vol. 1, no. 4, pp. 287--307, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  7. W. Xi, B. Zhang, Z. Chen, Y. Lu, S. Yan, W. Ma, and E. A. Fox, "Link Fusion: A unified link analysis framework for multi-type interrelated data objects," in WWW, pp. 319--327, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov, "Accuracy estimate and optimization techniques for SimRank computation," VLDB J., vol. 19, no. 1, pp. 45--66, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu, "Fast computation of SimRank for static and dynamic information networks," in EDBT, pp. 465--476, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. He, H. Feng, C. Li, and H. Chen, "Parallel SimRank computation on large graphs with iterative aggregation," in KDD, pp. 543--552, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Yu, W. Zhang, X. Lin, Q. Zhang, and J. Le, "A space and time efficient algorithm for SimRank computation," World Wide Web, vol. 15, no. 3, pp. 327--353, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Xue, H. Zeng, Z. Chen, Y. Yu, W. Ma, W. Xi, and E. A. Fox, "MRSSA: An iterative algorithm for similarity spreading over interrelated objects," in CIKM, pp. 240--241, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Cai, M. Zhang, C. H. Q. Ding, and S. Chakravarthy, "Closed form solution of similarity algorithms," in SIGIR, pp. 709--710, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Williams, Linear Algebra with Applications. Jones and Bartlett Publishers, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. J. Laub, Matrix Analysis for Scientists and Engineers. SIAM: Society for Industrial and Applied Mathematics, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, February 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition. Society for Industrial and Applied Mathematics, 2 ed., April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
          August 2012
          1236 pages
          ISBN:9781450314725
          DOI:10.1145/2348283

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 August 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader