research-article

Parallel SimRank computation on large graphs with iterative aggregation

Authors:
Guoming He

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Haijun Feng

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Cuiping Li

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Hong Chen

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 543–552https://doi.org/10.1145/1835804.1835874

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 543–552

ABSTRACT

Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.

Supplemental Material

kdd2010_he_psrclgia_01.mov

mov

85.3 MB

Download

References

L. Backstrom, D. Huttenlocher, and J. Kleinberg. Group formation in large social networks: membership, growth, and evolution. In Proc. of the 12th Int'l Conference on Knowledge discovery and data mining (KDD'06), 2006. Google ScholarDigital Library
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on cuda. In Technical Report NVR-2008-004, 2008.Google Scholar
A. Buluç and J. R. Gilbert. Challenges and advances in parallel sparse matrix-matrix multiplication. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 503--510, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
L. S. Buriol, C. Castillo, D. Donato, S. Leonardi, and S. Millozzi. Temporal analysis of the wikigraph. In WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pages 45--51, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In Proc. of the 13th Int'l Conference on Knowledge discovery and data mining (KDD'07), 2007. Google ScholarDigital Library
S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 421--430, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 118--127, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
W. Fang, K. K. Lau, M. Lu, X. Xiao, C. K. Lam, P. Y. Yang, B. He, Q. Luo, P. V. Sander, and K. Yang. Parallel data mining on graphic processors. In Technical Report HKUST-CS09-07.Google Scholar
D. Fogaras and B. Racz. Scaling link-based similarity search. In Proc. of the 14th Int'l Conference on World Wide Web (WWW'05), 2005. Google ScholarDigital Library
D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 641--650, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
P. Ganesan, H. Garcia-Molina, and J. Widom. Exploiting hierarchical domain structure to compute similarity. ACM Trans. Inf. Syst., 21(1):64--93, 2003. Google ScholarDigital Library
V. Garcia, E. Debreuve, and M. Barlaud. Fast k nearest neighbor search using gpu. CoRR, abs/0804.1448, 2008.Google Scholar
M. Girvan and M. Newman. Community structure in social and biological networks. In Proc. Of the National Academy of Sciences, 2002.Google ScholarCross Ref
N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics co-processor sorting for large database management. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 325--336, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
N. K. Govindaraju, N. Raghuvanshi, and D. Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 611--622, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
B. He, K. Yang, R. Fang, M. Lu, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational joins on graphics processors. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511--524, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999. Google ScholarDigital Library
Y. Koren, S. C. North, and C. Volinsky. Measuring and extracting proximity in networks. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245--255, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
A. N. Langville and C. D. Meyer. Updating pagerank with iterative aggregation. In WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 392--393, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
A. N. Langville and C. D. Meyer. Updating markov chains with an eye on google's pagerank. SIAM J. Matrix Anal. Appl., 27(4):968--987, 2006. Google ScholarDigital Library
E. A. Leicht, P. Holme, and M. E. J. Newman. Vertex similarity in networks. Physical Review E, 73:026120, 2006.Google ScholarCross Ref
J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proc. of the 13th Int'l Conference on Knowledge discovery and data mining (KDD'07), 2007. Google ScholarDigital Library
M. Ley. Dblp: some lessons learned. Proc. VLDB Endow., 2(2):1493--1500, 2009. {25} C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT'10, 2010. Google ScholarDigital Library
D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. In Proc. of the 34st Int'l Conference on Very Large Databases (VLDB'08), 2008. Google ScholarDigital Library
D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. Proc. VLDB Endow., 1(1):422--433, 2008. Google ScholarDigital Library
A. G. Maguitman, F. Menczer, F. Erdinc, H. Roinestad, and A. Vespignani. Algorithmic computation and approximation of semantic similarity. World Wide Web, 9(4):431--456, 2006. Google ScholarDigital Library
I. Marek, P. Mayer, and I. Pultarova. Convergence issues in the theory and practice of iterative aggregation/disagrregation methods. Electronic Transactions on Numerical Analysis, 35:185--200, 2009.Google Scholar
M.E.J.Newman. The structure and function of complex netwroks. SIAM Review, 2003.Google ScholarCross Ref
C. D. Meyer. Stochastic complementation, uncoupling markov chains, and the theory of nearly reducible systems. SIAM Review, 31(2):240--272, 1989. Google ScholarDigital Library
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proc. Of the Advances in Neural Information Processing Systems (NIPS), 2002.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University Database Group, http://citeseer.nj.nec.com/368196.html, 1998.Google Scholar
J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In Proc. of the 12th Int'l Conference on Knowledge discovery and data mining (KDD'06), 2006. Google ScholarDigital Library
Y. Takahashi. A lumping method for numerical calculations of stationary distributions of markov chains. 1975.Google Scholar
C. Tantipathananandh, T. Y. Berger-Wolf, and D. Kempe. A framework for community identification in dynamic social networks. In Proc. of the 13th Int'l Conference on Knowledge discovery and data mining (KDD'07), 2007. Google ScholarDigital Library
H. Tong, S. Papadimitriou, P. S. Yu, and C. Faloutsos. Proximity tracking on time-evolving bipartite graphs. In Proc. of SDM, 2008.Google ScholarCross Ref
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 130--137, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proc. of the 9th Int'l Conference on Knowledge discovery and data mining (KDD'03), 2003. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In Proc. Of ACM-SIGMOD Int'l Conference on Management of Data, 2005. Google ScholarDigital Library
Y. Zhu, S. Ye, and X. Li. Distributed pagerank computation based on iterative aggregation-disaggregation methods. In CIKM '05, pages 578--585, New York, NY, USA, 2005. ACM. Google ScholarDigital Library

Index Terms

Parallel SimRank computation on large graphs with iterative aggregation
1. Information systems

Recommendations

Scalable similarity search for SimRank
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many of the above mentioned applications. While there are many algorithms proposed so far to compute SimRank, but unfortunately, none of them are ...
Read More
Using Graphics Processors for High Performance SimRank Computation

Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid ...
Read More
Accelerating pairwise SimRank estimation over static and dynamic graphs

Measuring similarities among different vertices is a fundamental problem in graph analysis. Among different similarity measurements, SimRank is one of the most promising and popular. In reality, instead of computing the whole similarity matrix, people ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gpu
graph
iterative aggregation
parallel
simrank
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 84
  Total Citations
  View Citations
- 989
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel SimRank computation on large graphs with iterative aggregation

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Scalable similarity search for SimRank

Using Graphics Processors for High Performance SimRank Computation

Accelerating pairwise SimRank estimation over static and dynamic graphs