Abstract
In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture similar node pairs in certain conditions, especially when two nodes can only reach each other through paths of odd lengths. We present new similarity measures ASCOS and ASCOS++ to address the problem. ASCOS outputs a more complete similarity score than SimRank and SimRank’s families. ASCOS++ enriches ASCOS to include edge weight into the measure, giving all edges and network weights an opportunity to make their contribution. We show that both ASCOS++ and ASCOS can be reformulated and applied on a distributed environment for parallel contribution. Experimental results show that ASCOS++ reports a better score than SimRank and several famous similarity measures. Finally, we re-examine previous use cases of SimRank, and explain appropriate and inappropriate use cases. We suggest future SimRank users following the rules proposed here before naïvely applying it. We also discuss the relationship between ASCOS++ and PageRank.
- Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda. 2009. Link prediction on evolving data using matrix and tensor factorizations. In IEEE International Conference on Data Mining Workshops. IEEE, Washington DC, USA, 262--269. Google ScholarDigital Library
- Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social Networks 25, 3, 211--230.Google ScholarCross Ref
- Alekh Agarwal and Soumen Chakrabarti. 2007. Learning random walks to rank nodes in graphs. In Proceedings of the 24th International Conference on Machine learning. ACM, New York, NY, USA, 9--16. Google ScholarDigital Library
- Charu Aggarwal, Yan Xie, and Philip S. Yu. 2012. On dynamic link inference in heterogeneous networks. In Proceedings of the 12th SIAM International Conference on Data Mining. SIAM, Anaheim, CA, USA, 415--426.Google Scholar
- Mohammad Al Hasan and Mohammed J. Zaki. 2011. A survey of link prediction in social networks. Social Network Data Analytics, 243--275.Google Scholar
- Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1, 47--97.Google ScholarCross Ref
- Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. 2008. Simrank++: Query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment 1, 1, 408--421. Google ScholarDigital Library
- Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. ACM, Hong Kong, China, 635--644. Google ScholarDigital Library
- Shenghua Bao, Guirong Xue, Xiaoyuan Wu, Yong Yu, Ben Fei, and Zhong Su. 2007. Optimizing web search using social annotations. In Proceedings of the 16th International Conference on World Wide Web. ACM, Banff, Alberta, Canada, 501--510. Google ScholarDigital Library
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1, 107--117. Google ScholarDigital Library
- Rajmonda Sulo Caceres, Tanya Berger-Wolf, and Robert Grossman. 2011. Temporal scale of processes in dynamic networks. In IEEE International Conference on Data Mining Workshops. IEEE, Vancouver, Canada, 925--932. Google ScholarDigital Library
- Yuanzhe Cai, Miao Zhang, Chris Ding, and Sharma Chakravarthy. 2010. Closed form solution of similarity algorithms. In Proceedings of the 33rd International SIGIR Conference on Research and Development in Information Retrieval. ACM, Geneva, Switzerland, 709--710. Google ScholarDigital Library
- Hung-Hsuan Chen, Yan-Bin Ciou, and Shou-De Lin. 2012. Information propagation game: A tool to acquire humanplaying data for multiplayer influence maximization on social networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Beijing, China, 1524--1527. Google ScholarDigital Library
- Hung-Hsuan Chen and C. Lee Giles. 2013. ASCOS: An asymmetric network structure context similarity measure. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on. IEEE, Niagara Falls, Canada, 442--449. Google ScholarDigital Library
- Hung-Hsuan Chen, Liang Gou, Xiaolong Zhang, and C. Lee Giles. 2011. CollabSeer: A search engine for collaboration discovery. In Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries. ACM, Ottawa, Canada, 231--240. Google ScholarDigital Library
- Hung-Hsuan Chen, Liang Gou, Xiaolong Zhang, and C. Lee Giles. 2012a. Discovering missing links in networks using vertex similarity measures. In The 27th ACM Symposium on Applied Computing. ACM, Riva del Garda (Trento), Italy, 138--143. Google ScholarDigital Library
- Hung-Hsuan Chen, Liang Gou, Xioalong Zhang, and C. Lee Giles. 2012b. Predicting recent links in FOAF networks. In Social Computing, Behavioral-Cultural Modeling and Prediction. Springer, College Park, MD, USA, 156--163. Google ScholarDigital Library
- Hung-Hsuan Chen, Liang Gou, Xiaolong Luke Zhang, and C. Lee Giles. 2013a. Towards the discovery of diseases related by genes using vertex similarity measures. In Healthcare Informatics (ICHI), 2013 IEEE International Conference on. IEEE, Philadelphia, PA, USA, 505--510. Google ScholarDigital Library
- Hung-Hsuan Chen, David J. Miller, and C. Lee Giles. 2013b. The predictive value of young and old links in a social network. In Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks. ACM, New York, NY, USA, 43--48. Google ScholarDigital Library
- Fan Chung. 2007. The heat kernel as the pagerank of a graph. Proc. Natl. Acad. Sci. USA 104, 50, 19735--19740.Google ScholarCross Ref
- Fan Chung. 2009. A local graph partitioning algorithm using heat kernel pagerank. Internet Mathematics 6, 3, 315--330.Google ScholarCross Ref
- Adele Cutler and Leo Breiman. 1994. Archetypal analysis. Technometrics 36, 4, 338--347.Google ScholarCross Ref
- Yuxiao Dong, Qing Ke, Bai Wang, and Bin Wu. 2011. Link prediction based on local information. In Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on. IEEE, Kaohsiung, Taiwan, 382--386. Google ScholarDigital Library
- Leo A. Goodman. 1961. Snowball sampling. The Annals of Mathematical Statistics 32, 1, 148--170.Google ScholarCross Ref
- Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. 2010. Parallel simrank computation on large graphs with iterative aggregation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Washington, DC, USA, 543--552. Google ScholarDigital Library
- Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. Rolx: Structural role extraction & mining in large graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Beijing, China, 1231--1239. Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Edmonton, Alberta, Canada, 538--543. Google ScholarDigital Library
- Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1, 39--43.Google ScholarCross Ref
- David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Washington, DC, USA, 137--146. Google ScholarDigital Library
- Kyle Kloster and David F. Gleich. 2014. Heat kernel based community detection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, USA, 1386--1395. Google ScholarDigital Library
- Danai Koutra, Joshua T. Vogelstein, and Christos Faloutsos. 2013. DeltaCon: A principled massive-graph similarity function. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Austin, Texas, 162--170.Google ScholarCross Ref
- Elizabeth A. Leicht, Petter Holme, and Mark E. J. Newman. 2006. Vertex similarity in networks. Physical Review E 73, 2, 26120.Google ScholarCross Ref
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining. ACM, Chicago, IL, USA, 177--187. Google ScholarDigital Library
- Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Jose, CA, USA, 420--429. Google ScholarDigital Library
- Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010a. Fast computation of simrank for static and dynamic information networks. In Proceedings of the 13th International Conference on Extending Database Technology. ACM, Lausanne, Switzerland, 465--476. Google ScholarDigital Library
- Pei Li, Hongyan Liu, Jeffrey Xu Yu, Jun He, and Xiaoyong Du. 2010b. Fast single-pair simrank computation. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Columbus, Ohio, USA, 571--582.Google ScholarCross Ref
- David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58, 7, 1019--1031. Google ScholarDigital Library
- Ryan Lichtnwalter and Nitesh V. Chawla. 2012. Link prediction: Fair and effective evaluation. In Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on. IEEE, Istanbul, Turkey, 376--383. Google ScholarDigital Library
- Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390, 6, 1150--1170.Google Scholar
- Rashid Mehmood and Jon Crowcroft. 2005. Parallel iterative solution method for large sparse linear equation systems. Computer Laboratory: University of Cambridge.Google Scholar
- Douglas L. Nelson, Cathy L. McEvoy, and Thomas A. Schreiber. 2004. The university of south florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers 36, 3, 402--407.Google ScholarCross Ref
- Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM REVIEW 45, 2, 167--256.Google ScholarDigital Library
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2006. Introduction to Data Mining. Pearson Addison Wesley.Google ScholarDigital Library
- Erzsébet Ravasz, Anna Lisa Somera, Dale A. Mongru, Zoltán N. Oltvai, and A.-L. Barabási. 2002. Hierarchical organization of modularity in metabolic networks. Science 297, 5586, 1551--1555.Google Scholar
- Yousef Saad. 2003. Iterative Methods for Sparse Linear Systems. SIAM. Google ScholarDigital Library
- Gerard Salton. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley. Google ScholarDigital Library
- Charalampos E. Tsourakakis. 2014. Towards quantifying vertex similarity in networks. Internet Mathematics 10, 3--4, 263--286.Google ScholarCross Ref
- Amos Tversky. 1977. Features of similarity. Psychological Review 84, 4, 327--352.Google ScholarCross Ref
- Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: Measuring similarity using unified relationship matrix. In Proceedings of the 28th Annual International SIGIR Conference on Research and Development in Information Retrieval. ACM, Salvador, Brazil, 130--137. Google ScholarDigital Library
- Weiren Yu, Xuemin Lin, Wenjie Zhang, Ying Zhang, and Jiajin Le. 2012. SimFusion+: Extending simfusion towards efficient estimation on large and dynamic networks. In Proceedings of the 35th International SIGIR Conference on Research and Development in Information Retrieval. ACM, Portland, OR, USA, 365--374. Google ScholarDigital Library
- Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. 2014. Social Media Mining: An Introduction. Cambridge University Press. Google ScholarDigital Library
- Jun Zhang, Chaokun Wang, Philip S. Yu, and Jianmin Wang. 2013. Learning latent friendship propagation networks with interest awareness for link prediction. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, Dublin, Ireland, 63--72. Google ScholarDigital Library
- Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: A comprehensive structural similarity measure over information networks. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. ACM, Hong Kong, China, 553--562. Google ScholarDigital Library
- Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang. 2009. Predicting missing links via local information. The European Physical Journal B-Condensed Matter and Complex Systems 71, 4, 623--630.Google ScholarCross Ref
Index Terms
- ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank
Recommendations
PageSim: a novel link-based measure of web page aimilarity
WWW '06: Proceedings of the 15th international conference on World Wide WebTo find similar web pages to a query page on the Web, this paper introduces a novel link-based similarity measure, called PageSim. Contrast to SimRank, a recursive refinement of cocitation, PageSim can measure similarity between any two web pages, ...
ASCOS: an asymmetric network structure COntext similarity measure
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningDiscovering similar objects in a social network has many interesting issues. Here, we present ASCOS, an Asymmetric Structure COntext Similarity measure that captures the similarity scores among any pairs of nodes in a network. The definition of ASCOS is ...
Capturing missing edges in social networks using vertex similarity
K-CAP '11: Proceedings of the sixth international conference on Knowledge captureWe introduce the graph vertex similarity measure, Relation Strength Similarity (RSS), that utilizes a network's topology to discover and capture similar vertices. The RSS has the advantage that it is asymmetric; can be used in a weighted network; and ...
Comments