Skip to main content
Top
Published in: The VLDB Journal 6/2021

05-06-2021 | Regular Paper

ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths

Authors: Hanzhi Wang, Zhewei Wei, Yu Liu, Ye Yuan, Xiaoyong Du, Ji-Rong Wen

Published in: The VLDB Journal | Issue 6/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-k SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than \(10^6\) nodes. Consequently, no existing work has evaluated the actual accuracy of various single-source and top-k SimRank algorithms on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-k SimRank results on large graphs. This algorithm produces ground truths with precision up to 7 decimal places with high probability. With the ground truths computed by ExactSim, we present the first experimental study of the accuracy/cost trade-offs of existing approximate SimRank algorithms on large real-world graphs and synthetic graphs. Finally, we use the ground truths to exploit various properties of SimRank distributions on large graphs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aldecoa, Rodrigo, Orsini, Chiara, Krioukov, Dmitri: Hyperbolic graph generator. Computer Phys. Commun. 196, 492–496 (2015)CrossRef Aldecoa, Rodrigo, Orsini, Chiara, Krioukov, Dmitri: Hyperbolic graph generator. Computer Phys. Commun. 196, 492–496 (2015)CrossRef
2.
go back to reference Andersen, Reid., Chung, Fan R. K., Lang, Kevin J.: Local graph partitioning using pagerank vectors. In FOCS, pp. 475–486, (2006) Andersen, Reid., Chung, Fan R. K., Lang, Kevin J.: Local graph partitioning using pagerank vectors. In FOCS, pp. 475–486, (2006)
3.
go back to reference Antonellis, Ioannis, Molina, Hector Garcia, Chang, Chi Chao: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008) Antonellis, Ioannis, Molina, Hector Garcia, Chang, Chi Chao: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)
4.
go back to reference Bahmani, Bahman, Chowdhury, Abdur, Goel, Ashish: Fast incremental and personalized pagerank. VLDB 4(3), 173–184 (2010) Bahmani, Bahman, Chowdhury, Abdur, Goel, Ashish: Fast incremental and personalized pagerank. VLDB 4(3), 173–184 (2010)
5.
go back to reference Chung, Fan R.K., Lu, Lincoln: Concentration inequalities and martingale inequalities: a survey. Internet Math. 3(1), 79–127 (2006)MathSciNetCrossRef Chung, Fan R.K., Lu, Lincoln: Concentration inequalities and martingale inequalities: a survey. Internet Math. 3(1), 79–127 (2006)MathSciNetCrossRef
6.
go back to reference Fogaras, Daniel., Racz, Balazs.: Scaling link-based similarity search. In: WWW, pp. 641–650, (2005) Fogaras, Daniel., Racz, Balazs.: Scaling link-based similarity search. In: WWW, pp. 641–650, (2005)
7.
go back to reference Fogaras, Dániel, Rácz, Balázs, Csalogány, Károly, Sarlós, Tamás: Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)MathSciNetCrossRef Fogaras, Dániel, Rácz, Balázs, Csalogány, Károly, Sarlós, Tamás: Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)MathSciNetCrossRef
8.
go back to reference Fujiwara, Yuichiro., Nakatsuji, Makoto., Shiokawa, Hiroaki., Onizuka, Makoto.: Efficient search algorithm for simrank. In: ICDE, pp. 589–600, (2013) Fujiwara, Yuichiro., Nakatsuji, Makoto., Shiokawa, Hiroaki., Onizuka, Makoto.: Efficient search algorithm for simrank. In: ICDE, pp. 589–600, (2013)
9.
go back to reference He, Guoming., Feng, Haijun., Li, Cuiping., Chen, Hong.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD, pp. 543–552, (2010) He, Guoming., Feng, Haijun., Li, Cuiping., Chen, Hong.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD, pp. 543–552, (2010)
10.
go back to reference Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: SIGKDD, pp. 538–543, (2002) Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: SIGKDD, pp. 538–543, (2002)
11.
go back to reference Jiang, M., Fu, A.W.C., Wong, R.C.W.: Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB 10(9), 937–948 (2017) Jiang, M., Fu, A.W.C., Wong, R.C.W.: Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB 10(9), 937–948 (2017)
12.
go back to reference Krioukov, Dmitri, Papadopoulos, Fragkiskos, Kitsak, Maksim, Vahdat, Amin, Boguná, Marián: Hyperbolic geometry of complex networks. Phys. Rev. E 82(3), 036106 (2010)MathSciNetCrossRef Krioukov, Dmitri, Papadopoulos, Fragkiskos, Kitsak, Maksim, Vahdat, Amin, Boguná, Marián: Hyperbolic geometry of complex networks. Phys. Rev. E 82(3), 036106 (2010)MathSciNetCrossRef
13.
go back to reference Kusumoto, M., Maehara, T., Kawarabayashi, K-I.: Scalable similarity search for simrank. In: SIGMOD, pp. 325–336, (2014) Kusumoto, M., Maehara, T., Kawarabayashi, K-I.: Scalable similarity search for simrank. In: SIGMOD, pp. 325–336, (2014)
14.
go back to reference Lee, P., Lakshmanan, LVS., Yu, JX.: On top-k structural similarity search. In: ICDE, pp. 774–785, (2012) Lee, P., Lakshmanan, LVS., Yu, JX.: On top-k structural similarity search. In: ICDE, pp. 774–785, (2012)
15.
go back to reference Leskovec, J, Chakrabarti, D, Kleinberg, J, Faloutsos, C, Ghahramani, Z: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11(2), (2010) Leskovec, J, Chakrabarti, D, Kleinberg, J, Faloutsos, C, Ghahramani, Z: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11(2), (2010)
16.
go back to reference Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT, pp. 465–476, (2010) Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT, pp. 465–476, (2010)
17.
go back to reference Li, L., Li, C., Chen, H., Du, X.: Mapreduce-based simrank computation and its application in social recommender system. In: 2013 IEEE International Congress on Big Data, pp. 133–140. IEEE, (2013) Li, L., Li, C., Chen, H., Du, X.: Mapreduce-based simrank computation and its application in social recommender system. In: 2013 IEEE International Congress on Big Data, pp. 133–140. IEEE, (2013)
18.
go back to reference Li, Zhenguo, Fang, Yixiang, Liu, Qin, Cheng, Jiefeng, Cheng, Reynold, Lui, John: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015) Li, Zhenguo, Fang, Yixiang, Liu, Qin, Cheng, Jiefeng, Cheng, Reynold, Lui, John: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015)
19.
go back to reference Lin, Zhenjiang, Lyu, Michael R., King, Irwin: Matchsim: a novel similarity measure based on maximum neighborhood matching. KAIS 32(1), 141–166 (2012) Lin, Zhenjiang, Lyu, Michael R., King, Irwin: Matchsim: a novel similarity measure based on maximum neighborhood matching. KAIS 32(1), 141–166 (2012)
20.
go back to reference Litvak, N., Scheinhardt, W.R.W., Volkovich, Y.: In-degree and pagerank: why do they follow similar power laws? Internet Math. 4(2–3), 175–198 (2007)MathSciNetCrossRef Litvak, N., Scheinhardt, W.R.W., Volkovich, Y.: In-degree and pagerank: why do they follow similar power laws? Internet Math. 4(2–3), 175–198 (2007)MathSciNetCrossRef
21.
go back to reference Liu, Y., Zheng, B., He, X., Wei, Z., Xiao, X., Zheng, K., Jiaheng, L.: Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB 11(1), 14–26 (2017) Liu, Y., Zheng, B., He, X., Wei, Z., Xiao, X., Zheng, K., Jiaheng, L.: Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB 11(1), 14–26 (2017)
22.
go back to reference Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)CrossRef Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)CrossRef
23.
go back to reference Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)CrossRef Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)CrossRef
24.
go back to reference Lü, Linyuan, Zhou, Tao: Link prediction in complex networks: a survey. Phys. A: Stat. Mech. Appl. 390(6), 1150–1170 (2011)CrossRef Lü, Linyuan, Zhou, Tao: Link prediction in complex networks: a survey. Phys. A: Stat. Mech. Appl. 390(6), 1150–1170 (2011)CrossRef
25.
go back to reference Luo, X., Gao, J., Zhou, C., Yu, J. X.: Uniwalk: Unidirectional random walk based scalable simrank computation over large graph. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 325–336, (2017) Luo, X., Gao, J., Zhou, C., Yu, J. X.: Uniwalk: Unidirectional random walk based scalable simrank computation over large graph. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 325–336, (2017)
26.
go back to reference Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient simrank computation via linearization. CoRR, abs/1411.7228, (2014) Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient simrank computation via linearization. CoRR, abs/1411.7228, (2014)
27.
go back to reference Maehara, T., Kusumoto, M., Kawarabayashi, K.: Scalable simrank join algorithm. In: ICDE, pp. 603–614, (2015) Maehara, T., Kusumoto, M., Kawarabayashi, K.: Scalable simrank join algorithm. In: ICDE, pp. 603–614, (2015)
28.
go back to reference Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. (1999)
29.
go back to reference Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for simrank over large dynamic graphs. PVLDB 8(8), 838–849 (2015) Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for simrank over large dynamic graphs. PVLDB 8(8), 838–849 (2015)
30.
go back to reference Tao, W., Minghe, Y., Li, G.: Efficient top-k simrank-based similarity join. PVLDB 8(3), 317–328 (2014) Tao, W., Minghe, Y., Li, G.: Efficient top-k simrank-based similarity join. PVLDB 8(3), 317–328 (2014)
31.
go back to reference Tian, B., Xiao, X.: SLING: a near-optimal index structure for simrank. In: SIGMOD, pp. 1859–1874, (2016) Tian, B., Xiao, X.: SLING: a near-optimal index structure for simrank. In: SIGMOD, pp. 1859–1874, (2016)
32.
go back to reference Tsitsulin, A., Mottin, D., Karras, P., Müller, E.: Verse: Versatile graph embeddings from similarity measures. In: WWW, pp. 539–548. International World Wide Web Conferences Steering Committee, (2018) Tsitsulin, A., Mottin, D., Karras, P., Müller, E.: Verse: Versatile graph embeddings from similarity measures. In: WWW, pp. 539–548. International World Wide Web Conferences Steering Committee, (2018)
33.
go back to reference Wang, H., Wei, Z., Yuan, Y., Du, X., Wen, J.: Exact single-source simrank computation on large graphs. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 653–663, (2020) Wang, H., Wei, Z., Yuan, Y., Du, X., Wen, J.: Exact single-source simrank computation on large graphs. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 653–663, (2020)
34.
go back to reference Wang, Y, Che, Y, Lian, X, Chen, L, Luo, Q: Fast and accurate simrank computation via forward local push and its parallelization. In: IEEE Transactions on Knowledge and Data Engineering (2020) Wang, Y, Che, Y, Lian, X, Chen, L, Luo, Q: Fast and accurate simrank computation via forward local push and its parallelization. In: IEEE Transactions on Knowledge and Data Engineering (2020)
35.
go back to reference Wang, Y., Chen, L., Che, Y., Luo, Q.: Accelerating pairwise simrank estimation over static and dynamic graphs. VLDB J. 28(1), 99–122 (2019)CrossRef Wang, Y., Chen, L., Che, Y., Luo, Q.: Accelerating pairwise simrank estimation over static and dynamic graphs. VLDB J. 28(1), 99–122 (2019)CrossRef
36.
go back to reference Wei, Z., He, X., Xiao, X., Wang, S., Liu, Y., Du, X., Wen, J.: Prsim: sublinear time simrank computation on large power-law graphs. In: SIGMOD, pp. 1042–1059. ACM, (2019) Wei, Z., He, X., Xiao, X., Wang, S., Liu, Y., Du, X., Wen, J.: Prsim: sublinear time simrank computation on large power-law graphs. In: SIGMOD, pp. 1042–1059. ACM, (2019)
37.
go back to reference Xi, W., Fox, EA., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR, pp. 130–137. ACM, (2005) Xi, W., Fox, EA., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR, pp. 130–137. ACM, (2005)
38.
go back to reference Yu, W., Lin, X., Zhang, W.: Fast incremental simrank on link-evolving graphs. In: ICDE, pp. 304–315, (2014) Yu, W., Lin, X., Zhang, W.: Fast incremental simrank on link-evolving graphs. In: ICDE, pp. 304–315, (2014)
39.
go back to reference Weiren, Y., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013) Weiren, Y., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013)
40.
go back to reference Yu, W., McCann, J.: Gauging correct relative rankings for similarity search. In: CIKM, pp. 1791–1794, (2015) Yu, W., McCann, J.: Gauging correct relative rankings for similarity search. In: CIKM, pp. 1791–1794, (2015)
41.
go back to reference Weiren, Y., McCann, J.A.: Efficient partial-pairs simrank search for large networks. PVLDB 8(5), 569–580 (2015) Weiren, Y., McCann, J.A.: Efficient partial-pairs simrank search for large networks. PVLDB 8(5), 569–580 (2015)
42.
go back to reference Yu, W., McCann, J.A.: Efficient partial-pairs simrank search on large networks. Proc. VLDB Endow. 8(5), 569–580 (2015)CrossRef Yu, W., McCann, J.A.: Efficient partial-pairs simrank search on large networks. Proc. VLDB Endow. 8(5), 569–580 (2015)CrossRef
43.
go back to reference Yu, W., McCann, JA.: High quality graph-based similarity search. In: SIGIR, pp. 83–92, (2015) Yu, W., McCann, JA.: High quality graph-based similarity search. In: SIGIR, pp. 83–92, (2015)
44.
go back to reference Weiren, Y., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for simrank computation. World Wide Web 15(3), 327–353 (2012)CrossRef Weiren, Y., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for simrank computation. World Wide Web 15(3), 327–353 (2012)CrossRef
45.
go back to reference Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., Li, J.: Panther: Fast top-k similarity search on large networks. In: SIGKDD, pp. 1445–1454. ACM, (2015) Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., Li, J.: Panther: Fast top-k similarity search on large networks. In: SIGKDD, pp. 1445–1454. ACM, (2015)
46.
go back to reference Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562. ACM, (2009) Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562. ACM, (2009)
47.
go back to reference Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562, (2009) Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562, (2009)
48.
go back to reference Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. PVLDB 6(7), 493–504 (2013) Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. PVLDB 6(7), 493–504 (2013)
Metadata
Title
ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths
Authors
Hanzhi Wang
Zhewei Wei
Yu Liu
Ye Yuan
Xiaoyong Du
Ji-Rong Wen
Publication date
05-06-2021
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 6/2021
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-021-00672-7

Other articles of this Issue 6/2021

The VLDB Journal 6/2021 Go to the issue

Premium Partner