Skip to main content

2018 | OriginalPaper | Buchkapitel

A Parallel Method for All-Pair SimRank Similarity Computation

verfasst von : Xuan Huang, Xingkun Gao, Jie Tang, Gangshan Wu

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size.
This paper proposes a parallel multi-level solution for all-pair SimRank similarity computing on large graphs. We partition the objective graph first with the idea of modularity maximization and get a collapsed graph based on the blocks. Then we compute the similarities between verteices inside a block as well as the similarities between the blocks. In the end, we integrate these two types of similarities and calculate the approximate SimRank simlarities between all vertex pairs. The method is implemented on Spark platform and it makes an improvement on time efficiency while maintaining the effectiveness compared to SimRank.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Antonellis, I., Garcia-Molina, H., Chang, C.: SimRank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008) Antonellis, I., Garcia-Molina, H., Chang, C.: SimRank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)
2.
Zurück zum Zitat Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999) Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)
3.
Zurück zum Zitat Bhattacharya, I., Getoor, L.: Entity resolution in graphs. In: Mining Graph Data, p. 311 (2006) Bhattacharya, I., Getoor, L.: Entity resolution in graphs. In: Mining Graph Data, p. 311 (2006)
4.
5.
Zurück zum Zitat Cao, L., Cho, B., Kim, H.D., Li, Z., Tsai, M.H., Gupta, I.: Delta-SimRank computing on MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 28–35. ACM (2012) Cao, L., Cho, B., Kim, H.D., Li, Z., Tsai, M.H., Gupta, I.: Delta-SimRank computing on MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 28–35. ACM (2012)
6.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)
7.
Zurück zum Zitat Dean, J., Henzinger, M.R.: Finding related pages in the world wide web. Comput. Netw. 31(11), 1467–1479 (1999)CrossRef Dean, J., Henzinger, M.R.: Finding related pages in the world wide web. Comput. Netw. 31(11), 1467–1479 (1999)CrossRef
8.
Zurück zum Zitat Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)CrossRef Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)CrossRef
9.
Zurück zum Zitat Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. Papers on Twenty-Five Years of Electronic Design Automation, pp. 241–247. ACM (1988) Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. Papers on Twenty-Five Years of Electronic Design Automation, pp. 241–247. ACM (1988)
10.
Zurück zum Zitat Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)CrossRef Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)CrossRef
11.
Zurück zum Zitat He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 543–552. ACM (2010) He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 543–552. ACM (2010)
12.
Zurück zum Zitat Hendrickson, B., Leland, R.W.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452–469 (1995)MathSciNetCrossRef Hendrickson, B., Leland, R.W.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452–469 (1995)MathSciNetCrossRef
13.
Zurück zum Zitat Jaccard, P.: Etude comparative de la distribution florale dans uneportion des Alpes et du Jura. Impr. Corbaz (1901) Jaccard, P.: Etude comparative de la distribution florale dans uneportion des Alpes et du Jura. Impr. Corbaz (1901)
14.
Zurück zum Zitat Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM (2002) Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM (2002)
15.
Zurück zum Zitat Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: an experimental evaluation; part I, graph partitioning. Oper. Res. 37(6), 865–892 (1989)CrossRef Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: an experimental evaluation; part I, graph partitioning. Oper. Res. 37(6), 865–892 (1989)CrossRef
16.
Zurück zum Zitat Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the web for computing PageRank. Technical report 2003-17, Stanford InfoLab (2003) Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the web for computing PageRank. Technical report 2003-17, Stanford InfoLab (2003)
17.
Zurück zum Zitat Karypis, G., Kumar, V.: METIS - unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report (1995) Karypis, G., Kumar, V.: METIS - unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report (1995)
18.
Zurück zum Zitat Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)CrossRef Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)CrossRef
19.
Zurück zum Zitat Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)CrossRef Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)CrossRef
20.
Zurück zum Zitat Kusumoto, M., Maehara, T., Kawarabayashi, K.I.: Scalable similarity search for SimRank. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2014) Kusumoto, M., Maehara, T., Kawarabayashi, K.I.: Scalable similarity search for SimRank. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2014)
21.
Zurück zum Zitat Li, C., et al.: Fast computation of SimRank for static and dynamic information networks. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 465–476. ACM (2010) Li, C., et al.: Fast computation of SimRank for static and dynamic information networks. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 465–476. ACM (2010)
22.
Zurück zum Zitat Li, L., Li, C., Chen, H., Du, X.: MapReduce-based SimRank computation and its application in social recommender system. In: BigData Congress, pp. 133–140. IEEE Computer Society (2013) Li, L., Li, C., Chen, H., Du, X.: MapReduce-based SimRank computation and its application in social recommender system. In: BigData Congress, pp. 133–140. IEEE Computer Society (2013)
23.
Zurück zum Zitat Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: parallel SimRank at scale. PVLDB 9(1), 24–35 (2015) Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: parallel SimRank at scale. PVLDB 9(1), 24–35 (2015)
24.
Zurück zum Zitat Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for SimRank computation. Proc. VLDB Endow. 1(1), 422–433 (2008)CrossRef Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for SimRank computation. Proc. VLDB Endow. 1(1), 422–433 (2008)CrossRef
25.
Zurück zum Zitat Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient SimRank computation via linearization. CoRR abs/1411.7228 (2014) Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient SimRank computation via linearization. CoRR abs/1411.7228 (2014)
26.
Zurück zum Zitat Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef
27.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)
28.
Zurück zum Zitat Rothe, S., Schütze, H.: CoSimRank: a flexible & efficient graph-theoretic similarity measure. In: ACL (1), pp. 1392–1402. The Association for Computer Linguistics (2014) Rothe, S., Schütze, H.: CoSimRank: a flexible & efficient graph-theoretic similarity measure. In: ACL (1), pp. 1392–1402. The Association for Computer Linguistics (2014)
29.
Zurück zum Zitat Yu, W., Lin, X., Zhang, W.: Towards efficient SimRank computation on large networks. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 601–612. IEEE (2013) Yu, W., Lin, X., Zhang, W.: Towards efficient SimRank computation on large networks. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 601–612. IEEE (2013)
30.
Zurück zum Zitat Yu, W., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for SimRank computation. World Wide Web 15(3), 327–353 (2012)CrossRef Yu, W., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for SimRank computation. World Wide Web 15(3), 327–353 (2012)CrossRef
31.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud. USENIX Association (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud. USENIX Association (2010)
32.
Zurück zum Zitat Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562. ACM (2009) Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562. ACM (2009)
33.
Zurück zum Zitat Zhu, R., Zou, Z., Li, J.: SimRank computation on uncertain graphs. In: ICDE, pp. 565–576. IEEE Computer Society (2016) Zhu, R., Zou, Z., Li, J.: SimRank computation on uncertain graphs. In: ICDE, pp. 565–576. IEEE Computer Society (2016)
Metadaten
Titel
A Parallel Method for All-Pair SimRank Similarity Computation
verfasst von
Xuan Huang
Xingkun Gao
Jie Tang
Gangshan Wu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-05051-1_41