Skip to main content
Erschienen in: World Wide Web 6/2020

18.06.2020

Towards distributed node similarity search on graphs

verfasst von: Tianming Zhang, Yunjun Gao, Baihua Zheng, Lu Chen, Shiting Wen, Wei Guo

Erschienen in: World Wide Web | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
As suggested in [13], restart probability c is empirically set as 0.5.
 
2
 
Literatur
1.
Zurück zum Zitat Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: Survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)CrossRef Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: Survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)CrossRef
2.
Zurück zum Zitat Batko, M., Kohoutková, P., Novak, D.: Cophir image collection under the microscope. In: SISAP , pp 47–54 (2009) Batko, M., Kohoutková, P., Novak, D.: Cophir image collection under the microscope. In: SISAP , pp 47–54 (2009)
3.
Zurück zum Zitat Boutet, A., Kermarrec, A., Mittal, N., Taïani, F.: Being prepared in a sparse world: The case of K NN graph construction. In: ICDE, pp. 241–252 (2016) Boutet, A., Kermarrec, A., Mittal, N., Taïani, F.: Being prepared in a sparse world: The case of K NN graph construction. In: ICDE, pp. 241–252 (2016)
4.
Zurück zum Zitat Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: ICDE, pp. 591–602 (2015) Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: ICDE, pp. 591–602 (2015)
5.
Zurück zum Zitat Chen, L., Gao, Y., Chen, G., Zhang, H.: Metric all-k-nearest-neighbor search. IEEE Trans. Knowl. Data Eng. 28(1), 98–112 (2016)CrossRef Chen, L., Gao, Y., Chen, G., Zhang, H.: Metric all-k-nearest-neighbor search. IEEE Trans. Knowl. Data Eng. 28(1), 98–112 (2016)CrossRef
6.
Zurück zum Zitat Chen, G., Yang, K., Chen, L., Gao, Y., Zheng, B., Chen, C.: Metric similarity joins using mapreduce. IEEE Trans. Knowl. Data Eng. 29(3), 656–669 (2017)CrossRef Chen, G., Yang, K., Chen, L., Gao, Y., Zheng, B., Chen, C.: Metric similarity joins using mapreduce. IEEE Trans. Knowl. Data Eng. 29(3), 656–669 (2017)CrossRef
7.
Zurück zum Zitat Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: A balance between structural and attribute similarities. TKDD 5(2), 12:1–12:33 (2011)MathSciNetCrossRef Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: A balance between structural and attribute similarities. TKDD 5(2), 12:1–12:33 (2011)MathSciNetCrossRef
8.
Zurück zum Zitat Cohen, S., Kimelfeld, B., Koutrika, G.: A survey on proximity measures for social networks. In: Search Computing - Broadening Web Search, pp. 191–206 (2012) Cohen, S., Kimelfeld, B., Koutrika, G.: A survey on proximity measures for social networks. In: Search Computing - Broadening Web Search, pp. 191–206 (2012)
9.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
10.
Zurück zum Zitat Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp. 577–586 (2011) Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp. 577–586 (2011)
11.
Zurück zum Zitat Dong, Y., Zhang, J., Tang, J., Chawla, N.V., Wang, B.: Coupledlp: Link prediction in coupled networks. In: SIGKDD, pp. 199–208 (2015) Dong, Y., Zhang, J., Tang, J., Chawla, N.V., Wang, B.: Coupledlp: Link prediction in coupled networks. In: SIGKDD, pp. 199–208 (2015)
12.
Zurück zum Zitat Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012) Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012)
13.
Zurück zum Zitat Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD, pp. 445–456 (2013) Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD, pp. 445–456 (2013)
14.
Zurück zum Zitat Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014)
15.
Zurück zum Zitat Jeh, G., Widom, J.: Simrank: A measure of structural-context similarity. In: SIGKDD, pp. 538–543 (2002) Jeh, G., Widom, J.: Simrank: A measure of structural-context similarity. In: SIGKDD, pp. 538–543 (2002)
16.
Zurück zum Zitat Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. IEEE Trans. Knowl. Data Eng. 28(5), 1189–1202 (2016)CrossRef Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. IEEE Trans. Knowl. Data Eng. 28(5), 1189–1202 (2016)CrossRef
17.
Zurück zum Zitat Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012) Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
18.
Zurück zum Zitat Ma, H., Zhu, J., Lyu, M.R., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multimedia 12(5), 462–473 (2010)CrossRef Ma, H., Zhu, J., Lyu, M.R., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multimedia 12(5), 462–473 (2010)CrossRef
19.
Zurück zum Zitat Maehara, T., Akiba, T., Iwata, Y., Kawarabayashi, K.: Computing personalized pagerank quickly by exploiting graph structures. PVLDB 7(12), 1023–1034 (2014) Maehara, T., Akiba, T., Iwata, Y., Kawarabayashi, K.: Computing personalized pagerank quickly by exploiting graph structures. PVLDB 7(12), 1023–1034 (2014)
20.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010) Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
21.
Zurück zum Zitat Meng, F., Rui, X., Wang, Z., Xing, Y., Cao, L.: Coupled node similarity learning for community detection in attributed networks. Entropy 20(6), 471 (2018)CrossRef Meng, F., Rui, X., Wang, Z., Xing, Y., Cao, L.: Coupled node similarity learning for community detection in attributed networks. Entropy 20(6), 471 (2018)CrossRef
22.
Zurück zum Zitat Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: SIGKDD, pp. 653–658 (2004) Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: SIGKDD, pp. 653–658 (2004)
23.
Zurück zum Zitat Plaku, E., Kavraki, L.E.: Distributed computation of the k nn graph for large high-dimensional point sets. J. Parallel Distrib. Comput. 67(3), 346–359 (2007)CrossRef Plaku, E., Kavraki, L.E.: Distributed computation of the k nn graph for large high-dimensional point sets. J. Parallel Distrib. Comput. 67(3), 346–359 (2007)CrossRef
24.
Zurück zum Zitat Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: SIGKDD, pp. 513–522 (2010) Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: SIGKDD, pp. 513–522 (2010)
25.
Zurück zum Zitat Sarkar, P., Moore, A.W.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. arXiv:1206.5259(2012) Sarkar, P., Moore, A.W.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. arXiv:1206.​5259(2012)
26.
Zurück zum Zitat Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013) Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
27.
Zurück zum Zitat Shin, K., Jung, J., Sael, L., Kang, U.: Bear: Block elimination approach for random walk with restart on large graphs. In: SIGMOD, pp. 1571–1585 (2015) Shin, K., Jung, J., Sael, L., Kang, U.: Bear: Block elimination approach for random walk with restart on large graphs. In: SIGMOD, pp. 1571–1585 (2015)
28.
Zurück zum Zitat Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7(3), 193–204 (2013) Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7(3), 193–204 (2013)
29.
Zurück zum Zitat Trad, M.R., Joly, A., Boujemaa, N.: Distributed k NN-graph approximation via hashing. In: ICMR, p. 43 (2012) Trad, M.R., Joly, A., Boujemaa, N.: Distributed k NN-graph approximation via hashing. In: ICMR, p. 43 (2012)
30.
Zurück zum Zitat Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In: SIGMOD, pp. 1139–1150 (2014) Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In: SIGMOD, pp. 1139–1150 (2014)
31.
Zurück zum Zitat Xu, G., Fu, B., Gu, Y.: Point-of-interest recommendations via a supervised random walk algorithm. IEEE Intell. Syst. 31(1), 15–23 (2016)CrossRef Xu, G., Fu, B., Gu, Y.: Point-of-interest recommendations via a supervised random walk algorithm. IEEE Intell. Syst. 31(1), 15–23 (2016)CrossRef
32.
Zurück zum Zitat Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location-based social networks. ACM TIST 7(3), 30 (2016) Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location-based social networks. ACM TIST 7(3), 30 (2016)
33.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)
34.
Zurück zum Zitat Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In: CIKM, pp. 1442–1451 (2012) Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In: CIKM, pp. 1442–1451 (2012)
35.
Zurück zum Zitat Zhang, Q., Li, M., Deng, Y., Mahadevan, S.: Measure the similarity of nodes in the complex networks. arXiv:1502.00780 (2015) Zhang, Q., Li, M., Deng, Y., Mahadevan, S.: Measure the similarity of nodes in the complex networks. arXiv:1502.​00780 (2015)
36.
Zurück zum Zitat Zhang, Y., Huang, K., Geng, G., Liu, C.: Fast k NN graph construction with locality sensitive hashing. In: PKDD, pp. 660–674 (2013) Zhang, Y., Huang, K., Geng, G., Liu, C.: Fast k NN graph construction with locality sensitive hashing. In: PKDD, pp. 660–674 (2013)
37.
Zurück zum Zitat Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009) Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Metadaten
Titel
Towards distributed node similarity search on graphs
verfasst von
Tianming Zhang
Yunjun Gao
Baihua Zheng
Lu Chen
Shiting Wen
Wei Guo
Publikationsdatum
18.06.2020
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 6/2020
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-020-00819-6

Weitere Artikel der Ausgabe 6/2020

World Wide Web 6/2020 Zur Ausgabe