nach oben

World Wide Web

Erschienen in:

18.06.2020

Towards distributed node similarity search on graphs

verfasst von: Tianming Zhang, Yunjun Gao, Baihua Zheng, Lu Chen, Shiting Wen, Wei Guo

Erschienen in: World Wide Web | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques.

Vorheriger Artikel A Pseudo-document-based Topical N-grams model for short texts

Nächster Artikel Extracting diverse-shapelets for early classification on time series

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

As suggested in [13], restart probability c is empirically set as 0.5.

Giraph is available at http://giraph.apache.org/.

Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: Survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)CrossRef

Batko, M., Kohoutková, P., Novak, D.: Cophir image collection under the microscope. In: SISAP , pp 47–54 (2009)

Boutet, A., Kermarrec, A., Mittal, N., Taïani, F.: Being prepared in a sparse world: The case of K NN graph construction. In: ICDE, pp. 241–252 (2016)

Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: ICDE, pp. 591–602 (2015)

Chen, L., Gao, Y., Chen, G., Zhang, H.: Metric all-k-nearest-neighbor search. IEEE Trans. Knowl. Data Eng. 28(1), 98–112 (2016)CrossRef

Chen, G., Yang, K., Chen, L., Gao, Y., Zheng, B., Chen, C.: Metric similarity joins using mapreduce. IEEE Trans. Knowl. Data Eng. 29(3), 656–669 (2017)CrossRef

Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: A balance between structural and attribute similarities. TKDD 5(2), 12:1–12:33 (2011)MathSciNetCrossRef

Cohen, S., Kimelfeld, B., Koutrika, G.: A survey on proximity measures for social networks. In: Search Computing - Broadening Web Search, pp. 191–206 (2012)

Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

10.

Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp. 577–586 (2011)

11.

Dong, Y., Zhang, J., Tang, J., Chawla, N.V., Wang, B.: Coupledlp: Link prediction in coupled networks. In: SIGKDD, pp. 199–208 (2015)

12.

Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012)

13.

Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD, pp. 445–456 (2013)

14.

Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014)

15.

Jeh, G., Widom, J.: Simrank: A measure of structural-context similarity. In: SIGKDD, pp. 538–543 (2002)

16.

Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. IEEE Trans. Knowl. Data Eng. 28(5), 1189–1202 (2016)CrossRef

17.

Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

18.

Ma, H., Zhu, J., Lyu, M.R., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multimedia 12(5), 462–473 (2010)CrossRef

19.

Maehara, T., Akiba, T., Iwata, Y., Kawarabayashi, K.: Computing personalized pagerank quickly by exploiting graph structures. PVLDB 7(12), 1023–1034 (2014)

20.

Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

21.

Meng, F., Rui, X., Wang, Z., Xing, Y., Cao, L.: Coupled node similarity learning for community detection in attributed networks. Entropy 20(6), 471 (2018)CrossRef

22.

Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: SIGKDD, pp. 653–658 (2004)

23.

Plaku, E., Kavraki, L.E.: Distributed computation of the k nn graph for large high-dimensional point sets. J. Parallel Distrib. Comput. 67(3), 346–359 (2007)CrossRef

24.

Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: SIGKDD, pp. 513–522 (2010)

25.

Sarkar, P., Moore, A.W.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. arXiv:1206.5259(2012)

26.

Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

27.

Shin, K., Jung, J., Sael, L., Kang, U.: Bear: Block elimination approach for random walk with restart on large graphs. In: SIGMOD, pp. 1571–1585 (2015)

28.

Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7(3), 193–204 (2013)

29.

Trad, M.R., Joly, A., Boujemaa, N.: Distributed k NN-graph approximation via hashing. In: ICMR, p. 43 (2012)

30.

Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In: SIGMOD, pp. 1139–1150 (2014)

31.

Xu, G., Fu, B., Gu, Y.: Point-of-interest recommendations via a supervised random walk algorithm. IEEE Intell. Syst. 31(1), 15–23 (2016)CrossRef

32.

Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location-based social networks. ACM TIST 7(3), 30 (2016)

33.

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)

34.

Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In: CIKM, pp. 1442–1451 (2012)

35.

Zhang, Q., Li, M., Deng, Y., Mahadevan, S.: Measure the similarity of nodes in the complex networks. arXiv:1502.00780 (2015)

36.

Zhang, Y., Huang, K., Geng, G., Liu, C.: Fast k NN graph construction with locality sensitive hashing. In: PKDD, pp. 660–674 (2013)

37.

Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)

Titel: Towards distributed node similarity search on graphs
verfasst von: Tianming Zhang
Yunjun Gao
Baihua Zheng
Lu Chen
Shiting Wen
Wei Guo
Publikationsdatum: 18.06.2020
Verlag: Springer US
Erschienen in: World Wide Web / Ausgabe 6/2020
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI: https://doi.org/10.1007/s11280-020-00819-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2020

Topic representation model based on microblogging behavior analysis

A Pseudo-document-based Topical N-grams model for short texts

Improving biterm topic model with word embeddings

A general strategy for researches on Chinese “的(de)” structure based on neural network

Extracting diverse-shapelets for early classification on time series

Hybrid graph convolutional networks with multi-head attention for location recommendation