Skip to main content
Erschienen in: The VLDB Journal 3/2015

01.06.2015 | Regular Paper

Efficient distributed subgraph similarity matching

verfasst von: Ye Yuan, Guoren Wang, Jeffery Yu Xu, Lei Chen

Erschienen in: The VLDB Journal | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given a query graph \(q\) and a data graph \(G\), subgraph similarity matching is to retrieve all matches of \(q\) in \(G\) with the number of missing edges bounded by a given threshold \(\epsilon \). Many works have been conducted to study the problem of subgraph similarity matching due to its ability to handle applications involved with noisy or erroneous graph data. In practice, a data graph can be extremely large, e.g., a web-scale graph containing hundreds of millions of vertices and billions of edges. The state-of-the-art approaches employ centralized algorithms to process the subgraph similarity queries, and thus, they are infeasible for such a large graph due to the limited computational power and storage space of a centralized server. To address this problem, in this paper, we investigate subgraph similarity matching for a web-scale graph deployed in a distributed environment. We propose distributed algorithms and optimization techniques that exploit the properties of subgraph similarity matching, so that we can well utilize the parallel computing power and lower the communication cost among the distributed data centers for query processing. Specifically, we first relax and decompose \(q\) into a minimum number of sub-queries. Next, we send each sub-query to conduct the exact matching in parallel. Finally, we schedule and join the exact matches to obtain final query answers. Moreover, our workload-balance strategy further speeds up the query processing. Our experimental results demonstrate the feasibility of our proposed approach in performing subgraph similarity matching over web-scale graph data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We assume \(q(v)>0\) in the following of this paper.
 
Literatur
1.
Zurück zum Zitat Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: ICDE (2013) Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: ICDE (2013)
2.
4.
Zurück zum Zitat Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004) Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004)
5.
Zurück zum Zitat Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)MATH
6.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
7.
Zurück zum Zitat Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. In: VLDB, pp. 1304–1316 (2012) Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. In: VLDB, pp. 1304–1316 (2012)
8.
Zurück zum Zitat Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)CrossRefMathSciNet Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)CrossRefMathSciNet
9.
Zurück zum Zitat Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1979)MATH Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1979)MATH
10.
Zurück zum Zitat Hochbaum, D. (ed.) Approximation Algorithms for NP-Hard Problems. PWS (1997) Hochbaum, D. (ed.) Approximation Algorithms for NP-Hard Problems. PWS (1997)
15.
Zurück zum Zitat Kang, U., Tsourakakis, C.E.: Pegasus: a peta-scale graph mining system implementation and observations. In: ICDM (2009) Kang, U., Tsourakakis, C.E.: Pegasus: a peta-scale graph mining system implementation and observations. In: ICDM (2009)
16.
Zurück zum Zitat Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010) Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)
17.
Zurück zum Zitat Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW, pp. 949–958. ACM (2012) Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW, pp. 949–958. ACM (2012)
18.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM (2010)
19.
Zurück zum Zitat Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD (2003) Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD (2003)
20.
Zurück zum Zitat Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)CrossRef Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)CrossRef
21.
Zurück zum Zitat Shang, Z., Yu, J.X.: Catch the wind: graph workload balancing on cloud. In: ICDE, pp.553–564 (2013) Shang, Z., Yu, J.X.: Catch the wind: graph workload balancing on cloud. In: ICDE, pp.553–564 (2013)
22.
Zurück zum Zitat Shang, H., Zhu, K., Lin, X., Zhang, Y., Ichise, R.: Similarity search on supergraph containment. In: Proceedings of ICDE, pp. 637–648 (2010) Shang, H., Zhu, K., Lin, X., Zhang, Y., Ichise, R.: Similarity search on supergraph containment. In: Proceedings of ICDE, pp. 637–648 (2010)
23.
Zurück zum Zitat Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: MSST, pp. 1–10. IEEE (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: MSST, pp. 1–10. IEEE (2010)
24.
Zurück zum Zitat Srivatsa, M., Kawadia, V., Yang, S.: Distributed graph query processing in dynamic networks. In: Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on, pp. 20–25. IEEE (2012) Srivatsa, M., Kawadia, V., Yang, S.: Distributed graph query processing in dynamic networks. In: Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on, pp. 20–25. IEEE (2012)
25.
Zurück zum Zitat Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: KDD, pp. 1222–1230. ACM (2012) Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: KDD, pp. 1222–1230. ACM (2012)
26.
Zurück zum Zitat Sun, Z., Wang, H., Shao, B., Wang, H., Li, J.: Efficient subgraph matching on billion node graphs. In: VLDB (2012) Sun, Z., Wang, H., Shao, B., Wang, H., Li, J.: Efficient subgraph matching on billion node graphs. In: VLDB (2012)
27.
Zurück zum Zitat Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Berlin (2011) Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Berlin (2011)
28.
Zurück zum Zitat Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of SIGMOD, pp. 766–777 (2005) Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of SIGMOD, pp. 766–777 (2005)
29.
Zurück zum Zitat Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: SIGMOD, pp. 517–528 (2012) Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: SIGMOD, pp. 517–528 (2012)
30.
Zurück zum Zitat Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. In: Proceedings of VLDB, pp. 800–811 (2012) Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. In: Proceedings of VLDB, pp. 800–811 (2012)
31.
Zurück zum Zitat Yuan, Y., Wang, G., Chen, L., Wang, H.: Graph similarity search on large uncertain graph databases. VLDB J. pp. 1–26 (2014) Yuan, Y., Wang, G., Chen, L., Wang, H.: Graph similarity search on large uncertain graph databases. VLDB J. pp. 1–26 (2014)
32.
Zurück zum Zitat Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. In: Proceedings of VLDB, pp. 876–886 (2011) Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. In: Proceedings of VLDB, pp. 876–886 (2011)
33.
Zurück zum Zitat Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient keyword search on uncertain graph data. TKDE 25(12), 2767–2779 (2013) Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient keyword search on uncertain graph data. TKDE 25(12), 2767–2779 (2013)
34.
Zurück zum Zitat Zeng, Z., Tung, A.K.H., Wang, J., Zhou, L., Feng, J.: Comparing stars: on approximating graph edit distance. In: VLDB (2009) Zeng, Z., Tung, A.K.H., Wang, J., Zhou, L., Feng, J.: Comparing stars: on approximating graph edit distance. In: VLDB (2009)
35.
Zurück zum Zitat Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale rdf data. In: VLDB (2013) Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale rdf data. In: VLDB (2013)
36.
Zurück zum Zitat Zhang, S., Yang, J., Jin, W.: Sapper: subgraph indexing and approximate matching in large graphs. In: VLDB (2010) Zhang, S., Yang, J., Jin, W.: Sapper: subgraph indexing and approximate matching in large graphs. In: VLDB (2010)
37.
Zurück zum Zitat Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endow. 3(1–2), 340–351 (2010)CrossRef Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endow. 3(1–2), 340–351 (2010)CrossRef
38.
Zurück zum Zitat Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: Treespan: efficiently computing similarity all-matching. In: SIGMOD (2012) Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: Treespan: efficiently computing similarity all-matching. In: SIGMOD (2012)
Metadaten
Titel
Efficient distributed subgraph similarity matching
verfasst von
Ye Yuan
Guoren Wang
Jeffery Yu Xu
Lei Chen
Publikationsdatum
01.06.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 3/2015
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-015-0381-6

Weitere Artikel der Ausgabe 3/2015

The VLDB Journal 3/2015 Zur Ausgabe

Premium Partner