Skip to main content
Erschienen in: The VLDB Journal 3/2017

21.02.2017 | Regular Paper

Scalable subgraph enumeration in MapReduce: a cost-oriented approach

verfasst von: Longbin Lai, Lu Qin, Xuemin Lin, Lijun Chang

Erschienen in: The VLDB Journal | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm \(\mathsf {Twin}\) \(\mathsf {Twig}\) \(\mathsf {Join}\) based on a left-deep-join framework in MapReduce, in which the basic join unit is a \(\mathsf {Twin}\) \(\mathsf {Twig}\) (an edge or two incident edges of a node). We show that in the Erdös–Rényi random graph model, \(\mathsf {Twin}\) \(\mathsf {Twig}\) \(\mathsf {Join}\) is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that we apply a threshold \(\sqrt{M}\) in line 2 and line 13, and we will discuss the threshold later. Let’s first assume that there is no threshold.
 
2
Lines 26–28 deal with the node with degree larger than the threshold (line 2,13).
 
3
The case is much easier when p is an edge, we hence focus on the two-edge \(\mathsf {Twin}\) \(\mathsf {Twig}\) in the following.
 
Literatur
1.
Zurück zum Zitat Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: Proceedings of the ICDE’13 (2013) Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: Proceedings of the ICDE’13 (2013)
2.
Zurück zum Zitat Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N., Willke, T.L.: Graphlet Decomposition: Framework, Algorithms, and Applications. ArXiv e-prints (2015) Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N., Willke, T.L.: Graphlet Decomposition: Framework, Algorithms, and Applications. ArXiv e-prints (2015)
3.
Zurück zum Zitat Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the STOC’00 (2000) Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the STOC’00 (2000)
4.
Zurück zum Zitat Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C.: Biomolecular network motif counting and discovery by color coding. In: Proceedings of the ISMB’08 (2008) Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C.: Biomolecular network motif counting and discovery by color coding. In: Proceedings of the ISMB’08 (2008)
5.
Zurück zum Zitat Bhuiyan, M.A., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. TKDE 27(3), 608–620 (2015) Bhuiyan, M.A., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. TKDE 27(3), 608–620 (2015)
6.
Zurück zum Zitat Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATH Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATH
8.
Zurück zum Zitat Chung, F.R.K., Lu, L., Vu, V.H.: The spectra of random graphs with given expected degrees. Internet Math. 1(3), 6313–6318 (2003)MathSciNetMATH Chung, F.R.K., Lu, L., Vu, V.H.: The spectra of random graphs with given expected degrees. Internet Math. 1(3), 6313–6318 (2003)MathSciNetMATH
9.
10.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the OSDI’04 (2004) Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the OSDI’04 (2004)
11.
Zurück zum Zitat Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 38(4), 343–347 (1960)MathSciNetMATH Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 38(4), 343–347 (1960)MathSciNetMATH
12.
Zurück zum Zitat Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010) Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010)
13.
Zurück zum Zitat Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the SODA’10 (2010) Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the SODA’10 (2010)
14.
Zurück zum Zitat Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the RECOMB’07 (2007) Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the RECOMB’07 (2007)
15.
Zurück zum Zitat Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the OSDI’12 (2012) Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the OSDI’12 (2012)
16.
Zurück zum Zitat Han, W.S., Lee, J., Lee, J.H.: Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the SIGMOD’13 (2013) Han, W.S., Lee, J., Lee, J.H.: Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the SIGMOD’13 (2013)
17.
Zurück zum Zitat He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the SIGMOD’08 (2008) He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the SIGMOD’08 (2008)
18.
Zurück zum Zitat Kairam, S.R., Wang, D.J., Leskovec, J.: The life and death of online groups: predicting group growth and longevity. In: Proceedings of the WSDM’12 (2012) Kairam, S.R., Wang, D.J., Leskovec, J.: The life and death of online groups: predicting group growth and longevity. In: Proceedings of the WSDM’12 (2012)
19.
Zurück zum Zitat Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. PVLDB 6(3), 181–190 (2013) Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. PVLDB 6(3), 181–190 (2013)
20.
Zurück zum Zitat Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015) Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015)
21.
Zurück zum Zitat Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012) Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)
22.
Zurück zum Zitat Leskovec, J., Singh, A., Kleinberg, J.: Patterns of influence in a recommendation network. In: Proceedings of the PAKDD’06 (2006) Leskovec, J., Singh, A., Kleinberg, J.: Patterns of influence in a recommendation network. In: Proceedings of the PAKDD’06 (2006)
23.
Zurück zum Zitat Lin, W., Xiao, X., Gabriel, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014) Lin, W., Xiao, X., Gabriel, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014)
24.
Zurück zum Zitat Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW (2012) Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW (2012)
25.
Zurück zum Zitat Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 257–273 (2008) Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 257–273 (2008)
26.
Zurück zum Zitat Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)CrossRef Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)CrossRef
27.
Zurück zum Zitat Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)CrossRef Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)CrossRef
28.
Zurück zum Zitat Przulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), 177–183 (2007)CrossRef Przulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), 177–183 (2007)CrossRef
29.
Zurück zum Zitat Rahman, M., Bhuiyan, M.A., Hasan, M.A.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014) Rahman, M., Bhuiyan, M.A., Hasan, M.A.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014)
30.
Zurück zum Zitat Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8(5), 617–628 (2015)CrossRef Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8(5), 617–628 (2015)CrossRef
31.
Zurück zum Zitat Rücker, G., Rücker, C.: Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules. J. Chem. Info. Comput. Sci. 41(6), 1457–1462 (2001)CrossRefMATH Rücker, G., Rücker, C.: Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules. J. Chem. Info. Comput. Sci. 41(6), 1457–1462 (2001)CrossRefMATH
32.
Zurück zum Zitat Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: AISTATS (2009) Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: AISTATS (2009)
33.
Zurück zum Zitat Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008) Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)
34.
Zurück zum Zitat Steinbrunn, M., Moerkotte, G., Kemper, A.: Optimizing Join Orders. Tech. rep. (1993) Steinbrunn, M., Moerkotte, G., Kemper, A.: Optimizing Join Orders. Tech. rep. (1993)
35.
Zurück zum Zitat Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–789 (2012) Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–789 (2012)
36.
Zurück zum Zitat Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the WWW’11 (2011) Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the WWW’11 (2011)
37.
Zurück zum Zitat Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: Counting triangles in massive graphs with a coin. In: Proceedings of the KDD’09 (2009) Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: Counting triangles in massive graphs with a coin. In: Proceedings of the KDD’09 (2009)
38.
Zurück zum Zitat Viger, F., Latapy, M.: Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In: COCOON’05, pp. 440–449 (2005) Viger, F., Latapy, M.: Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In: COCOON’05, pp. 440–449 (2005)
39.
Zurück zum Zitat Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012) Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
40.
Zurück zum Zitat Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 6684(393), 440–442 (1998)CrossRef Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 6684(393), 440–442 (1998)CrossRef
41.
Zurück zum Zitat Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1–2), 340–451 (2010) Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1–2), 340–451 (2010)
42.
Zurück zum Zitat Zhao, Z., Khan, M., Kumar, V.S.A., Marathe, M.V.: Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: Proceedings of the ICPP’10 (2010) Zhao, Z., Khan, M., Kumar, V.S.A., Marathe, M.V.: Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: Proceedings of the ICPP’10 (2010)
Metadaten
Titel
Scalable subgraph enumeration in MapReduce: a cost-oriented approach
verfasst von
Longbin Lai
Lu Qin
Xuemin Lin
Lijun Chang
Publikationsdatum
21.02.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 3/2017
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-017-0459-4

Weitere Artikel der Ausgabe 3/2017

The VLDB Journal 3/2017 Zur Ausgabe