Skip to main content
Top
Published in: The VLDB Journal 3/2017

21-02-2017 | Regular Paper

Scalable subgraph enumeration in MapReduce: a cost-oriented approach

Authors: Longbin Lai, Lu Qin, Xuemin Lin, Lijun Chang

Published in: The VLDB Journal | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm \(\mathsf {Twin}\) \(\mathsf {Twig}\) \(\mathsf {Join}\) based on a left-deep-join framework in MapReduce, in which the basic join unit is a \(\mathsf {Twin}\) \(\mathsf {Twig}\) (an edge or two incident edges of a node). We show that in the Erdös–Rényi random graph model, \(\mathsf {Twin}\) \(\mathsf {Twig}\) \(\mathsf {Join}\) is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Note that we apply a threshold \(\sqrt{M}\) in line 2 and line 13, and we will discuss the threshold later. Let’s first assume that there is no threshold.
 
2
Lines 26–28 deal with the node with degree larger than the threshold (line 2,13).
 
3
The case is much easier when p is an edge, we hence focus on the two-edge \(\mathsf {Twin}\) \(\mathsf {Twig}\) in the following.
 
Literature
1.
go back to reference Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: Proceedings of the ICDE’13 (2013) Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: Proceedings of the ICDE’13 (2013)
2.
go back to reference Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N., Willke, T.L.: Graphlet Decomposition: Framework, Algorithms, and Applications. ArXiv e-prints (2015) Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N., Willke, T.L.: Graphlet Decomposition: Framework, Algorithms, and Applications. ArXiv e-prints (2015)
3.
go back to reference Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the STOC’00 (2000) Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the STOC’00 (2000)
4.
go back to reference Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C.: Biomolecular network motif counting and discovery by color coding. In: Proceedings of the ISMB’08 (2008) Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C.: Biomolecular network motif counting and discovery by color coding. In: Proceedings of the ISMB’08 (2008)
5.
go back to reference Bhuiyan, M.A., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. TKDE 27(3), 608–620 (2015) Bhuiyan, M.A., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. TKDE 27(3), 608–620 (2015)
6.
go back to reference Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATH Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATH
8.
go back to reference Chung, F.R.K., Lu, L., Vu, V.H.: The spectra of random graphs with given expected degrees. Internet Math. 1(3), 6313–6318 (2003)MathSciNetMATH Chung, F.R.K., Lu, L., Vu, V.H.: The spectra of random graphs with given expected degrees. Internet Math. 1(3), 6313–6318 (2003)MathSciNetMATH
10.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the OSDI’04 (2004) Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the OSDI’04 (2004)
11.
go back to reference Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 38(4), 343–347 (1960)MathSciNetMATH Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 38(4), 343–347 (1960)MathSciNetMATH
12.
go back to reference Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010) Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010)
13.
go back to reference Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the SODA’10 (2010) Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the SODA’10 (2010)
14.
go back to reference Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the RECOMB’07 (2007) Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the RECOMB’07 (2007)
15.
go back to reference Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the OSDI’12 (2012) Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the OSDI’12 (2012)
16.
go back to reference Han, W.S., Lee, J., Lee, J.H.: Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the SIGMOD’13 (2013) Han, W.S., Lee, J., Lee, J.H.: Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the SIGMOD’13 (2013)
17.
go back to reference He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the SIGMOD’08 (2008) He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the SIGMOD’08 (2008)
18.
go back to reference Kairam, S.R., Wang, D.J., Leskovec, J.: The life and death of online groups: predicting group growth and longevity. In: Proceedings of the WSDM’12 (2012) Kairam, S.R., Wang, D.J., Leskovec, J.: The life and death of online groups: predicting group growth and longevity. In: Proceedings of the WSDM’12 (2012)
19.
go back to reference Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. PVLDB 6(3), 181–190 (2013) Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. PVLDB 6(3), 181–190 (2013)
20.
go back to reference Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015) Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015)
21.
go back to reference Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012) Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)
22.
go back to reference Leskovec, J., Singh, A., Kleinberg, J.: Patterns of influence in a recommendation network. In: Proceedings of the PAKDD’06 (2006) Leskovec, J., Singh, A., Kleinberg, J.: Patterns of influence in a recommendation network. In: Proceedings of the PAKDD’06 (2006)
23.
go back to reference Lin, W., Xiao, X., Gabriel, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014) Lin, W., Xiao, X., Gabriel, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014)
24.
go back to reference Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW (2012) Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW (2012)
25.
go back to reference Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 257–273 (2008) Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 257–273 (2008)
26.
go back to reference Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)CrossRef Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)CrossRef
27.
go back to reference Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)CrossRef Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)CrossRef
28.
go back to reference Przulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), 177–183 (2007)CrossRef Przulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), 177–183 (2007)CrossRef
29.
go back to reference Rahman, M., Bhuiyan, M.A., Hasan, M.A.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014) Rahman, M., Bhuiyan, M.A., Hasan, M.A.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014)
30.
go back to reference Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8(5), 617–628 (2015)CrossRef Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8(5), 617–628 (2015)CrossRef
31.
go back to reference Rücker, G., Rücker, C.: Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules. J. Chem. Info. Comput. Sci. 41(6), 1457–1462 (2001)CrossRefMATH Rücker, G., Rücker, C.: Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules. J. Chem. Info. Comput. Sci. 41(6), 1457–1462 (2001)CrossRefMATH
32.
go back to reference Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: AISTATS (2009) Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: AISTATS (2009)
33.
go back to reference Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008) Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)
34.
go back to reference Steinbrunn, M., Moerkotte, G., Kemper, A.: Optimizing Join Orders. Tech. rep. (1993) Steinbrunn, M., Moerkotte, G., Kemper, A.: Optimizing Join Orders. Tech. rep. (1993)
35.
go back to reference Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–789 (2012) Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–789 (2012)
36.
go back to reference Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the WWW’11 (2011) Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the WWW’11 (2011)
37.
go back to reference Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: Counting triangles in massive graphs with a coin. In: Proceedings of the KDD’09 (2009) Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: Counting triangles in massive graphs with a coin. In: Proceedings of the KDD’09 (2009)
38.
go back to reference Viger, F., Latapy, M.: Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In: COCOON’05, pp. 440–449 (2005) Viger, F., Latapy, M.: Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In: COCOON’05, pp. 440–449 (2005)
39.
go back to reference Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012) Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
40.
go back to reference Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 6684(393), 440–442 (1998)CrossRef Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 6684(393), 440–442 (1998)CrossRef
41.
go back to reference Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1–2), 340–451 (2010) Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1–2), 340–451 (2010)
42.
go back to reference Zhao, Z., Khan, M., Kumar, V.S.A., Marathe, M.V.: Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: Proceedings of the ICPP’10 (2010) Zhao, Z., Khan, M., Kumar, V.S.A., Marathe, M.V.: Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: Proceedings of the ICPP’10 (2010)
Metadata
Title
Scalable subgraph enumeration in MapReduce: a cost-oriented approach
Authors
Longbin Lai
Lu Qin
Xuemin Lin
Lijun Chang
Publication date
21-02-2017
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 3/2017
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-017-0459-4

Other articles of this Issue 3/2017

The VLDB Journal 3/2017 Go to the issue

Premium Partner