Skip to main content
Top
Published in: The VLDB Journal 3/2017

17-01-2017 | Regular Paper

Path-based holistic detection plan for multiple patterns in distributed graph frameworks

Authors: Jun Gao, Yuqiong Liu, Chang Zhou, Jeffrey Xu Yu

Published in: The VLDB Journal | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multiple pattern detection is needed in applications like disease analysis over gene networks, bug detection in program flow networks. This paper takes pattern detection to investigate the evaluation and optimization of multiple jobs in existing distributed graph processing frameworks. The evaluation plan for multiple pattern detection should be parallelizable and can capture and reuse the shared parts among pattern queries easily. In this paper, we design a path-based holistic plan for multiple pattern queries. Specifically, (1) we design a path-based edge-covered plan for an individual pattern. The paths in the plan can be easily captured and reused among different queries. Additionally, the evaluation plan is fully parallelizable, in which each data vertex performs necessary join operations independently during exploring graph. (2) We extend the individual plan to a holistic evaluation plan for multiple queries, whose results are equivalent to those of individual queries. The plan reduces the overall cost by finding frequent paths among queries and reusing the shared part in the holistic plan. (3) We devise various optimization strategies over the holistic plan. The experimental studies, conducted on Giraph, illustrate the high effectiveness of our holistic approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference McCune, R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 25 (2015)CrossRef McCune, R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 25 (2015)CrossRef
2.
go back to reference Cheng, J., Yu, J., Ding, B., Yu, P., Wang, H.: Fast graph pattern matching. In: ICDE, pp. 913–922 (2008) Cheng, J., Yu, J., Ding, B., Yu, P., Wang, H.: Fast graph pattern matching. In: ICDE, pp. 913–922 (2008)
3.
go back to reference Lee, J., Han, W., Kasperovics, R., Lee, J.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012) Lee, J., Han, W., Kasperovics, R., Lee, J.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)
5.
go back to reference Gao, J., Zhou, C., Yu, J.: Towards continuous pattern detection over evolving large graph with snapshot isolation. VLDB J. 25(2), 269–290 (2016) Gao, J., Zhou, C., Yu, J.: Towards continuous pattern detection over evolving large graph with snapshot isolation. VLDB J. 25(2), 269–290 (2016)
6.
go back to reference Gao, J., Zhou, C., Zhou, J., Yu, J.: Continuous pattern detection over billion-edge graph using distributed framework. In: ICDE, pp. 556–567 (2014) Gao, J., Zhou, C., Zhou, J., Yu, J.: Continuous pattern detection over billion-edge graph using distributed framework. In: ICDE, pp. 556–567 (2014)
7.
go back to reference Sun, P.: The human drug-disease-gene network. Inf. Sci. 306, 70–80 (2015)CrossRef Sun, P.: The human drug-disease-gene network. Inf. Sci. 306, 70–80 (2015)CrossRef
8.
go back to reference Nguyen, T., Nguyen, H., Pham, N., AI-Kofahi, J., Nguyen, T.: Graph-based mining of multiple object usage patterns. In: SIGSOFT, pp. 383–392 (2009) Nguyen, T., Nguyen, H., Pham, N., AI-Kofahi, J., Nguyen, T.: Graph-based mining of multiple object usage patterns. In: SIGSOFT, pp. 383–392 (2009)
10.
go back to reference Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010) Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
11.
go back to reference Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. PVLDB 7, 577–588 (2014) Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. PVLDB 7, 577–588 (2014)
12.
go back to reference Wang, G., Chan, C.: Multiquery optimization in mapreduce framework. PVLDB 7(3), 145–156 (2013) Wang, G., Chan, C.: Multiquery optimization in mapreduce framework. PVLDB 7(3), 145–156 (2013)
13.
go back to reference Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. PVLDB 5(6), 586–597 (2012) Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. PVLDB 5(6), 586–597 (2012)
14.
go back to reference Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: sharing across multiple queries in MapReduce. PVLDB 3(1), 137–150 (2010)MATH Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: sharing across multiple queries in MapReduce. PVLDB 3(1), 137–150 (2010)MATH
15.
go back to reference Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013) Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
16.
go back to reference Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012) Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)
17.
go back to reference Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD, pp. 249–260 (2000) Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD, pp. 249–260 (2000)
18.
go back to reference Sellis, T., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)CrossRef Sellis, T., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)CrossRef
19.
go back to reference Diao, Y., Franklin, M.J.: High-performance XML filtering: an overview of YFilter. IEEE Data Eng. Bull. (DEBU) 26(1), 41–48 (2003) Diao, Y., Franklin, M.J.: High-performance XML filtering: an overview of YFilter. IEEE Data Eng. Bull. (DEBU) 26(1), 41–48 (2003)
20.
go back to reference Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: VLDB, pp. 612–623 (2002) Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: VLDB, pp. 612–623 (2002)
21.
go back to reference Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: ICDE, pp. 666–677 (2012) Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: ICDE, pp. 666–677 (2012)
22.
go back to reference Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012) Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
23.
go back to reference Haewoon, K., Changhyun, L., Hosung, P., Sue, M.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600 (2010) Haewoon, K., Changhyun, L., Hosung, P., Sue, M.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)
24.
go back to reference Huang, J., Abadi, D., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011) Huang, J., Abadi, D., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
25.
go back to reference He, H., Singh, A.: Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, pp. 405–418 (2008) He, H., Singh, A.: Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, pp. 405–418 (2008)
Metadata
Title
Path-based holistic detection plan for multiple patterns in distributed graph frameworks
Authors
Jun Gao
Yuqiong Liu
Chang Zhou
Jeffrey Xu Yu
Publication date
17-01-2017
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 3/2017
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-016-0452-3

Other articles of this Issue 3/2017

The VLDB Journal 3/2017 Go to the issue

Premium Partner