Skip to main content
Top
Published in: Pattern Analysis and Applications 3/2020

01-08-2019 | Industrial and commercial application

PathQuery Pregel: high-performance graph query with bulk synchronous processing

Authors: Bogdan Arsintescu, Shardul Deo, Warren Harris

Published in: Pattern Analysis and Applications | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

High-performance graph query systems are a scalable way to mine information in Knowledge Graphs, especially when the queries benefit from a high-level expressive query language. This paper presents techniques to algorithmically compile queries expressed in a high-level language (e.g., Datalog) into a directed acyclic graph query plan and details how these queries can be run on a Pregel graph vertex-centric compute system. Our solution, called PathQuery Pregel, creates plans for any conjunctive or disjunctive queries with aggregation and negation; we describe how the query execution extracts graph results optimally while avoiding many join operations where parallel map execution is permitted. We provide details of how we scaled this system out to execute large set of queries in parallel over the Google Knowledge Graph, a graph of 70B edges, or facts; we describe our production experience with PathQuery Pregel.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Baeza-Yates RA, Ribeiro-Neto BA (1999) Modern information retrieval. ACM Press, Addison-Wesley, Boston Baeza-Yates RA, Ribeiro-Neto BA (1999) Modern information retrieval. ACM Press, Addison-Wesley, Boston
5.
go back to reference Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang JT (ed) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008. ACM, pp 1247–1250. https://doi.org/10.1145/1376616.1376746 Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang JT (ed) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008. ACM, pp 1247–1250. https://​doi.​org/​10.​1145/​1376616.​1376746
8.
go back to reference Chavarría-Miranda DG, Castellana VG, Morari A, Haglin D, Feo J (2016) Graql: a query language for high-performance attributed graph databases. In: 2016 IEEE international parallel and distributed processing symposium workshops, IPDPS workshops 2016, Chicago, IL, USA, May 23–27, 2016. IEEE Computer Society, pp 1453–1462. https://doi.org/10.1109/IPDPSW.2016.216 Chavarría-Miranda DG, Castellana VG, Morari A, Haglin D, Feo J (2016) Graql: a query language for high-performance attributed graph databases. In: 2016 IEEE international parallel and distributed processing symposium workshops, IPDPS workshops 2016, Chicago, IL, USA, May 23–27, 2016. IEEE Computer Society, pp 1453–1462. https://​doi.​org/​10.​1109/​IPDPSW.​2016.​216
22.
go back to reference Hillis WD (1989) The connection machine. MIT Press, Cambridge Hillis WD (1989) The connection machine. MIT Press, Cambridge
23.
go back to reference Hong S, Salihoglu S, Widom J, Olukotun K (2014) Simplifying scalable graph processing with a domain-specific language. In: Kaeli DR, Moseley T (eds) 12th Annual IEEE/ACM international symposium on code generation and optimization, CGO 2014, Orlando, FL, USA, February 15–19, 2014. ACM, p 208. https://doi.org/10.1145/2544137.2544162 Hong S, Salihoglu S, Widom J, Olukotun K (2014) Simplifying scalable graph processing with a domain-specific language. In: Kaeli DR, Moseley T (eds) 12th Annual IEEE/ACM international symposium on code generation and optimization, CGO 2014, Orlando, FL, USA, February 15–19, 2014. ACM, p 208. https://​doi.​org/​10.​1145/​2544137.​2544162
27.
go back to reference Lipsett R, Schaefer CF, Ussery C (1989) VHDL: hardware description and design. Kluwer, DordrechtCrossRef Lipsett R, Schaefer CF, Ussery C (1989) VHDL: hardware description and design. Kluwer, DordrechtCrossRef
28.
go back to reference Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8):716–727 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8):716–727
29.
go back to reference Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Elmagarmid AK, Agrawal D (eds) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010. ACM, pp 135–146. https://doi.org/10.1145/1807167.1807184 Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Elmagarmid AK, Agrawal D (eds) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010. ACM, pp 135–146. https://​doi.​org/​10.​1145/​1807167.​1807184
31.
go back to reference Moustafa WE, Papavasileiou V, Yocum K, Deutsch A (2016) Datalography: scaling datalog graph analytics on graph processing systems. In: Joshi J, Karypis G, Liu L, Hu X, Ak R, Xia Y, Xu W, Sato A, Rachuri S, Ungar LH, Yu PS, Govindaraju R, Suzumura T (eds) 2016 IEEE international conference on Big Data, BigData 2016, Washington DC, USA, December 5–8, 2016. IEEE, pp 56–65. https://doi.org/10.1109/BigData.2016.7840589 Moustafa WE, Papavasileiou V, Yocum K, Deutsch A (2016) Datalography: scaling datalog graph analytics on graph processing systems. In: Joshi J, Karypis G, Liu L, Hu X, Ak R, Xia Y, Xu W, Sato A,  Rachuri S, Ungar LH, Yu PS, Govindaraju R, Suzumura T (eds) 2016 IEEE international conference on Big Data, BigData 2016, Washington DC, USA, December 5–8, 2016. IEEE, pp 56–65. https://​doi.​org/​10.​1109/​BigData.​2016.​7840589
33.
go back to reference Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: Tilevich E, Eugster P (eds) SPLASH workshop on programming support innovations for emerging distributed applications (PSI EtA-\(\Psi\)Theta 2010), October 17, 2010, Reno/Tahoe, Nevada, USA. ACM, p 4. https://doi.org/10.1145/1940747.1940751 Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: Tilevich E, Eugster P (eds) SPLASH workshop on programming support innovations for emerging distributed applications (PSI EtA-\(\Psi\)Theta 2010), October 17, 2010, Reno/Tahoe, Nevada, USA. ACM, p 4. https://​doi.​org/​10.​1145/​1940747.​1940751
34.
go back to reference Seo J, Park J, Shin J, Lam MS (2013) Distributed socialite: a datalog-based language for large-scale graph analysis. PVLDB 6(14):1906–1917 Seo J, Park J, Shin J, Lam MS (2013) Distributed socialite: a datalog-based language for large-scale graph analysis. PVLDB 6(14):1906–1917
36.
go back to reference Steele LS Jr, Hillis WD (1986) Connection machine LISP: fine-grained parallel symbolic processing. In: LISP and functional programming, pp 279–297 Steele LS Jr, Hillis WD (1986) Connection machine LISP: fine-grained parallel symbolic processing. In: LISP and functional programming, pp 279–297
39.
go back to reference Wang J, Balazinska M, Halperin D (2015) Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB 8(12):1542–1553 Wang J, Balazinska M, Halperin D (2015) Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB 8(12):1542–1553
40.
go back to reference Yan D, Cheng J, Lu Y, Ng W (2015) Effective techniques for message reduction and load balancing in distributed graph computation. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th international conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015. ACM, pp 1307–1317. https://doi.org/10.1145/2736277.2741096 Yan D, Cheng J, Lu Y, Ng W (2015) Effective techniques for message reduction and load balancing in distributed graph computation. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th international conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015. ACM, pp 1307–1317. https://​doi.​org/​10.​1145/​2736277.​2741096
41.
go back to reference Yan D, Cheng J, Xing K, Lu Y, Ng W, Bu Y (2014) Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7(14):1821–1832 Yan D, Cheng J, Xing K, Lu Y, Ng W, Bu Y (2014) Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7(14):1821–1832
43.
go back to reference Zhang Q, Yan D, Cheng J (2016) Quegel: a general-purpose system for querying big graphs. In: Özcan F, Koutrika G, Madden S (eds) Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26–July 01, 2016. ACM, pp 2189–2192. https://doi.org/10.1145/2882903.2899398 Zhang Q, Yan D, Cheng J (2016) Quegel: a general-purpose system for querying big graphs. In: Özcan F, Koutrika G, Madden S (eds) Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26–July 01, 2016. ACM, pp 2189–2192. https://​doi.​org/​10.​1145/​2882903.​2899398
Metadata
Title
PathQuery Pregel: high-performance graph query with bulk synchronous processing
Authors
Bogdan Arsintescu
Shardul Deo
Warren Harris
Publication date
01-08-2019
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 3/2020
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-019-00841-z

Other articles of this Issue 3/2020

Pattern Analysis and Applications 3/2020 Go to the issue

Premium Partner