survey

Foundations of Modern Query Languages for Graph Databases

Authors:
Renzo Angles

Universidad de Talca 8 Center for Semantic Web Research, Chile

Universidad de Talca 8 Center for Semantic Web Research, Chile
View Profile

,
Marcelo Arenas

Pontificia Universidad Católica de Chile 8 Center for Semantic Web Research, Santiago, Chile

Pontificia Universidad Católica de Chile 8 Center for Semantic Web Research, Santiago, Chile
View Profile

,
Pablo Barceló

DCC, Universidad de Chile 8 Center for Semantic Web Research, Chile

DCC, Universidad de Chile 8 Center for Semantic Web Research, Chile

0000-0003-2293-2653
View Profile

,
Aidan Hogan

DCC, Universidad de Chile 8 Center for Semantic Web Research, Chile

DCC, Universidad de Chile 8 Center for Semantic Web Research, Chile
View Profile

,
Juan Reutter

Pontificia Universidad Católica de Chile 8 Center for Semantic Web Research, Santiago, Chile

Pontificia Universidad Católica de Chile 8 Center for Semantic Web Research, Santiago, Chile
View Profile

,
Domagoj Vrgoč

Pontificia Universidad Católica de Chile 8 Center for Semantic Web Research, Santiago, Chile

Pontificia Universidad Católica de Chile 8 Center for Semantic Web Research, Santiago, Chile
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 50 Issue 5Article No.: 68pp 1–40https://doi.org/10.1145/3104031

Published:26 September 2017Publication History

ACM Computing Surveys

Abstract

We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher, and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.

Supplemental Material

Available for Download

zip

angles.zip (213.6 KB)

Supplemental movie, appendix, image and software files for, Foundations of Modern Query Languages for Graph Databases

References

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google ScholarDigital Library
Charu C. Aggarwal and Haixun Wang. 2010. Managing and Mining Graph Data. Springer. Google ScholarCross Ref
Faisal Alkhateeb, Jean-François Baget, and Jérôme Euzenat. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Sem. 7, 2 (2009), 57--73. Google ScholarDigital Library
Renzo Angles. 2012. A comparison of current graph database models. In Proceedings of the ICDE Workshop on Graph Data Management (GDM’12). Google ScholarDigital Library
Renzo Angles and Claudio Gutierrez. 2008. The expressive power of SPARQL. In Proceedings of the Conference on the Semantic Web (ISWC’08). 114--129. Google ScholarDigital Library
Renzo Angles and Claudio Gutiérrez. 2008. Survey of graph database models. ACM Comp. Surv. 40, 1 (2008).Google Scholar
Renzo Angles and Claudio Gutierrez. 2016. The multiset semantics of SPARQL patterns. In Proceedings of the Conference on the Semantic Web (ISWC’16). Springer, 20--36. Google ScholarDigital Library
Apache TinkerPop. 2017. TinkerPop3 Documentation v.3.2.5. Retrieved from http://tinkerpop.apache.org/docs/current/reference/ (June 2017).Google Scholar
Marcelo Arenas, Sebastián Conca, and Jorge Pérez. 2012. Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the Conference on the World Wide Web (WWW’12). 629--638.Google ScholarDigital Library
Guillaume Bagan, Angela Bonifati, and Benoît Groz. 2013. A trichotomy for regular simple path queries on graphs. In Proceedings of the Conference on Principles of Database Systems (PODS’13). 261--272.Google ScholarDigital Library
Pablo Barceló. 2013. Querying graph databases. In Proceedings of the Conference on Principles of Database Systems (PODS’13). 175--188.Google Scholar
Pablo Barceló, Leonid Libkin, Anthony Widjaja Lin, and Peter T. Wood. 2012a. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37, 4 (2012), 31.Google ScholarDigital Library
Pablo Barceló, Leonid Libkin, and Juan L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), 8:1--8:54.Google ScholarDigital Library
Pablo Barceló and Pablo Muñoz. 2014. Graph logics with rational relations: The role of word combinatorics. In Proceedings of the Conference on Logic in Computer Science (LICS’14). 12:1--12:10.Google ScholarDigital Library
Pablo Barceló, Jorge Pérez, and Juan L. Reutter. 2012b. Relative expressiveness of nested regular expressions. In Proceedings of the Alberto Mendelzon Workshop (AMW’12). 180--195.Google Scholar
Christopher L. Barrett, Riko Jacob, and Madhav V. Marathe. 2000. Formal-language-constrained path problems. SIAM J. Comput. 30, 3 (2000), 809--837. Google ScholarDigital Library
Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic repositories. Semant. Web J. 2, 1 (2011), 33--42.Google ScholarDigital Library
Peter Buneman. 1997. Semistructured data. In Proceedings of the Conference on Principles of Database Systems (PODS’97). 117--121. Google ScholarDigital Library
Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, and Dan Suciu. 1996. A query language and optimization techniques for unstructured data. In Proceeedings of the SIGMOD International Conference on Management of Data (SIGMOD’96). 505--516. Google ScholarDigital Library
Horst Bunke. 2000. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of the Conference on Vision Interface. 82--88.Google Scholar
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2000. Containment of conjunctive regular path queries with inverse. In Proceedings of the Conference on Knowledge Representation 8 Reasoning (KR’00). 176--185.Google Scholar
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3 (2002), 443--465. Google ScholarDigital Library
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2003. Reasoning on regular path queries. SIGMOD Rec. 32, 4 (2003), 83--92. Google ScholarDigital Library
Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2 (2000), 211--229. Google ScholarDigital Library
Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, and Haixun Wang. 2008. Fast graph pattern matching. In Proceedings of the International Conference on Data Engineering (ICDE’08). 913--922. Google ScholarDigital Library
Mariano P. Consens and Alberto O. Mendelzon. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the Conference on Principles of Database Systems (PODS’90). 404--416. Google ScholarDigital Library
Isabel F. Cruz, Alberto O. Mendelzon, and Peter T. Wood. 1987. A graphical query language supporting recursion. In Proceedings of the SIGMOD International Conference on Management of Data (SIGMOD’87). 323--330. Google ScholarDigital Library
Víctor Dalmau, Phokion G. Kolaitis, and Moshe Y. Vardi. 2002. Constraint satisfaction, bounded treewidth, and finite-variable logics. In Proceedings of the Conference on Constraint Programming (CP’02). 310--326.Google Scholar
DataStax. 2015. Titan Documentation. Retrieved from http://s3.thinkaurelius.com/docs/titan/1.0.0/ (2015).Google Scholar
Orri Erling. 2012. Virtuoso, a Hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35, 1 (2012), 3--8.Google Scholar
Wenfei Fan. 2012. Graph pattern matching revised for social network analysis. In Proceedings of the International Conference on Database Theory (ICDT’12). 8--21. Google ScholarDigital Library
Wenfei Fan, Jianzhong Li, Shuai Ma, Hongzhi Wang, and Yinghui Wu. 2010. Graph homomorphism revisited for graph matching. Proc. Very Large Data Bases 3, 1 (2010), 1161--1172. Google ScholarDigital Library
Diego Figueira and Leonid Libkin. 2015. Path logics for querying graphs: Combining expressiveness and efficiency. In Proceedings of the Conference on Logic in Computer Science (LICS’15). 329--340.Google ScholarDigital Library
Valeria Fionda, Giuseppe Pirrò, and Mariano P Consens. 2015. Extended property paths: Writing more SPARQL queries in a succinct way. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Daniela Florescu, Alon Y. Levy, and Dan Suciu. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Conference on Principles of Database Systems (PODS’98). 139--148. Google ScholarDigital Library
César A. Galindo-Legaria and Arnon Rosenthal. 1997. Outerjoin simplification and reordering for query optimization. ACM Trans. Database Syst. 22, 1 (1997), 43--73.Google ScholarDigital Library
Brian Gallagher. 2006. Matching structure and semantics: A survey on graph-based pattern matching. In Proceedings of the AAAI Fall Symposium. 43--53.Google Scholar
Georg Gottlob, Gianluigi Greco, Nicola Leone, and Francesco Scarcello. 2016. Hypertree decompositions: Questions and answers. In Proceedings of the Conference on Principles of Database Systems (PODS’16). 57--74.Google ScholarDigital Library
Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. Very Large Data Bases 8, 9 (2015), 950--961. Google ScholarDigital Library
Steve Harris, Nicholas Lamb, and Nigel Shadbolt. 2009. 4store: The design and implementation of a clustered RDF store. In Proceedings of the Conference on Scalable Semantic Web Knowledge Base Systems (SWSS’09). 94--109.Google Scholar
Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation (2013). Retrieved from http://www.w3.org/TR/sparql11-query/.Google Scholar
Pavol Hell and Jaroslav Nesetril. 2004. Graphs and Homomorphisms. Oxford University Press. Google ScholarCross Ref
Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata? In Proceedings of the Conference on Scalable Semantic Web Knowledge Base Systems (SWSS’15). 32--47.Google Scholar
David A. Holland, Uri Jacob Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo I. Seltzer.2008. Choosing a data model and query language for provenance. In Proceedings of the 2nd International Provenance and Annotation Workshop (IPAW’08).Google Scholar
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-marl: A DSL for easy and efficient graph analysis. In ACM SIGARCH Comput. Arch. News, Vol. 40. ACM, 349--362. Google ScholarDigital Library
John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2003. Introduction to Automata Theory, Languages, and Computation—International Edition (2nd ed). Addison-Wesley.Google ScholarDigital Library
Alekh Jindal and Samuel Madden. 2014. GRAPHiQL: A graph intuitive query language for relational databases. In Big Data. IEEE, 441--450. Google ScholarCross Ref
Graham Klyne, Jeremy J. Carroll, and Brian McBride. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation (25 Feb. 2014). Retrieved from https://www.w3.org/TR/rdf11-concepts/.Google Scholar
Egor V. Kostylev, Juan L. Reutter, Miguel Romero, and Domagoj Vrgoč. 2015. SPARQL with property paths. In Proceedings of the Conference on the Semantic Web (ISWC’15). 3--18.Google ScholarDigital Library
LDBC. 2015. LDBC Task Force: Property Graphs Data Model. Retrieved from http://www.ldbcouncil.org (2015).Google Scholar
Leonid Libkin, Wim Martens, and Domagoj Vrgoč. 2016. Querying graphs with data. J. ACM 63, 2 (2016), 14.Google ScholarDigital Library
Leonid Libkin and Domagoj Vrgoč. 2012. Regular path queries on graphs with data. In Proceedings of the International Conference on Database Theory (ICDT’12). 74--85.Google ScholarDigital Library
Lorenzo Livi and Antonello Rizzi. 2013. The graph matching problem. Pattern Anal. Appl. 16, 3 (2013), 253--283. Google ScholarDigital Library
Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL. ACM Trans. Database Syst. 38, 4 (2013), 24. Google ScholarDigital Library
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, 135--146. Google ScholarDigital Library
Akiyoshi Matono, Toshiyuki Amagasa, Masatoshi Yoshikawa, and Shunsuke Uemura. 2003. An efficient pathway search using an indexing scheme for RDF. Genome Informat. Ser. (2003), 374--375.Google Scholar
Alberto O. Mendelzon and Peter T. Wood. 1989. Finding regular simple paths in graph databases. In Proceedings of the Conference on Very Large Data Bases (VLDB’89). 185--193.Google Scholar
Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824--827. Google ScholarCross Ref
Hiroyuki Ogata, Wataru Fujibuchi, Susumu Goto, and Minoru Kanehisa. 2000. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucl. Acids Res. 28, 20 (2000), 4021--4028. Google ScholarCross Ref
Marcus Paradies, Wolfgang Lehner, and Christof Bornhövd. 2015. GRAPHITE: An extensible graph traversal framework for relational database management systems. In Proceedings of the Conference on Scientific and Statistical Database Management (SSDBM’15). 29:1--29:12.Google ScholarDigital Library
Task Force: Property Paths. 2009. Use cases in Property Paths Task Force. Retrieved from http://www.w3.org/2009/sparql/wiki/TaskForce:PropertyPaths#Use_Cases.Google Scholar
Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3 (2009), 1--45.Google ScholarDigital Library
Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2010. nSPARQL: A navigational language for RDF. J. Web Sem. 8, 4 (2010), 255--270.Google ScholarDigital Library
Eric Prud’hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF. W3C Recommendation (2008). Retrieved from http://www.w3.org/TR/rdf-sparql-query/.Google Scholar
Juan L. Reutter, Miguel Romero, and Moshe Y. Vardi. 2015. Regular queries on graph databases. In Proceedings of the International Conference on Database Theory (ICDT’15). 177--194.Google Scholar
Patrick Reynolds. 2015. Oracle of Bacon. Retrieved from http://www.oracleofbacon.org/ (2015).Google Scholar
Kaspar Riesen, Xiaoyi Jiang, and Horst Bunke. 2010. Exact and inexact graph matching: Methodology and applications. In Proceedings of the Conference on Managing and Mining Graph Data. 217--247. Google ScholarCross Ref
Ian Robinson, Jim Webber, and Emil Eifrem. 2013. Graph Databases (first ed.). O’Reilly Media.Google Scholar
Michael Schmidt, Michael Meier, and Georg Lausen. 2010. Foundations of SPARQL query optimization. In Proceedings of the International Conference on Database Theory (ICDT’10). 4--33. Google ScholarDigital Library
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. Very Large Data Bases 4, 11 (2011), 992--1003.Google ScholarDigital Library
Claudio Tesoriero. 2013. Getting Started with OrientDB. Packt Publishing Ltd.Google Scholar
The Neo4j Team. 2016. The Neo4j Manual v3.0. Retrieved from http://neo4j.com/docs/stable/.Google Scholar
Bryan B. Thompson, Mike Personick, and Martyn Cutcher. 2014. The bigdata® RDF graph database. In Proceedings of the Conference on Linked Data Management. 193--237. Google ScholarCross Ref
Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42. Google ScholarDigital Library
Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410--421. Google ScholarDigital Library
Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In Proceedings of the Conference on Graph Data Management Experiences and Systems (GRADES’16). Google ScholarDigital Library
Moshe Y. Vardi. 1982. The complexity of relational query languages (extended abstract). In Proceedings of the Symposium on Theory of Computing (STOC’82). 137--146.Google ScholarDigital Library
Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds, and Luping Ding. 2003. Supporting scalable, persistent semantic web applications. IEEE Data Eng. Bull. 26, 4 (2003), 33--39.Google Scholar
Peter T. Wood. 2012. Query languages for graph databases. SIGMOD Rec. 41, 1 (2012), 50--60. Google ScholarDigital Library
Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In Proceedings of the Conference on Graph Data Management Experiences and Systems (GRADES’13). ACM, 2.Google ScholarDigital Library
Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang. 2016. A short survey of recent advances in graph matching. In Proceedings of the Conference on Multimedia Retrieval (ICMR’16). 167--174. Google ScholarDigital Library
Jeffrey Xu Yu and Jiefeng Cheng. 2010. Graph reachability queries: A survey. In Managing and Mining Graph Data. Advances in Database Systems, Vol. 40. Springer, 181--215. Google ScholarCross Ref
Lei Zou, Lei Chen, and M. Tamer Özsu. 2009. DistanceJoin: Pattern match query in a large graph database. Proc. Very Large Data Bases 2, 1 (2009), 886--897.Google ScholarDigital Library

Index Terms

Foundations of Modern Query Languages for Graph Databases
1. Information systems
  1. Data management systems
    1. Query languages
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query languages (principles)

Recommendations

Querying graph patterns
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data,...
Read More
Querying Regular Graph Patterns

Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, that is, graph patterns. While queries need to be posed against such ...
Read More
Schema Mappings for Data Graphs
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Schema mappings are a fundamental concept in data integration and exchange, and they have been thoroughly studied in different data models. For graph data, however, mappings have been studied in a restricted context that, unlike real-life graph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 50, Issue 5
September 2018
573 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3145473
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 September 2017
- Revised: 1 June 2017
- Accepted: 1 June 2017
- Received: 1 October 2016
Published in csur Volume 50, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Property graphs
aggregation
graph databases
graph patterns
navigation
query languages
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 196
  Total Citations
  View Citations
- 4,489
  Total Downloads
- Downloads (Last 12 months)567
- Downloads (Last 6 weeks)70
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Foundations of Modern Query Languages for Graph Databases

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Querying graph patterns

Querying Regular Graph Patterns

Schema Mappings for Data Graphs