Abstract
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher, and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Foundations of Modern Query Languages for Graph Databases
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google ScholarDigital Library
- Charu C. Aggarwal and Haixun Wang. 2010. Managing and Mining Graph Data. Springer. Google ScholarCross Ref
- Faisal Alkhateeb, Jean-François Baget, and Jérôme Euzenat. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Sem. 7, 2 (2009), 57--73. Google ScholarDigital Library
- Renzo Angles. 2012. A comparison of current graph database models. In Proceedings of the ICDE Workshop on Graph Data Management (GDM’12). Google ScholarDigital Library
- Renzo Angles and Claudio Gutierrez. 2008. The expressive power of SPARQL. In Proceedings of the Conference on the Semantic Web (ISWC’08). 114--129. Google ScholarDigital Library
- Renzo Angles and Claudio Gutiérrez. 2008. Survey of graph database models. ACM Comp. Surv. 40, 1 (2008).Google Scholar
- Renzo Angles and Claudio Gutierrez. 2016. The multiset semantics of SPARQL patterns. In Proceedings of the Conference on the Semantic Web (ISWC’16). Springer, 20--36. Google ScholarDigital Library
- Apache TinkerPop. 2017. TinkerPop3 Documentation v.3.2.5. Retrieved from http://tinkerpop.apache.org/docs/current/reference/ (June 2017).Google Scholar
- Marcelo Arenas, Sebastián Conca, and Jorge Pérez. 2012. Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the Conference on the World Wide Web (WWW’12). 629--638.Google ScholarDigital Library
- Guillaume Bagan, Angela Bonifati, and Benoît Groz. 2013. A trichotomy for regular simple path queries on graphs. In Proceedings of the Conference on Principles of Database Systems (PODS’13). 261--272.Google ScholarDigital Library
- Pablo Barceló. 2013. Querying graph databases. In Proceedings of the Conference on Principles of Database Systems (PODS’13). 175--188.Google Scholar
- Pablo Barceló, Leonid Libkin, Anthony Widjaja Lin, and Peter T. Wood. 2012a. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37, 4 (2012), 31.Google ScholarDigital Library
- Pablo Barceló, Leonid Libkin, and Juan L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), 8:1--8:54.Google ScholarDigital Library
- Pablo Barceló and Pablo Muñoz. 2014. Graph logics with rational relations: The role of word combinatorics. In Proceedings of the Conference on Logic in Computer Science (LICS’14). 12:1--12:10.Google ScholarDigital Library
- Pablo Barceló, Jorge Pérez, and Juan L. Reutter. 2012b. Relative expressiveness of nested regular expressions. In Proceedings of the Alberto Mendelzon Workshop (AMW’12). 180--195.Google Scholar
- Christopher L. Barrett, Riko Jacob, and Madhav V. Marathe. 2000. Formal-language-constrained path problems. SIAM J. Comput. 30, 3 (2000), 809--837. Google ScholarDigital Library
- Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic repositories. Semant. Web J. 2, 1 (2011), 33--42.Google ScholarDigital Library
- Peter Buneman. 1997. Semistructured data. In Proceedings of the Conference on Principles of Database Systems (PODS’97). 117--121. Google ScholarDigital Library
- Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, and Dan Suciu. 1996. A query language and optimization techniques for unstructured data. In Proceeedings of the SIGMOD International Conference on Management of Data (SIGMOD’96). 505--516. Google ScholarDigital Library
- Horst Bunke. 2000. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of the Conference on Vision Interface. 82--88.Google Scholar
- Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2000. Containment of conjunctive regular path queries with inverse. In Proceedings of the Conference on Knowledge Representation 8 Reasoning (KR’00). 176--185.Google Scholar
- Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3 (2002), 443--465. Google ScholarDigital Library
- Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2003. Reasoning on regular path queries. SIGMOD Rec. 32, 4 (2003), 83--92. Google ScholarDigital Library
- Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2 (2000), 211--229. Google ScholarDigital Library
- Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, and Haixun Wang. 2008. Fast graph pattern matching. In Proceedings of the International Conference on Data Engineering (ICDE’08). 913--922. Google ScholarDigital Library
- Mariano P. Consens and Alberto O. Mendelzon. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the Conference on Principles of Database Systems (PODS’90). 404--416. Google ScholarDigital Library
- Isabel F. Cruz, Alberto O. Mendelzon, and Peter T. Wood. 1987. A graphical query language supporting recursion. In Proceedings of the SIGMOD International Conference on Management of Data (SIGMOD’87). 323--330. Google ScholarDigital Library
- Víctor Dalmau, Phokion G. Kolaitis, and Moshe Y. Vardi. 2002. Constraint satisfaction, bounded treewidth, and finite-variable logics. In Proceedings of the Conference on Constraint Programming (CP’02). 310--326.Google Scholar
- DataStax. 2015. Titan Documentation. Retrieved from http://s3.thinkaurelius.com/docs/titan/1.0.0/ (2015).Google Scholar
- Orri Erling. 2012. Virtuoso, a Hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35, 1 (2012), 3--8.Google Scholar
- Wenfei Fan. 2012. Graph pattern matching revised for social network analysis. In Proceedings of the International Conference on Database Theory (ICDT’12). 8--21. Google ScholarDigital Library
- Wenfei Fan, Jianzhong Li, Shuai Ma, Hongzhi Wang, and Yinghui Wu. 2010. Graph homomorphism revisited for graph matching. Proc. Very Large Data Bases 3, 1 (2010), 1161--1172. Google ScholarDigital Library
- Diego Figueira and Leonid Libkin. 2015. Path logics for querying graphs: Combining expressiveness and efficiency. In Proceedings of the Conference on Logic in Computer Science (LICS’15). 329--340.Google ScholarDigital Library
- Valeria Fionda, Giuseppe Pirrò, and Mariano P Consens. 2015. Extended property paths: Writing more SPARQL queries in a succinct way. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Daniela Florescu, Alon Y. Levy, and Dan Suciu. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Conference on Principles of Database Systems (PODS’98). 139--148. Google ScholarDigital Library
- César A. Galindo-Legaria and Arnon Rosenthal. 1997. Outerjoin simplification and reordering for query optimization. ACM Trans. Database Syst. 22, 1 (1997), 43--73.Google ScholarDigital Library
- Brian Gallagher. 2006. Matching structure and semantics: A survey on graph-based pattern matching. In Proceedings of the AAAI Fall Symposium. 43--53.Google Scholar
- Georg Gottlob, Gianluigi Greco, Nicola Leone, and Francesco Scarcello. 2016. Hypertree decompositions: Questions and answers. In Proceedings of the Conference on Principles of Database Systems (PODS’16). 57--74.Google ScholarDigital Library
- Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. Very Large Data Bases 8, 9 (2015), 950--961. Google ScholarDigital Library
- Steve Harris, Nicholas Lamb, and Nigel Shadbolt. 2009. 4store: The design and implementation of a clustered RDF store. In Proceedings of the Conference on Scalable Semantic Web Knowledge Base Systems (SWSS’09). 94--109.Google Scholar
- Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation (2013). Retrieved from http://www.w3.org/TR/sparql11-query/.Google Scholar
- Pavol Hell and Jaroslav Nesetril. 2004. Graphs and Homomorphisms. Oxford University Press. Google ScholarCross Ref
- Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata? In Proceedings of the Conference on Scalable Semantic Web Knowledge Base Systems (SWSS’15). 32--47.Google Scholar
- David A. Holland, Uri Jacob Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo I. Seltzer.2008. Choosing a data model and query language for provenance. In Proceedings of the 2nd International Provenance and Annotation Workshop (IPAW’08).Google Scholar
- Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-marl: A DSL for easy and efficient graph analysis. In ACM SIGARCH Comput. Arch. News, Vol. 40. ACM, 349--362. Google ScholarDigital Library
- John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2003. Introduction to Automata Theory, Languages, and Computation—International Edition (2nd ed). Addison-Wesley.Google ScholarDigital Library
- Alekh Jindal and Samuel Madden. 2014. GRAPHiQL: A graph intuitive query language for relational databases. In Big Data. IEEE, 441--450. Google ScholarCross Ref
- Graham Klyne, Jeremy J. Carroll, and Brian McBride. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation (25 Feb. 2014). Retrieved from https://www.w3.org/TR/rdf11-concepts/.Google Scholar
- Egor V. Kostylev, Juan L. Reutter, Miguel Romero, and Domagoj Vrgoč. 2015. SPARQL with property paths. In Proceedings of the Conference on the Semantic Web (ISWC’15). 3--18.Google ScholarDigital Library
- LDBC. 2015. LDBC Task Force: Property Graphs Data Model. Retrieved from http://www.ldbcouncil.org (2015).Google Scholar
- Leonid Libkin, Wim Martens, and Domagoj Vrgoč. 2016. Querying graphs with data. J. ACM 63, 2 (2016), 14.Google ScholarDigital Library
- Leonid Libkin and Domagoj Vrgoč. 2012. Regular path queries on graphs with data. In Proceedings of the International Conference on Database Theory (ICDT’12). 74--85.Google ScholarDigital Library
- Lorenzo Livi and Antonello Rizzi. 2013. The graph matching problem. Pattern Anal. Appl. 16, 3 (2013), 253--283. Google ScholarDigital Library
- Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL. ACM Trans. Database Syst. 38, 4 (2013), 24. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, 135--146. Google ScholarDigital Library
- Akiyoshi Matono, Toshiyuki Amagasa, Masatoshi Yoshikawa, and Shunsuke Uemura. 2003. An efficient pathway search using an indexing scheme for RDF. Genome Informat. Ser. (2003), 374--375.Google Scholar
- Alberto O. Mendelzon and Peter T. Wood. 1989. Finding regular simple paths in graph databases. In Proceedings of the Conference on Very Large Data Bases (VLDB’89). 185--193.Google Scholar
- Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824--827. Google ScholarCross Ref
- Hiroyuki Ogata, Wataru Fujibuchi, Susumu Goto, and Minoru Kanehisa. 2000. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucl. Acids Res. 28, 20 (2000), 4021--4028. Google ScholarCross Ref
- Marcus Paradies, Wolfgang Lehner, and Christof Bornhövd. 2015. GRAPHITE: An extensible graph traversal framework for relational database management systems. In Proceedings of the Conference on Scientific and Statistical Database Management (SSDBM’15). 29:1--29:12.Google ScholarDigital Library
- Task Force: Property Paths. 2009. Use cases in Property Paths Task Force. Retrieved from http://www.w3.org/2009/sparql/wiki/TaskForce:PropertyPaths#Use_Cases.Google Scholar
- Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3 (2009), 1--45.Google ScholarDigital Library
- Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2010. nSPARQL: A navigational language for RDF. J. Web Sem. 8, 4 (2010), 255--270.Google ScholarDigital Library
- Eric Prud’hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF. W3C Recommendation (2008). Retrieved from http://www.w3.org/TR/rdf-sparql-query/.Google Scholar
- Juan L. Reutter, Miguel Romero, and Moshe Y. Vardi. 2015. Regular queries on graph databases. In Proceedings of the International Conference on Database Theory (ICDT’15). 177--194.Google Scholar
- Patrick Reynolds. 2015. Oracle of Bacon. Retrieved from http://www.oracleofbacon.org/ (2015).Google Scholar
- Kaspar Riesen, Xiaoyi Jiang, and Horst Bunke. 2010. Exact and inexact graph matching: Methodology and applications. In Proceedings of the Conference on Managing and Mining Graph Data. 217--247. Google ScholarCross Ref
- Ian Robinson, Jim Webber, and Emil Eifrem. 2013. Graph Databases (first ed.). O’Reilly Media.Google Scholar
- Michael Schmidt, Michael Meier, and Georg Lausen. 2010. Foundations of SPARQL query optimization. In Proceedings of the International Conference on Database Theory (ICDT’10). 4--33. Google ScholarDigital Library
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. Very Large Data Bases 4, 11 (2011), 992--1003.Google ScholarDigital Library
- Claudio Tesoriero. 2013. Getting Started with OrientDB. Packt Publishing Ltd.Google Scholar
- The Neo4j Team. 2016. The Neo4j Manual v3.0. Retrieved from http://neo4j.com/docs/stable/.Google Scholar
- Bryan B. Thompson, Mike Personick, and Martyn Cutcher. 2014. The bigdata® RDF graph database. In Proceedings of the Conference on Linked Data Management. 193--237. Google ScholarCross Ref
- Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42. Google ScholarDigital Library
- Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410--421. Google ScholarDigital Library
- Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In Proceedings of the Conference on Graph Data Management Experiences and Systems (GRADES’16). Google ScholarDigital Library
- Moshe Y. Vardi. 1982. The complexity of relational query languages (extended abstract). In Proceedings of the Symposium on Theory of Computing (STOC’82). 137--146.Google ScholarDigital Library
- Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds, and Luping Ding. 2003. Supporting scalable, persistent semantic web applications. IEEE Data Eng. Bull. 26, 4 (2003), 33--39.Google Scholar
- Peter T. Wood. 2012. Query languages for graph databases. SIGMOD Rec. 41, 1 (2012), 50--60. Google ScholarDigital Library
- Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In Proceedings of the Conference on Graph Data Management Experiences and Systems (GRADES’13). ACM, 2.Google ScholarDigital Library
- Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang. 2016. A short survey of recent advances in graph matching. In Proceedings of the Conference on Multimedia Retrieval (ICMR’16). 167--174. Google ScholarDigital Library
- Jeffrey Xu Yu and Jiefeng Cheng. 2010. Graph reachability queries: A survey. In Managing and Mining Graph Data. Advances in Database Systems, Vol. 40. Springer, 181--215. Google ScholarCross Ref
- Lei Zou, Lei Chen, and M. Tamer Özsu. 2009. DistanceJoin: Pattern match query in a large graph database. Proc. Very Large Data Bases 2, 1 (2009), 886--897.Google ScholarDigital Library
Index Terms
- Foundations of Modern Query Languages for Graph Databases
Recommendations
Querying graph patterns
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsGraph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data,...
Querying Regular Graph Patterns
Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, that is, graph patterns. While queries need to be posed against such ...
Schema Mappings for Data Graphs
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsSchema mappings are a fundamental concept in data integration and exchange, and they have been thoroughly studied in different data models. For graph data, however, mappings have been studied in a restricted context that, unlike real-life graph ...
Comments