skip to main content
survey

Foundations of Modern Query Languages for Graph Databases

Authors Info & Claims
Published:26 September 2017Publication History
Skip Abstract Section

Abstract

We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher, and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.

Skip Supplemental Material Section

Supplemental Material

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Charu C. Aggarwal and Haixun Wang. 2010. Managing and Mining Graph Data. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  3. Faisal Alkhateeb, Jean-François Baget, and Jérôme Euzenat. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Sem. 7, 2 (2009), 57--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Renzo Angles. 2012. A comparison of current graph database models. In Proceedings of the ICDE Workshop on Graph Data Management (GDM’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Renzo Angles and Claudio Gutierrez. 2008. The expressive power of SPARQL. In Proceedings of the Conference on the Semantic Web (ISWC’08). 114--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Renzo Angles and Claudio Gutiérrez. 2008. Survey of graph database models. ACM Comp. Surv. 40, 1 (2008).Google ScholarGoogle Scholar
  7. Renzo Angles and Claudio Gutierrez. 2016. The multiset semantics of SPARQL patterns. In Proceedings of the Conference on the Semantic Web (ISWC’16). Springer, 20--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Apache TinkerPop. 2017. TinkerPop3 Documentation v.3.2.5. Retrieved from http://tinkerpop.apache.org/docs/current/reference/ (June 2017).Google ScholarGoogle Scholar
  9. Marcelo Arenas, Sebastián Conca, and Jorge Pérez. 2012. Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the Conference on the World Wide Web (WWW’12). 629--638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guillaume Bagan, Angela Bonifati, and Benoît Groz. 2013. A trichotomy for regular simple path queries on graphs. In Proceedings of the Conference on Principles of Database Systems (PODS’13). 261--272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pablo Barceló. 2013. Querying graph databases. In Proceedings of the Conference on Principles of Database Systems (PODS’13). 175--188.Google ScholarGoogle Scholar
  12. Pablo Barceló, Leonid Libkin, Anthony Widjaja Lin, and Peter T. Wood. 2012a. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37, 4 (2012), 31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pablo Barceló, Leonid Libkin, and Juan L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), 8:1--8:54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pablo Barceló and Pablo Muñoz. 2014. Graph logics with rational relations: The role of word combinatorics. In Proceedings of the Conference on Logic in Computer Science (LICS’14). 12:1--12:10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pablo Barceló, Jorge Pérez, and Juan L. Reutter. 2012b. Relative expressiveness of nested regular expressions. In Proceedings of the Alberto Mendelzon Workshop (AMW’12). 180--195.Google ScholarGoogle Scholar
  16. Christopher L. Barrett, Riko Jacob, and Madhav V. Marathe. 2000. Formal-language-constrained path problems. SIAM J. Comput. 30, 3 (2000), 809--837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic repositories. Semant. Web J. 2, 1 (2011), 33--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peter Buneman. 1997. Semistructured data. In Proceedings of the Conference on Principles of Database Systems (PODS’97). 117--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, and Dan Suciu. 1996. A query language and optimization techniques for unstructured data. In Proceeedings of the SIGMOD International Conference on Management of Data (SIGMOD’96). 505--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Horst Bunke. 2000. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of the Conference on Vision Interface. 82--88.Google ScholarGoogle Scholar
  21. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2000. Containment of conjunctive regular path queries with inverse. In Proceedings of the Conference on Knowledge Representation 8 Reasoning (KR’00). 176--185.Google ScholarGoogle Scholar
  22. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3 (2002), 443--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2003. Reasoning on regular path queries. SIGMOD Rec. 32, 4 (2003), 83--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2 (2000), 211--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, and Haixun Wang. 2008. Fast graph pattern matching. In Proceedings of the International Conference on Data Engineering (ICDE’08). 913--922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mariano P. Consens and Alberto O. Mendelzon. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the Conference on Principles of Database Systems (PODS’90). 404--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Isabel F. Cruz, Alberto O. Mendelzon, and Peter T. Wood. 1987. A graphical query language supporting recursion. In Proceedings of the SIGMOD International Conference on Management of Data (SIGMOD’87). 323--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Víctor Dalmau, Phokion G. Kolaitis, and Moshe Y. Vardi. 2002. Constraint satisfaction, bounded treewidth, and finite-variable logics. In Proceedings of the Conference on Constraint Programming (CP’02). 310--326.Google ScholarGoogle Scholar
  29. DataStax. 2015. Titan Documentation. Retrieved from http://s3.thinkaurelius.com/docs/titan/1.0.0/ (2015).Google ScholarGoogle Scholar
  30. Orri Erling. 2012. Virtuoso, a Hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35, 1 (2012), 3--8.Google ScholarGoogle Scholar
  31. Wenfei Fan. 2012. Graph pattern matching revised for social network analysis. In Proceedings of the International Conference on Database Theory (ICDT’12). 8--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wenfei Fan, Jianzhong Li, Shuai Ma, Hongzhi Wang, and Yinghui Wu. 2010. Graph homomorphism revisited for graph matching. Proc. Very Large Data Bases 3, 1 (2010), 1161--1172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Diego Figueira and Leonid Libkin. 2015. Path logics for querying graphs: Combining expressiveness and efficiency. In Proceedings of the Conference on Logic in Computer Science (LICS’15). 329--340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Valeria Fionda, Giuseppe Pirrò, and Mariano P Consens. 2015. Extended property paths: Writing more SPARQL queries in a succinct way. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  35. Daniela Florescu, Alon Y. Levy, and Dan Suciu. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Conference on Principles of Database Systems (PODS’98). 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. César A. Galindo-Legaria and Arnon Rosenthal. 1997. Outerjoin simplification and reordering for query optimization. ACM Trans. Database Syst. 22, 1 (1997), 43--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Brian Gallagher. 2006. Matching structure and semantics: A survey on graph-based pattern matching. In Proceedings of the AAAI Fall Symposium. 43--53.Google ScholarGoogle Scholar
  38. Georg Gottlob, Gianluigi Greco, Nicola Leone, and Francesco Scarcello. 2016. Hypertree decompositions: Questions and answers. In Proceedings of the Conference on Principles of Database Systems (PODS’16). 57--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. Very Large Data Bases 8, 9 (2015), 950--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Steve Harris, Nicholas Lamb, and Nigel Shadbolt. 2009. 4store: The design and implementation of a clustered RDF store. In Proceedings of the Conference on Scalable Semantic Web Knowledge Base Systems (SWSS’09). 94--109.Google ScholarGoogle Scholar
  41. Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation (2013). Retrieved from http://www.w3.org/TR/sparql11-query/.Google ScholarGoogle Scholar
  42. Pavol Hell and Jaroslav Nesetril. 2004. Graphs and Homomorphisms. Oxford University Press. Google ScholarGoogle ScholarCross RefCross Ref
  43. Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata? In Proceedings of the Conference on Scalable Semantic Web Knowledge Base Systems (SWSS’15). 32--47.Google ScholarGoogle Scholar
  44. David A. Holland, Uri Jacob Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo I. Seltzer.2008. Choosing a data model and query language for provenance. In Proceedings of the 2nd International Provenance and Annotation Workshop (IPAW’08).Google ScholarGoogle Scholar
  45. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-marl: A DSL for easy and efficient graph analysis. In ACM SIGARCH Comput. Arch. News, Vol. 40. ACM, 349--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2003. Introduction to Automata Theory, Languages, and Computation—International Edition (2nd ed). Addison-Wesley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Alekh Jindal and Samuel Madden. 2014. GRAPHiQL: A graph intuitive query language for relational databases. In Big Data. IEEE, 441--450. Google ScholarGoogle ScholarCross RefCross Ref
  48. Graham Klyne, Jeremy J. Carroll, and Brian McBride. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation (25 Feb. 2014). Retrieved from https://www.w3.org/TR/rdf11-concepts/.Google ScholarGoogle Scholar
  49. Egor V. Kostylev, Juan L. Reutter, Miguel Romero, and Domagoj Vrgoč. 2015. SPARQL with property paths. In Proceedings of the Conference on the Semantic Web (ISWC’15). 3--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. LDBC. 2015. LDBC Task Force: Property Graphs Data Model. Retrieved from http://www.ldbcouncil.org (2015).Google ScholarGoogle Scholar
  51. Leonid Libkin, Wim Martens, and Domagoj Vrgoč. 2016. Querying graphs with data. J. ACM 63, 2 (2016), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Leonid Libkin and Domagoj Vrgoč. 2012. Regular path queries on graphs with data. In Proceedings of the International Conference on Database Theory (ICDT’12). 74--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Lorenzo Livi and Antonello Rizzi. 2013. The graph matching problem. Pattern Anal. Appl. 16, 3 (2013), 253--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL. ACM Trans. Database Syst. 38, 4 (2013), 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Akiyoshi Matono, Toshiyuki Amagasa, Masatoshi Yoshikawa, and Shunsuke Uemura. 2003. An efficient pathway search using an indexing scheme for RDF. Genome Informat. Ser. (2003), 374--375.Google ScholarGoogle Scholar
  57. Alberto O. Mendelzon and Peter T. Wood. 1989. Finding regular simple paths in graph databases. In Proceedings of the Conference on Very Large Data Bases (VLDB’89). 185--193.Google ScholarGoogle Scholar
  58. Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824--827. Google ScholarGoogle ScholarCross RefCross Ref
  59. Hiroyuki Ogata, Wataru Fujibuchi, Susumu Goto, and Minoru Kanehisa. 2000. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucl. Acids Res. 28, 20 (2000), 4021--4028. Google ScholarGoogle ScholarCross RefCross Ref
  60. Marcus Paradies, Wolfgang Lehner, and Christof Bornhövd. 2015. GRAPHITE: An extensible graph traversal framework for relational database management systems. In Proceedings of the Conference on Scientific and Statistical Database Management (SSDBM’15). 29:1--29:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Task Force: Property Paths. 2009. Use cases in Property Paths Task Force. Retrieved from http://www.w3.org/2009/sparql/wiki/TaskForce:PropertyPaths#Use_Cases.Google ScholarGoogle Scholar
  62. Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3 (2009), 1--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2010. nSPARQL: A navigational language for RDF. J. Web Sem. 8, 4 (2010), 255--270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Eric Prud’hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF. W3C Recommendation (2008). Retrieved from http://www.w3.org/TR/rdf-sparql-query/.Google ScholarGoogle Scholar
  65. Juan L. Reutter, Miguel Romero, and Moshe Y. Vardi. 2015. Regular queries on graph databases. In Proceedings of the International Conference on Database Theory (ICDT’15). 177--194.Google ScholarGoogle Scholar
  66. Patrick Reynolds. 2015. Oracle of Bacon. Retrieved from http://www.oracleofbacon.org/ (2015).Google ScholarGoogle Scholar
  67. Kaspar Riesen, Xiaoyi Jiang, and Horst Bunke. 2010. Exact and inexact graph matching: Methodology and applications. In Proceedings of the Conference on Managing and Mining Graph Data. 217--247. Google ScholarGoogle ScholarCross RefCross Ref
  68. Ian Robinson, Jim Webber, and Emil Eifrem. 2013. Graph Databases (first ed.). O’Reilly Media.Google ScholarGoogle Scholar
  69. Michael Schmidt, Michael Meier, and Georg Lausen. 2010. Foundations of SPARQL query optimization. In Proceedings of the International Conference on Database Theory (ICDT’10). 4--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. Very Large Data Bases 4, 11 (2011), 992--1003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Claudio Tesoriero. 2013. Getting Started with OrientDB. Packt Publishing Ltd.Google ScholarGoogle Scholar
  72. The Neo4j Team. 2016. The Neo4j Manual v3.0. Retrieved from http://neo4j.com/docs/stable/.Google ScholarGoogle Scholar
  73. Bryan B. Thompson, Mike Personick, and Martyn Cutcher. 2014. The bigdata® RDF graph database. In Proceedings of the Conference on Linked Data Management. 193--237. Google ScholarGoogle ScholarCross RefCross Ref
  74. Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In Proceedings of the Conference on Graph Data Management Experiences and Systems (GRADES’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Moshe Y. Vardi. 1982. The complexity of relational query languages (extended abstract). In Proceedings of the Symposium on Theory of Computing (STOC’82). 137--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds, and Luping Ding. 2003. Supporting scalable, persistent semantic web applications. IEEE Data Eng. Bull. 26, 4 (2003), 33--39.Google ScholarGoogle Scholar
  79. Peter T. Wood. 2012. Query languages for graph databases. SIGMOD Rec. 41, 1 (2012), 50--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In Proceedings of the Conference on Graph Data Management Experiences and Systems (GRADES’13). ACM, 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang. 2016. A short survey of recent advances in graph matching. In Proceedings of the Conference on Multimedia Retrieval (ICMR’16). 167--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Jeffrey Xu Yu and Jiefeng Cheng. 2010. Graph reachability queries: A survey. In Managing and Mining Graph Data. Advances in Database Systems, Vol. 40. Springer, 181--215. Google ScholarGoogle ScholarCross RefCross Ref
  83. Lei Zou, Lei Chen, and M. Tamer Özsu. 2009. DistanceJoin: Pattern match query in a large graph database. Proc. Very Large Data Bases 2, 1 (2009), 886--897.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Foundations of Modern Query Languages for Graph Databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 50, Issue 5
        September 2018
        573 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/3145473
        • Editor:
        • Sartaj Sahni
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 September 2017
        • Revised: 1 June 2017
        • Accepted: 1 June 2017
        • Received: 1 October 2016
        Published in csur Volume 50, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader