Abstract
For many problems arising in the setting of graph querying (such as finding semantic associations in RDF graphs, exact and approximate pattern matching, sequence alignment, etc.), the power of standard languages such as the widely studied conjunctive regular path queries (CRPQs) is insufficient in at least two ways. First, they cannot output paths and second, more crucially, they cannot express relationships among paths.
We thus propose a class of extended CRPQs, called ECRPQs, which add regular relations on tuples of paths, and allow path variables in the heads of queries. We provide several examples of their usefulness in querying graph structured data, and study their properties. We analyze query evaluation and representation of tuples of paths in the output by means of automata. We present a detailed analysis of data and combined complexity of queries, and consider restrictions that lower the complexity of ECRPQs to that of relational conjunctive queries. We study the containment problem, and look at further extensions with first-order features, and with nonregular relations that add arithmetic constraints on the lengths of paths and numbers of occurrences of labels.
Supplemental Material
Available for Download
The proof is given in an electronic appendix, available online in the ACM Digital Library.
- Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. 1997. The LOREL query language for semistructured data. Int. J. Digital Libraries 1, 1, 68--88.Google ScholarCross Ref
- Abiteboul, S., Buneman, P., and Suciu, D. 1999. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kauffman. Google ScholarDigital Library
- Abulla, P., Jonnson, B., Nilsson, M., and Saksena, M. 2003. A survey of regular model checking. In Proceedings of the 5th International Conference on Concurrence Theory. 35--48.Google Scholar
- Alkhateeb, F., Baget, J.-F., and Euzenat, J. 2008. Constrained regular expressions in SPARQL. In Proceedings of the International Conference on Semantic Web & Web Services. 91--99.Google Scholar
- Alkhateeb, F., Baget, J.-F., and Euzenat, J. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Semantics 7, 2, 57--73. Google ScholarDigital Library
- Anyanwu, K. and Sheth, A. 2003. ρ-queries: Enabling querying for semantic associations on the semantic web. In Proceedings of the 12th International World Wide Web Conference. 690--699. Google ScholarDigital Library
- Anyanwu, K., Maduko, A., and Sheth, A. P. 2007. SPARQ2L: Towards support for subgraph extraction queries in RDF databases. In Proceedings of the 16th International World Wide Web Conference. 797--806. Google ScholarDigital Library
- Barceló, P., Hurtado, C. A., Libkin, L., and Wood, P. T. 2010. Expressive languages for path queries over graph-structured data. In Proceedings of the 29th ACM Symposium on Principles of Database Systems. 3--14. Google ScholarDigital Library
- Barceló, P., Pérez, J., and Reutter, J. 2012. Relative expressiveness of nested regular expressions. In Proceedings of the 6th Alberto Mendelzon Workshop on the Foundations of Data Management and the Web. 180--195.Google Scholar
- Barrett, C., Jacob, R., and Marathe, M. 2000. Formal-language-constrained path problems. SIAM J. Comput. 30, 3, 809--837. Google ScholarDigital Library
- Benedikt, M., Libkin, L., Schwentick, T., and Segoufin, L. 2003. Definable relations and first-order query languages over strings. J. ACM 50, 5, 694--751. Google ScholarDigital Library
- Berstel, J. 1979. Transductions and Context-Free Languages. B. G. Teubner.Google Scholar
- Blumensath, A. and Grädel, E. 2000. Automatic structures. In Proceedings of the 15th Annual IEEE Symposium on Logic in Computer Science. 51--62. Google ScholarDigital Library
- Bruyère, V., Hansel, G., Michaux, C., and Villemaire, R. 1994. Logic and p-recognizable sets of integers. Bull. Belgian Math. Soc. 1, 191--238.Google ScholarCross Ref
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2000. Containment of conjunctive regular path queries with inverse. In Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning. 176--185.Google Scholar
- Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3, 443--465.Google ScholarDigital Library
- Chrobak, M. 1986. Finite automata and unary languages. Theor. Comput. Sci. 47, 2, 149--158. Google ScholarCross Ref
- Consens, M. and Mendelzon, A. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the 9th ACM Symposium on Principles of Database Systems. 404--416. Google ScholarDigital Library
- Deutsch, A. and Tannen, V. 2001. Optimization properties for classes of conjunctive regular path queries. In Proceedings of the 8th International Workshop on Database Programming Languages. 21--39. Google ScholarDigital Library
- Elgot, C. and Mezei, J. 1965. On relations defined by generalized finite automata. IBM J. Resear. Dev. 9, 1, 47--68. Google ScholarDigital Library
- Florescu, D., Levy, A., and Suciu, D. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the 17th ACM Symposium on Principles of Database Systems. 139--148. Google ScholarDigital Library
- Freydenberger, D. and Reidenbach, D. 2010. Bad news on decision problems for patterns. Inf. Comput. 208, 1, 83--96. Google ScholarDigital Library
- Freydenberger, D. and Schweikardt, N. 2011. Expressiveness and static analysis of extended conjunctive regular path queries. In Proceedings of the 5th Alberto Mendelzon International Workshop on Foundations of Data Management.Google Scholar
- Frougny, C. and Sakarovitch, J. 1991. Rational relations with bounded delay. In Proceedings of the 8th Annual Symposium on Theoretical Aspects of Computer Science. 50--63. Google ScholarDigital Library
- Grahne, G. and Thomo, A. 2004. Query answering and containment for regular path queries under distortions. In Proceedings of the 3rd International Symposium on the Foundations of Information and Knowledge Systems. 98--115.Google Scholar
- Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press. Google ScholarDigital Library
- Holland, D., Braun, U., Maclean, D., Muniswamy-Reddy, K., and Seltzer, M. 2008. Choosing a data model and query language for provenance. In Proceedings of the 2nd International Provenance and Annotation Workshop.Google Scholar
- Ibarra, H., Su, J., Dang, Z., Bultan, T., and Kemmerer, R. 2002. Counter machines and verification problems. Theor. Comput. Sci. 289, 1, 165--189. Google ScholarDigital Library
- Kanza, Y. and Sagiv, Y. 2001. Flexible queries over semistructured data. In Proceedings of the 20th ACM Symposium on Principles of Database Systems. 40--51. Google ScholarDigital Library
- Kochut, K. and Janik, M. 2007. SPARQLeR: Extended SPARQL for semantic association discovery. In Proceedings of the 4th European Semantic Web Conference. 145--159. Google ScholarDigital Library
- Kozen, D. 1977. Lower bounds for natural proof systems. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science. 254--266. Google ScholarDigital Library
- Lee, W., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., and Noy, N. 2007. Using annotations from controlled vocabularies to find meaningful associations. In Proceedings of the 4th International Workshop on Data Integration in the Life Sciences. 247--263. Google ScholarDigital Library
- Lehmann, J., Schüpell, J., and Auer, S. 2007. Discovering unknown connections---the DBpedia relationship finder. In Proceedings of the 1st SABRE Conference on Social Semantic Web. 99--110.Google Scholar
- Lenstra, H. 1983. Integer programming in a fixed number of variables. Math. Oper. Resear. 8, 4, 538--548.Google ScholarDigital Library
- Mendelzon, A. and Wood, P. 1995. Finding regular simple paths in graph databases. SIAM J. Comput. 24, 6, 1235--1258. Google ScholarDigital Library
- Milo, T. and Suciu, D. 1999. Index structures for path expressions. In Proceedings of the 7th International Conference on Database Theory. 277--295. Google ScholarDigital Library
- Papadimitriou, C. 1981. On the complexity of integer programming. J. ACM 28, 4, 765--768. Google ScholarDigital Library
- Pérez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. J. Web Semantics 8, 4, 255--270. Google ScholarDigital Library
- Robertson, E. L. 1974. Structure of complexity in the weak monadic second-order theories of the natural numbers. In Conference Record of 6th Annual ACM Symposium on Theory of Computing. 161--171. Google ScholarDigital Library
- Scarpellini, B. 1984. Complexity of subcases of Presburger arithmetic. Trans. AMS 284, 203--218.Google ScholarCross Ref
- Sheth, A., Aleman-Meza, A. B., et al. 2005. Semantic association identification and knowledge discovery for national security applications. J. Datab. Manage. 16, 1, 33--53.Google ScholarCross Ref
- Stockmeyer, L. J. 1974. The complexity of decision problems in automata theory and logic. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
- To, A. 2009. Unary finite automata vs. arithmetic progressions. Inf. Process. Lett. 109, 17, 1010--1014. Google ScholarDigital Library
- To, A. 2010. Model checking infinite-state systems: Generic and specific approaches. Ph.D. thesis, LFCS, School of Informatics, University of Edinburgh.Google Scholar
- Verma, K., Seidl, H., and Schwentick, T. 2005. On the complexity of equational Horn clauses. In Proceedings of the 20th International Conference on Automated Deduction (CADE). 337--352. Google ScholarDigital Library
- Weikum, G., Kasneci, G., Ramanath, M., and Suchanek, F. 2009. Database and information-retrieval methods for knowledge discovery. Commun. ACM 52, 4, 56--64. Google ScholarDigital Library
Index Terms
- Expressive Languages for Path Queries over Graph-Structured Data
Recommendations
Expressive languages for path queries over graph-structured data
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsFor many problems arising in the setting of graph querying (such as finding semantic associations in RDF graphs, exact and approximate pattern matching, sequence alignment, etc.), the power of standard languages such as the widely studied conjunctive ...
Expressiveness and static analysis of extended conjunctive regular path queries
We study the expressiveness and the complexity of static analysis of extended conjunctive regular path queries (ECRPQs), introduced by Barcelo et al. (2010) [3]. ECRPQs are an extension of conjunctive regular path queries (CRPQs), a well-studied ...
View-based query processing: On the relationship between rewriting, answering and losslessness
As a result of the extensive research in view-based query processing, three notions have been identified as fundamental, namely rewriting, answering, and losslessness. Answering amounts to computing the tuples satisfying the query in all databases ...
Comments