ABSTRACT
Query containment is a fundamental operation used to expedite query processing in view materialisation and query caching techniques. Since query containment has been shown to be NP-complete for arbitrary conjunctive queries on RDF graphs, we introduce a simpler form of conjunctive queries that we name f-graph queries. We first show that containment checking for f-graph queries can be solved in polynomial time. Based on this observation, we propose a novel indexing structure, named mv-index, that allows for fast containment checking between a single f-graph query and an arbitrary number of stored queries. Search is performed in polynomial time in the combined size of the query and the index. We then show how our algorithms and structures can be extended for arbitrary conjunctive queries on RDF graphs by introducing f-graph witnesses, i.e., f-graph representatives of conjunctive queries. F-graph witnesses have the following interesting property, a conjunctive query for RDF graphs is contained in another query only if its corresponding f-graph witness is also contained in it. The latter allows to use our indexing structure for the general case of conjunctive query containment. This translates in practice to microseconds or less for the containment test against hundreds of thousands of queries that are indexed within our structure.
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Sw-store: a vertically partitioned dbms for semantic web data management. VLDB J., 18(2):385--406, 2009. Google ScholarDigital Library
- F. N. Afrati, M. Damigos, and M. Gergatsoulis. Query containment under bag and bag-set semantics. Information Processing Letters, 110(10):360--369, 2010.Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, volume 2000, pages 496--505, 2000. Google ScholarDigital Library
- G. Alucc, O. Hartig, M. T. Özsu, and K. Daudjee. Diversified stress testing of rdf data management systems. In ISWC, pages 197--212, 2014. Google ScholarDigital Library
- R. Angles and C. Gutierrez. The expressive power of sparql. ISWC, pages 114--129, 2008. Google ScholarDigital Library
- M. Arenas, B. C. Grau, E. Kharlamov, S. Marciuska, and D. Zheleznyakov. Faceted search over rdf-based knowledge graphs. J. Web Sem., 37--38:55--74, 2016. Google ScholarDigital Library
- M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix bit loaded: a scalable lightweight join query processor for rdf data. In WWW, pages 41--50. ACM, 2010.Google Scholar
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. ISWC/ASWC, pages 722--735, 2007. Google ScholarDigital Library
- C. Bizer and A. Schultz. The berlin sparql benchmark. International Journal on Semantic Web and Information Systems (IJSWIS), 5(2):1--24, 2009.Google ScholarCross Ref
- D. Bleco and Y. Kotidis. Graph analytics on massive collections of small graphs. In EDBT, pages 523--534. Citeseer, 2014.Google Scholar
- D. Bleco and Y. Kotidis. Using entropy metrics for pruning very large graph cubes. Information Systems, 81:49--62, 2019.Google ScholarDigital Library
- A. Bonifati, W. Martens, and T. Timm. An analytical study of large sparql query logs. PVLDB, 11(2):149--161, 2017. Google ScholarDigital Library
- S. Borgwardt, T. Mailis, R. Pe naloza, and A.-Y. Turhan. Answering fuzzy conjunctive queries over finitely valued fuzzy ontologies. Journal on Data Semantics, 5(2):55--75, 2016.Google ScholarCross Ref
- D. Brickley and R. V. Guha. Rdf vocabulary description language 1.0: Rdf schema. https://www.w3.org/TR/rdf-schema/, 2004.Google Scholar
- J. Broekstra, A. Kampman, and F. Van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In ISWC, pages 54--68, 2002. Google ScholarDigital Library
- B. Cautis and E. Kharlamov. Answering queries using views over probabilistic XML: complexity and tractability. PVLDB, 5(11):1148--1159, 2012. Google ScholarDigital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC. ACM, 1977. Google ScholarDigital Library
- S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS, pages 59--70. ACM, 1993. Google ScholarDigital Library
- M. W. Chekol, J. Euzenat, P. Genevès, and N. Laya"ida. Sparql query containment under rdfs entailment regime. In IJCAR, 2012. Google ScholarDigital Library
- C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. ICDT, pages 56--70, 1997. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009. Google ScholarDigital Library
- S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava, M. Tan, et al. Semantic data caching and replacement. VLDB, 96:330--341, 1996. Google ScholarDigital Library
- V. Dritsou, P. Constantopoulos, A. Deligiannakis, and Y. Kotidis. Optimizing query shortcuts in rdf databases. ESWC, pages 77--92, 2011. Google ScholarDigital Library
- O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. The ldbc social network benchmark: Interactive workload. In SIGMOD, pages 619--630. ACM, 2015. Google ScholarDigital Library
- R. Giugno, V. Bonnici, N. Bombieri, A. Pulvirenti, A. Ferro, and D. Shasha. Grapes: A software for parallel searching on biological graphs targeting multi-core architectures. PloS one, 8(10):e76911, 2013.Google ScholarCross Ref
- F. Goasdoué, K. Karanasos, J. Leblay, and I. Manolescu. View selection in semantic web databases. PVLDB, 5(2):97--108, 2011. Google ScholarDigital Library
- G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. Journal of the ACM, 48(3):431--498, 2001. Google ScholarDigital Library
- Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Semant., 3(2--3):158--182, 2005. Google ScholarDigital Library
- A. Gupta, I. S. Mumick, et al. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Eng. Bull., 18(2):3--18, 1995.Google Scholar
- S. Gurajada, S. Seufert, I. Miliaraki, and M. Theobald. Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In SIGMOD, pages 289--300. ACM, 2014. Google ScholarDigital Library
- C. Gutierrez, C. A. Hurtado, A. O. Mendelzon, and J. Pérez. Foundations of semantic web databases. JCSS, 77(3):520--541, 2011. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. ACM SIGMOD Record, 25:205--216, 1996. Google ScholarDigital Library
- J. Huang, D. J. Abadi, and K. Ren. Scalable sparql querying of large rdf graphs. PVLDB, 4(11):1123--1134, 2011.Google ScholarDigital Library
- Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. TODS, 20(3):288--324, 1995. Google ScholarDigital Library
- A. Jena. semantic web framework for java, 2007.Google Scholar
- D. S. Johnson and A. Klug. Optimizing conjunctive queries that contain untyped variables. SIAM Journal on Computing, 12(4):616--640, 1983.Google ScholarDigital Library
- G. Karvounarakis, A. Magganaraki, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, and K. Tolle. Querying the semantic web with rql. Computer networks, 42(5):617--640, 2003. Google ScholarDigital Library
- E. Kharlamov, S. Brandt, E. Jimé nez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. L. Ö zcc ep, C. Pinkel, C. Svingos, D. Zheleznyakov, I. Horrocks, Y. E. Ioannidis, and R. Mö ller. Ontology-based integration of streaming and static relational data with optique. In SIGMOD, pages 2109--2112, 2016. Google ScholarDigital Library
- E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö. Özcc ep, C. Svingos, D. Zheleznyakov, S. Brandt, I. Horrocks, Y. E. Ioannidis, S. Lamparter, and R. Möller. Towards analytics aware ontology based access to static and streaming data. In ISWC, pages 344--362, 2016.Google ScholarDigital Library
- E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö. Özcc ep, C. Svingos, D. Zheleznyakov, Y. Ioannnidis, S. Lamparter, R. Möller, and A. Waaler. An ontology-mediated analytics-aware approach to support monitoring and diagnostics of static and streaming data. J. Web Semant., 2019.Google Scholar
- K. Klein, N. Kriege, and P. Mutzel. Ct-index: Fingerprint-based graph indexing combining cycles and trees. In ICDE, pages 1115--1126, 2011. Google ScholarDigital Library
- A. Klug. On conjunctive queries containing inequalities. JACM, 35(1):146--160, 1988. Google ScholarDigital Library
- Y. Kotidis and N. Roussopoulos. A case for dynamic view management. TODS, 26(4):388--423, 2001. Google ScholarDigital Library
- A. Letelier, J. Pérez, R. Pichler, and S. Skritek. Static analysis and optimization of semantic web queries. TODS, 38(4):25, 2013. Google ScholarDigital Library
- A. Y. Levy, A. O. Mendelzon, and Y. Sagiv. Answering queries using views. In PODS, pages 95--104. ACM, 1995. Google ScholarDigital Library
- T. Mailis, G. Stoilos, and G. Stamou. Expressive reasoning with horn rules and fuzzy description logics. Knowledge and information systems, 25(1):105--136, 2010.Google Scholar
- S. Malyshev, M. Krötzsch, L. González, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In ISWC, pages 376--394, 2018.Google ScholarDigital Library
- B. McBride. Jena: Implementing the rdf model and syntax specification. In ISWC, pages 23--28. CEUR-WS. org, 2001.Google Scholar
- M. Meimaris, G. Papastefanatos, N. Mamoulis, and I. Anagnostopoulos. Extended characteristic sets: graph indexing for sparql query optimization. In ICDE, pages 497--508, 2017.Google ScholarCross Ref
- H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In ACM SIGMOD Record, volume 30, pages 307--318. ACM, 2001. Google ScholarDigital Library
- K. Morfonios, S. Konakas, Y. Ioannidis, and N. Kotsis. Rolap implementations of the data cube. ACM Computing Surveys (CSUR), 39(4):12, 2007. Google ScholarDigital Library
- D. R. Morrison. Patricia--practical algorithm to retrieve information coded in alphanumeric. JACM, 15(4):514--534, 1968. Google ScholarDigital Library
- Y. Nenov, R. Piro, B. Motik, I. Horrocks, Z. Wu, and J. Banerjee. Rdfox: A highly-scalable rdf store. In ISWC, pages 3--20, 2015.Google ScholarDigital Library
- T. Neumann and G. Weikum. x-rdf-3x: fast querying, high update rates, and consistency for rdf databases. PVLDB, 3(1--2):256--263, 2010. Google ScholarDigital Library
- J. Neyman. X--outline of a theory of statistical estimation based on the classical theory of probability. Phil. Trans. R. Soc. Lond. A, 236(767):333--380, 1937.Google ScholarCross Ref
- N. Papailiou, D. Tsoumakos, P. Karras, and N. Koziris. Graph-aware, workload-adaptive sparql query caching. In SIGMOD, pages 1777--1792. ACM, 2015.Google ScholarDigital Library
- J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of sparql. In ISWC, pages 30--43, 2006. Google ScholarDigital Library
- F. Picalausa and S. Vansummeren. What are real sparql queries like? In SWIM, page 7. ACM, 2011. Google ScholarDigital Library
- R. Pichler and S. Skritek. Containment and equivalence of well-designed sparql. In PODS, pages 39--50. ACM, 2014. Google ScholarDigital Library
- A. Polleres. From sparql to rules (and back). In WWW, pages 787--796. ACM, 2007.Google ScholarDigital Library
- E. Prud, A. Seaborne, et al. Sparql query language for rdf. https://www.w3.org/TR/rdf-sparql-query/, 2006.Google Scholar
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. JACM, 27(4):633--655, 1980. Google ScholarDigital Library
- G. Serfiotis, I. Koffina, V. Christophides, and V. Tannen. Containment and minimization of rdf/s query patterns. In ISWC, 2005. Google ScholarDigital Library
- L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008. Google ScholarDigital Library
- A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jimé nez-Ruiz, M. Giese, M. G. Skjæveland, D. Hovland, R. Schlatte, S. Brandt, H. Lie, and I. Horrocks. Optiquevqs: A visual query system over ontologies for industry. Semantic Web, 9(5):627--660, 2018.Google ScholarDigital Library
- V. Spyropoulos and Y. K. Kotidis. Digree: Building a distributed graph processing engine out of single-node graph database installations. ACM SIGMOD Record, 46(4):22--27, 2018.Google ScholarDigital Library
- C. Svingos, T. Mailis, H. Kllapi, L. Stamatogiannakis, Y. Kotidis, and Y. Ioannidis. Real time processing of streaming and static information. In IEEE Big Data, pages 410--415, 2016.Google ScholarCross Ref
- T. Tran, G. Ladwig, and S. Rudolph. Managing structured and semistructured rdf data using structure indexes. IEEE Transactions on Knowledge and Data Engineering, 25(9):2076--2089, 2013. Google ScholarDigital Library
- J. Wang, Z. Liu, S. Ma, N. Ntarmos, and P. Triantafillou. GC: A graph caching system for subgraph/supergraph queries. PVLDB, 11(12):2022--2025, 2018. Google ScholarDigital Library
- J. Wang, N. Ntarmos, and P. Triantafillou. Indexing query graphs to speedup graph query processing. In EDBT, pages 41--52, 2016.Google Scholar
- J. Wang, N. Ntarmos, and P. Triantafillou. Graphcache: A caching system for graph queries. In EDBT, pages 13--24, 2017.Google Scholar
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008. Google ScholarDigital Library
- M. Wudage, J. Euzenat, P. Genevès, and N. Layaida. Sparql query containment under shi axioms. In Proceedings 26th AAAI Conference on Artificial Intelligence, pages 10--16, 2012. Google ScholarDigital Library
- M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, volume 81, pages 82--94, 1981.Google Scholar
- X. Zhang, L. Chen, Y. Tong, and M. Wang. Eagre: Towards scalable i/o efficient sparql query evaluation on the cloud. In ICDE, 2013. Google ScholarDigital Library
Index Terms
- An Efficient Index for RDF Query Containment
Recommendations
Query containment under bag and bag-set semantics
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...
Decidable containment of recursive queries
Database theoryOne of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query ...
Conjunctive query containment and answering under description logic constraints
Query containment and query answering are two important computational tasks in databases. While query answering amounts to computing the result of a query over a database, query containment is the problem of checking whether, for every database, the ...
Comments