skip to main content
10.1145/3299869.3319864acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

An Efficient Index for RDF Query Containment

Published:25 June 2019Publication History

ABSTRACT

Query containment is a fundamental operation used to expedite query processing in view materialisation and query caching techniques. Since query containment has been shown to be NP-complete for arbitrary conjunctive queries on RDF graphs, we introduce a simpler form of conjunctive queries that we name f-graph queries. We first show that containment checking for f-graph queries can be solved in polynomial time. Based on this observation, we propose a novel indexing structure, named mv-index, that allows for fast containment checking between a single f-graph query and an arbitrary number of stored queries. Search is performed in polynomial time in the combined size of the query and the index. We then show how our algorithms and structures can be extended for arbitrary conjunctive queries on RDF graphs by introducing f-graph witnesses, i.e., f-graph representatives of conjunctive queries. F-graph witnesses have the following interesting property, a conjunctive query for RDF graphs is contained in another query only if its corresponding f-graph witness is also contained in it. The latter allows to use our indexing structure for the general case of conjunctive query containment. This translates in practice to microseconds or less for the containment test against hundreds of thousands of queries that are indexed within our structure.

References

  1. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Sw-store: a vertically partitioned dbms for semantic web data management. VLDB J., 18(2):385--406, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. N. Afrati, M. Damigos, and M. Gergatsoulis. Query containment under bag and bag-set semantics. Information Processing Letters, 110(10):360--369, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, volume 2000, pages 496--505, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Alucc, O. Hartig, M. T. Özsu, and K. Daudjee. Diversified stress testing of rdf data management systems. In ISWC, pages 197--212, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Angles and C. Gutierrez. The expressive power of sparql. ISWC, pages 114--129, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Arenas, B. C. Grau, E. Kharlamov, S. Marciuska, and D. Zheleznyakov. Faceted search over rdf-based knowledge graphs. J. Web Sem., 37--38:55--74, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix bit loaded: a scalable lightweight join query processor for rdf data. In WWW, pages 41--50. ACM, 2010.Google ScholarGoogle Scholar
  8. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. ISWC/ASWC, pages 722--735, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Bizer and A. Schultz. The berlin sparql benchmark. International Journal on Semantic Web and Information Systems (IJSWIS), 5(2):1--24, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. D. Bleco and Y. Kotidis. Graph analytics on massive collections of small graphs. In EDBT, pages 523--534. Citeseer, 2014.Google ScholarGoogle Scholar
  11. D. Bleco and Y. Kotidis. Using entropy metrics for pruning very large graph cubes. Information Systems, 81:49--62, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Bonifati, W. Martens, and T. Timm. An analytical study of large sparql query logs. PVLDB, 11(2):149--161, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Borgwardt, T. Mailis, R. Pe naloza, and A.-Y. Turhan. Answering fuzzy conjunctive queries over finitely valued fuzzy ontologies. Journal on Data Semantics, 5(2):55--75, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. Brickley and R. V. Guha. Rdf vocabulary description language 1.0: Rdf schema. https://www.w3.org/TR/rdf-schema/, 2004.Google ScholarGoogle Scholar
  15. J. Broekstra, A. Kampman, and F. Van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In ISWC, pages 54--68, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Cautis and E. Kharlamov. Answering queries using views over probabilistic XML: complexity and tractability. PVLDB, 5(11):1148--1159, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC. ACM, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS, pages 59--70. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. W. Chekol, J. Euzenat, P. Genevès, and N. Laya"ida. Sparql query containment under rdfs entailment regime. In IJCAR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. ICDT, pages 56--70, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava, M. Tan, et al. Semantic data caching and replacement. VLDB, 96:330--341, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Dritsou, P. Constantopoulos, A. Deligiannakis, and Y. Kotidis. Optimizing query shortcuts in rdf databases. ESWC, pages 77--92, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. The ldbc social network benchmark: Interactive workload. In SIGMOD, pages 619--630. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Giugno, V. Bonnici, N. Bombieri, A. Pulvirenti, A. Ferro, and D. Shasha. Grapes: A software for parallel searching on biological graphs targeting multi-core architectures. PloS one, 8(10):e76911, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  26. F. Goasdoué, K. Karanasos, J. Leblay, and I. Manolescu. View selection in semantic web databases. PVLDB, 5(2):97--108, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. Journal of the ACM, 48(3):431--498, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Semant., 3(2--3):158--182, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Gupta, I. S. Mumick, et al. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Eng. Bull., 18(2):3--18, 1995.Google ScholarGoogle Scholar
  30. S. Gurajada, S. Seufert, I. Miliaraki, and M. Theobald. Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In SIGMOD, pages 289--300. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Gutierrez, C. A. Hurtado, A. O. Mendelzon, and J. Pérez. Foundations of semantic web databases. JCSS, 77(3):520--541, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. ACM SIGMOD Record, 25:205--216, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Huang, D. J. Abadi, and K. Ren. Scalable sparql querying of large rdf graphs. PVLDB, 4(11):1123--1134, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. TODS, 20(3):288--324, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Jena. semantic web framework for java, 2007.Google ScholarGoogle Scholar
  36. D. S. Johnson and A. Klug. Optimizing conjunctive queries that contain untyped variables. SIAM Journal on Computing, 12(4):616--640, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Karvounarakis, A. Magganaraki, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, and K. Tolle. Querying the semantic web with rql. Computer networks, 42(5):617--640, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. E. Kharlamov, S. Brandt, E. Jimé nez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. L. Ö zcc ep, C. Pinkel, C. Svingos, D. Zheleznyakov, I. Horrocks, Y. E. Ioannidis, and R. Mö ller. Ontology-based integration of streaming and static relational data with optique. In SIGMOD, pages 2109--2112, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö. Özcc ep, C. Svingos, D. Zheleznyakov, S. Brandt, I. Horrocks, Y. E. Ioannidis, S. Lamparter, and R. Möller. Towards analytics aware ontology based access to static and streaming data. In ISWC, pages 344--362, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö. Özcc ep, C. Svingos, D. Zheleznyakov, Y. Ioannnidis, S. Lamparter, R. Möller, and A. Waaler. An ontology-mediated analytics-aware approach to support monitoring and diagnostics of static and streaming data. J. Web Semant., 2019.Google ScholarGoogle Scholar
  41. K. Klein, N. Kriege, and P. Mutzel. Ct-index: Fingerprint-based graph indexing combining cycles and trees. In ICDE, pages 1115--1126, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Klug. On conjunctive queries containing inequalities. JACM, 35(1):146--160, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Y. Kotidis and N. Roussopoulos. A case for dynamic view management. TODS, 26(4):388--423, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Letelier, J. Pérez, R. Pichler, and S. Skritek. Static analysis and optimization of semantic web queries. TODS, 38(4):25, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. Y. Levy, A. O. Mendelzon, and Y. Sagiv. Answering queries using views. In PODS, pages 95--104. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. T. Mailis, G. Stoilos, and G. Stamou. Expressive reasoning with horn rules and fuzzy description logics. Knowledge and information systems, 25(1):105--136, 2010.Google ScholarGoogle Scholar
  47. S. Malyshev, M. Krötzsch, L. González, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In ISWC, pages 376--394, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. B. McBride. Jena: Implementing the rdf model and syntax specification. In ISWC, pages 23--28. CEUR-WS. org, 2001.Google ScholarGoogle Scholar
  49. M. Meimaris, G. Papastefanatos, N. Mamoulis, and I. Anagnostopoulos. Extended characteristic sets: graph indexing for sparql query optimization. In ICDE, pages 497--508, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  50. H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In ACM SIGMOD Record, volume 30, pages 307--318. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. K. Morfonios, S. Konakas, Y. Ioannidis, and N. Kotsis. Rolap implementations of the data cube. ACM Computing Surveys (CSUR), 39(4):12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. D. R. Morrison. Patricia--practical algorithm to retrieve information coded in alphanumeric. JACM, 15(4):514--534, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Y. Nenov, R. Piro, B. Motik, I. Horrocks, Z. Wu, and J. Banerjee. Rdfox: A highly-scalable rdf store. In ISWC, pages 3--20, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. T. Neumann and G. Weikum. x-rdf-3x: fast querying, high update rates, and consistency for rdf databases. PVLDB, 3(1--2):256--263, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Neyman. X--outline of a theory of statistical estimation based on the classical theory of probability. Phil. Trans. R. Soc. Lond. A, 236(767):333--380, 1937.Google ScholarGoogle ScholarCross RefCross Ref
  56. N. Papailiou, D. Tsoumakos, P. Karras, and N. Koziris. Graph-aware, workload-adaptive sparql query caching. In SIGMOD, pages 1777--1792. ACM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of sparql. In ISWC, pages 30--43, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. F. Picalausa and S. Vansummeren. What are real sparql queries like? In SWIM, page 7. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. R. Pichler and S. Skritek. Containment and equivalence of well-designed sparql. In PODS, pages 39--50. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. A. Polleres. From sparql to rules (and back). In WWW, pages 787--796. ACM, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. E. Prud, A. Seaborne, et al. Sparql query language for rdf. https://www.w3.org/TR/rdf-sparql-query/, 2006.Google ScholarGoogle Scholar
  62. Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. JACM, 27(4):633--655, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. G. Serfiotis, I. Koffina, V. Christophides, and V. Tannen. Containment and minimization of rdf/s query patterns. In ISWC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jimé nez-Ruiz, M. Giese, M. G. Skjæveland, D. Hovland, R. Schlatte, S. Brandt, H. Lie, and I. Horrocks. Optiquevqs: A visual query system over ontologies for industry. Semantic Web, 9(5):627--660, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. V. Spyropoulos and Y. K. Kotidis. Digree: Building a distributed graph processing engine out of single-node graph database installations. ACM SIGMOD Record, 46(4):22--27, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. C. Svingos, T. Mailis, H. Kllapi, L. Stamatogiannakis, Y. Kotidis, and Y. Ioannidis. Real time processing of streaming and static information. In IEEE Big Data, pages 410--415, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  68. T. Tran, G. Ladwig, and S. Rudolph. Managing structured and semistructured rdf data using structure indexes. IEEE Transactions on Knowledge and Data Engineering, 25(9):2076--2089, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. Wang, Z. Liu, S. Ma, N. Ntarmos, and P. Triantafillou. GC: A graph caching system for subgraph/supergraph queries. PVLDB, 11(12):2022--2025, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. J. Wang, N. Ntarmos, and P. Triantafillou. Indexing query graphs to speedup graph query processing. In EDBT, pages 41--52, 2016.Google ScholarGoogle Scholar
  71. J. Wang, N. Ntarmos, and P. Triantafillou. Graphcache: A caching system for graph queries. In EDBT, pages 13--24, 2017.Google ScholarGoogle Scholar
  72. C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. M. Wudage, J. Euzenat, P. Genevès, and N. Layaida. Sparql query containment under shi axioms. In Proceedings 26th AAAI Conference on Artificial Intelligence, pages 10--16, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, volume 81, pages 82--94, 1981.Google ScholarGoogle Scholar
  75. X. Zhang, L. Chen, Y. Tong, and M. Wang. Eagre: Towards scalable i/o efficient sparql query evaluation on the cloud. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Efficient Index for RDF Query Containment

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
                    June 2019
                    2106 pages
                    ISBN:9781450356435
                    DOI:10.1145/3299869

                    Copyright © 2019 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 25 June 2019

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader