Abstract
Despite the intense interest towards realizing the Semantic Web vision, most existing RDF data management schemes are constrained in terms of efficiency and scalability. Still, the growing popularity of the RDF format arguably calls for an effort to offset these drawbacks. Viewed from a relational-database perspective, these constraints are derived from the very nature of the RDF data model, which is based on a triple format. Recent research has attempted to address these constraints using a vertical-partitioning approach, in which separate two-column tables are constructed for each property. However, as we show, this approach suffers from similar scalability drawbacks on queries that are not bound by RDF property value. In this paper, we propose an RDF storage scheme that uses the triple nature of RDF as an asset. This scheme enhances the vertical partitioning idea and takes it to its logical conclusion. RDF data is indexed in six possible ways, one for each possible ordering of the three RDF elements. Each instance of an RDF element is associated with two vectors; each such vector gathers elements of one of the other types, along with lists of the third-type resources attached to each vector element. Hence, a sextuple-indexing scheme emerges. This format allows for quick and scalable general-purpose query processing; it confers significant advantages (up to five orders of magnitude) compared to previous approaches for RDF data management, at the price of a worst-case five-fold increase in index space. We experimentally document the advantages of our approach on real-world and synthetic data sets with practical queries.
- Longwell browser, http://simile.mit.edu/longwell.Google Scholar
- MIT Libraries Barton Catalog Data. http://simile.mit.edu/rdf-test-data/barton/.Google Scholar
- The SIMILE Project, http://simile.mit.edu/.Google Scholar
- D. J. Abadi, S. R. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, 2006. Google ScholarDigital Library
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable Semantic Web Data Management using vertical partitioning. In VLDB, 2007. Google ScholarDigital Library
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Using the Barton Libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT, 2007.Google Scholar
- D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.Google ScholarCross Ref
- S. Alexaki, V. Christophides, G. Karvounarakis, and D. Plexousakis. On storing voluminous RDF descriptions: The case of web portal catalogs. In WebDB, 2001.Google Scholar
- S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, and K. Tolle. The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In SemWeb, 2001.Google ScholarDigital Library
- R. Angles and C. Gutiérrez. Querying RDF data from a graph database perspective. In ESWC, 2005. Google ScholarDigital Library
- D. Beckett. The design and implementation of the Redland RDF application framework. In WWW, 2001. Google ScholarDigital Library
- J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In ICDE, 2006. Google ScholarDigital Library
- T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34--43, 2001.Google ScholarCross Ref
- P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Jounral, 8(2):101--119, 1999. Google ScholarDigital Library
- P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google Scholar
- V. Bönström, A. Hinze, and H. Schweppe. Storing RDF as a graph. In LA-WEB, 2003.Google ScholarCross Ref
- J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF Schema. In ISWC, 2002. Google ScholarDigital Library
- K. S. Candan, H. Liu, and R. Suvarna. Resource Description Framework: Metadata and its applications. SIGKDD Explorations Newsletter, 3(1):6--19, 2001. Google ScholarDigital Library
- J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the Semantic Web recommendations. In WWW (Alternate Track Papers & Posters), 2004. Google ScholarDigital Library
- E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An efficient SQL-based RDF querying scheme. In VLDB, 2005. Google ScholarDigital Library
- S. Dar and R. Ramakrishnan. A performance study of transitive closure algorithms. In SIGMOD, 1994. Google ScholarDigital Library
- R. V. Guha. rdfDB: An RDF database. http://www.guha.com/rdfdb/.Google Scholar
- Y. Guo, J. Heflin, and Z. Pan. Benchmarking DAML+OIL repositories. In ISWC, 2003.Google ScholarDigital Library
- Y. Guo, Z. Pan, and J. Heflin. An evaluation of knowledge base systems for large OWL datasets. In ISWC, 2004.Google ScholarCross Ref
- S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS, 2003.Google Scholar
- S. Harris and N. Shadbolt. SPARQL query processing with conventional relational database systems. In SSWS, 2005. Google ScholarDigital Library
- A. Harth and S. Decker. Optimized index structures for querying rdf from the web. In LA-WEB, 2005. Google ScholarDigital Library
- J. Hayes and C. Gutiérrez. Bipartite graphs as intermediate model for RDF. In ISWC, 2004.Google ScholarDigital Library
- S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.Google Scholar
- S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In SIGMOD, 2007. Google ScholarDigital Library
- Y. Ioannidis, R. Ramakrishnan, and L. Winger. Transitive closure algorithms based on graph traversal. ACM TODS, 18(3):512--576, 1993. Google ScholarDigital Library
- M. L. Kersten and S. Manegold. Cracking the database store. In CIDR, 2005.Google Scholar
- C. Kiefer, A. Bernstein, and M. Stocker. The fundamentals of iSPARQL - a virtual triple approach for similarity-based Semantic Web tasks. In ISWC, 2007. Google ScholarDigital Library
- Y. Kim, B. Kim, J. Lee, and H. Lim. The path index for query processing on RDF and RDF Schema. In ICACT, 2005.Google Scholar
- E. Liarou, S. Idreos, and M. Koubarakis. Continuous RDF query processing over DHTs. In ISWC, 2007. Google ScholarDigital Library
- L. Ma, Z. Su, Y. Pan, L. Zhang, and T. Liu. RStar: an RDF storage and query system for enterprise resource management. In CIKM, 2004. Google ScholarDigital Library
- F. Manola and E. Miller, editors. RDF Primer. W3C Recommendation. WWW Consortium, 2004.Google Scholar
- A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura. A path-based relational RDF database. In ADC, 2005. Google ScholarDigital Library
- Z. Pan and J. Heflin. DLDB: Extending relational databases to support Semantic Web queries. In PSSS, 2003.Google Scholar
- M. Petrovic, H. Liu, and H.-A. Jacobsen. G-ToPSS: Fast filtering of graph-based metadata. In WWW, 2005. Google ScholarDigital Library
- M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In WWW, 2008. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: a column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
- J. D. Ullman and M. Yannakakis. The input/output complexity of transitive closure. In SIGMOD, 1990. Google ScholarDigital Library
- R. Volz, D. Oberle, S. Staab, and B. Motik. KAON SERVER - A Semantic Web Management System. In WWW (Alternate Paper Tracks), 2003.Google Scholar
- K. Wilkinson. Jena property table implementation. In SSWS, 2006.Google Scholar
- K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in Jena2. In SWDB, 2003.Google Scholar
- D. Wood, P. Gearon, and T. Adams. Kowari: A platform for Semantic Web storage and analysis. In XTech, 2005.Google Scholar
Index Terms
- Hexastore: sextuple indexing for semantic web data management
Recommendations
The bio-zen plus ontology
Towards a Metaontology for the Biomedical DomainBio-zen plus is an OWL DL ontology for the domain of biomedical research. It incorporates several existing Semantic Web ontologies: the DOLCE foundational ontology, the Simple Knowledge Organisation System (SKOS), the Semantically Interlinked Online ...
The bio-zen plus ontology
Towards a Metaontology for the Biomedical DomainBio-zen plus is an OWL DL ontology for the domain of biomedical research. It incorporates several existing Semantic Web ontologies: the DOLCE foundational ontology, the Simple Knowledge Organisation System (SKOS), the Semantically Interlinked Online ...
RDFS(FA): Connecting RDF(S) and OWL DL
Semantic Web (SW) languages are supposed to be compatible with each other in a meaningful way, so as to facilitate machine understanding. Recent research, however, shows that the semantics of the standard SW annotation language RDF (as well as its ...
Comments