skip to main content
research-article

Hexastore: sextuple indexing for semantic web data management

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

Despite the intense interest towards realizing the Semantic Web vision, most existing RDF data management schemes are constrained in terms of efficiency and scalability. Still, the growing popularity of the RDF format arguably calls for an effort to offset these drawbacks. Viewed from a relational-database perspective, these constraints are derived from the very nature of the RDF data model, which is based on a triple format. Recent research has attempted to address these constraints using a vertical-partitioning approach, in which separate two-column tables are constructed for each property. However, as we show, this approach suffers from similar scalability drawbacks on queries that are not bound by RDF property value. In this paper, we propose an RDF storage scheme that uses the triple nature of RDF as an asset. This scheme enhances the vertical partitioning idea and takes it to its logical conclusion. RDF data is indexed in six possible ways, one for each possible ordering of the three RDF elements. Each instance of an RDF element is associated with two vectors; each such vector gathers elements of one of the other types, along with lists of the third-type resources attached to each vector element. Hence, a sextuple-indexing scheme emerges. This format allows for quick and scalable general-purpose query processing; it confers significant advantages (up to five orders of magnitude) compared to previous approaches for RDF data management, at the price of a worst-case five-fold increase in index space. We experimentally document the advantages of our approach on real-world and synthetic data sets with practical queries.

References

  1. Longwell browser, http://simile.mit.edu/longwell.Google ScholarGoogle Scholar
  2. MIT Libraries Barton Catalog Data. http://simile.mit.edu/rdf-test-data/barton/.Google ScholarGoogle Scholar
  3. The SIMILE Project, http://simile.mit.edu/.Google ScholarGoogle Scholar
  4. D. J. Abadi, S. R. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable Semantic Web Data Management using vertical partitioning. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Using the Barton Libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT, 2007.Google ScholarGoogle Scholar
  7. D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Alexaki, V. Christophides, G. Karvounarakis, and D. Plexousakis. On storing voluminous RDF descriptions: The case of web portal catalogs. In WebDB, 2001.Google ScholarGoogle Scholar
  9. S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, and K. Tolle. The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In SemWeb, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Angles and C. Gutiérrez. Querying RDF data from a graph database perspective. In ESWC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Beckett. The design and implementation of the Redland RDF application framework. In WWW, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34--43, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Jounral, 8(2):101--119, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google ScholarGoogle Scholar
  16. V. Bönström, A. Hinze, and H. Schweppe. Storing RDF as a graph. In LA-WEB, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF Schema. In ISWC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. S. Candan, H. Liu, and R. Suvarna. Resource Description Framework: Metadata and its applications. SIGKDD Explorations Newsletter, 3(1):6--19, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the Semantic Web recommendations. In WWW (Alternate Track Papers & Posters), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An efficient SQL-based RDF querying scheme. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Dar and R. Ramakrishnan. A performance study of transitive closure algorithms. In SIGMOD, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. V. Guha. rdfDB: An RDF database. http://www.guha.com/rdfdb/.Google ScholarGoogle Scholar
  23. Y. Guo, J. Heflin, and Z. Pan. Benchmarking DAML+OIL repositories. In ISWC, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Guo, Z. Pan, and J. Heflin. An evaluation of knowledge base systems for large OWL datasets. In ISWC, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS, 2003.Google ScholarGoogle Scholar
  26. S. Harris and N. Shadbolt. SPARQL query processing with conventional relational database systems. In SSWS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Harth and S. Decker. Optimized index structures for querying rdf from the web. In LA-WEB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Hayes and C. Gutiérrez. Bipartite graphs as intermediate model for RDF. In ISWC, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.Google ScholarGoogle Scholar
  30. S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Ioannidis, R. Ramakrishnan, and L. Winger. Transitive closure algorithms based on graph traversal. ACM TODS, 18(3):512--576, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. L. Kersten and S. Manegold. Cracking the database store. In CIDR, 2005.Google ScholarGoogle Scholar
  33. C. Kiefer, A. Bernstein, and M. Stocker. The fundamentals of iSPARQL - a virtual triple approach for similarity-based Semantic Web tasks. In ISWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Kim, B. Kim, J. Lee, and H. Lim. The path index for query processing on RDF and RDF Schema. In ICACT, 2005.Google ScholarGoogle Scholar
  35. E. Liarou, S. Idreos, and M. Koubarakis. Continuous RDF query processing over DHTs. In ISWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Ma, Z. Su, Y. Pan, L. Zhang, and T. Liu. RStar: an RDF storage and query system for enterprise resource management. In CIKM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. Manola and E. Miller, editors. RDF Primer. W3C Recommendation. WWW Consortium, 2004.Google ScholarGoogle Scholar
  38. A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura. A path-based relational RDF database. In ADC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Z. Pan and J. Heflin. DLDB: Extending relational databases to support Semantic Web queries. In PSSS, 2003.Google ScholarGoogle Scholar
  40. M. Petrovic, H. Liu, and H.-A. Jacobsen. G-ToPSS: Fast filtering of graph-based metadata. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: a column-oriented DBMS. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. D. Ullman and M. Yannakakis. The input/output complexity of transitive closure. In SIGMOD, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Volz, D. Oberle, S. Staab, and B. Motik. KAON SERVER - A Semantic Web Management System. In WWW (Alternate Paper Tracks), 2003.Google ScholarGoogle Scholar
  45. K. Wilkinson. Jena property table implementation. In SSWS, 2006.Google ScholarGoogle Scholar
  46. K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in Jena2. In SWDB, 2003.Google ScholarGoogle Scholar
  47. D. Wood, P. Gearon, and T. Adams. Kowari: A platform for Semantic Web storage and analysis. In XTech, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Hexastore: sextuple indexing for semantic web data management

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader