Abstract
The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to accelerate joins by orders of magnitude, the large space overhead limits the scalability of this approach and makes it heavyweight. In this paper, we present TripleBit, a fast and compact system for storing and accessing RDF data. The design of TripleBit has three salient features. First, the compact design of TripleBit reduces both the size of stored RDF data and the size of its indexes. Second, TripleBit introduces two auxiliary index structures, ID-Chunk bit matrix and ID-Predicate bit matrix, to minimize the cost of index selection during query evaluation. Third, its query processor dynamically generates an optimal execution ordering for join queries, leading to fast query execution and effective reduction on the size of intermediate results. Our experiments show that TripleBit outperforms RDF-3X, MonetDB, BitMat on LUBM, UniProt and BTC 2012 benchmark queries and it offers orders of mangnitude performance improvement for some complex join queries.
- Semantic web challenge 2012. http://challenge.semanticweb.org/2012/.Google Scholar
- UniProt RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google Scholar
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In Proc. of VLDB 2007, pages 411-422. ACM, 2007. Google Scholar
- S. Álvarez Garcí, N. R. Brisaboa, J. D. Fernández, and M. A. Martínez-Prieto. Compressed k2-triples for full-in-memory RDF engines. In Proc. of AMCIS 2011.Google Scholar
- M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix bit loaded: A scalable lightweight join query processor for RDF data. In Proc. of WWW 2010, pages 41-50. ACM, 2010. Google Scholar
- P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. Journal of the Associanon for Computing Machinery, 28(1):25-40, 1981. Google Scholar
- V. Bonstrom, A. Hinze, and H. Schweppe. Storing RDF as a graph. In Proc. of LA-WEB 2003, pages 27-36. Google Scholar
- J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF schema. In Proc. of ISWC 2002, pages 54-68. Google Scholar
- A. Harth, J. Umbrich, A. Hogan, and S. Decker. YARS2: A federated repository for querying graph structured data from the web. In Proc. of ISWC/ASWC 2007, pages 211-224. Google Scholar
- O. Hartig and R. Heese. The SPARQL query graph model for query optimization. In Proc. of ESWC 2007, pages 564-578. Google Scholar
- J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. PVLDB, 4(11):1123-1134.Google Scholar
- M. Janik and K. Kochut. BRAHMS: A workbench RDF store and high performance memory system for semantic association discovery. In Proc. of ISWC 2005, pages 431-445. Springer, Berlin, 2005. Google Scholar
- LUBM. http://swat.cse.lehigh.edu/projects/lubm/.Google Scholar
- A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura. A path-based relational RDF database. In Proc. of 16th ADC. Google Scholar
- B. Motik, I. Horrocks, and S. M. Kim. Delta-reasoner: a semantic web reasoner for an intelligent mobile platform. In Proc. of WWW 2012. ACM, 2012. Google Scholar
- T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In Proc. of SIGMOD 2009, pages 627-639. ACM, 2009. Google Scholar
- T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1):91-113, 2010. Google Scholar
- T. Neumann and G. Weikum. x-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. PVLDB, 3(1-2):256-263, 2010. Google Scholar
- L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: Not all swans are white. PVLDB, 1(2):1553-1563, 2008. Google Scholar
- K. Stocker, D. Kossmann, R. Braumandl, and A. KemperK. Integrating semi-join-reducers into state of the art query processors. In Proc. of ICDE 2001, pages 575-584, 2001. Google Scholar
- M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proc. of WWW 2008, pages 595-604. ACM, 2008. Google Scholar
- SWEO Community Project. Linking open data on the semantic web. http://www.w3.org/wiki/SweoIG/TaskForces/ CommunityProjects/LinkingOpenData.Google Scholar
- W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, 2008.Google Scholar
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple indexing for semantic web data management. PVLDB, 1(1):1008-1019, 2008. Google Scholar
- K. Wilkinson. Jena property table implementation. In Proc. of SSWS 2006, pages 35-46, 2006.Google Scholar
- K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems, 31(1):1-38, March 2006. Google Scholar
- Y. Yan, C. Wang, A. Zhou, W. Qian, L. Ma, and Y. Pan. Efficiently querying RDF data in triple stores. In Proc. of WWW 2008, pages 1053-1054. ACM, 2008. Google Scholar
- L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL queries via subgraph matching. PVLDB, (8):482-493, 2011. Google Scholar
Index Terms
- TripleBit: a fast and compact system for large scale RDF data
Recommendations
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Combining Joint and Semi-Join Operations for Distributed Query Processing
The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Comments