article

TripleBit: a fast and compact system for large scale RDF data

Authors:
Pingpeng Yuan

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China
View Profile

,
Pu Liu

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China
View Profile

,
Buwen Wu

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China
View Profile

,
Hai Jin

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China
View Profile

,
Wenya Zhang

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China

Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China
View Profile

,
Ling Liu

Distributed Data Intensive Systems Lab., School of Computer Science, College of Computing, Georgia Institute of Technology

Distributed Data Intensive Systems Lab., School of Computer Science, College of Computing, Georgia Institute of Technology
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 7pp 517–528https://doi.org/10.14778/2536349.2536352

Published:01 May 2013Publication History

Proceedings of the VLDB Endowment

Abstract

The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to accelerate joins by orders of magnitude, the large space overhead limits the scalability of this approach and makes it heavyweight. In this paper, we present TripleBit, a fast and compact system for storing and accessing RDF data. The design of TripleBit has three salient features. First, the compact design of TripleBit reduces both the size of stored RDF data and the size of its indexes. Second, TripleBit introduces two auxiliary index structures, ID-Chunk bit matrix and ID-Predicate bit matrix, to minimize the cost of index selection during query evaluation. Third, its query processor dynamically generates an optimal execution ordering for join queries, leading to fast query execution and effective reduction on the size of intermediate results. Our experiments show that TripleBit outperforms RDF-3X, MonetDB, BitMat on LUBM, UniProt and BTC 2012 benchmark queries and it offers orders of mangnitude performance improvement for some complex join queries.

References

Semantic web challenge 2012. http://challenge.semanticweb.org/2012/.Google Scholar
UniProt RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google Scholar
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In Proc. of VLDB 2007, pages 411-422. ACM, 2007. Google Scholar
S. Álvarez Garcí, N. R. Brisaboa, J. D. Fernández, and M. A. Martínez-Prieto. Compressed k²-triples for full-in-memory RDF engines. In Proc. of AMCIS 2011.Google Scholar
M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix bit loaded: A scalable lightweight join query processor for RDF data. In Proc. of WWW 2010, pages 41-50. ACM, 2010. Google Scholar
P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. Journal of the Associanon for Computing Machinery, 28(1):25-40, 1981. Google Scholar
V. Bonstrom, A. Hinze, and H. Schweppe. Storing RDF as a graph. In Proc. of LA-WEB 2003, pages 27-36. Google Scholar
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF schema. In Proc. of ISWC 2002, pages 54-68. Google Scholar
A. Harth, J. Umbrich, A. Hogan, and S. Decker. YARS2: A federated repository for querying graph structured data from the web. In Proc. of ISWC/ASWC 2007, pages 211-224. Google Scholar
O. Hartig and R. Heese. The SPARQL query graph model for query optimization. In Proc. of ESWC 2007, pages 564-578. Google Scholar
J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. PVLDB, 4(11):1123-1134.Google Scholar
M. Janik and K. Kochut. BRAHMS: A workbench RDF store and high performance memory system for semantic association discovery. In Proc. of ISWC 2005, pages 431-445. Springer, Berlin, 2005. Google Scholar
LUBM. http://swat.cse.lehigh.edu/projects/lubm/.Google Scholar
A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura. A path-based relational RDF database. In Proc. of 16th ADC. Google Scholar
B. Motik, I. Horrocks, and S. M. Kim. Delta-reasoner: a semantic web reasoner for an intelligent mobile platform. In Proc. of WWW 2012. ACM, 2012. Google Scholar
T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In Proc. of SIGMOD 2009, pages 627-639. ACM, 2009. Google Scholar
T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1):91-113, 2010. Google Scholar
T. Neumann and G. Weikum. x-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. PVLDB, 3(1-2):256-263, 2010. Google Scholar
L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: Not all swans are white. PVLDB, 1(2):1553-1563, 2008. Google Scholar
K. Stocker, D. Kossmann, R. Braumandl, and A. KemperK. Integrating semi-join-reducers into state of the art query processors. In Proc. of ICDE 2001, pages 575-584, 2001. Google Scholar
M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proc. of WWW 2008, pages 595-604. ACM, 2008. Google Scholar
SWEO Community Project. Linking open data on the semantic web. http://www.w3.org/wiki/SweoIG/TaskForces/ CommunityProjects/LinkingOpenData.Google Scholar
W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, 2008.Google Scholar
C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple indexing for semantic web data management. PVLDB, 1(1):1008-1019, 2008. Google Scholar
K. Wilkinson. Jena property table implementation. In Proc. of SSWS 2006, pages 35-46, 2006.Google Scholar
K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems, 31(1):1-38, March 2006. Google Scholar
Y. Yan, C. Wang, A. Zhou, W. Qian, L. Ma, and Y. Pan. Efficiently querying RDF data in triple stores. In Proc. of WWW 2008, pages 1053-1054. ACM, 2008. Google Scholar
L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL queries via subgraph matching. PVLDB, (8):482-493, 2011. Google Scholar

Index Terms

TripleBit: a fast and compact system for large scale RDF data
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies
The Semantic Web
Abstract
Query answering in RDF knowledge bases has traditionally been performed either through graph saturation, i.e., adding all implicit triples to the graph, or through query reformulation, i.e., modifying the query to look for the explicit triples ...
Read More
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Read More
Combining Joint and Semi-Join Operations for Distributed Query Processing

The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 7
May 2013
48 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 May 2013
Published in pvldb Volume 6, Issue 7
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 529
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TripleBit: a fast and compact system for large scale RDF data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies

Equivalence and minimization of conjunctive queries under combined semantics

Combining Joint and Semi-Join Operations for Distributed Query Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TripleBit: a fast and compact system for large scale RDF data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies

Equivalence and minimization of conjunctive queries under combined semantics

Combining Joint and Semi-Join Operations for Distributed Query Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media