skip to main content
article

TripleBit: a fast and compact system for large scale RDF data

Authors Info & Claims
Published:01 May 2013Publication History
Skip Abstract Section

Abstract

The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to accelerate joins by orders of magnitude, the large space overhead limits the scalability of this approach and makes it heavyweight. In this paper, we present TripleBit, a fast and compact system for storing and accessing RDF data. The design of TripleBit has three salient features. First, the compact design of TripleBit reduces both the size of stored RDF data and the size of its indexes. Second, TripleBit introduces two auxiliary index structures, ID-Chunk bit matrix and ID-Predicate bit matrix, to minimize the cost of index selection during query evaluation. Third, its query processor dynamically generates an optimal execution ordering for join queries, leading to fast query execution and effective reduction on the size of intermediate results. Our experiments show that TripleBit outperforms RDF-3X, MonetDB, BitMat on LUBM, UniProt and BTC 2012 benchmark queries and it offers orders of mangnitude performance improvement for some complex join queries.

References

  1. Semantic web challenge 2012. http://challenge.semanticweb.org/2012/.Google ScholarGoogle Scholar
  2. UniProt RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google ScholarGoogle Scholar
  3. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In Proc. of VLDB 2007, pages 411-422. ACM, 2007. Google ScholarGoogle Scholar
  4. S. Álvarez Garcí, N. R. Brisaboa, J. D. Fernández, and M. A. Martínez-Prieto. Compressed k2-triples for full-in-memory RDF engines. In Proc. of AMCIS 2011.Google ScholarGoogle Scholar
  5. M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix bit loaded: A scalable lightweight join query processor for RDF data. In Proc. of WWW 2010, pages 41-50. ACM, 2010. Google ScholarGoogle Scholar
  6. P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. Journal of the Associanon for Computing Machinery, 28(1):25-40, 1981. Google ScholarGoogle Scholar
  7. V. Bonstrom, A. Hinze, and H. Schweppe. Storing RDF as a graph. In Proc. of LA-WEB 2003, pages 27-36. Google ScholarGoogle Scholar
  8. J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF schema. In Proc. of ISWC 2002, pages 54-68. Google ScholarGoogle Scholar
  9. A. Harth, J. Umbrich, A. Hogan, and S. Decker. YARS2: A federated repository for querying graph structured data from the web. In Proc. of ISWC/ASWC 2007, pages 211-224. Google ScholarGoogle Scholar
  10. O. Hartig and R. Heese. The SPARQL query graph model for query optimization. In Proc. of ESWC 2007, pages 564-578. Google ScholarGoogle Scholar
  11. J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. PVLDB, 4(11):1123-1134.Google ScholarGoogle Scholar
  12. M. Janik and K. Kochut. BRAHMS: A workbench RDF store and high performance memory system for semantic association discovery. In Proc. of ISWC 2005, pages 431-445. Springer, Berlin, 2005. Google ScholarGoogle Scholar
  13. LUBM. http://swat.cse.lehigh.edu/projects/lubm/.Google ScholarGoogle Scholar
  14. A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura. A path-based relational RDF database. In Proc. of 16th ADC. Google ScholarGoogle Scholar
  15. B. Motik, I. Horrocks, and S. M. Kim. Delta-reasoner: a semantic web reasoner for an intelligent mobile platform. In Proc. of WWW 2012. ACM, 2012. Google ScholarGoogle Scholar
  16. T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In Proc. of SIGMOD 2009, pages 627-639. ACM, 2009. Google ScholarGoogle Scholar
  17. T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1):91-113, 2010. Google ScholarGoogle Scholar
  18. T. Neumann and G. Weikum. x-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. PVLDB, 3(1-2):256-263, 2010. Google ScholarGoogle Scholar
  19. L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: Not all swans are white. PVLDB, 1(2):1553-1563, 2008. Google ScholarGoogle Scholar
  20. K. Stocker, D. Kossmann, R. Braumandl, and A. KemperK. Integrating semi-join-reducers into state of the art query processors. In Proc. of ICDE 2001, pages 575-584, 2001. Google ScholarGoogle Scholar
  21. M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proc. of WWW 2008, pages 595-604. ACM, 2008. Google ScholarGoogle Scholar
  22. SWEO Community Project. Linking open data on the semantic web. http://www.w3.org/wiki/SweoIG/TaskForces/ CommunityProjects/LinkingOpenData.Google ScholarGoogle Scholar
  23. W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, 2008.Google ScholarGoogle Scholar
  24. C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple indexing for semantic web data management. PVLDB, 1(1):1008-1019, 2008. Google ScholarGoogle Scholar
  25. K. Wilkinson. Jena property table implementation. In Proc. of SSWS 2006, pages 35-46, 2006.Google ScholarGoogle Scholar
  26. K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems, 31(1):1-38, March 2006. Google ScholarGoogle Scholar
  27. Y. Yan, C. Wang, A. Zhou, W. Qian, L. Ma, and Y. Pan. Efficiently querying RDF data in triple stores. In Proc. of WWW 2008, pages 1053-1054. ACM, 2008. Google ScholarGoogle Scholar
  28. L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL queries via subgraph matching. PVLDB, (8):482-493, 2011. Google ScholarGoogle Scholar

Index Terms

  1. TripleBit: a fast and compact system for large scale RDF data
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 7
        May 2013
        48 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 May 2013
        Published in pvldb Volume 6, Issue 7

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader