skip to main content
research-article

Scalable SPARQL querying of large RDF graphs

Published:01 August 2011Publication History
Skip Abstract Section

Abstract

The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a single node, current solutions that allow the data to be partitioned across multiple machines are highly inefficient. In this paper, we introduce a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multi-node RDF data management systems. In so doing, we introduce techniques for (1) leveraging state-of-the-art single node RDF-store technology (2) partitioning the data across nodes in a manner that helps accelerate query processing through locality optimizations and (3) decomposing SPARQL queries into high performance fragments that take advantage of how data is partitioned in a cluster.

References

  1. METIS. http://glaros.dtc.umn.edu/gkhome/views/metis.Google ScholarGoogle Scholar
  2. RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer, 2004.Google ScholarGoogle Scholar
  3. SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006.Google ScholarGoogle Scholar
  4. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422, Vienna, Austria, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Abraham, C. Gavoille, D. Malkhi, and U. Wieder. Strong-diameter decompositions of minor free graphs. SPAA '07, pages 16--24, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Alexaki, V. Christophides, G. Karvounarakis, K. Tolle, and D. Plexousakis. The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In SemWeb, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Awerbuch and D. Peleg. Sparse partitions. In In IEEE Symposium on Foundations of Computer Science, pages 503--513, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC, pages 54--68, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Chan and F. Dehne. Cgmgraph/cgmlib: Implementing and testing cgm graph algorithms on pc clusters. In International Journal of High Performance Computing Applications, pages 81--97, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, pages 913--922. IEEE, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In VLDB, pages 1216--1227, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. J. Cook, L. B. Holder, G. Galal, and R. Maglothin. Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing, 61, 2001.Google ScholarGoogle Scholar
  14. J. Fox and B. Sudakov. Decompositions into subgraphs of small diameter. Comb. Probab. Comput., 19:753--774.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Franz Inc. AllegroGraph 4.2 Introduction. http://www.franz.com/agraph/support/documentation/v4/agraph-introduction.html.Google ScholarGoogle Scholar
  16. D. Gregor and A. Lumsdaine. The parallel bgl: A generic library for distributed graph computations. In POOSC, 2005.Google ScholarGoogle Scholar
  17. Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Sem., 3(2--3):158--182, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In In Proc. of PSSS'03, pages 1--15, 2003.Google ScholarGoogle Scholar
  19. A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: a federated repository for querying graph structured data from the web. ISWC'07/ASWC'07, pages 211--224, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. N. Linial and M. Saks. Low diameter graph decompositions. Combinatorica, 13:441--454, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5--20, 2007 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Myung, J. Yeon, and S.-g. Lee. Sparql basic graph pattern processing with iterative mapreduce. MDAC '10, pages 6:1--6:6.Google ScholarGoogle Scholar
  24. T. Neumann and G. Weikum. The rdf-3x engine for scalable management of rdf data. The VLDB Journal, 19:91--113, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ontotext. OWLIM Benchmarking: LUBM. http://www.ontotext.com/owlim/lubm.html.Google ScholarGoogle Scholar
  26. OpenLink Software. Towards Web-Scale RDF. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSArticleWebScaleRDF.Google ScholarGoogle Scholar
  27. K. Rohloff and R. Schantz. High-performance, massively scalable distributed systems using the mapreduce software framework: The shard triple-store. International Workshop on Programming Support Innovations for Emerging Distributed Applications, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Sidirourgos, R. Goncalves, M. L. Kersten, N. Nes, and S. Manegold. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Urbani, S. Kotoulas, E. Oren, and F. Harmelen. Scalable distributed reasoning using mapreduce. In Proceedings of the 8th International Semantic Web Conference, ISWC '09, pages 634--649, Berlin, Heidelberg, 2009. Springer-Verlag.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Weaver and J. A. Hendler. Parallel materialization of the finite rdfs closure for hundreds of millions of triples. ISWC '09, pages 682--697, Berlin, Heidelberg, 2009. Springer-Verlag.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow., 1:1008--1019, August 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In SWDB, pages 131--150, 2003.Google ScholarGoogle Scholar
  34. L. Zou, L. C. 0002, and M. T. Özsu. Distancejoin: Pattern match query in a large graph database. PVLDB, pages 886--897, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Scalable SPARQL querying of large RDF graphs
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 4, Issue 11
        August 2011
        520 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2011
        Published in pvldb Volume 4, Issue 11

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader