Abstract
The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a single node, current solutions that allow the data to be partitioned across multiple machines are highly inefficient. In this paper, we introduce a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multi-node RDF data management systems. In so doing, we introduce techniques for (1) leveraging state-of-the-art single node RDF-store technology (2) partitioning the data across nodes in a manner that helps accelerate query processing through locality optimizations and (3) decomposing SPARQL queries into high performance fragments that take advantage of how data is partitioned in a cluster.
- METIS. http://glaros.dtc.umn.edu/gkhome/views/metis.Google Scholar
- RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer, 2004.Google Scholar
- SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006.Google Scholar
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422, Vienna, Austria, 2007.Google ScholarDigital Library
- A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.Google ScholarDigital Library
- I. Abraham, C. Gavoille, D. Malkhi, and U. Wieder. Strong-diameter decompositions of minor free graphs. SPAA '07, pages 16--24, 2007.Google ScholarDigital Library
- S. Alexaki, V. Christophides, G. Karvounarakis, K. Tolle, and D. Plexousakis. The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In SemWeb, 2001.Google ScholarDigital Library
- B. Awerbuch and D. Peleg. Sparse partitions. In In IEEE Symposium on Foundations of Computer Science, pages 503--513, 1990.Google ScholarDigital Library
- J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC, pages 54--68, 2002.Google ScholarDigital Library
- A. Chan and F. Dehne. Cgmgraph/cgmlib: Implementing and testing cgm graph algorithms on pc clusters. In International Journal of High Performance Computing Applications, pages 81--97, 2003.Google ScholarCross Ref
- J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, pages 913--922. IEEE, 2008.Google ScholarDigital Library
- E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In VLDB, pages 1216--1227, 2005.Google ScholarDigital Library
- D. J. Cook, L. B. Holder, G. Galal, and R. Maglothin. Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing, 61, 2001.Google Scholar
- J. Fox and B. Sudakov. Decompositions into subgraphs of small diameter. Comb. Probab. Comput., 19:753--774.Google ScholarDigital Library
- Franz Inc. AllegroGraph 4.2 Introduction. http://www.franz.com/agraph/support/documentation/v4/agraph-introduction.html.Google Scholar
- D. Gregor and A. Lumsdaine. The parallel bgl: A generic library for distributed graph computations. In POOSC, 2005.Google Scholar
- Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Sem., 3(2--3):158--182, 2005.Google ScholarDigital Library
- S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In In Proc. of PSSS'03, pages 1--15, 2003.Google Scholar
- A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: a federated repository for querying graph structured data from the web. ISWC'07/ASWC'07, pages 211--224, 2007.Google ScholarCross Ref
- N. Linial and M. Saks. Low diameter graph decompositions. Combinatorica, 13:441--454, 1993.Google ScholarCross Ref
- A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5--20, 2007 2007.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.Google ScholarDigital Library
- J. Myung, J. Yeon, and S.-g. Lee. Sparql basic graph pattern processing with iterative mapreduce. MDAC '10, pages 6:1--6:6.Google Scholar
- T. Neumann and G. Weikum. The rdf-3x engine for scalable management of rdf data. The VLDB Journal, 19:91--113, 2010.Google ScholarDigital Library
- Ontotext. OWLIM Benchmarking: LUBM. http://www.ontotext.com/owlim/lubm.html.Google Scholar
- OpenLink Software. Towards Web-Scale RDF. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSArticleWebScaleRDF.Google Scholar
- K. Rohloff and R. Schantz. High-performance, massively scalable distributed systems using the mapreduce software framework: The shard triple-store. International Workshop on Programming Support Innovations for Emerging Distributed Applications, 2010.Google ScholarDigital Library
- D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002.Google ScholarDigital Library
- L. Sidirourgos, R. Goncalves, M. L. Kersten, N. Nes, and S. Manegold. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008.Google ScholarDigital Library
- J. Urbani, S. Kotoulas, E. Oren, and F. Harmelen. Scalable distributed reasoning using mapreduce. In Proceedings of the 8th International Semantic Web Conference, ISWC '09, pages 634--649, Berlin, Heidelberg, 2009. Springer-Verlag.Google ScholarDigital Library
- J. Weaver and J. A. Hendler. Parallel materialization of the finite rdfs closure for hundreds of millions of triples. ISWC '09, pages 682--697, Berlin, Heidelberg, 2009. Springer-Verlag.Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow., 1:1008--1019, August 2008.Google ScholarDigital Library
- K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In SWDB, pages 131--150, 2003.Google Scholar
- L. Zou, L. C. 0002, and M. T. Özsu. Distancejoin: Pattern match query in a large graph database. PVLDB, pages 886--897, 2009.Google Scholar
Index Terms
- Scalable SPARQL querying of large RDF graphs
Recommendations
Querying distributed RDF data sources with SPARQL
ESWC'08: Proceedings of the 5th European semantic web conference on The semantic web: research and applicationsIntegrated access to multiple distributed and autonomous RDF data sources is a key challenge for many semantic web applications. As a reaction to this challenge, SPARQL, the W3C Recommendation for an RDF query language, supports querying of multiple RDF ...
RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaborationThe Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
Parallel Processing SPARQL Theta Join on Large Scale RDF Graphs
2018 IEEE Global Communications Conference (GLOBECOM)Theta join is commonly employed in real practices. Although SPARQL is a popular RDF query language, SPARQL does not define the specification of theta join until 2013. After that, few RDF stores except the stores based on RDBMS or key-value stores can ...
Comments