research-article

Scalable SPARQL querying of large RDF graphs

Authors:
Jiewen Huang

Yale University

Yale University
View Profile

,
Daniel J. Abadi

Yale University

Yale University
View Profile

,
Kun Ren

Northwestern Polytechnical University, China

Northwestern Polytechnical University, China
View Profile

Proceedings of the VLDB Endowment Volume 4 Issue 11pp 1123–1134https://doi.org/10.14778/3402707.3402747

Published:01 August 2011Publication History

Proceedings of the VLDB Endowment

Abstract

The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a single node, current solutions that allow the data to be partitioned across multiple machines are highly inefficient. In this paper, we introduce a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multi-node RDF data management systems. In so doing, we introduce techniques for (1) leveraging state-of-the-art single node RDF-store technology (2) partitioning the data across nodes in a manner that helps accelerate query processing through locality optimizations and (3) decomposing SPARQL queries into high performance fragments that take advantage of how data is partitioned in a cluster.

References

METIS. http://glaros.dtc.umn.edu/gkhome/views/metis.Google Scholar
RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer, 2004.Google Scholar
SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006.Google Scholar
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422, Vienna, Austria, 2007.Google ScholarDigital Library
A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.Google ScholarDigital Library
I. Abraham, C. Gavoille, D. Malkhi, and U. Wieder. Strong-diameter decompositions of minor free graphs. SPAA '07, pages 16--24, 2007.Google ScholarDigital Library
S. Alexaki, V. Christophides, G. Karvounarakis, K. Tolle, and D. Plexousakis. The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In SemWeb, 2001.Google ScholarDigital Library
B. Awerbuch and D. Peleg. Sparse partitions. In In IEEE Symposium on Foundations of Computer Science, pages 503--513, 1990.Google ScholarDigital Library
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC, pages 54--68, 2002.Google ScholarDigital Library
A. Chan and F. Dehne. Cgmgraph/cgmlib: Implementing and testing cgm graph algorithms on pc clusters. In International Journal of High Performance Computing Applications, pages 81--97, 2003.Google ScholarCross Ref
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, pages 913--922. IEEE, 2008.Google ScholarDigital Library
E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In VLDB, pages 1216--1227, 2005.Google ScholarDigital Library
D. J. Cook, L. B. Holder, G. Galal, and R. Maglothin. Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing, 61, 2001.Google Scholar
J. Fox and B. Sudakov. Decompositions into subgraphs of small diameter. Comb. Probab. Comput., 19:753--774.Google ScholarDigital Library
Franz Inc. AllegroGraph 4.2 Introduction. http://www.franz.com/agraph/support/documentation/v4/agraph-introduction.html.Google Scholar
D. Gregor and A. Lumsdaine. The parallel bgl: A generic library for distributed graph computations. In POOSC, 2005.Google Scholar
Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Sem., 3(2--3):158--182, 2005.Google ScholarDigital Library
S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In In Proc. of PSSS'03, pages 1--15, 2003.Google Scholar
A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: a federated repository for querying graph structured data from the web. ISWC'07/ASWC'07, pages 211--224, 2007.Google ScholarCross Ref
N. Linial and M. Saks. Low diameter graph decompositions. Combinatorica, 13:441--454, 1993.Google ScholarCross Ref
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5--20, 2007 2007.Google ScholarCross Ref
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.Google ScholarDigital Library
J. Myung, J. Yeon, and S.-g. Lee. Sparql basic graph pattern processing with iterative mapreduce. MDAC '10, pages 6:1--6:6.Google Scholar
T. Neumann and G. Weikum. The rdf-3x engine for scalable management of rdf data. The VLDB Journal, 19:91--113, 2010.Google ScholarDigital Library
Ontotext. OWLIM Benchmarking: LUBM. http://www.ontotext.com/owlim/lubm.html.Google Scholar
OpenLink Software. Towards Web-Scale RDF. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSArticleWebScaleRDF.Google Scholar
K. Rohloff and R. Schantz. High-performance, massively scalable distributed systems using the mapreduce software framework: The shard triple-store. International Workshop on Programming Support Innovations for Emerging Distributed Applications, 2010.Google ScholarDigital Library
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002.Google ScholarDigital Library
L. Sidirourgos, R. Goncalves, M. L. Kersten, N. Nes, and S. Manegold. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008.Google ScholarDigital Library
J. Urbani, S. Kotoulas, E. Oren, and F. Harmelen. Scalable distributed reasoning using mapreduce. In Proceedings of the 8th International Semantic Web Conference, ISWC '09, pages 634--649, Berlin, Heidelberg, 2009. Springer-Verlag.Google ScholarDigital Library
J. Weaver and J. A. Hendler. Parallel materialization of the finite rdfs closure for hundreds of millions of triples. ISWC '09, pages 682--697, Berlin, Heidelberg, 2009. Springer-Verlag.Google ScholarDigital Library
C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow., 1:1008--1019, August 2008.Google ScholarDigital Library
K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In SWDB, pages 131--150, 2003.Google Scholar
L. Zou, L. C. 0002, and M. T. Özsu. Distancejoin: Pattern match query in a large graph database. PVLDB, pages 886--897, 2009.Google Scholar

Index Terms

Scalable SPARQL querying of large RDF graphs
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Querying distributed RDF data sources with SPARQL
ESWC'08: Proceedings of the 5th European semantic web conference on The semantic web: research and applications

Integrated access to multiple distributed and autonomous RDF data sources is a key challenge for many semantic web applications. As a reaction to this challenge, SPARQL, the W3C Recommendation for an RDF query language, supports querying of multiple RDF ...
Read More
RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration

The Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
Read More
Parallel Processing SPARQL Theta Join on Large Scale RDF Graphs
2018 IEEE Global Communications Conference (GLOBECOM)
Theta join is commonly employed in real practices. Although SPARQL is a popular RDF query language, SPARQL does not define the specification of theta join until 2013. After that, few RDF stores except the stores based on RDBMS or key-value stores can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 4, Issue 11
August 2011
520 pages
ISSN:2150-8097
Editor:
H. V. Jagadish
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2011
Published in pvldb Volume 4, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 85
  Total Citations
  View Citations
- 238
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable SPARQL querying of large RDF graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Querying distributed RDF data sources with SPARQL

RDF, Jena, SparQL and the 'Semantic Web'

Parallel Processing SPARQL Theta Join on Large Scale RDF Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scalable SPARQL querying of large RDF graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Querying distributed RDF data sources with SPARQL

RDF, Jena, SparQL and the 'Semantic Web'

Parallel Processing SPARQL Theta Join on Large Scale RDF Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media