Abstract
RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with a streamlined architecture and carefully designed, puristic data structures and operations. The salient points of RDF-3X are: 1) a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, 2) a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and 3) a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths. The performance of RDF-3X, in comparison to the previously best state-of-the-art systems, has been measured on several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching and long join paths in the underlying data graphs.
- D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, 2007. Google ScholarDigital Library
- S. Alexaki et al. The ics-forth rdfsuite: Managing voluminous rdf description bases. In SemWeb, 2001.Google ScholarDigital Library
- K. Anyanwu, A. Maduko, and A. P. Sheth. Sparq21: towards support for subgraph extraction queries in rdf databases. In WWW, 2007. Google ScholarDigital Library
- S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007. Google ScholarDigital Library
- L. Baolin and H. Bo. Path queries based rdf index. In SKG, 2005.Google Scholar
- L. Baolin and H. Bo. Hprd: A high performance rdf database. In NPC, 2007. Google ScholarDigital Library
- BioPAX: Biological Pathways Exchange. http://www.biopax.org/.Google Scholar
- J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: An architecture for storing and querying rdf data and schema information. In Spinning the Semantic Web, 2003. Google ScholarDigital Library
- S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. TODS, 24(2), 1999. Google ScholarDigital Library
- S. Chaudhuri and G. Weikum. Rethinking database system architecture: Towards a self-tuning risc-style database system. In VLDB, 2000. Google ScholarDigital Library
- E. I. Chong et al. An efficient sql-based rdf querying scheme. In VLDB, 2005. Google ScholarDigital Library
- E. Chu, J. L. Beckmann, and J. F. Naughton. The case for a wide-table approach to manage sparse relational data sets. In SIGMOD, 2007. Google ScholarDigital Library
- D. DeHaan and F. W. Tompa. Optimal top-down join enumeration. In SIGMOD, 2007. Google ScholarDigital Library
- A. Eickler, C. A. Gerlhof, and D. Kossmann. A performance evaluation of oid mapping techniques. In VLDB, 1995. Google ScholarDigital Library
- C. A. Galindo-Legaria, A. Pellenkoft, and M. L. Kersten. Fast, randomized join-order selection - why use transformations? In VLDB, 1994. Google ScholarDigital Library
- L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD Explorations, 7(2), 2005. Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2), 1993. Google ScholarDigital Library
- A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, 2006. Google ScholarDigital Library
- A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007. Google ScholarDigital Library
- O. Hartig and R. Heese. The sparql query graph model for query optimization. In ESWC, 2007. Google ScholarDigital Library
- A. Hogan and A. Harth. The expertfinder corpus 2007 for the benchmarking development of expert-finding systems. In International ExpertFinder Workshop, 2007.Google Scholar
- D. Huynh, S. Mazzocchi, and D. R. Karger. Piggy bank: Experience the semantic web inside your web browser. J. Web Sem., 5(1), 2007. Google ScholarDigital Library
- Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.Google Scholar
- M. Kersten and A. P. Siebes. An organic database system. Technical report, CWI, 1999. Google ScholarDigital Library
- A. Maduko et al. Estimating the cardinality of rdf graph patterns. In WWW, 2007. Google ScholarDigital Library
- G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In VLDB, 2006. Google ScholarDigital Library
- Monetdb. http://monetdb.cwi.nl/.Google Scholar
- Openrdf. http://www.openrdf.org/index.jsp.Google Scholar
- Oracle technical network, semantic technologies center. http://www.oracle.com/technology/tech/semantic_technologies/index.html.Google Scholar
- W3C: Resource Description Framework (RDF). http://www.w3.org/RDF/.Google Scholar
- RDFizers. http://simile.mit.edu/wiki/RDFizers.Google Scholar
- ICS-FORTH RDF suite. http://athena.ics.forth.gr:9090/RDF/.Google Scholar
- F. Scholer et al. Compression of inverted indexes for fast query evaluation. In SIGIR, 2002. Google ScholarDigital Library
- P. G. Selinger et al. Access path selection in a relational database management system. In SIGMOD, 1979. Google ScholarDigital Library
- D. E. Simmen, E. J. Shekita, and T. Malkemus. Fundamental techniques for order optimization. In SIGMOD, 1996. Google ScholarDigital Library
- W3C: SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/.Google Scholar
- M. Steinbrunn et al. Bypassing joins in disjunctive queries. In VLDB, 1995. Google ScholarDigital Library
- M. Stocker et al. Sparql basic graph pattern optimization using selectivity estimation. In WWW, 2008. Google ScholarDigital Library
- M. Stonebraker et al. One size fits all? part 2: Benchmarking studies. In CIDR, 2007.Google Scholar
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007. Google ScholarDigital Library
- Y. Theoharis, V. Christophides, and G. Karvounarakis. Benchmarking database representations of rdf/s stores. In International Semantic Web Conference, 2005. Google ScholarDigital Library
- O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007. Google ScholarDigital Library
- uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google Scholar
- N. Vanetik and E. Gudes. Mining frequent labeled and partially labeled graph patterns. In ICDE, 2004. Google ScholarDigital Library
- T. Westmann et al. The implementation and performance of compressed databases. SIGMOD Record, 29(3), 2000. Google ScholarDigital Library
- K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.Google Scholar
- W3C: RDF/OWL representation of WordNet. http://www.w3.org/TR/wordnet-rdf/.Google Scholar
- Yars2. http://sw.deri.org/svn/sw/2004/06/yars.Google Scholar
- F. Zhu, X. Yan, J. Han, and P. S. Yu. gprune: A constraint pushing framework for graph pattern mining. In PAKDD, 2007. Google ScholarDigital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 2006. Google ScholarDigital Library
Index Terms
- RDF-3X: a RISC-style engine for RDF
Recommendations
x-RDF-3X: fast querying, high update rates, and consistency for RDF databases
The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition ...
The RDF-3X engine for scalable management of RDF data
RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its ...
Comments