skip to main content
research-article

RDF-3X: a RISC-style engine for RDF

Authors Info & Claims
Published:01 August 2008Publication History
Skip Abstract Section

Abstract

RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with a streamlined architecture and carefully designed, puristic data structures and operations. The salient points of RDF-3X are: 1) a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, 2) a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and 3) a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths. The performance of RDF-3X, in comparison to the previously best state-of-the-art systems, has been measured on several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching and long join paths in the underlying data graphs.

References

  1. D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Alexaki et al. The ics-forth rdfsuite: Managing voluminous rdf description bases. In SemWeb, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Anyanwu, A. Maduko, and A. P. Sheth. Sparq21: towards support for subgraph extraction queries in rdf databases. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Baolin and H. Bo. Path queries based rdf index. In SKG, 2005.Google ScholarGoogle Scholar
  6. L. Baolin and H. Bo. Hprd: A high performance rdf database. In NPC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BioPAX: Biological Pathways Exchange. http://www.biopax.org/.Google ScholarGoogle Scholar
  8. J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: An architecture for storing and querying rdf data and schema information. In Spinning the Semantic Web, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. TODS, 24(2), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chaudhuri and G. Weikum. Rethinking database system architecture: Towards a self-tuning risc-style database system. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. I. Chong et al. An efficient sql-based rdf querying scheme. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Chu, J. L. Beckmann, and J. F. Naughton. The case for a wide-table approach to manage sparse relational data sets. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. DeHaan and F. W. Tompa. Optimal top-down join enumeration. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Eickler, C. A. Gerlhof, and D. Kossmann. A performance evaluation of oid mapping techniques. In VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. A. Galindo-Legaria, A. Pellenkoft, and M. L. Kersten. Fast, randomized join-order selection - why use transformations? In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD Explorations, 7(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Hartig and R. Heese. The sparql query graph model for query optimization. In ESWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Hogan and A. Harth. The expertfinder corpus 2007 for the benchmarking development of expert-finding systems. In International ExpertFinder Workshop, 2007.Google ScholarGoogle Scholar
  22. D. Huynh, S. Mazzocchi, and D. R. Karger. Piggy bank: Experience the semantic web inside your web browser. J. Web Sem., 5(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.Google ScholarGoogle Scholar
  24. M. Kersten and A. P. Siebes. An organic database system. Technical report, CWI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Maduko et al. Estimating the cardinality of rdf graph patterns. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Monetdb. http://monetdb.cwi.nl/.Google ScholarGoogle Scholar
  28. Openrdf. http://www.openrdf.org/index.jsp.Google ScholarGoogle Scholar
  29. Oracle technical network, semantic technologies center. http://www.oracle.com/technology/tech/semantic_technologies/index.html.Google ScholarGoogle Scholar
  30. W3C: Resource Description Framework (RDF). http://www.w3.org/RDF/.Google ScholarGoogle Scholar
  31. RDFizers. http://simile.mit.edu/wiki/RDFizers.Google ScholarGoogle Scholar
  32. ICS-FORTH RDF suite. http://athena.ics.forth.gr:9090/RDF/.Google ScholarGoogle Scholar
  33. F. Scholer et al. Compression of inverted indexes for fast query evaluation. In SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. G. Selinger et al. Access path selection in a relational database management system. In SIGMOD, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. E. Simmen, E. J. Shekita, and T. Malkemus. Fundamental techniques for order optimization. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W3C: SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/.Google ScholarGoogle Scholar
  37. M. Steinbrunn et al. Bypassing joins in disjunctive queries. In VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Stocker et al. Sparql basic graph pattern optimization using selectivity estimation. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Stonebraker et al. One size fits all? part 2: Benchmarking studies. In CIDR, 2007.Google ScholarGoogle Scholar
  40. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Theoharis, V. Christophides, and G. Karvounarakis. Benchmarking database representations of rdf/s stores. In International Semantic Web Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.Google ScholarGoogle Scholar
  44. N. Vanetik and E. Gudes. Mining frequent labeled and partially labeled graph patterns. In ICDE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Westmann et al. The implementation and performance of compressed databases. SIGMOD Record, 29(3), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.Google ScholarGoogle Scholar
  47. W3C: RDF/OWL representation of WordNet. http://www.w3.org/TR/wordnet-rdf/.Google ScholarGoogle Scholar
  48. Yars2. http://sw.deri.org/svn/sw/2004/06/yars.Google ScholarGoogle Scholar
  49. F. Zhu, X. Yan, J. Han, and P. S. Yu. gprune: A constraint pushing framework for graph pattern mining. In PAKDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RDF-3X: a RISC-style engine for RDF

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader