Abstract
Due to the increasing use of RDF data, efficient processing of SPARQL queries over RDF datasets has become an important issue. However, existing solutions suffer from two limitations: 1) they cannot answer SPARQL queries with wildcards in a scalable manner; and 2) they cannot handle frequent updates in RDF repositories efficiently. Thus, most of them have to reprocess the dataset from scratch. In this paper, we propose a graph-based approach to store and query RDF data. Rather than mapping RDF triples into a relational database as most existing methods do, we store RDF data as a large graph. A SPARQL query is then converted into a corresponding subgraph matching query. In order to speed up query processing, we develop a novel index, together with some effective pruning rules and efficient search algorithms. Our method can answer exact SPARQL queries and queries with wildcards in a uniform manner. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solution.
- D. J. Abadi, A. Marcus, S. Madden, and K. Hollenbach. Sw-store: a vertically partitioned dbms for semantic web data management. VLDB J., 18(2):385--406, 2009. Google ScholarDigital Library
- D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422, 2007. Google ScholarDigital Library
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. J. Web Sem., 7(3):154--165, 2009. Google ScholarDigital Library
- V. Bönström, A. Hinze, and H. Schweppe. Storing RDF as a graph. In LA-WEB, pages 27--36, 2003. Google ScholarDigital Library
- J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF schema. In ISWC, pages 54--68, 2002. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, 2001. Google ScholarDigital Library
- U. Deppisch. S-tree: A dynamic balanced signature index for office retrieval. In SIGIR, pages 77--87, 1986. Google ScholarDigital Library
- C. Faloutsos and S. Christodoulakis. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst., 2(4):267--288, 1984. Google ScholarDigital Library
- L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, L. Pietarinen, and D. Srivastava. Using q-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull., 24(4):28--34, 2001.Google Scholar
- L. Gravano, P. G. Ipeirotis, N. Koudas, and D. Srivastava. Text joins in an RDBMS for web data integration. In WWW, pages 90--101, 2003. Google ScholarDigital Library
- A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, pages 211--224, 2007. Google ScholarDigital Library
- T. Neumann and G. Weikum. RDF-3X: a risc-style engine for RDF. PVLDB, 1(1):647--659, 2008. Google ScholarDigital Library
- T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In SIGMOD, pages 627--640, 2009. Google ScholarDigital Library
- T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1):91--113, 2010. Google ScholarDigital Library
- T. Neumann and G. Weikum. X-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. PVLDB, 1(1):256--263, 2010. Google ScholarDigital Library
- J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Trans. Database Syst., 34(3):16:1--16:45, 2009. Google ScholarDigital Library
- D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002. Google ScholarDigital Library
- M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In WWW, pages 595--604, 2008. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706, 2007. Google ScholarDigital Library
- E. Tousidou, P. Bozanis, and Y. Manolopoulos. Signature-based structures for objects with set-valued attributes. Inf. Syst., 27(2):93--121, 2002. Google ScholarDigital Library
- O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based RDF index. In AAAI, pages 1465--1470, 2007. Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008. Google ScholarDigital Library
- K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in jena2. In SWDB, pages 131--150, 2003.Google Scholar
- X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- Y. Yan, C. Wang, A. Zhou, W. Qian, L. Ma, and Y. Pan. Efficient indices using graph partitioning in RDF triple stores. In ICDE, pages 1263--1266, 2009. Google ScholarDigital Library
- L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008. Google ScholarDigital Library
Index Terms
- gStore: answering SPARQL queries via subgraph matching
Recommendations
gStore: a graph-based SPARQL query engine
We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. ...
Redesign of the gStore system
gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (...
A New CPU-FPGA Heterogeneous gStore System
Web and Big DataAbstractIn this demonstration, we present a new CPU-FPGA heterogeneous gStore system. The previous gStore system is based on CPU and has low join query performance when the data size is too big. We implement a FPGA-based join module to speed up join ...
Comments