Skip to main content
Erschienen in: The VLDB Journal 4/2014

01.08.2014 | Regular Paper

gStore: a graph-based SPARQL query engine

verfasst von: Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, Dongyan Zhao

Erschienen in: The VLDB Journal | Ausgabe 4/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The literature on aggregation and aggregate queries frequently refer to these attributes as dimensions. We follow the same convention.
 
2
Note that, this is not necessarily \(O_n\), since node identifiers are arbitrarily assigned to help with presentation.
 
3
Although dimension list is a set, when the order is important, we specify them as a list enclosed in ( ).
 
4
We revised the original 14 to remove type reasoning that gStore does not currently support; the resulting queries return larger result sets since there is no filtering as a result of type reasoning. For completeness, these are included in Online Supplements as Table 15.
 
Literatur
1.
Zurück zum Zitat Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422 (2007) Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422 (2007)
2.
Zurück zum Zitat Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)CrossRef Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)CrossRef
3.
Zurück zum Zitat Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International World Wide Web Conference, pp. 41–50 (2010) Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International World Wide Web Conference, pp. 41–50 (2010)
5.
Zurück zum Zitat Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of the 1st Latin American Web Congress, pp. 27–36 (2003) Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of the 1st Latin American Web Congress, pp. 27–36 (2003)
6.
Zurück zum Zitat Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference, pp. 54–68 (2002) Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference, pp. 54–68 (2002)
7.
Zurück zum Zitat Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)MATH
8.
Zurück zum Zitat Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87 (1986) Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87 (1986)
9.
Zurück zum Zitat Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)CrossRef Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)CrossRef
10.
Zurück zum Zitat Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference, pp. 90–101 (2003) Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference, pp. 90–101 (2003)
11.
Zurück zum Zitat Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L., Srivastava, D.: Using \(q\)-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull. 24(4), 28–34 (2001) Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L., Srivastava, D.: Using \(q\)-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull. 24(4), 28–34 (2001)
12.
Zurück zum Zitat Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)CrossRef Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)CrossRef
13.
Zurück zum Zitat Gupta, A., Dallan Quass, V.H.: Aggregate-query processing in data warehousing environments. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 358–369 (1995) Gupta, A., Dallan Quass, V.H.: Aggregate-query processing in data warehousing environments. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 358–369 (1995)
14.
Zurück zum Zitat Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007) Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007)
15.
Zurück zum Zitat Hoffart, J., Suchanek, F.M., Berberich, K., Kelham, E.L., de Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International World Wide Web Conference, pp. 229–232 (2011) Hoffart, J., Suchanek, F.M., Berberich, K., Kelham, E.L., de Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International World Wide Web Conference, pp. 229–232 (2011)
16.
Zurück zum Zitat Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: Proceedings of the 21st International Conference on Data Engineering, pp. 717–728 (2005) Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: Proceedings of the 21st International Conference on Data Engineering, pp. 717–728 (2005)
17.
Zurück zum Zitat Johnson, T., Shasha, D.: B-trees with inserts and deletes: why free-at-empty is better than merge-at-half. J. Comput. Syst. Sci. 47(1), 45–76 (1993)CrossRefMATHMathSciNet Johnson, T., Shasha, D.: B-trees with inserts and deletes: why free-at-empty is better than merge-at-half. J. Comput. Syst. Sci. 47(1), 45–76 (1993)CrossRefMATHMathSciNet
18.
Zurück zum Zitat Kitagawa, H., Ishikawa, Y.: False drop analysis of set retrieval with signature files. IEICE Trans. Inf. Syst. E80–D(6), 1–12 (1997) Kitagawa, H., Ishikawa, Y.: False drop analysis of set retrieval with signature files. IEICE Trans. Inf. Syst. E80–D(6), 1–12 (1997)
19.
Zurück zum Zitat Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009) Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)
20.
Zurück zum Zitat Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)CrossRef Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)CrossRef
21.
Zurück zum Zitat Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRef Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRef
22.
Zurück zum Zitat Neumann, T., Weikum, G.: x-RDF-3x: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 1(1), 256–263 (2010)CrossRef Neumann, T., Weikum, G.: x-RDF-3x: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 1(1), 256–263 (2010)CrossRef
23.
Zurück zum Zitat Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009) Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)
24.
Zurück zum Zitat Seid, D.Y., Mehrotra, S.: Grouping and aggregate queries over semantic web databases. In: Proceedings of the International Conference on Semantic Computing, pp. 775–782 (2007) Seid, D.Y., Mehrotra, S.: Grouping and aggregate queries over semantic web databases. In: Proceedings of the International Conference on Semantic Computing, pp. 775–782 (2007)
25.
Zurück zum Zitat Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 39–52 (2002) Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 39–52 (2002)
26.
Zurück zum Zitat Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference, pp. 595–604 (2008) Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference, pp. 595–604 (2008)
27.
Zurück zum Zitat Tousidou, E., Nanopoulos, A., Manolopoulos, Y.: Improved methods for signature-tree construction. Comput. J. 43(4), 301–314 (2000)CrossRefMATH Tousidou, E., Nanopoulos, A., Manolopoulos, Y.: Improved methods for signature-tree construction. Comput. J. 43(4), 301–314 (2000)CrossRefMATH
28.
Zurück zum Zitat Tousidou, E., Bozanis, P., Manolopoulos, Y.: Signature-based structures for objects with set-valued attributes. Inf. Syst. 27(2), 93–121 (2002)CrossRefMATH Tousidou, E., Bozanis, P., Manolopoulos, Y.: Signature-based structures for objects with set-valued attributes. Inf. Syst. 27(2), 93–121 (2002)CrossRefMATH
29.
Zurück zum Zitat Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 1465–1470 (2007) Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 1465–1470 (2007)
30.
Zurück zum Zitat Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRef Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRef
31.
Zurück zum Zitat Wilkinson, K., Sayers, C. , Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of the 1st Inter national Workshop on Semantic Web and Databases, pp. 131–150 (2003) Wilkinson, K., Sayers, C. , Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of the 1st Inter national Workshop on Semantic Web and Databases, pp. 131–150 (2003)
32.
Zurück zum Zitat Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1263–1266 (2009) Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1263–1266 (2009)
33.
Zurück zum Zitat Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004) Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)
34.
Zurück zum Zitat Yuan, P., Liu, P., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)CrossRef Yuan, P., Liu, P., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)CrossRef
Metadaten
Titel
gStore: a graph-based SPARQL query engine
verfasst von
Lei Zou
M. Tamer Özsu
Lei Chen
Xuchuan Shen
Ruizhe Huang
Dongyan Zhao
Publikationsdatum
01.08.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2014
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-013-0337-7

Weitere Artikel der Ausgabe 4/2014

The VLDB Journal 4/2014 Zur Ausgabe