Skip to main content

2019 | OriginalPaper | Buchkapitel

Schema-Independent Querying and Manipulation for Heterogeneous Collections in NoSQL Document Stores

verfasst von : Hamdi Ben Hamadou, Faiza Ghozzi, André Péninou, Olivier Teste

Erschienen in: Enterprise Information Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections with multiple schemas. Hence, the user has to build complex queries or to reformulate existing ones whenever new schemas appear in the collection. In this paper, we propose a novel approach, grounded on formal foundations, for enabling schema-independent queries for querying and maintaining multi-structured documents. We introduce a query reformulation mechanism which consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project and aggregate. In addition, we automate the process of reformulating the classical manipulation operators (insert, delete and update queries) in order to update the dictionary according to the different structural changes made in the collection. These two processes produce queries which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conduct experiments on synthetic datasets. Our results show that the induced overhead when querying or updating can be acceptable when compared to the efforts made to restructure the data and the time required to execute several queries corresponding to the different schemas inside the collection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. J. Data Semant. IV, 146–171 (2005)MATH Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. J. Data Semant. IV, 146–171 (2005)MATH
2.
Zurück zum Zitat Bourhis, P., Reutter, J.L., Suárez, F., Vrgoč, D.: JSON: data model, query languages and schema specification. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 123–135. ACM (2017) Bourhis, P., Reutter, J.L., Suárez, F., Vrgoč, D.: JSON: data model, query languages and schema specification. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 123–135. ACM (2017)
3.
Zurück zum Zitat Yang, Y., Sun, Y., Tang, J., Ma, B., Li, J.: Entity matching across heterogeneous sources. In: Proceedings of the 21th ACM SIGKDD, pp. 1395–1404. ACM (2015) Yang, Y., Sun, Y., Tang, J., Ma, B., Li, J.: Entity matching across heterogeneous sources. In: Proceedings of the 21th ACM SIGKDD, pp. 1395–1404. ACM (2015)
4.
Zurück zum Zitat Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016) Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016)
5.
Zurück zum Zitat Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22, 183–236 (1990)CrossRef Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22, 183–236 (1990)CrossRef
6.
Zurück zum Zitat Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)CrossRef Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)CrossRef
7.
Zurück zum Zitat Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013) Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013)
8.
Zurück zum Zitat Tahara, D., Diamond, T., Abadi, D.J.: Sinew: a SQL system for multi-structured data. In: Proceedings of the 2014 ACM SIGMOD, pp. 815–826. ACM (2014) Tahara, D., Diamond, T., Abadi, D.J.: Sinew: a SQL system for multi-structured data. In: Proceedings of the 2014 ACM SIGMOD, pp. 815–826. ACM (2014)
9.
Zurück zum Zitat DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: Proceedings of the 2016 International Conference on Management of Data, pp. 295–310. ACM (2016) DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: Proceedings of the 2016 International Conference on Management of Data, pp. 295–310. ACM (2016)
10.
Zurück zum Zitat Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: EDBT (2017) Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: EDBT (2017)
11.
12.
Zurück zum Zitat Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O.: Schema management for document stores. Proc. VLDB Endow. 8, 922–933 (2015)CrossRef Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O.: Schema management for document stores. Proc. VLDB Endow. 8, 922–933 (2015)CrossRef
14.
Zurück zum Zitat Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: ACM SIGMOD Record, vol. 28, pp. 455–466. ACM (1999) Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: ACM SIGMOD Record, vol. 28, pp. 455–466. ACM (1999)
15.
Zurück zum Zitat Lin, C., Wang, J., Rong, C.: Towards heterogeneous keyword search. In: Proceedings of the ACM Turing 50th Celebration Conference-China, p. 46. ACM (2017) Lin, C., Wang, J., Rong, C.: Towards heterogeneous keyword search. In: Proceedings of the ACM Turing 50th Celebration Conference-China, p. 46. ACM (2017)
16.
Zurück zum Zitat Clark, J., DeRose, S., et al.: XML path language (XPath) version 1.0 (1999) Clark, J., DeRose, S., et al.: XML path language (XPath) version 1.0 (1999)
17.
Zurück zum Zitat Boag, S., et al.: XQuery 1.0: an XML query language (2002) Boag, S., et al.: XQuery 1.0: an XML query language (2002)
18.
Zurück zum Zitat Florescu, D., Fourny, G.: JSONiq: the history of a query language. IEEE Internet Comput. 17, 86–90 (2013)CrossRef Florescu, D., Fourny, G.: JSONiq: the history of a query language. IEEE Internet Comput. 17, 86–90 (2013)CrossRef
19.
Zurück zum Zitat Hidders, J., Paredaens, J., Van den Bussche, J.: J-logic: logical foundations for JSON querying. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 137–149. ACM (2017) Hidders, J., Paredaens, J., Van den Bussche, J.: J-logic: logical foundations for JSON querying. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 137–149. ACM (2017)
20.
Zurück zum Zitat Botoeva, E., Calvanese, D., Cogrel, B., Xiao, G.: Expressivity and complexity of MongoDB queries. In: 21st International Conference on Database Theory, ICDT 2018, Vienna, Austria, 26–29 March 2018, pp. 9:1–9:23 (2018) Botoeva, E., Calvanese, D., Cogrel, B., Xiao, G.: Expressivity and complexity of MongoDB queries. In: 21st International Conference on Database Theory, ICDT 2018, Vienna, Austria, 26–29 March 2018, pp. 9:1–9:23 (2018)
21.
Zurück zum Zitat Hamadou, H.B., Ghozzi, F., Péninou, A., Teste, O.: Towards schema-independent querying on document data stores. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), Vienna, Austria, 26–29 March 2018 (2018) Hamadou, H.B., Ghozzi, F., Péninou, A., Teste, O.: Towards schema-independent querying on document data stores. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), Vienna, Austria, 26–29 March 2018 (2018)
Metadaten
Titel
Schema-Independent Querying and Manipulation for Heterogeneous Collections in NoSQL Document Stores
verfasst von
Hamdi Ben Hamadou
Faiza Ghozzi
André Péninou
Olivier Teste
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-26169-6_16