Skip to main content
Top

2019 | OriginalPaper | Chapter

Schema-Independent Querying and Manipulation for Heterogeneous Collections in NoSQL Document Stores

Authors : Hamdi Ben Hamadou, Faiza Ghozzi, André Péninou, Olivier Teste

Published in: Enterprise Information Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections with multiple schemas. Hence, the user has to build complex queries or to reformulate existing ones whenever new schemas appear in the collection. In this paper, we propose a novel approach, grounded on formal foundations, for enabling schema-independent queries for querying and maintaining multi-structured documents. We introduce a query reformulation mechanism which consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project and aggregate. In addition, we automate the process of reformulating the classical manipulation operators (insert, delete and update queries) in order to update the dictionary according to the different structural changes made in the collection. These two processes produce queries which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conduct experiments on synthetic datasets. Our results show that the induced overhead when querying or updating can be acceptable when compared to the efforts made to restructure the data and the time required to execute several queries corresponding to the different schemas inside the collection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. J. Data Semant. IV, 146–171 (2005)MATH Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. J. Data Semant. IV, 146–171 (2005)MATH
2.
go back to reference Bourhis, P., Reutter, J.L., Suárez, F., Vrgoč, D.: JSON: data model, query languages and schema specification. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 123–135. ACM (2017) Bourhis, P., Reutter, J.L., Suárez, F., Vrgoč, D.: JSON: data model, query languages and schema specification. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 123–135. ACM (2017)
3.
go back to reference Yang, Y., Sun, Y., Tang, J., Ma, B., Li, J.: Entity matching across heterogeneous sources. In: Proceedings of the 21th ACM SIGKDD, pp. 1395–1404. ACM (2015) Yang, Y., Sun, Y., Tang, J., Ma, B., Li, J.: Entity matching across heterogeneous sources. In: Proceedings of the 21th ACM SIGKDD, pp. 1395–1404. ACM (2015)
4.
go back to reference Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016) Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016)
5.
go back to reference Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22, 183–236 (1990)CrossRef Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22, 183–236 (1990)CrossRef
6.
go back to reference Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)CrossRef Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)CrossRef
7.
go back to reference Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013) Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013)
8.
go back to reference Tahara, D., Diamond, T., Abadi, D.J.: Sinew: a SQL system for multi-structured data. In: Proceedings of the 2014 ACM SIGMOD, pp. 815–826. ACM (2014) Tahara, D., Diamond, T., Abadi, D.J.: Sinew: a SQL system for multi-structured data. In: Proceedings of the 2014 ACM SIGMOD, pp. 815–826. ACM (2014)
9.
go back to reference DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: Proceedings of the 2016 International Conference on Management of Data, pp. 295–310. ACM (2016) DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: Proceedings of the 2016 International Conference on Management of Data, pp. 295–310. ACM (2016)
10.
go back to reference Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: EDBT (2017) Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: EDBT (2017)
12.
go back to reference Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O.: Schema management for document stores. Proc. VLDB Endow. 8, 922–933 (2015)CrossRef Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O.: Schema management for document stores. Proc. VLDB Endow. 8, 922–933 (2015)CrossRef
14.
go back to reference Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: ACM SIGMOD Record, vol. 28, pp. 455–466. ACM (1999) Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: ACM SIGMOD Record, vol. 28, pp. 455–466. ACM (1999)
15.
go back to reference Lin, C., Wang, J., Rong, C.: Towards heterogeneous keyword search. In: Proceedings of the ACM Turing 50th Celebration Conference-China, p. 46. ACM (2017) Lin, C., Wang, J., Rong, C.: Towards heterogeneous keyword search. In: Proceedings of the ACM Turing 50th Celebration Conference-China, p. 46. ACM (2017)
16.
go back to reference Clark, J., DeRose, S., et al.: XML path language (XPath) version 1.0 (1999) Clark, J., DeRose, S., et al.: XML path language (XPath) version 1.0 (1999)
17.
go back to reference Boag, S., et al.: XQuery 1.0: an XML query language (2002) Boag, S., et al.: XQuery 1.0: an XML query language (2002)
18.
go back to reference Florescu, D., Fourny, G.: JSONiq: the history of a query language. IEEE Internet Comput. 17, 86–90 (2013)CrossRef Florescu, D., Fourny, G.: JSONiq: the history of a query language. IEEE Internet Comput. 17, 86–90 (2013)CrossRef
19.
go back to reference Hidders, J., Paredaens, J., Van den Bussche, J.: J-logic: logical foundations for JSON querying. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 137–149. ACM (2017) Hidders, J., Paredaens, J., Van den Bussche, J.: J-logic: logical foundations for JSON querying. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 137–149. ACM (2017)
20.
go back to reference Botoeva, E., Calvanese, D., Cogrel, B., Xiao, G.: Expressivity and complexity of MongoDB queries. In: 21st International Conference on Database Theory, ICDT 2018, Vienna, Austria, 26–29 March 2018, pp. 9:1–9:23 (2018) Botoeva, E., Calvanese, D., Cogrel, B., Xiao, G.: Expressivity and complexity of MongoDB queries. In: 21st International Conference on Database Theory, ICDT 2018, Vienna, Austria, 26–29 March 2018, pp. 9:1–9:23 (2018)
21.
go back to reference Hamadou, H.B., Ghozzi, F., Péninou, A., Teste, O.: Towards schema-independent querying on document data stores. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), Vienna, Austria, 26–29 March 2018 (2018) Hamadou, H.B., Ghozzi, F., Péninou, A., Teste, O.: Towards schema-independent querying on document data stores. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), Vienna, Austria, 26–29 March 2018 (2018)
Metadata
Title
Schema-Independent Querying and Manipulation for Heterogeneous Collections in NoSQL Document Stores
Authors
Hamdi Ben Hamadou
Faiza Ghozzi
André Péninou
Olivier Teste
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-26169-6_16

Premium Partner