Skip to main content
Top
Published in: The Journal of Supercomputing 3/2024

08-09-2023

Schema generation for document stores using workload-driven approach

Authors: Neha Bansal, Shelly Sachdeva, Lalit K. Awasthi

Published in: The Journal of Supercomputing | Issue 3/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Although there are numerous data modeling tools for relational databases, data modeling for NoSQL databases has seen another perspective. These databases (a) do not define any explicit schema, (b) store data in a denormalized manner, and (c) give many structure alternatives. The decision to structure the data always relies on rules of thumb, which do not guarantee an optimal structural solution. Based on this motivation, this paper offers a workload-driven model for the logical schema design of a NoSQL document database. It consists of Model input, Intermediate transformation, and Final schema generation. The proposed model takes the conceptual schema (EER model) and application workload (queries and anticipated data volume) as input and describes a procedure to convert it into a logical model for NoSQL document stores. The conversion process initially converts the application queries into query graphs. The query graphs, along with the anticipated data volume, are used to generate the query labels. The resulting query labels are assigned on the schema graph designed from the EER model. The schema graph and labels are used to transform the EER model into the appropriate logical schema model based on the actions defined for each label. We evaluate the model using a case study in the eCommerce application domain. The experimental evaluation shows the proposed model outperforms the existing conventional, optimized, and query path graphs models in multiple aspects, including query performance, storage space efficiency, aggregate pipeline efficiency, read–write latency, collection-wise performance, scalability, throughput and latency. By effectively addressing the challenges associated with managing the variety and volume of big data through a well-designed schema, our proposed model significantly reduces the time, cost, and effort required for schema development and repair.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Faccia A, Cavaliere LPL, Petratos P, Mosteanu NR (2022) Unstructured over structured, big data analytics and applications in accounting and management. In: Proceedings of the 2022 6th International Conference on Cloud and Big Data Computing, pp 37–41. https://doi.org/10.1145/3555962.3555969 Faccia A, Cavaliere LPL, Petratos P, Mosteanu NR (2022) Unstructured over structured, big data analytics and applications in accounting and management. In: Proceedings of the 2022 6th International Conference on Cloud and Big Data Computing, pp 37–41. https://​doi.​org/​10.​1145/​3555962.​3555969
14.
go back to reference De Lima C, Dos Santos Mello R (2015) A workload-driven logical design approach for NoSQL document databases. In: 17th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2015 - Proceedings. https://doi.org/10.1145/2837185.2837218 De Lima C, Dos Santos Mello R (2015) A workload-driven logical design approach for NoSQL document databases. In: 17th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2015 - Proceedings. https://​doi.​org/​10.​1145/​2837185.​2837218
15.
go back to reference Jia T, Zhao X, Wang Z, D Gong (2016) Model transformation and data migration from relational database to MongoDB. In: 2016 IEEE International Congress on Big Data (BigData Congress) Jia T, Zhao X, Wang Z, D Gong (2016) Model transformation and data migration from relational database to MongoDB. In: 2016 IEEE International Congress on Big Data (BigData Congress)
27.
go back to reference Roy-Hubara N (2019) The quest for a database selection and design method. CEUR Workshop Proc 2370:69–77 Roy-Hubara N (2019) The quest for a database selection and design method. CEUR Workshop Proc 2370:69–77
36.
go back to reference Jia T, Zhao X, Wang DG-2016 II, 2016 U (2016) Model transformation and data migration from relational database to MongoDB. In: In 2016 IEEE International Congress on Big Data (BigData Congress), pp 60–67 Jia T, Zhao X, Wang DG-2016 II, 2016 U (2016) Model transformation and data migration from relational database to MongoDB. In: In 2016 IEEE International Congress on Big Data (BigData Congress), pp 60–67
38.
39.
go back to reference Davoudian A (2021) A workload-driven framework for NoSQL data modeling and partitioning, PhD Dissertation. Carleton University Davoudian A (2021) A workload-driven framework for NoSQL data modeling and partitioning, PhD Dissertation. Carleton University
44.
go back to reference Rosenthal A, Galindo-Legaria C (1990) Query graphs, implementing trees, and freely-reorderable outerjoins. Proc ACM SIGMOD Int Conf Manage Data 1990:291–299CrossRef Rosenthal A, Galindo-Legaria C (1990) Query graphs, implementing trees, and freely-reorderable outerjoins. Proc ACM SIGMOD Int Conf Manage Data 1990:291–299CrossRef
49.
go back to reference Henry OB (2019) MongoDB aggregation stages and pipelining. White paper, pp 1–38 Henry OB (2019) MongoDB aggregation stages and pipelining. White paper, pp 1–38
Metadata
Title
Schema generation for document stores using workload-driven approach
Authors
Neha Bansal
Shelly Sachdeva
Lalit K. Awasthi
Publication date
08-09-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 3/2024
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05613-5

Other articles of this Issue 3/2024

The Journal of Supercomputing 3/2024 Go to the issue

Premium Partner