Skip to main content
Erschienen in: The VLDB Journal 4/2021

24.03.2021 | Regular Paper

A cost model for random access queries in document stores

verfasst von: Moditha Hewasinghage, Alberto Abelló, Jovan Varga, Esteban Zimányi

Erschienen in: The VLDB Journal | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Document stores have become one of the key NoSQL storage solutions. They have been widely adopted in different domains due to their ability to store semi-structured data and expressive query capabilities. However, implementations differ in terms of concrete data storage and retrieval. Unfortunately, a standard framework for data and query optimization for document stores is nonexistent, and only implementation-specific design and query guidelines are used. Hence, the goal of this work is to aid automating the data design for document stores based on query costs instead of generic design rules. For this, we define a generic storage and query cost model based on disk access and memory allocation that allows estimating the impact of design decisions. Since all document stores carry out data operations in memory, we first estimate the memory usage by considering characteristics of the stored documents, their access patterns, and memory management algorithms. Then, using this estimation and metadata storage size, we introduce a cost model for random access queries. We validate our work on two well-known document store implementations: MongoDB and Couchbase. The results show that the memory usage estimates have the average precision of 91% and predicted costs are highly correlated to the actual execution times. During this work, we have managed to suggest several improvements to document storage systems. Thus, this cost model also contributes to identifying discordance between document store implementations and their theoretical expectations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Beal, L.D.R., Hill, D.C., Martin, R.A., Hedengren, J.D.: GEKKO Optimization Suite. Processes 6(8), 106–131 (2018)CrossRef Beal, L.D.R., Hill, D.C., Martin, R.A., Hedengren, J.D.: GEKKO Optimization Suite. Processes 6(8), 106–131 (2018)CrossRef
2.
Zurück zum Zitat Bertino, E., Foscoli, P.: On modeling cost functions for object-oriented databases. IEEE Trans. Knowl. Data Eng. 9(3), 500–508 (1997)CrossRef Bertino, E., Foscoli, P.: On modeling cost functions for object-oriented databases. IEEE Trans. Knowl. Data Eng. 9(3), 500–508 (1997)CrossRef
3.
Zurück zum Zitat Cattell, R.: Scalable SQL and NoSQL data stores. SIGMOD Record 39(4), 12–27 (2010)CrossRef Cattell, R.: Scalable SQL and NoSQL data stores. SIGMOD Record 39(4), 12–27 (2010)CrossRef
4.
Zurück zum Zitat Dan, A., Towsley, D.: An approximate analysis of the LRU and FIFO buffer replacement schemes. SIGMETRICS Perform. Eval. Rev. 18(1), 143–152 (1990)CrossRef Dan, A., Towsley, D.: An approximate analysis of the LRU and FIFO buffer replacement schemes. SIGMETRICS Perform. Eval. Rev. 18(1), 143–152 (1990)CrossRef
5.
Zurück zum Zitat de la Vega, A., García-Saiz, D., Blanco, C., Zorrilla, M.E., Sánchez, P.: Mortadelo: Automatic generation of NoSQL stores from platform-independent data models. Future Gener. Comput. Syst. 105, (2020) de la Vega, A., García-Saiz, D., Blanco, C., Zorrilla, M.E., Sánchez, P.: Mortadelo: Automatic generation of NoSQL stores from platform-independent data models. Future Gener. Comput. Syst. 105, (2020)
6.
7.
Zurück zum Zitat Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992)CrossRef Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992)CrossRef
8.
Zurück zum Zitat Gardarin, G., Gruser, J., Tang, Z.: A cost model for clustered object-oriented databases. In: International Conference on Very Large Data Bases, pp. 323–334 (1995) Gardarin, G., Gruser, J., Tang, Z.: A cost model for clustered object-oriented databases. In: International Conference on Very Large Data Bases, pp. 323–334 (1995)
9.
Zurück zum Zitat Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: A survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)CrossRef Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: A survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)CrossRef
10.
Zurück zum Zitat Guo, F., Solihin, Y.: An analytical model for cache replacement policy performance. SIGMETRICS Perform. Evalu. Rev. 34(1), 228–239 (2006)CrossRef Guo, F., Solihin, Y.: An analytical model for cache replacement policy performance. SIGMETRICS Perform. Evalu. Rev. 34(1), 228–239 (2006)CrossRef
11.
Zurück zum Zitat Hecht, R., Jablonski, S.: NoSQL Evaluation: A Use Case Oriented Survey. In: IEEE International Conference on Cloud and Service Computing, pp. 336–341 (2011) Hecht, R., Jablonski, S.: NoSQL Evaluation: A Use Case Oriented Survey. In: IEEE International Conference on Cloud and Service Computing, pp. 336–341 (2011)
12.
Zurück zum Zitat Hewasinghage, M., Abelló, A., Varga, J., Zimányi, E.: DocDesign: Cost-Based Database Design for Document Stores. Presented at the (2020) Hewasinghage, M., Abelló, A., Varga, J., Zimányi, E.: DocDesign: Cost-Based Database Design for Document Stores. Presented at the (2020)
13.
Zurück zum Zitat Imam, A.A., Basri, S., Ahmad, R., Watada, J., Gonzalez-Aparicio, M.T., Almomani, M.A.: Data modeling guidelines for NoSQL document-store databases. Int. J. Adv. Comput. Sci. Appl. 9(10), 544–555 (2018) Imam, A.A., Basri, S., Ahmad, R., Watada, J., Gonzalez-Aparicio, M.T., Almomani, M.A.: Data modeling guidelines for NoSQL document-store databases. Int. J. Adv. Comput. Sci. Appl. 9(10), 544–555 (2018)
14.
Zurück zum Zitat Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRef Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRef
15.
Zurück zum Zitat Jiang, B., Nain, P., Towsley, D.: LRU cache under stationary requests. SIGMETRICS Perform. Evalu. Rev. 45(2), 24–26 (2017)CrossRef Jiang, B., Nain, P., Towsley, D.: LRU cache under stationary requests. SIGMETRICS Perform. Evalu. Rev. 45(2), 24–26 (2017)CrossRef
16.
Zurück zum Zitat Kim, J., Lee, W., Lee, K.: The Cost Model for XML Documents in Relational Database Systems. In: IEEE International Conference on Computer Systems and Applications, pp. 185–187, (2001) Kim, J., Lee, W., Lee, K.: The Cost Model for XML Documents in Relational Database Systems. In: IEEE International Conference on Computer Systems and Applications, pp. 185–187, (2001)
17.
Zurück zum Zitat King III, W.F.: Analysis of Demand Paging Algorithms. IFIP Congress 1, pp. 485–490 (1971) King III, W.F.: Analysis of Demand Paging Algorithms. IFIP Congress 1, pp. 485–490 (1971)
18.
Zurück zum Zitat Lightstone, S., Teorey, T.J., Nadeau, T.P.: Physical Database Design: The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Morgan Kaufmann, San Francisc (2007) Lightstone, S., Teorey, T.J., Nadeau, T.P.: Physical Database Design: The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Morgan Kaufmann, San Francisc (2007)
19.
Zurück zum Zitat Manegold, S., Boncz, P., Kersten, M.: Generic database cost models for hierarchical memory systems. [INS]. CWI., (Technical Report INS-R0203), Jan. (2002) Manegold, S., Boncz, P., Kersten, M.: Generic database cost models for hierarchical memory systems. [INS]. CWI., (Technical Report INS-R0203), Jan. (2002)
20.
Zurück zum Zitat Manegold, S., Boncz, P. A., Kersten, M. L.: Generic Database Cost Models for Hierarchical Memory Systems. In: International Conference on Very Large Data Bases, pp. 191–202, (2002) Manegold, S., Boncz, P. A., Kersten, M. L.: Generic Database Cost Models for Hierarchical Memory Systems. In: International Conference on Very Large Data Bases, pp. 191–202, (2002)
21.
Zurück zum Zitat Megiddo, N., Modha, D.S.: Outperforming LRU with an adaptive replacement cache algorithm. IEEE Comput. 37(4), 58–65 (2004)CrossRef Megiddo, N., Modha, D.S.: Outperforming LRU with an adaptive replacement cache algorithm. IEEE Comput. 37(4), 58–65 (2004)CrossRef
22.
Zurück zum Zitat Michels, J., Hare, K., Kulkarni, K., Zuzarte, C., Liu, Z.H., Hammerschmidt, B., Zemke, F.: The new and improved SQL: 2016 standard. ACM SIGMOD Record 47(2), 51–60 (2018)CrossRef Michels, J., Hare, K., Kulkarni, K., Zuzarte, C., Liu, Z.H., Hammerschmidt, B., Zemke, F.: The new and improved SQL: 2016 standard. ACM SIGMOD Record 47(2), 51–60 (2018)CrossRef
23.
Zurück zum Zitat Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: Schema design for NoSQL applications. IEEE Trans. Knowl. Data Eng. 29(10), (2017) Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: Schema design for NoSQL applications. IEEE Trans. Knowl. Data Eng. 29(10), (2017)
24.
Zurück zum Zitat Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 4th edn. Springer, Berlin (2020)CrossRef Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 4th edn. Springer, Berlin (2020)CrossRef
25.
Zurück zum Zitat Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)MATH Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)MATH
26.
Zurück zum Zitat Schindler, J.: I/O characteristics of NoSQL databases. Proc. VLDB Endow. 5(12), 2020–2021 (2012)CrossRef Schindler, J.: I/O characteristics of NoSQL databases. Proc. VLDB Endow. 5(12), 2020–2021 (2012)CrossRef
27.
Zurück zum Zitat Smith, A.J.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982)CrossRef Smith, A.J.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982)CrossRef
28.
Zurück zum Zitat Smith, A.J.: Cache evaluation and the impact of workload choice. SIGARCH Comput. Archit. News 13(3), 64–73 (1985)CrossRef Smith, A.J.: Cache evaluation and the impact of workload choice. SIGARCH Comput. Archit. News 13(3), 64–73 (1985)CrossRef
29.
Zurück zum Zitat Vafaei, N., Ribeiro, R.A., Camarinha-Matos, L.M.: Data normalisation techniques in decision making: Case study with TOPSIS method. Int. J. Inf. Decis. Sci. 10(1), 19–38 (2018) Vafaei, N., Ribeiro, R.A., Camarinha-Matos, L.M.: Data normalisation techniques in decision making: Case study with TOPSIS method. Int. J. Inf. Decis. Sci. 10(1), 19–38 (2018)
30.
Zurück zum Zitat Vajk, T., Deák, L., Fekete, K., Mezei, G.: Automatic NoSQL schema development: A case study. In: Artificial Intelligence and Applications (2013) Vajk, T., Deák, L., Fekete, K., Mezei, G.: Automatic NoSQL schema development: A case study. In: Artificial Intelligence and Applications (2013)
31.
Zurück zum Zitat Waldspurger, C., Saemundsson, T., Ahmad, I., Park, N.: Cache Modeling and Optimization using Miniature Simulations. In: USENIX Annual Technical Conference, pp. 487–498 (2017) Waldspurger, C., Saemundsson, T., Ahmad, I., Park, N.: Cache Modeling and Optimization using Miniature Simulations. In: USENIX Annual Technical Conference, pp. 487–498 (2017)
32.
Zurück zum Zitat Yao, J.: An Efficient Storage Model of Tree-Like Structure in MongoDB. In: IEEE International Conference on Semantics, Knowledge and Grids, pp. 166–169 (2016) Yao, J.: An Efficient Storage Model of Tree-Like Structure in MongoDB. In: IEEE International Conference on Semantics, Knowledge and Grids, pp. 166–169 (2016)
Metadaten
Titel
A cost model for random access queries in document stores
verfasst von
Moditha Hewasinghage
Alberto Abelló
Jovan Varga
Esteban Zimányi
Publikationsdatum
24.03.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2021
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-021-00660-x

Weitere Artikel der Ausgabe 4/2021

The VLDB Journal 4/2021 Zur Ausgabe

Premium Partner