Top

Published in:

2023 | OriginalPaper | Chapter

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

Authors : Aristeidis Karras, Christos Karras, Antonios Pervanas, Spyros Sioutas, Christos Zaroliagis

Published in: Algorithmic Aspects of Cloud Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

A method for query optimization is presented by utilizing Spark SQL, a module of Apache Spark that integrates relational data processing. The goal of this paper is to explore NoSQL databases and their effective usage in conjunction with distributed environments to optimize query execution time, in order to accommodate the user complex demands in a cloud computing setting that necessitate the real-time generation of dynamic pages and the provision of dynamic information.

In this work, we investigate query optimization using various query execution paths by combining MongoDB and Spark SQL, aiming to reduce the average query execution time. We achieve this goal by improving the query execution time through a sequence of query execution path scenarios that split the initial query into sub-queries between MongoDB and Spark SQL, along with the use of a mediator between Apache Spark and MongoDB. This mediator transfers either the entire database from MongoDB to Spark, or transfers a subset of the results for those sub-queries executed in MongoDB. Our experimental results with eight different query execution path scenarios and six difference database sizes demonstrate the clear superiority and scalability of a specific scenario.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Cloud-Based Urban Mobility Services

next chapter MAGMA: Proposing a Massive Historical Graph Management System

Available at: https://www.mongodb.com/products/compass.

Available at: https://www.mongodb.com/docs/spark-connector/current/.

The standard range notation is used where “[” and “]” denote inclusive bounds and “(” and “)” denote exclusive bounds.

Hotspots in sharded clusters refer to situations where a specific chunk of data receives a disproportionate amount of read and write operations, causing performance issues.

Abdel-Fattah, M.A., Mohamed, W., Abdelgaber, S.: A comprehensive spark-based layer for converting relational databases to NoSQL. Big Data Cogn. Comput. 6(3), 71 (2022). https://doi.org/10.3390/bdcc6030071

Ali, W., Saleem, M., Yao, B., Hogan, A., Ngomo, A.-C.N.: A survey of RDF stores & SPARQL engines for querying knowledge graphs. VLDB J. 31, 1–26 (2021). https://doi.org/10.1007/s00778-021-00711-3CrossRef

Anusha, K., Usha Rani, K.: Performance evaluation of spark SQL for batch processing. In: Venkata Krishna, P., Obaidat, M.S. (eds.) Emerging Research in Data Engineering Systems and Computer Communications. AISC, vol. 1054, pp. 145–153. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0135-7_13CrossRef

Apache: Hadoop. https://hadoop.apache.org/. Accessed 17 Jan 2023

Apache: HBase. http://hbase.apache.org/. Accessed 17 Jan 2023

Apache: Spark. https://spark.apache.org/. Accessed 17 Jan 2023

Apache: Storm. https://storm.apache.org/. Accessed 17 Jan 2023

Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 119–130 (2005)

Behm, A., Behm, A., et al.: ASTERIX: towards a scalable, semistructured data platform for evolving-world models. Distrib. Parall. Databases 29(3), 185–216 (2011)CrossRef

10.

Celesti, A., et al.: Information management in IoT cloud-based tele-rehabilitation as a service for smart cities: Comparison of NoSQL approaches. Measurement 151, 107218 (2020). https://doi.org/10.1016/j.measurement.2019.107218CrossRef

11.

Chambers, C., et al.: Flumejava: easy, efficient data-parallel pipelines. ACM SIGPLAN Notices 45(6), 363–375 (2010)CrossRef

12.

Chawla, T., Singh, G., Pilli, E.S., Govil, M.: Storage, partitioning, indexing and retrieval in big RDF frameworks: a survey. Comput. Sci. Rev. 38, 100309 (2020). https://doi.org/10.1016/j.cosrev.2020.100309CrossRef

13.

Chen, Y., Özsu, M.T., Xiao, G., Tang, Z., Li, K.: GSmart: an efficient SPARQL query engine using sparse matrix algebra - full version. arXiv preprint arXiv:2106.14038 (2021)

14.

Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492

15.

Eyada, M.M., Saber, W., El Genidy, M.M., Amer, F.: Performance evaluation of IoT data management using MongoDB versus MySQL databases in different cloud environments. IEEE Access 8, 110656–110668 (2020). https://doi.org/10.1109/ACCESS.2020.3002164CrossRef

16.

Gupta, A., Jain, S.: Optimizing performance of real-time big data stateful streaming applications on cloud. In: 2022 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–4 (2022). https://doi.org/10.1109/BigComp54360.2022.00010

17.

Győrödi, C., Győrödi, R., Pecherle, G., Olah, A.: A comparative study: MongoDB vs. MySQL. In: 2015 13th International Conference on Engineering of Modern Electric Systems (EMES), pp. 1–6. IEEE (2015)

18.

Isard, M., Yu, Y.: Distributed data-parallel computing using a high-level programming language. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 987–994 (2009)

19.

Izenov, Y., Datta, A., Rusu, F., Shin, J.H.: COMPASS: Online sketch-based query optimization for in-memory databases. In: Proceedings of the 2021 International Conference on Management of Data, pp. 804–816 (2021)

20.

Karras, A., Karras, C., Samoladas, D., Giotopoulos, K.C., Sioutas, S.: Query optimization in NoSQL databases using an enhanced localized R-tree index. In: Pardede, E., Delir Haghighi, P., Khalil, I., Kotsis, G. (eds.) Information Integration and Web Intelligence, pp. 391–398. Springer Nature Switzerland, Cham (2022)CrossRef

21.

Li, Z.: Geospatial big data handling with high performance computing: current approaches and future directions. In: Tang, W., Wang, S. (eds.) High Performance Computing for Geospatial Applications. GE, vol. 23, pp. 53–76. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47998-5_4CrossRef

22.

Makris, A., Tserpes, K., Andronikou, V., Anagnostopoulos, D.: A classification of NoSQL data stores based on key design characteristics. Procedia Comput. Sci. 97, 94–103 (2016). https://doi.org/10.1016/j.procs.2016.08.284, 2nd International Conference on Cloud Forward: From Distributed to Complete Computing

23.

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., Kraska, T.: Bao: making learned query optimization practical. ACM SIGMOD Rec. 51(1), 6–13 (2022)CrossRef

24.

Marcus, R., et al.: Neo: a Learned Query Optimizer. Proc. VLDB Endow. 12(11), 1705–1718 (2019). https://doi.org/10.14778/3342263.3342644

25.

Markl, V., Lohman, G.M., Raman, V.: LEO: An autonomic query optimizer for DB2. IBM Syst. J. 42(1), 98–106 (2003)CrossRef

26.

Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proceed. VLDB Endow. 3(1–2), 330–339 (2010)CrossRef

27.

MongoDB Inc.: MongoDB. https://www.mongodb.com/. Accessed 24 Dec 2022

28.

Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1(3), 145–164 (2016). https://doi.org/10.1007/s41060-016-0027-9CrossRef

29.

Sellami, R., Defude, B.: Complex queries optimization and evaluation over relational and NoSQL data stores in cloud environments. IEEE Trans. Big Data 4(2), 217–230 (2017)

30.

Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

31.

Thusoo, A., et al.: Hive-a petabyte scale data warehouse using Hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 996–1005. IEEE (2010)

32.

Vaisman, A., Zimányi, E.: Recent Developments in Big Data Warehouses. In: Data Warehouse Systems. Data-Centric Systems and Applications, pp. 561–631. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-662-65167-4_15

33.

Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 13–24 (2013)

34.

Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 15–28 (2012)

Title: SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications
Authors: Aristeidis Karras
Christos Karras
Antonios Pervanas
Spyros Sioutas
Christos Zaroliagis
Publisher: Springer International Publishing
Book: Algorithmic Aspects of Cloud Computing
Print ISBN: 978-3-031-33436-8

Electronic ISBN: 978-3-031-33437-5

Copyright Year: 2023
DOI: https://doi.org/10.1007/978-3-031-33437-5_2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner