Skip to main content
Top
Published in: World Wide Web 5/2023

12-06-2023

Coral: federated query join order optimization based on deep reinforcement learning

Authors: Rong Gu, Yi Zhang, Liangliang Yin, Lingyi Song, Wenjie Huang, Chunfeng Yuan, Zhaokang Wang, Guanghui Zhu, Yihua Huang

Published in: World Wide Web | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The rise of diversified data engines has created the need for federated queries. A federated query can take a query and provide data analysis based on data from various data engines. Since the query data originates from multiple data engines, federated queries usually rely on join operation and data migration to complete the query and take a long time. The challenges of optimizing federated queries lie on join order selection and data migration coordination. However, enumerating all join orders is impractical because the set of join orders grows exponentially with the number of relations to be joined. To improve the performance of federated queries, we present a deep reinforcement learning-based approach on optimizing join order and join engine selection for federated queries and design an deep Q-network-based (DQN-based) optimizer. The DQN-based optimizer can generate join search policies that optimize the join order selection for datasets with a given cost model. Based on the DQN-based optimizer, we implement a federated query system Coral which can provide optimization for join order selection of federated queries. With the optimized join order, Coral can transform a federated query into a set of subqueries which will be assigned to and executed on different data engines. We also propose a subquery cache optimization to optimize data migration during the query execution. The extensive experimental evaluation demonstrates that Coral can significantly reduce the query latency of federated queries and achieve a speedup of up to 5.03\(\times \) compared to the cutting-edge federated query systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
Literature
3.
go back to reference Shamgunov, N.: The MemsQL in-memory database system. In: Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics (IMDM ’14), p. 106 (2014) Shamgunov, N.: The MemsQL in-memory database system. In: Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics (IMDM ’14), p. 106 (2014)
6.
go back to reference Xu, L., Cole, R.L., Ting, D.: Learning to optimize federated queries. In: Proceedings of the 2nd ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD’19), pp. 1–7 (2019) Xu, L., Cole, R.L., Ting, D.: Learning to optimize federated queries. In: Proceedings of the 2nd ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD’19), pp. 1–7 (2019)
7.
go back to reference Giannakouris, V., Papailiou, N., Tsoumakos, D., Koziris, N.: MuSQLE: Distributed SQL query execution over multiple engine environments. In: Proceedings of the 4th IEEE International Conference on Big Data (BigData ’16), pp. 452–461 (2016) Giannakouris, V., Papailiou, N., Tsoumakos, D., Koziris, N.: MuSQLE: Distributed SQL query execution over multiple engine environments. In: Proceedings of the 4th IEEE International Conference on Big Data (BigData ’16), pp. 452–461 (2016)
8.
go back to reference Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.B.: The BigDAWG polystore system. ACM SIGMOD Record 44(2), 11–16 (2015)CrossRef Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.B.: The BigDAWG polystore system. ACM SIGMOD Record 44(2), 11–16 (2015)CrossRef
9.
go back to reference LeFevre, J., Sankaranarayanan, J., Hacigümüs, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the 33rd ACM International Conference on Management of Data (SIGMOD ’14), pp. 1591–1602 (2014) LeFevre, J., Sankaranarayanan, J., Hacigümüs, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the 33rd ACM International Conference on Management of Data (SIGMOD ’14), pp. 1591–1602 (2014)
10.
go back to reference Vogt, M., Stiemer, A., Schuldt, H.: ICARUS: Towards a multistore database system. In: Proceedings of the 5th IEEE International Conference on Big Data (BigData ’17), pp. 2490–2499 (2017) Vogt, M., Stiemer, A., Schuldt, H.: ICARUS: Towards a multistore database system. In: Proceedings of the 5th IEEE International Conference on Big Data (BigData ’17), pp. 2490–2499 (2017)
11.
go back to reference Ying. Research and implementation on cross-platform unified big data SQL query system. Master’s thesis, Nanjing University (2019) Ying. Research and implementation on cross-platform unified big data SQL query system. Master’s thesis, Nanjing University (2019)
12.
go back to reference Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of the 37th ACM International Conference on Management of Data (SIGMOD ’18), pp. 221–230 (2018) Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of the 37th ACM International Conference on Management of Data (SIGMOD ’18), pp. 221–230 (2018)
16.
go back to reference Kostas, T., Sellis, T., Jensen, C.S.: A reinforcement learning approach for adaptive query processing. Technical Report (2008) Kostas, T., Sellis, T., Jensen, C.S.: A reinforcement learning approach for adaptive query processing. Technical Report (2008)
17.
go back to reference Marcus, R., Papaemmanouil, O.: Deep reinforcement learning for join order enumeration. In: Proceedings of the 1st ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD ’18), pp. 3:1–3:4 (2018) Marcus, R., Papaemmanouil, O.: Deep reinforcement learning for join order enumeration. In: Proceedings of the 1st ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD ’18), pp. 3:1–3:4 (2018)
18.
go back to reference Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J.M., Stoica, I.: Learning to optimize join queries with deep reinforcement learning. arXiv:1808.03196 (2018) Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J.M., Stoica, I.: Learning to optimize join queries with deep reinforcement learning. arXiv:​1808.​03196 (2018)
19.
go back to reference Shi, H., Liu, S., Wu, H., Li, R., Liu, S., Kwok, N., Peng, Y.: Oscillatory particle swarm optimizer. Appl. Soft Comput. 73, 316–327 (2018)CrossRef Shi, H., Liu, S., Wu, H., Li, R., Liu, S., Kwok, N., Peng, Y.: Oscillatory particle swarm optimizer. Appl. Soft Comput. 73, 316–327 (2018)CrossRef
20.
go back to reference Ying, C., Ying, C., Ban, C.: A performance optimization strategy based on degree of parallelism and allocation fitness. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–8 (2018)CrossRef Ying, C., Ying, C., Ban, C.: A performance optimization strategy based on degree of parallelism and allocation fitness. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–8 (2018)CrossRef
21.
go back to reference Yan, W., Li, G., Wu, Z., Wang, S., Yu, P.S.: Extracting diverse-shapelets for early classification on time series. World Wide Web 23(6), 3055–3081 (2020)CrossRef Yan, W., Li, G., Wu, Z., Wang, S., Yu, P.S.: Extracting diverse-shapelets for early classification on time series. World Wide Web 23(6), 3055–3081 (2020)CrossRef
22.
go back to reference Wu, Z., Cao, Z., Wang, Y.: Multimedia selection operation placement. Multimed. Tools Appl. 54(1), 69–96 (2011)CrossRef Wu, Z., Cao, Z., Wang, Y.: Multimedia selection operation placement. Multimed. Tools Appl. 54(1), 69–96 (2011)CrossRef
23.
go back to reference Wu, Z., Shen, S., Zhou, H., Li, H., Lu, Z., Zou, D.: An effective approach for the protection of user commodity viewing privacy in e-commerce website. Knowl.-Based Syst. 220, 106952 (2021)CrossRef Wu, Z., Shen, S., Zhou, H., Li, H., Lu, Z., Zou, D.: An effective approach for the protection of user commodity viewing privacy in e-commerce website. Knowl.-Based Syst. 220, 106952 (2021)CrossRef
24.
go back to reference Wu, Z., Li, G., Shen, S., Lian, X., Chen, E., Xu, G.: Constructing dummy query sequences to protect location privacy and query privacy in location-based services. World Wide Web 24(1), 25–49 (2021)CrossRef Wu, Z., Li, G., Shen, S., Lian, X., Chen, E., Xu, G.: Constructing dummy query sequences to protect location privacy and query privacy in location-based services. World Wide Web 24(1), 25–49 (2021)CrossRef
25.
go back to reference Wu, Z., Shen, S., Lian, X., Su, X., Chen, E.: A dummy-based user privacy protection approach for text information retrieval. Knowl.-Based Syst. 195, 105679 (2020)CrossRef Wu, Z., Shen, S., Lian, X., Su, X., Chen, E.: A dummy-based user privacy protection approach for text information retrieval. Knowl.-Based Syst. 195, 105679 (2020)CrossRef
26.
go back to reference Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-LSTM for join order selection. In: Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE ’20), pp. 1297–1308 (2020) Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-LSTM for join order selection. In: Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE ’20), pp. 1297–1308 (2020)
27.
go back to reference Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
28.
go back to reference Graefe, G.: Rule-Based Query Optimization in Extensible Database Systems. PhD thesis, University of Wisconsin-Madison (1987) Graefe, G.: Rule-Based Query Optimization in Extensible Database Systems. PhD thesis, University of Wisconsin-Madison (1987)
29.
go back to reference Goetz, G.: The cascades framework for query optimization. IEEE Database Eng. Bull. 18(3), 19–29 (1995) Goetz, G.: The cascades framework for query optimization. IEEE Database Eng. Bull. 18(3), 19–29 (1995)
30.
go back to reference Goldstein, J., Larson, P.Å.: Optimizing queries using materialized views: A practical, scalable solution. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD ’01), pp. 331–342 (2001) Goldstein, J., Larson, P.Å.: Optimizing queries using materialized views: A practical, scalable solution. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD ’01), pp. 331–342 (2001)
32.
Metadata
Title
Coral: federated query join order optimization based on deep reinforcement learning
Authors
Rong Gu
Yi Zhang
Liangliang Yin
Lingyi Song
Wenjie Huang
Chunfeng Yuan
Zhaokang Wang
Guanghui Zhu
Yihua Huang
Publication date
12-06-2023
Publisher
Springer US
Published in
World Wide Web / Issue 5/2023
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-023-01156-0

Other articles of this Issue 5/2023

World Wide Web 5/2023 Go to the issue

Premium Partner