skip to main content
10.1145/3514221.3522565acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial
Open Access

HTAP Databases: What is New and What is Next

Published:11 June 2022Publication History

ABSTRACT

Processing the mixed workloads of transactions and analytical queries in a single database system can eliminate the ETL process and enable real-time data analysis on the transaction data. However, there is no free lunch. Such systems must balance the trade-off between workload isolation and data freshness due to interweaving workloads of OLTP and OLAP. Since Gartner coined the term, Hybrid Transactional/Analytical Processing (HTAP), we have witnessed the emergence of various database systems to support HTAP. One common feature is that they leverage the best of row store and column store to achieve high quality of HTAP. As they have disparate storage strategies and processing techniques to satisfy the requirements of various HTAP applications, it is essential to understand, compare, and evaluate their key techniques. In this tutorial, we offer a comprehensive survey of HTAP databases. We introduce a taxonomy of state-of-the-art HTAP databases according to their storage strategies and architectures. We then take a deep dive into their key techniques regarding transaction processing, analytical processing, data synchronization, query optimization, and resource scheduling. We also introduce existing HTAP benchmarks. Finally, we discuss the research challenges and open problems for HTAP.

Skip Supplemental Material Section

Supplemental Material

SIGMOD22-HTAP-Tutorial.m4v

m4v

561.3 MB

References

  1. Apache Cassandra. Apache Cassandra-- An Open Source NoSQL Distributed Database, 2021.Google ScholarGoogle Scholar
  2. Apache HBase. A Distributed, Scalable, Big Data Store., 2021.Google ScholarGoogle Scholar
  3. Apache Hive. A Data Warehouse using SQL., 2021.Google ScholarGoogle Scholar
  4. Apache Impala. An Open Source, Native Analytic Database for Hadoop., 2021.Google ScholarGoogle Scholar
  5. R. Appuswamy, M. Karpathiotakis, D. Porobic, and A. Ailamaki. The Case For Heterogeneous HTAP. In CIDR, 2017.Google ScholarGoogle Scholar
  6. J. Arulraj, A. Pavlo, and P. Menon. Bridging the Archipelago between Row-stores and Column-stores for Hybrid Workloads. In SIGMOD, pages 583--598, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Athanassoulis, K. S. Bøgh, and S. Idreos. Optimal Column Layout for Hybrid Workloads. Proceedings of the VLDB Endowment, 12(13):2393--2407, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Boncz, A.-C. Anatiotis, and S. Kläbe. JCC-H: Adding Join Crossing Correlations with Skew to TPC-H. In Technology Conference on Performance Evaluation and Benchmarking, pages 103--119. Springer, 2017.Google ScholarGoogle Scholar
  9. M. Bouzeghoub. A Framework for Analysis of Data Freshness. In International workshop on Information quality in information systems, pages 59--67, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Coelho, J. Paulo, R. Vilaça, J. Pereira, and R. Oliveira. HTAPBench: Hybrid Transactional and Analytical Processing Benchmark. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, pages 293--304, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno, R. Nambiar, T. Neumann, M. Poess, et al. The Mixed Workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, pages 1--6, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In SIGMOD, pages 1243--1254, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Dziedzic, J. Wang, S. Das, B. Ding, V. R. Narasayya, and M. Syamala. Columnstore and B+ tree-Are Hybrid Physical Designs Important? In SIGMOD, pages 177--190, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA Database--An Architecture Overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google ScholarGoogle Scholar
  15. D. Feinberg. Setting the Record Straight -- HTAP OPDBMS, 2018.Google ScholarGoogle Scholar
  16. B. Gallet and M. Gowanlock. Heterogeneous CPU-GPU epsilon grid joins: Static and dynamic work partitioning strategies. Data Sci. Eng., 6(1):39--62, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. K. Goel, J. Pound, N. Auch, P. Bumbulis, S. MacLean, F. Färber, F. Gropengiesser, C. Mathis, T. Bodner, and W. Lehner. Towards Scalable Real-Time Analytics: An Architecture for Scale-Out of OLxP Workloads. Proceedings of the VLDB Endowment, 8(12):1716--1727, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Huang, Q. Liu, Q. Cui, Z. Fang, X. Ma, F. Xu, L. Shen, L. Tang, Y. Zhou, M. Huang, et al. TiDB: A Raft-based HTAP Database. Proceedings of the VLDB Endowment, 13(12):3072--3084, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway, J. Kamp, T.-H. Lee, et al. Oracle Database In-Memory: A Dual Format In-Memory Database. In 2015 IEEE 31st International Conference on Data Engineering, pages 1253--1258. IEEE, 2015.Google ScholarGoogle Scholar
  20. P.-Å. Larson, A. Birka, E. N. Hanson, W. Huang, M. Nowakiewicz, and V. Papadimos. Real-Time Analytical Processing with SQL Server. VLDB, 8(12):1740--1751, 2015.Google ScholarGoogle Scholar
  21. J. Lee, S. Moon, K. H. Kim, D. H. Kim, S. K. Cha, andW.-S. Han. Parallel Replication across Formats in SAP HANA for Scaling Out Mixed OLTP/OLAP workloads. VLDB, 10(12):1598--1609, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Lee, M. Zhou, C. Li, S. Hu, J. Teng, D. Li, and X. Zhang. The Art of Balance: A RateupDB Experience of Building a CPU/GPU Hybrid Database Product. VLDB, 14(12):2999--3013, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? VLDB, 9(3):204--215, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Li, X. Zhou, and L. Cao. AI Meets Database: AI4DB and DB4AI. In SIGMOD, pages 2859--2866, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Li, X. Zhou, S. Li, and B. Gao. Qtune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. VLDB, 12(12):2118--2130, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Li, X. Zhou, J. Sun, X. Yu, Y. Han, L. Jin, W. Li, T. Wang, and S. Li. opengauss: An autonomous database system. Proc. VLDB Endow., 14(12):3028--3041, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Making Learned Query Optimization Practical. In SIGMOD, pages 1275--1288, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. C. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. VLDB, 12(11):1705--1718, 2019.Google ScholarGoogle Scholar
  29. MariaDB. Deploy an HTAP Server with MariaDB ColumnStore 5.5 and Community Server 10.6, 2021.Google ScholarGoogle Scholar
  30. A. McAfee, E. Brynjolfsson, T. H. Davenport, D. Patil, and D. Barton. Big Data: The Management Revolution. Harvard business review, 90(10):60--68, 2012.Google ScholarGoogle Scholar
  31. MySQL Heatwave. Real-time Analytics for MySQL Database Service, 2021.Google ScholarGoogle Scholar
  32. T. Neumann, T. Mühlbauer, and A. Kemper. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD, pages 677--689, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Oracle 21c. Automating Management of In-Memory Objects., 2021.Google ScholarGoogle Scholar
  34. F. Özcan, Y. Tian, and P. Tözün. Hybrid Transactional/Analytical Processing: A Survey. In SIGMOD, pages 1771--1775, 2017.Google ScholarGoogle Scholar
  35. M. Pezzini, D. Feinberg, N. Rayner, and R. Edjlali. Hybrid Transaction/Analytical Processing Will Foster Opportunities For Dramatic Business Innovation. Gartner (2014, January 28), pages 4--20, 2014.Google ScholarGoogle Scholar
  36. M. Pezzini, D. Feinberg, N. Rayner, and R. Edjlali. Real-time Insights and Decision Making using Hybrid Streaming, In-Memory Computing Analytics and Transaction Processing. 2016.Google ScholarGoogle Scholar
  37. PolarDB. PolarDB HTAP Real-Time Data Analysis Technology Decryption, 2021.Google ScholarGoogle Scholar
  38. I. Psaroudakis, F. Wolf, N. May, T. Neumann, A. Böhm, A. Ailamaki, and K.-U. Sattler. Scaling Up Mixed Workloads: A Battle of Data Freshness, Flexibility, and Scheduling. In Technology Conference on Performance Evaluation and Benchmarking, pages 97--112. Springer, 2014.Google ScholarGoogle Scholar
  39. V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU Acceleration: So Much More Than Just A Column Store. VLDB, 6(11):1080--1091, 2013.Google ScholarGoogle Scholar
  40. A. Raza, P. Chrysogelos, A. C. Anadiotis, and A. Ailamaki. Adaptive HTAP Through Elastic Resource Scheduling. In SIGMOD, pages 2043--2054, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Riegger, T. Vinçon, R. Gottstein, and I. Petrov. MV-PBT: Multi-Version Indexing for Large Datasets and HTAP Workloads. In EDBT, pages 217--228, 2020.Google ScholarGoogle Scholar
  42. S. Shen, R. Chen, H. Chen, and B. Zang. Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing. In OSDI, pages 219--238, 2021.Google ScholarGoogle Scholar
  43. V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, and C. Bornhövd. Efficient Transaction Processing in SAP HANA Database: The End of A Column Store Myth. In SIGMOD, pages 731--742, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. SingleStore. The Single Database for All Data-Intensive Applications, 2021.Google ScholarGoogle Scholar
  45. U. Sirin, S. Dwarkadas, and A. Ailamaki. Performance Characterization of HTAP Workloads. In ICDE, pages 1829--1834. IEEE, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  46. Y. Sun, G. E. Blelloch, W. S. Lim, and A. Pavlo. On Supporting Efficient Snapshot Isolation for Hybrid Workloads with Multi-Versioned Indexes. VLDB, 13(2), 2019.Google ScholarGoogle Scholar
  47. B. Tran, B. Schaffner, J. M. Myre, J. Sawin, and D. Chiu. Exploring means to enhance the efficiency of GPU bitmap index query processing. Data Sci. Eng., 6(2):209--228, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  48. Transaction Processing Performance Council. TPC-C, 2021.Google ScholarGoogle Scholar
  49. Transaction Processing Performance Council. TPC-H, 2021.Google ScholarGoogle Scholar
  50. X. Yu, G. Li, C. Chai, and N. Tang. Reinforcement Learning with Tree-LSTM for Join Order Selection. In ICDE, pages 1297--1308. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  51. H. Yuan, G. Li, L. Feng, J. Sun, and Y. Han. Automatic View Generation with Deep Learning and Reinforcement Learning. In ICDE, pages 1501--1512, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud, 10(10--10):95, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. C. Zhang, J. Lu, P. Xu, and Y. Chen. UniBench: A Benchmark for Multi-Model Database Management Systems. In TPCTC, volume 11135 of Lecture Notes in Computer Science, pages 7--23. Springer, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, et al. An End-To-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In SIGMOD, pages 415--432, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. X. Zhou, C. Chai, G. Li, and J. Sun. Database meets artificial intelligence: A survey. IEEE Trans. Knowl. Data Eng., 34(3):1096--1116, 2022.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. HTAP Databases: What is New and What is Next

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
        June 2022
        2597 pages
        ISBN:9781450392495
        DOI:10.1145/3514221

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • tutorial

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader