ABSTRACT
Processing the mixed workloads of transactions and analytical queries in a single database system can eliminate the ETL process and enable real-time data analysis on the transaction data. However, there is no free lunch. Such systems must balance the trade-off between workload isolation and data freshness due to interweaving workloads of OLTP and OLAP. Since Gartner coined the term, Hybrid Transactional/Analytical Processing (HTAP), we have witnessed the emergence of various database systems to support HTAP. One common feature is that they leverage the best of row store and column store to achieve high quality of HTAP. As they have disparate storage strategies and processing techniques to satisfy the requirements of various HTAP applications, it is essential to understand, compare, and evaluate their key techniques. In this tutorial, we offer a comprehensive survey of HTAP databases. We introduce a taxonomy of state-of-the-art HTAP databases according to their storage strategies and architectures. We then take a deep dive into their key techniques regarding transaction processing, analytical processing, data synchronization, query optimization, and resource scheduling. We also introduce existing HTAP benchmarks. Finally, we discuss the research challenges and open problems for HTAP.
Supplemental Material
- Apache Cassandra. Apache Cassandra-- An Open Source NoSQL Distributed Database, 2021.Google Scholar
- Apache HBase. A Distributed, Scalable, Big Data Store., 2021.Google Scholar
- Apache Hive. A Data Warehouse using SQL., 2021.Google Scholar
- Apache Impala. An Open Source, Native Analytic Database for Hadoop., 2021.Google Scholar
- R. Appuswamy, M. Karpathiotakis, D. Porobic, and A. Ailamaki. The Case For Heterogeneous HTAP. In CIDR, 2017.Google Scholar
- J. Arulraj, A. Pavlo, and P. Menon. Bridging the Archipelago between Row-stores and Column-stores for Hybrid Workloads. In SIGMOD, pages 583--598, 2016.Google ScholarDigital Library
- M. Athanassoulis, K. S. Bøgh, and S. Idreos. Optimal Column Layout for Hybrid Workloads. Proceedings of the VLDB Endowment, 12(13):2393--2407, 2019.Google ScholarDigital Library
- P. Boncz, A.-C. Anatiotis, and S. Kläbe. JCC-H: Adding Join Crossing Correlations with Skew to TPC-H. In Technology Conference on Performance Evaluation and Benchmarking, pages 103--119. Springer, 2017.Google Scholar
- M. Bouzeghoub. A Framework for Analysis of Data Freshness. In International workshop on Information quality in information systems, pages 59--67, 2004.Google ScholarDigital Library
- F. Coelho, J. Paulo, R. Vilaça, J. Pereira, and R. Oliveira. HTAPBench: Hybrid Transactional and Analytical Processing Benchmark. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, pages 293--304, 2017.Google ScholarDigital Library
- R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno, R. Nambiar, T. Neumann, M. Poess, et al. The Mixed Workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, pages 1--6, 2011.Google ScholarDigital Library
- C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In SIGMOD, pages 1243--1254, 2013.Google ScholarDigital Library
- A. Dziedzic, J. Wang, S. Das, B. Ding, V. R. Narasayya, and M. Syamala. Columnstore and B+ tree-Are Hybrid Physical Designs Important? In SIGMOD, pages 177--190, 2018.Google ScholarDigital Library
- F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA Database--An Architecture Overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google Scholar
- D. Feinberg. Setting the Record Straight -- HTAP OPDBMS, 2018.Google Scholar
- B. Gallet and M. Gowanlock. Heterogeneous CPU-GPU epsilon grid joins: Static and dynamic work partitioning strategies. Data Sci. Eng., 6(1):39--62, 2021.Google ScholarCross Ref
- A. K. Goel, J. Pound, N. Auch, P. Bumbulis, S. MacLean, F. Färber, F. Gropengiesser, C. Mathis, T. Bodner, and W. Lehner. Towards Scalable Real-Time Analytics: An Architecture for Scale-Out of OLxP Workloads. Proceedings of the VLDB Endowment, 8(12):1716--1727, 2015.Google ScholarDigital Library
- D. Huang, Q. Liu, Q. Cui, Z. Fang, X. Ma, F. Xu, L. Shen, L. Tang, Y. Zhou, M. Huang, et al. TiDB: A Raft-based HTAP Database. Proceedings of the VLDB Endowment, 13(12):3072--3084, 2020.Google ScholarDigital Library
- T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway, J. Kamp, T.-H. Lee, et al. Oracle Database In-Memory: A Dual Format In-Memory Database. In 2015 IEEE 31st International Conference on Data Engineering, pages 1253--1258. IEEE, 2015.Google Scholar
- P.-Å. Larson, A. Birka, E. N. Hanson, W. Huang, M. Nowakiewicz, and V. Papadimos. Real-Time Analytical Processing with SQL Server. VLDB, 8(12):1740--1751, 2015.Google Scholar
- J. Lee, S. Moon, K. H. Kim, D. H. Kim, S. K. Cha, andW.-S. Han. Parallel Replication across Formats in SAP HANA for Scaling Out Mixed OLTP/OLAP workloads. VLDB, 10(12):1598--1609, 2017.Google ScholarDigital Library
- R. Lee, M. Zhou, C. Li, S. Hu, J. Teng, D. Li, and X. Zhang. The Art of Balance: A RateupDB Experience of Building a CPU/GPU Hybrid Database Product. VLDB, 14(12):2999--3013, 2021.Google ScholarDigital Library
- V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? VLDB, 9(3):204--215, 2015.Google ScholarDigital Library
- G. Li, X. Zhou, and L. Cao. AI Meets Database: AI4DB and DB4AI. In SIGMOD, pages 2859--2866, 2021.Google ScholarDigital Library
- G. Li, X. Zhou, S. Li, and B. Gao. Qtune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. VLDB, 12(12):2118--2130, 2019.Google ScholarDigital Library
- G. Li, X. Zhou, J. Sun, X. Yu, Y. Han, L. Jin, W. Li, T. Wang, and S. Li. opengauss: An autonomous database system. Proc. VLDB Endow., 14(12):3028--3041, 2021.Google ScholarDigital Library
- R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Making Learned Query Optimization Practical. In SIGMOD, pages 1275--1288, 2021.Google ScholarDigital Library
- R. C. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. VLDB, 12(11):1705--1718, 2019.Google Scholar
- MariaDB. Deploy an HTAP Server with MariaDB ColumnStore 5.5 and Community Server 10.6, 2021.Google Scholar
- A. McAfee, E. Brynjolfsson, T. H. Davenport, D. Patil, and D. Barton. Big Data: The Management Revolution. Harvard business review, 90(10):60--68, 2012.Google Scholar
- MySQL Heatwave. Real-time Analytics for MySQL Database Service, 2021.Google Scholar
- T. Neumann, T. Mühlbauer, and A. Kemper. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD, pages 677--689, 2015.Google ScholarDigital Library
- Oracle 21c. Automating Management of In-Memory Objects., 2021.Google Scholar
- F. Özcan, Y. Tian, and P. Tözün. Hybrid Transactional/Analytical Processing: A Survey. In SIGMOD, pages 1771--1775, 2017.Google Scholar
- M. Pezzini, D. Feinberg, N. Rayner, and R. Edjlali. Hybrid Transaction/Analytical Processing Will Foster Opportunities For Dramatic Business Innovation. Gartner (2014, January 28), pages 4--20, 2014.Google Scholar
- M. Pezzini, D. Feinberg, N. Rayner, and R. Edjlali. Real-time Insights and Decision Making using Hybrid Streaming, In-Memory Computing Analytics and Transaction Processing. 2016.Google Scholar
- PolarDB. PolarDB HTAP Real-Time Data Analysis Technology Decryption, 2021.Google Scholar
- I. Psaroudakis, F. Wolf, N. May, T. Neumann, A. Böhm, A. Ailamaki, and K.-U. Sattler. Scaling Up Mixed Workloads: A Battle of Data Freshness, Flexibility, and Scheduling. In Technology Conference on Performance Evaluation and Benchmarking, pages 97--112. Springer, 2014.Google Scholar
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU Acceleration: So Much More Than Just A Column Store. VLDB, 6(11):1080--1091, 2013.Google Scholar
- A. Raza, P. Chrysogelos, A. C. Anadiotis, and A. Ailamaki. Adaptive HTAP Through Elastic Resource Scheduling. In SIGMOD, pages 2043--2054, 2020.Google ScholarDigital Library
- C. Riegger, T. Vinçon, R. Gottstein, and I. Petrov. MV-PBT: Multi-Version Indexing for Large Datasets and HTAP Workloads. In EDBT, pages 217--228, 2020.Google Scholar
- S. Shen, R. Chen, H. Chen, and B. Zang. Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing. In OSDI, pages 219--238, 2021.Google Scholar
- V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, and C. Bornhövd. Efficient Transaction Processing in SAP HANA Database: The End of A Column Store Myth. In SIGMOD, pages 731--742, 2012.Google ScholarDigital Library
- SingleStore. The Single Database for All Data-Intensive Applications, 2021.Google Scholar
- U. Sirin, S. Dwarkadas, and A. Ailamaki. Performance Characterization of HTAP Workloads. In ICDE, pages 1829--1834. IEEE, 2021.Google ScholarCross Ref
- Y. Sun, G. E. Blelloch, W. S. Lim, and A. Pavlo. On Supporting Efficient Snapshot Isolation for Hybrid Workloads with Multi-Versioned Indexes. VLDB, 13(2), 2019.Google Scholar
- B. Tran, B. Schaffner, J. M. Myre, J. Sawin, and D. Chiu. Exploring means to enhance the efficiency of GPU bitmap index query processing. Data Sci. Eng., 6(2):209--228, 2021.Google ScholarCross Ref
- Transaction Processing Performance Council. TPC-C, 2021.Google Scholar
- Transaction Processing Performance Council. TPC-H, 2021.Google Scholar
- X. Yu, G. Li, C. Chai, and N. Tang. Reinforcement Learning with Tree-LSTM for Join Order Selection. In ICDE, pages 1297--1308. IEEE, 2020.Google ScholarCross Ref
- H. Yuan, G. Li, L. Feng, J. Sun, and Y. Han. Automatic View Generation with Deep Learning and Reinforcement Learning. In ICDE, pages 1501--1512, 2020.Google ScholarCross Ref
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud, 10(10--10):95, 2010.Google ScholarDigital Library
- C. Zhang, J. Lu, P. Xu, and Y. Chen. UniBench: A Benchmark for Multi-Model Database Management Systems. In TPCTC, volume 11135 of Lecture Notes in Computer Science, pages 7--23. Springer, 2018.Google ScholarDigital Library
- J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, et al. An End-To-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In SIGMOD, pages 415--432, 2019.Google ScholarDigital Library
- X. Zhou, C. Chai, G. Li, and J. Sun. Database meets artificial intelligence: A survey. IEEE Trans. Knowl. Data Eng., 34(3):1096--1116, 2022.Google ScholarCross Ref
Index Terms
- HTAP Databases: What is New and What is Next
Recommendations
OLE DB: A Component DBMS Architecture
ICDE '96: Proceedings of the Twelfth International Conference on Data EngineeringThe article describes an effort at Microsoft whose primary goal is to enable applications to have uniform access to data stored in diverse DBMS and non DBMS information containers. Applications continue to take advantage of the benefits of database ...
Efficient Execution of Read-Only Transactions in Replicated Multiversion Databases
Multiple versions of data are used in database systems to increase concurrency. The higher concurrency results since read-only transactions can be executed without any concurrency control overhead and, therefore, read-only transactions do not interfere ...
A concurrency control protocol for read-only transactions in real-time secure database systems
RTCSA '00: Proceedings of the Seventh International Conference on Real-Time Systems and ApplicationsA read-only transaction (ROT) or a query is a transaction that only reads data items, without modifying them. When we use a protocol that takes care of ROTs distinctively from update transactions, the number of conflicts between ROTs and update ...
Comments