research-article

Hybrid Transactional/Analytical Processing: A Survey

Authors:
Fatma Özcan

IBM Research - Almaden, San Jose, CA, USA

IBM Research - Almaden, San Jose, CA, USA
View Profile

,
Yuanyuan Tian

IBM Research - Almaden, San Jose, CA, USA

IBM Research - Almaden, San Jose, CA, USA
View Profile

,
Pinar Tözün

IBM Research - Almaden, San Jose, CA, USA

IBM Research - Almaden, San Jose, CA, USA
View Profile

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataMay 2017Pages 1771–1775https://doi.org/10.1145/3035918.3054784

Published:09 May 2017Publication History

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 1771–1775

ABSTRACT

The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of transactions. Efficient processing of individual transactional and analytical requests, however, leads to different optimizations and architectural decisions while building a data management system.

For the kind of data processing that requires both analytics and transactions, Gartner recently coined the term Hybrid Transactional/Analytical Processing (HTAP). Many HTAP solutions are emerging both from the industry as well as academia that target these new applications. While some of these are single system solutions, others are a looser coupling of OLTP databases or NoSQL systems with analytical big data platforms, like Spark. The goal of this tutorial is to 1-) quickly review the historical progression of OLTP and OLAP systems, 2-) discuss the driving factors for HTAP, and finally 3-) provide a deep technical analysis of existing and emerging HTAP solutions, detailing their key architectural differences and trade-offs.

References

Apache Parquet. https://parquet.apache.org/.Google Scholar
R. Appuswarmy, M. Karpathiotakis, D. Porobic, and A. Ailamaki. The Case For Heterogeneous HTAP. In CIDR, 2017.Google Scholar
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. In SIGMOD, pages 1383--1394, 2015. Google ScholarDigital Library
J. Arulraj, A. Pavlo, and P. Menon. Bridging the Archipelago Between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD, pages 583--598, 2016. Google ScholarDigital Library
R. Barber, C. Garcia-Arellano, R. Grosman, R. Mueller, V. Raman, R. Sidle, M. Spilchen, A. Storm, Y. Tian, P. Tözün, D. Zilio, M. Huras, G. Lohman, C. Mohan, F. Özcan, and H. Pirahesh. Evolving Databases for New-Gen Big Data Applications. In CIDR, 2017.Google Scholar
A. Boehm, J. Dittrich, N. Mukherjee, I. Pandis, and R. Sen. Operational analytics data management systems. PVLDB, 9:1601--1604, 2016. Google ScholarDigital Library
P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, 2005.Google Scholar
Apache Cassandra. http://cassandra.apache.org.Google Scholar
A. Costea, A. Ionescu, B. Răaducanu, M. Switakowski, C. Bârca, J. Sompolski, A. Luszczak, M. Szafrański, G. de Nijs, and P. Boncz. Vectorh: Taking sql-on-hadoop to the next level. In SIGMOD '16, pages 1105--1117, 2016. Google ScholarDigital Library
Danial Abadi and Shivnath Babu and Fatma Özcan and Ippokratis Pandis. Tutorial: SQL-on-Hadoop Systems. PVLDB, 8, 2015. Google ScholarDigital Library
IBM dashDB. http://www.ibm.com/analytics/us/en/technology/cloud-data-services/dashdb.Google Scholar
DataStax Spark Cassandra Connector. https://github.com/datastax/spark-cassandra-connector.Google Scholar
C. Diaconu, C. Freedman, E. Ismert, P.-Å. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's memory-optimized OLTP engine. In SIGMOD, pages 1243--1254, 2013. Google ScholarDigital Library
F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA Database -- An Architecture Overview. IEEE DEBull, 35(1):28--33, 2012.Google Scholar
S. Gray, F. Özcan, H. Pereyra, B. van der Linden, and A. Zubiri. IBM Big SQL 3.0: SQL-on-Hadoop without compromise. http://public.dhe.ibm.com/common/ssi/ecm/en/sww14019usen/SWW14019USEN.PDF, 2014.Google Scholar
SAP HANA Vora. http://go.sap.com/product/data-mgmt/hana-vora-hadoop.html.Google Scholar
Apache HBase. https://hbase.apache.org/.Google Scholar
Hive Transactions. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/hive-013-feature-transactions.html.Google Scholar
A. Kemper and T. Neumann. HyPer -- A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE, pages 195--206, 2011. Google ScholarDigital Library
M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for Hadoop. In CIDR, 2015.Google Scholar
Apache Kudu. https://kudu.apache.org/.Google Scholar
T. Lahiri, M.-A. Neimat, and S. Folkman. Oracle TimesTen: An In-Memory Database for Enterprise Applications. IEEE DEBull, 36(3):6--13, 2013.Google Scholar
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The Vertica Analytic Database: C-store 7 Years Later. PVLDB, 5(12):1790--1801, 2012. Google ScholarDigital Library
MemSQL. http://www.memsql.com/.Google Scholar
C. Mohan. History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla. In EDBT, 2013. Google ScholarDigital Library
B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics. In CIDR, 2017.Google Scholar
Apache ORC. https://orc.apache.org/.Google Scholar
A. Pavlo, J. Arulraj, L. Ma, P. Menon, T. C. Mowry, M. Perron, A. Tomasic, D. V. Aken, Z. Wang, and T. Zhang. Self-Driving Database Management Systems. In CIDR, 2017.Google Scholar
Apache Phoenix. http://phoenix.apache.org.Google Scholar
V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, 6:1080--1091, 2013. Google ScholarDigital Library
RocksDB. http://rocksdb.org/.Google Scholar
Roshan Sumbaly and others. Serving large-scale batch computed data with project Voldemort. In Proc. of the 10th USENIX conference on File and Storage Technologies, 2012. Google ScholarDigital Library
Splice Machine. http://www.splicemachine.com/.Google Scholar
M. Stonebraker and U. Cetintemel. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, pages 2--11, 2005. Google ScholarDigital Library
M. Stonebraker and A. Weisberg. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google Scholar
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - A Petabyte Scale Data Warehouse Using Hadoop. In ICDE, 2010.Google ScholarCross Ref
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy Transactions in Multicore In-memory Databases. In SOSP, pages 18--32, 2013. Google ScholarDigital Library
Z. Zhang. Spark-on-HBase: Dataframe Based HBase Connector. http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector.Google Scholar

Recommendations

HTAP Databases: What is New and What is Next
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Processing the mixed workloads of transactions and analytical queries in a single database system can eliminate the ETL process and enable real-time data analysis on the transaction data. However, there is no free lunch. Such systems must balance the ...
Read More
TiDB: a Raft-based HTAP database

Hybrid Transactional and Analytical Processing (HTAP) databases require processing transactional and analytical queries in isolation to remove the interference between them. To achieve this, it is necessary to maintain different replicas of data ...
Read More
Adaptive HTAP through Elastic Resource Scheduling
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Modern Hybrid Transactional/Analytical Processing (HTAP) systems use an integrated data processing engine that performs analytics on fresh data, which are ingested from a transactional engine. HTAP systems typically consider data freshness at design ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
analytics
htap
hybrid transaction and analytics processing
olap
oltp
transactions
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 1,489
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hybrid Transactional/Analytical Processing: A Survey

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

ABSTRACT

References

Cited By

Recommendations

HTAP Databases: What is New and What is Next

TiDB: a Raft-based HTAP database

Adaptive HTAP through Elastic Resource Scheduling