skip to main content
10.1145/3035918.3054784acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Hybrid Transactional/Analytical Processing: A Survey

Published:09 May 2017Publication History

ABSTRACT

The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of transactions. Efficient processing of individual transactional and analytical requests, however, leads to different optimizations and architectural decisions while building a data management system.

For the kind of data processing that requires both analytics and transactions, Gartner recently coined the term Hybrid Transactional/Analytical Processing (HTAP). Many HTAP solutions are emerging both from the industry as well as academia that target these new applications. While some of these are single system solutions, others are a looser coupling of OLTP databases or NoSQL systems with analytical big data platforms, like Spark. The goal of this tutorial is to 1-) quickly review the historical progression of OLTP and OLAP systems, 2-) discuss the driving factors for HTAP, and finally 3-) provide a deep technical analysis of existing and emerging HTAP solutions, detailing their key architectural differences and trade-offs.

References

  1. Apache Parquet. https://parquet.apache.org/.Google ScholarGoogle Scholar
  2. R. Appuswarmy, M. Karpathiotakis, D. Porobic, and A. Ailamaki. The Case For Heterogeneous HTAP. In CIDR, 2017.Google ScholarGoogle Scholar
  3. M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. In SIGMOD, pages 1383--1394, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Arulraj, A. Pavlo, and P. Menon. Bridging the Archipelago Between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD, pages 583--598, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Barber, C. Garcia-Arellano, R. Grosman, R. Mueller, V. Raman, R. Sidle, M. Spilchen, A. Storm, Y. Tian, P. Tözün, D. Zilio, M. Huras, G. Lohman, C. Mohan, F. Özcan, and H. Pirahesh. Evolving Databases for New-Gen Big Data Applications. In CIDR, 2017.Google ScholarGoogle Scholar
  6. A. Boehm, J. Dittrich, N. Mukherjee, I. Pandis, and R. Sen. Operational analytics data management systems. PVLDB, 9:1601--1604, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, 2005.Google ScholarGoogle Scholar
  8. Apache Cassandra. http://cassandra.apache.org.Google ScholarGoogle Scholar
  9. A. Costea, A. Ionescu, B. Răaducanu, M. Switakowski, C. Bârca, J. Sompolski, A. Luszczak, M. Szafrański, G. de Nijs, and P. Boncz. Vectorh: Taking sql-on-hadoop to the next level. In SIGMOD '16, pages 1105--1117, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Danial Abadi and Shivnath Babu and Fatma Özcan and Ippokratis Pandis. Tutorial: SQL-on-Hadoop Systems. PVLDB, 8, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM dashDB. http://www.ibm.com/analytics/us/en/technology/cloud-data-services/dashdb.Google ScholarGoogle Scholar
  12. DataStax Spark Cassandra Connector. https://github.com/datastax/spark-cassandra-connector.Google ScholarGoogle Scholar
  13. C. Diaconu, C. Freedman, E. Ismert, P.-Å. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's memory-optimized OLTP engine. In SIGMOD, pages 1243--1254, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA Database -- An Architecture Overview. IEEE DEBull, 35(1):28--33, 2012.Google ScholarGoogle Scholar
  15. S. Gray, F. Özcan, H. Pereyra, B. van der Linden, and A. Zubiri. IBM Big SQL 3.0: SQL-on-Hadoop without compromise. http://public.dhe.ibm.com/common/ssi/ecm/en/sww14019usen/SWW14019USEN.PDF, 2014.Google ScholarGoogle Scholar
  16. SAP HANA Vora. http://go.sap.com/product/data-mgmt/hana-vora-hadoop.html.Google ScholarGoogle Scholar
  17. Apache HBase. https://hbase.apache.org/.Google ScholarGoogle Scholar
  18. Hive Transactions. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/hive-013-feature-transactions.html.Google ScholarGoogle Scholar
  19. A. Kemper and T. Neumann. HyPer -- A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE, pages 195--206, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for Hadoop. In CIDR, 2015.Google ScholarGoogle Scholar
  21. Apache Kudu. https://kudu.apache.org/.Google ScholarGoogle Scholar
  22. T. Lahiri, M.-A. Neimat, and S. Folkman. Oracle TimesTen: An In-Memory Database for Enterprise Applications. IEEE DEBull, 36(3):6--13, 2013.Google ScholarGoogle Scholar
  23. A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The Vertica Analytic Database: C-store 7 Years Later. PVLDB, 5(12):1790--1801, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. MemSQL. http://www.memsql.com/.Google ScholarGoogle Scholar
  25. C. Mohan. History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla. In EDBT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics. In CIDR, 2017.Google ScholarGoogle Scholar
  27. Apache ORC. https://orc.apache.org/.Google ScholarGoogle Scholar
  28. A. Pavlo, J. Arulraj, L. Ma, P. Menon, T. C. Mowry, M. Perron, A. Tomasic, D. V. Aken, Z. Wang, and T. Zhang. Self-Driving Database Management Systems. In CIDR, 2017.Google ScholarGoogle Scholar
  29. Apache Phoenix. http://phoenix.apache.org.Google ScholarGoogle Scholar
  30. V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, 6:1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. RocksDB. http://rocksdb.org/.Google ScholarGoogle Scholar
  32. Roshan Sumbaly and others. Serving large-scale batch computed data with project Voldemort. In Proc. of the 10th USENIX conference on File and Storage Technologies, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Splice Machine. http://www.splicemachine.com/.Google ScholarGoogle Scholar
  34. M. Stonebraker and U. Cetintemel. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, pages 2--11, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Stonebraker and A. Weisberg. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google ScholarGoogle Scholar
  36. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - A Petabyte Scale Data Warehouse Using Hadoop. In ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  37. S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy Transactions in Multicore In-memory Databases. In SOSP, pages 18--32, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Zhang. Spark-on-HBase: Dataframe Based HBase Connector. http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector.Google ScholarGoogle Scholar

Index Terms

  1. Hybrid Transactional/Analytical Processing: A Survey

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
        May 2017
        1810 pages
        ISBN:9781450341974
        DOI:10.1145/3035918

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 May 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader