skip to main content
research-article

Hierarchical Residual Encoding for Multiresolution Time Series Compression

Published:30 May 2023Publication History
Skip Abstract Section

Abstract

Data compression is a key technique for reducing the cost of data transfer from storage to compute nodes. Increasingly, modern data scales necessitate lossy compression techniques, where exactness is sacrificed for a smaller compressed representation. One challenge in lossy compression is that different applications may have different accuracy demands. Today's compression techniques struggle in this setting either forcing the user to compress at the strictest accuracy demand, or to re-encode the data at multiple resolutions. This paper proposes a simple, but effective multiresolution compression algorithm for time series data, where a single encoding can effectively be decompressed at multiple output resolutions. There are a number of benefits over current state-of-the-art techniques for time series compression. (1) The storage footprint of this encoding is smaller than re-encoding the data at multiple resolutions. (2) Similarly, the compression latency is generally smaller than re-encoding at multiple resolutions. (3) Finally, the decompression latency of our encoding is significantly faster than single encodings at the strictest accuracy demand.

Skip Supplemental Material Section

Supplemental Material

PACMMOD-V1V1mod099.mp4

Presentation video for SIGMOD 2023

mp4

25.3 MB

References

  1. Cuneyt G Akcora, Yitao Li, Yulia R Gel, and Murat Kantarcioglu. 2021. BitcoinHeist: topological data analysis for ransomware prediction on the bitcoin blockchain. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 4439--4445.Google ScholarGoogle Scholar
  2. Sebastian Angel, Mihir Nanavati, and Siddhartha Sen. 2020. Disaggregation and the Application. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20).Google ScholarGoogle Scholar
  3. Gennady Antoshenkov. 1997. Dictionary-based order-preserving string compression. The VLDB Journal 6, 1 (1997), 26--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Davis Blalock, Samuel Madden, and John Guttag. 2018. Sprintz: Time series compression for the internet of things. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Francesco Buccafurri, Filippo Furfaro, Giuseppe M Mazzeo, and Domenico Saccà. 2011. A quad-tree based multiresolution approach for two-dimensional summary data. Information Systems 36, 7 (2011), 1082--1103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Burrows and D. J. Wheeler. 1994. A block-sorting lossless data compression algorithm. Technical Report.Google ScholarGoogle Scholar
  7. Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201--1220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. 2001. Approximate query processing using wavelets. The VLDB Journal 10, 2 (2001), 199--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kaushik Chakrabarti, Eamonn Keogh, Sharad Mehrotra, and Michael Pazzani. 2002. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans. Database Syst. 27, 2 (jun 2002), 188--228. https://doi.org/10.1145/568518.568520Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shubham Chandak, Kedar Tatwawadi, Chengtao Wen, Lingyun Wang, Juan Aparicio Ojea, and Tsachy Weissman. 2020. LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction. In 2020 Data Compression Conference (DCC). 342--351. https://doi.org/10.1109/DCC47342.2020.00042Google ScholarGoogle ScholarCross RefCross Ref
  11. Chris Chatfield. 1978. The Holt-winters forecasting procedure. Journal of the Royal Statistical Society: Series C (Applied Statistics) 27, 3 (1978), 264--279.Google ScholarGoogle ScholarCross RefCross Ref
  12. Graham Cormode, Minos Garofalakis, Peter J Haas, Chris Jermaine, et al. 2011. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4, 1--3 (2011), 1--294.Google ScholarGoogle Scholar
  13. Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 730--739.Google ScholarGoogle Scholar
  14. James Diffenderfer, Alyson L Fox, Jeffrey A Hittinger, Geoffrey Sanders, and Peter G Lindstrom. 2019. Error analysis of zfp compression for floating-point data. SIAM Journal on Scientific Computing 41, 3 (2019), A1867--A1898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  16. Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database--An Architecture Overview. IEEE Data Eng. Bull. 35, 1 (2012), 28--33.Google ScholarGoogle Scholar
  17. Johannes Fischer and Volker Heun. 2007. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In International Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. Springer, 459--470.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yihan Gao and Aditya Parameswaran. 2016. Squish: Near-optimal compression for archival of relational datasets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1575--1584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1917--1923.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ramón Huerta, Thiago Mosqueiro, Jordi Fonollosa, Nikolai Rulkov, and Irene Rodriguez-Lujan. 2016. Online Decorrelation of Humidity and Temperature in Chemical Sensors for Continuous Monitoring. Chemometrics and Intelligent Laboratory Systems 157 (07 2016). https://doi.org/10.1016/j.chemolab.2016.07.004Google ScholarGoogle ScholarCross RefCross Ref
  22. BR Hutchinson and GD Raithby. 1986. A multigrid method based on the additive correction strategy. Numerical Heat Transfer, Part A: Applications 9, 5 (1986), 511--537.Google ScholarGoogle ScholarCross RefCross Ref
  23. Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur Cetintemel. 2020. Deep-Squeeze: deep semantic compression for tabular data. In Proceedings of the 2020 ACM SIGMOD international conference on management of data. 1733--1746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings 2001 IEEE international conference on data mining. IEEE, 289--296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2004. Segmenting time series: A survey and novel approach. In Data mining in time series databases. World Scientific, 1--21.Google ScholarGoogle Scholar
  26. Eamonn J Keogh and Michael J Pazzani. 2000. Scaling up dynamic time warping for datamining applications. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 285--289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Risi Kondor, Nedelina Teneva, and Vikas Garg. 2014. Multiresolution matrix factorization. In International Conference on Machine Learning. PMLR, 1620--1628.Google ScholarGoogle Scholar
  28. David Krasowska, Julie Bessac, Robert Underwood, Jon C Calhoun, Sheng Di, and Franck Cappello. 2021. Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets. In 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7). IEEE, 47--53.Google ScholarGoogle ScholarCross RefCross Ref
  29. Iosif Lazaridis and Sharad Mehrotra. 2001. Progressive approximate aggregate queries with a multi-resolution tree structure. Acm sigmod record 30, 2 (2001), 401--412.Google ScholarGoogle Scholar
  30. Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 438--447.Google ScholarGoogle ScholarCross RefCross Ref
  31. Xi Liang, Stavros Sintos, Zechao Shang, and Sanjay Krishnan. 2021. Combining aggregation and sampling (nearly) optimally for approximate query processing. In Proceedings of the 2021 International Conference on Management of Data. 1129--1141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J. Elmore. 2021. Decomposed Bounded Floats for Fast Compression and Queries. Proc. VLDB Endow. 14, 11 (jul 2021), 2586--2598. https://doi.org/10.14778/3476249.3476305Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. John Paparrizos, Chunwei Liu, Bruno Barbarioli, Johnny Hwang, Ikraduya Edian, Aaron J Elmore, Michael J Franklin, and Sanjay Krishnan. 2021. VergeDB: A Database for IoT Analytics on Edge Devices. In CIDR.Google ScholarGoogle Scholar
  34. John Paparrizos, Chunwei Liu, Aaron J Elmore, and Michael J Franklin. 2020. Debunking four long-standing mis- conceptions of time-series distance measures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1887--1905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment 8, 12 (2015), 1816--1827.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, in-Memory Time Series Database. Proc. VLDB Endow. 8, 12 (aug 2015), 1816--1827. https://doi.org/10.14778/2824032.2824078Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Navneet Potti and Jignesh M Patel. 2015. Daq: a new paradigm for approximate query processing. Proceedings of the VLDB Endowment 8, 9 (2015), 898--909.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Powturbo. 2022. Turbo Range Coder. Retrieved March 23, 2023 from https://github.com/powturbo/Turbo-Range-CoderGoogle ScholarGoogle Scholar
  39. Julian Seward. 1996. bzip2 and libbzip2. avaliable at http://www. bzip. org (1996).Google ScholarGoogle Scholar
  40. John Shahid. 2019. InfluxDB documentation.Google ScholarGoogle Scholar
  41. Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing frontiers and innovations 1, 2 (2014), 76--88.Google ScholarGoogle Scholar
  42. Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart Devices Are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (Seoul, South Korea) (SenSys '15). Association for Computing Machinery, New York, NY, USA, 127--140. https://doi.org/10.1145/2809695.2809718Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, and Ce Zhang. 2019. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning. Proceedings of the VLDB Endowment 12, 7 (2019), 807--821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology 13, 7 (2003), 560--576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ian H. Witten, Radford M. Neal, and John G. Cleary. 1987. Arithmetic Coding for Data Compression. Commun. ACM 30, 6 (jun 1987), 520--540. https://doi.org/10.1145/214762.214771Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M Grulich, Ariane Ziehn, and Volker Mark. 2020. Nebulastream: Complex analytics beyond the cloud. Open Journal of Internet Of Things (OJIOT) 6, 1 (2020), 66--81.Google ScholarGoogle Scholar
  48. Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1643--1654.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Hierarchical Residual Encoding for Multiresolution Time Series Compression

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Management of Data
            Proceedings of the ACM on Management of Data  Volume 1, Issue 1
            PACMMOD
            May 2023
            2807 pages
            EISSN:2836-6573
            DOI:10.1145/3603164
            Issue’s Table of Contents

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 May 2023
            Published in pacmmod Volume 1, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader