research-article

Hierarchical Residual Encoding for Multiresolution Time Series Compression

Authors:
Bruno Barbarioli

The University of Chicago, Chicago, IL, USA

The University of Chicago, Chicago, IL, USA

0000-0001-7438-5634
View Profile

,
Gabriel Mersy

The University of Chicago, Chicago, IL, USA

The University of Chicago, Chicago, IL, USA

0000-0002-4302-0525
View Profile

,
Stavros Sintos

University of Illinois Chicago, Chicago, IL, USA

University of Illinois Chicago, Chicago, IL, USA

0000-0002-2114-8886
View Profile

,
Sanjay Krishnan

The University of Chicago, Chicago, IL, USA

The University of Chicago, Chicago, IL, USA

0000-0001-6968-4090
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 1Article No.: 99pp 1–26https://doi.org/10.1145/3588953

Published:30 May 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Data compression is a key technique for reducing the cost of data transfer from storage to compute nodes. Increasingly, modern data scales necessitate lossy compression techniques, where exactness is sacrificed for a smaller compressed representation. One challenge in lossy compression is that different applications may have different accuracy demands. Today's compression techniques struggle in this setting either forcing the user to compress at the strictest accuracy demand, or to re-encode the data at multiple resolutions. This paper proposes a simple, but effective multiresolution compression algorithm for time series data, where a single encoding can effectively be decompressed at multiple output resolutions. There are a number of benefits over current state-of-the-art techniques for time series compression. (1) The storage footprint of this encoding is smaller than re-encoding the data at multiple resolutions. (2) Similarly, the compression latency is generally smaller than re-encoding at multiple resolutions. (3) Finally, the decompression latency of our encoding is significantly faster than single encodings at the strictest accuracy demand.

Supplemental Material

PACMMOD-V1V1mod099.mp4

Presentation video for SIGMOD 2023

mp4

25.3 MB

Download

References

Cuneyt G Akcora, Yitao Li, Yulia R Gel, and Murat Kantarcioglu. 2021. BitcoinHeist: topological data analysis for ransomware prediction on the bitcoin blockchain. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 4439--4445.Google Scholar
Sebastian Angel, Mihir Nanavati, and Siddhartha Sen. 2020. Disaggregation and the Application. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20).Google Scholar
Gennady Antoshenkov. 1997. Dictionary-based order-preserving string compression. The VLDB Journal 6, 1 (1997), 26--39.Google ScholarDigital Library
Davis Blalock, Samuel Madden, and John Guttag. 2018. Sprintz: Time series compression for the internet of things. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--23.Google ScholarDigital Library
Francesco Buccafurri, Filippo Furfaro, Giuseppe M Mazzeo, and Domenico Saccà. 2011. A quad-tree based multiresolution approach for two-dimensional summary data. Information Systems 36, 7 (2011), 1082--1103.Google ScholarDigital Library
M. Burrows and D. J. Wheeler. 1994. A block-sorting lossless data compression algorithm. Technical Report.Google Scholar
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201--1220.Google ScholarDigital Library
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. 2001. Approximate query processing using wavelets. The VLDB Journal 10, 2 (2001), 199--223.Google ScholarDigital Library
Kaushik Chakrabarti, Eamonn Keogh, Sharad Mehrotra, and Michael Pazzani. 2002. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans. Database Syst. 27, 2 (jun 2002), 188--228. https://doi.org/10.1145/568518.568520Google ScholarDigital Library
Shubham Chandak, Kedar Tatwawadi, Chengtao Wen, Lingyun Wang, Juan Aparicio Ojea, and Tsachy Weissman. 2020. LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction. In 2020 Data Compression Conference (DCC). 342--351. https://doi.org/10.1109/DCC47342.2020.00042Google ScholarCross Ref
Chris Chatfield. 1978. The Holt-winters forecasting procedure. Journal of the Royal Statistical Society: Series C (Applied Statistics) 27, 3 (1978), 264--279.Google ScholarCross Ref
Graham Cormode, Minos Garofalakis, Peter J Haas, Chris Jermaine, et al. 2011. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4, 1--3 (2011), 1--294.Google Scholar
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 730--739.Google Scholar
James Diffenderfer, Alyson L Fox, Jeffrey A Hittinger, Geoffrey Sanders, and Peter G Lindstrom. 2019. Error analysis of zfp compression for floating-point data. SIAM Journal on Scientific Computing 41, 3 (2019), A1867--A1898.Google ScholarDigital Library
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database--An Architecture Overview. IEEE Data Eng. Bull. 35, 1 (2012), 28--33.Google Scholar
Johannes Fischer and Volker Heun. 2007. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In International Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. Springer, 459--470.Google ScholarCross Ref
Yihan Gao and Aditya Parameswaran. 2016. Squish: Near-optimal compression for archival of relational datasets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1575--1584.Google ScholarDigital Library
Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1917--1923.Google ScholarDigital Library
Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.Google ScholarDigital Library
Ramón Huerta, Thiago Mosqueiro, Jordi Fonollosa, Nikolai Rulkov, and Irene Rodriguez-Lujan. 2016. Online Decorrelation of Humidity and Temperature in Chemical Sensors for Continuous Monitoring. Chemometrics and Intelligent Laboratory Systems 157 (07 2016). https://doi.org/10.1016/j.chemolab.2016.07.004Google ScholarCross Ref
BR Hutchinson and GD Raithby. 1986. A multigrid method based on the additive correction strategy. Numerical Heat Transfer, Part A: Applications 9, 5 (1986), 511--537.Google ScholarCross Ref
Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur Cetintemel. 2020. Deep-Squeeze: deep semantic compression for tabular data. In Proceedings of the 2020 ACM SIGMOD international conference on management of data. 1733--1746.Google ScholarDigital Library
Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings 2001 IEEE international conference on data mining. IEEE, 289--296.Google ScholarDigital Library
Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2004. Segmenting time series: A survey and novel approach. In Data mining in time series databases. World Scientific, 1--21.Google Scholar
Eamonn J Keogh and Michael J Pazzani. 2000. Scaling up dynamic time warping for datamining applications. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 285--289.Google ScholarDigital Library
Risi Kondor, Nedelina Teneva, and Vikas Garg. 2014. Multiresolution matrix factorization. In International Conference on Machine Learning. PMLR, 1620--1628.Google Scholar
David Krasowska, Julie Bessac, Robert Underwood, Jon C Calhoun, Sheng Di, and Franck Cappello. 2021. Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets. In 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7). IEEE, 47--53.Google ScholarCross Ref
Iosif Lazaridis and Sharad Mehrotra. 2001. Progressive approximate aggregate queries with a multi-resolution tree structure. Acm sigmod record 30, 2 (2001), 401--412.Google Scholar
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 438--447.Google ScholarCross Ref
Xi Liang, Stavros Sintos, Zechao Shang, and Sanjay Krishnan. 2021. Combining aggregation and sampling (nearly) optimally for approximate query processing. In Proceedings of the 2021 International Conference on Management of Data. 1129--1141.Google ScholarDigital Library
Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J. Elmore. 2021. Decomposed Bounded Floats for Fast Compression and Queries. Proc. VLDB Endow. 14, 11 (jul 2021), 2586--2598. https://doi.org/10.14778/3476249.3476305Google ScholarDigital Library
John Paparrizos, Chunwei Liu, Bruno Barbarioli, Johnny Hwang, Ikraduya Edian, Aaron J Elmore, Michael J Franklin, and Sanjay Krishnan. 2021. VergeDB: A Database for IoT Analytics on Edge Devices. In CIDR.Google Scholar
John Paparrizos, Chunwei Liu, Aaron J Elmore, and Michael J Franklin. 2020. Debunking four long-standing mis- conceptions of time-series distance measures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1887--1905.Google ScholarDigital Library
Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment 8, 12 (2015), 1816--1827.Google ScholarDigital Library
Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, in-Memory Time Series Database. Proc. VLDB Endow. 8, 12 (aug 2015), 1816--1827. https://doi.org/10.14778/2824032.2824078Google ScholarDigital Library
Navneet Potti and Jignesh M Patel. 2015. Daq: a new paradigm for approximate query processing. Proceedings of the VLDB Endowment 8, 9 (2015), 898--909.Google ScholarDigital Library
Powturbo. 2022. Turbo Range Coder. Retrieved March 23, 2023 from https://github.com/powturbo/Turbo-Range-CoderGoogle Scholar
Julian Seward. 1996. bzip2 and libbzip2. avaliable at http://www. bzip. org (1996).Google Scholar
John Shahid. 2019. InfluxDB documentation.Google Scholar
Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing frontiers and innovations 1, 2 (2014), 76--88.Google Scholar
Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart Devices Are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (Seoul, South Korea) (SenSys '15). Association for Computing Machinery, New York, NY, USA, 127--140. https://doi.org/10.1145/2809695.2809718Google ScholarDigital Library
Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.Google ScholarDigital Library
Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, and Ce Zhang. 2019. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning. Proceedings of the VLDB Endowment 12, 7 (2019), 807--821.Google ScholarDigital Library
Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology 13, 7 (2003), 560--576.Google ScholarDigital Library
Ian H. Witten, Radford M. Neal, and John G. Cleary. 1987. Arithmetic Coding for Data Compression. Commun. ACM 30, 6 (jun 1987), 520--540. https://doi.org/10.1145/214762.214771Google ScholarDigital Library
Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M Grulich, Ariane Ziehn, and Volker Mark. 2020. Nebulastream: Complex analytics beyond the cloud. Open Journal of Internet Of Things (OJIOT) 6, 1 (2020), 66--81.Google Scholar
Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1643--1654.Google ScholarCross Ref

Index Terms

Hierarchical Residual Encoding for Multiresolution Time Series Compression
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data layout
        Data compression
    2. Database design and models
      1. Data model extensions
        Temporal data
  2. Information systems applications
    1. Spatial-temporal systems
      1. Data streaming
      2. Sensor networks

Recommendations

Lossless compression of VLSI layout image data

We present a novel lossless compression algorithm called Context Copy Combinatorial Code (C4), which integrates the advantages of two very disparate compression techniques: context-based modeling and Lempel-Ziv (LZ) style copying. While the algorithm ...
Read More
Chain code lossless compression using move-to-front transform and adaptive run-length encoding

Chain codes are the most size-efficient representations of rasterised binary shapes and contours. This paper considers a new lossless chain code compression method based on move-to-front transform and an adaptive run-length encoding. The former reduces ...
Read More
FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

In this paper we propose an optimized bitstream compression algorithm based on LZ and a novel architecture of decompressor, the proposed algorithm improves the Compression Ratio by fully utilizing the regularity of configuration bits of CLB (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 May 2023
Published in pacmmod Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions
Author Tags
compression
storage
temporal databases
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 259
  Total Downloads
- Downloads (Last 12 months)259
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hierarchical Residual Encoding for Multiresolution Time Series Compression

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Lossless compression of VLSI layout image data

Chain code lossless compression using move-to-front transform and adaptive run-length encoding

FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hierarchical Residual Encoding for Multiresolution Time Series Compression

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Lossless compression of VLSI layout image data

Chain code lossless compression using move-to-front transform and adaptive run-length encoding

FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media