Abstract
Data compression is a key technique for reducing the cost of data transfer from storage to compute nodes. Increasingly, modern data scales necessitate lossy compression techniques, where exactness is sacrificed for a smaller compressed representation. One challenge in lossy compression is that different applications may have different accuracy demands. Today's compression techniques struggle in this setting either forcing the user to compress at the strictest accuracy demand, or to re-encode the data at multiple resolutions. This paper proposes a simple, but effective multiresolution compression algorithm for time series data, where a single encoding can effectively be decompressed at multiple output resolutions. There are a number of benefits over current state-of-the-art techniques for time series compression. (1) The storage footprint of this encoding is smaller than re-encoding the data at multiple resolutions. (2) Similarly, the compression latency is generally smaller than re-encoding at multiple resolutions. (3) Finally, the decompression latency of our encoding is significantly faster than single encodings at the strictest accuracy demand.
Supplemental Material
- Cuneyt G Akcora, Yitao Li, Yulia R Gel, and Murat Kantarcioglu. 2021. BitcoinHeist: topological data analysis for ransomware prediction on the bitcoin blockchain. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 4439--4445.Google Scholar
- Sebastian Angel, Mihir Nanavati, and Siddhartha Sen. 2020. Disaggregation and the Application. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20).Google Scholar
- Gennady Antoshenkov. 1997. Dictionary-based order-preserving string compression. The VLDB Journal 6, 1 (1997), 26--39.Google ScholarDigital Library
- Davis Blalock, Samuel Madden, and John Guttag. 2018. Sprintz: Time series compression for the internet of things. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--23.Google ScholarDigital Library
- Francesco Buccafurri, Filippo Furfaro, Giuseppe M Mazzeo, and Domenico Saccà. 2011. A quad-tree based multiresolution approach for two-dimensional summary data. Information Systems 36, 7 (2011), 1082--1103.Google ScholarDigital Library
- M. Burrows and D. J. Wheeler. 1994. A block-sorting lossless data compression algorithm. Technical Report.Google Scholar
- Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201--1220.Google ScholarDigital Library
- Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. 2001. Approximate query processing using wavelets. The VLDB Journal 10, 2 (2001), 199--223.Google ScholarDigital Library
- Kaushik Chakrabarti, Eamonn Keogh, Sharad Mehrotra, and Michael Pazzani. 2002. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans. Database Syst. 27, 2 (jun 2002), 188--228. https://doi.org/10.1145/568518.568520Google ScholarDigital Library
- Shubham Chandak, Kedar Tatwawadi, Chengtao Wen, Lingyun Wang, Juan Aparicio Ojea, and Tsachy Weissman. 2020. LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction. In 2020 Data Compression Conference (DCC). 342--351. https://doi.org/10.1109/DCC47342.2020.00042Google ScholarCross Ref
- Chris Chatfield. 1978. The Holt-winters forecasting procedure. Journal of the Royal Statistical Society: Series C (Applied Statistics) 27, 3 (1978), 264--279.Google ScholarCross Ref
- Graham Cormode, Minos Garofalakis, Peter J Haas, Chris Jermaine, et al. 2011. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4, 1--3 (2011), 1--294.Google Scholar
- Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 730--739.Google Scholar
- James Diffenderfer, Alyson L Fox, Jeffrey A Hittinger, Geoffrey Sanders, and Peter G Lindstrom. 2019. Error analysis of zfp compression for floating-point data. SIAM Journal on Scientific Computing 41, 3 (2019), A1867--A1898.Google ScholarDigital Library
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database--An Architecture Overview. IEEE Data Eng. Bull. 35, 1 (2012), 28--33.Google Scholar
- Johannes Fischer and Volker Heun. 2007. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In International Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. Springer, 459--470.Google ScholarCross Ref
- Yihan Gao and Aditya Parameswaran. 2016. Squish: Near-optimal compression for archival of relational datasets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1575--1584.Google ScholarDigital Library
- Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1917--1923.Google ScholarDigital Library
- Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.Google ScholarDigital Library
- Ramón Huerta, Thiago Mosqueiro, Jordi Fonollosa, Nikolai Rulkov, and Irene Rodriguez-Lujan. 2016. Online Decorrelation of Humidity and Temperature in Chemical Sensors for Continuous Monitoring. Chemometrics and Intelligent Laboratory Systems 157 (07 2016). https://doi.org/10.1016/j.chemolab.2016.07.004Google ScholarCross Ref
- BR Hutchinson and GD Raithby. 1986. A multigrid method based on the additive correction strategy. Numerical Heat Transfer, Part A: Applications 9, 5 (1986), 511--537.Google ScholarCross Ref
- Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur Cetintemel. 2020. Deep-Squeeze: deep semantic compression for tabular data. In Proceedings of the 2020 ACM SIGMOD international conference on management of data. 1733--1746.Google ScholarDigital Library
- Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings 2001 IEEE international conference on data mining. IEEE, 289--296.Google ScholarDigital Library
- Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2004. Segmenting time series: A survey and novel approach. In Data mining in time series databases. World Scientific, 1--21.Google Scholar
- Eamonn J Keogh and Michael J Pazzani. 2000. Scaling up dynamic time warping for datamining applications. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 285--289.Google ScholarDigital Library
- Risi Kondor, Nedelina Teneva, and Vikas Garg. 2014. Multiresolution matrix factorization. In International Conference on Machine Learning. PMLR, 1620--1628.Google Scholar
- David Krasowska, Julie Bessac, Robert Underwood, Jon C Calhoun, Sheng Di, and Franck Cappello. 2021. Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets. In 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7). IEEE, 47--53.Google ScholarCross Ref
- Iosif Lazaridis and Sharad Mehrotra. 2001. Progressive approximate aggregate queries with a multi-resolution tree structure. Acm sigmod record 30, 2 (2001), 401--412.Google Scholar
- Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 438--447.Google ScholarCross Ref
- Xi Liang, Stavros Sintos, Zechao Shang, and Sanjay Krishnan. 2021. Combining aggregation and sampling (nearly) optimally for approximate query processing. In Proceedings of the 2021 International Conference on Management of Data. 1129--1141.Google ScholarDigital Library
- Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J. Elmore. 2021. Decomposed Bounded Floats for Fast Compression and Queries. Proc. VLDB Endow. 14, 11 (jul 2021), 2586--2598. https://doi.org/10.14778/3476249.3476305Google ScholarDigital Library
- John Paparrizos, Chunwei Liu, Bruno Barbarioli, Johnny Hwang, Ikraduya Edian, Aaron J Elmore, Michael J Franklin, and Sanjay Krishnan. 2021. VergeDB: A Database for IoT Analytics on Edge Devices. In CIDR.Google Scholar
- John Paparrizos, Chunwei Liu, Aaron J Elmore, and Michael J Franklin. 2020. Debunking four long-standing mis- conceptions of time-series distance measures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1887--1905.Google ScholarDigital Library
- Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment 8, 12 (2015), 1816--1827.Google ScholarDigital Library
- Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, in-Memory Time Series Database. Proc. VLDB Endow. 8, 12 (aug 2015), 1816--1827. https://doi.org/10.14778/2824032.2824078Google ScholarDigital Library
- Navneet Potti and Jignesh M Patel. 2015. Daq: a new paradigm for approximate query processing. Proceedings of the VLDB Endowment 8, 9 (2015), 898--909.Google ScholarDigital Library
- Powturbo. 2022. Turbo Range Coder. Retrieved March 23, 2023 from https://github.com/powturbo/Turbo-Range-CoderGoogle Scholar
- Julian Seward. 1996. bzip2 and libbzip2. avaliable at http://www. bzip. org (1996).Google Scholar
- John Shahid. 2019. InfluxDB documentation.Google Scholar
- Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing frontiers and innovations 1, 2 (2014), 76--88.Google Scholar
- Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart Devices Are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (Seoul, South Korea) (SenSys '15). Association for Computing Machinery, New York, NY, USA, 127--140. https://doi.org/10.1145/2809695.2809718Google ScholarDigital Library
- Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.Google ScholarDigital Library
- Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, and Ce Zhang. 2019. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning. Proceedings of the VLDB Endowment 12, 7 (2019), 807--821.Google ScholarDigital Library
- Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology 13, 7 (2003), 560--576.Google ScholarDigital Library
- Ian H. Witten, Radford M. Neal, and John G. Cleary. 1987. Arithmetic Coding for Data Compression. Commun. ACM 30, 6 (jun 1987), 520--540. https://doi.org/10.1145/214762.214771Google ScholarDigital Library
- Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M Grulich, Ariane Ziehn, and Volker Mark. 2020. Nebulastream: Complex analytics beyond the cloud. Open Journal of Internet Of Things (OJIOT) 6, 1 (2020), 66--81.Google Scholar
- Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1643--1654.Google ScholarCross Ref
Index Terms
- Hierarchical Residual Encoding for Multiresolution Time Series Compression
Recommendations
Lossless compression of VLSI layout image data
We present a novel lossless compression algorithm called Context Copy Combinatorial Code (C4), which integrates the advantages of two very disparate compression techniques: context-based modeling and Lempel-Ziv (LZ) style copying. While the algorithm ...
Chain code lossless compression using move-to-front transform and adaptive run-length encoding
Chain codes are the most size-efficient representations of rasterised binary shapes and contours. This paper considers a new lossless chain code compression method based on move-to-front transform and an adaptive run-length encoding. The former reduces ...
FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysIn this paper we propose an optimized bitstream compression algorithm based on LZ and a novel architecture of decompressor, the proposed algorithm improves the Compression Ratio by fully utilizing the regularity of configuration bits of CLB (...
Comments