Abstract
Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data's utility, but smaller size enables both reduced network and storage costs at the servers and reduced power consumption in sensing devices. A natural solution is to compress the data at the sensing devices. Unfortunately, existing compression algorithms either violate the memory and latency constraints common for these devices or, as we show experimentally, perform poorly on sensor-generated time series.
We introduce a time series compression algorithm that achieves state-of-the-art compression ratios while requiring less than 1KB of memory and adding virtually no latency. This method is suitable not only for low-power devices collecting data, but also for servers storing and querying data; in the latter context, it can decompress at over 3GB/s in a single thread, even faster than many algorithms with much lower compression ratios. A key component of our method is a high-speed forecasting algorithm that can be trained online and significantly outperforms alternatives such as delta coding.
Extensive experiments on datasets from many domains show that these results hold not only for sensor data but also across a wide array of other time series.
- Information technology -- generic coding of moving pictures and associated audio information -- part 7: Advanced audio coding (aac), 2006. https://www.iso.org/standard/43345.html.Google Scholar
- Digi-key electronics, 2017. https://www.digikey.com/products/en/integrated-circuits-ics/data-acquisition-analog-to-digital-converters-adc700?k=adc8k=8pkeyword=adc8pv1989=0.Google Scholar
- Intel quark microcontrollers, 2017. https://www.intel.com/content/www/us/en/embedded/products/quark/overview.html.Google Scholar
- N. Aharony, W. Pan, C. Ip, I. Khayal, and A. Pentland. Social fmri: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile Computing, 7(6):643--659, 2011. Google ScholarDigital Library
- J. Alakuijala and Z. Szabadka. Brotli compressed data format. Technical report, 2016.Google Scholar
- M. P. Andersen and D. E. Culler. Btrdb: Optimizing storage system design for time-series processing. In FAST, pages 39--52, 2016. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Index compression using 64-bit words. Software: Practice and Experience, 40(2):131--147, 2010. Google ScholarDigital Library
- Apple. Apple lossless audio codec, 2011. https://github.com/macosforge/alac.Google Scholar
- S. Arrabi and J. Lach. Adaptive lossless compression in wireless body sensor networks. In Proceedings of the Fourth International Conference on Body Area Networks, page 19. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2009. Google ScholarDigital Library
- S. Beckett. Influxdb, 2017. https://influxdata.com.Google Scholar
- D. W. Blalock and J. V. Guttag. Extract: Strong examples from weakly-labeled sensor data. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pages 799--804. IEEE, 2016.Google ScholarCross Ref
- T. Bose, S. Bandyopadhyay, S. Kumar, A. Bhattacharyya, and A. Pal. Signal characteristics on sensor data compression in iot-an investigation. In Sensing, Communication, and Networking (SECON), 2016 13th Annual IEEE International Conference on, pages 1--6. IEEE, 2016.Google Scholar
- T. Boutell. Png (portable network graphics) specification version 1.0. 1997.Google Scholar
- M. Buevich, A. Wright, R. Sargent, and A. Rowe. Respawn: A distributed multi-resolution timeseries datastore. In Real-Time Systems Symposium (RTSS), 2013 IEEE 34th, pages 288--297. IEEE, 2013. Google ScholarDigital Library
- A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. isax 2.0: Indexing and mining one billion time series. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 58--67. IEEE, 2010. Google ScholarDigital Library
- Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista. The ucr time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/.Google Scholar
- J. Coalson. Flac-free lossless audio codec, 2008. http://flac.sourceforge.net.Google Scholar
- Y. Collet. Finite state entropy. https://github.com/Cyan4973/FiniteStateEntropy.Google Scholar
- Y. Collet. Lz4--extremely fast compression, 2017. https://github.com/Cyan4973/lz4.Google Scholar
- Y. Collet. Zstandard - fast real-time compression algorithm, 2017. https://facebook.github.io/zstd/.Google Scholar
- D. D. I. F. Committee et al. Dwarf debugging information format, version 4. Free Standards Group, 2010.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan):1--30, 2006. Google ScholarDigital Library
- L. P. Deutsch. Deflate compressed data format specification version 1.3. 1996.Google Scholar
- P. Deutsch and J.-L. Gailly. Zlib compressed data format specification version 3.3. Technical report, 1996. Google ScholarDigital Library
- J. Duda. Asymmetric numeral systems: entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540, 2013.Google Scholar
- F. Eichinger, P. Efros, S. Karnouskos, and K. Böhm. A timeseries compression technique and its application to the smart grid. The VLDB Journal, 24(2):193--218, 2015. Google ScholarDigital Library
- J. Fonollosa, S. Sheik, R. Huerta, and S. Marco. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical, 215:618--629, 2015.Google ScholarCross Ref
- S. Fothergill, H. M. Mentis, P. Kohli, and S. Nowozin. Instructing people for training gestural interactive systems. In J. A. Konstan, E. H. Chi, and K. Höök, editors, CHI, pages 1737--1746. ACM, 2012. Google ScholarDigital Library
- J.-L. Gailly and M. Adler. The gzip home page, 2003. https://www.gzip.org/.Google Scholar
- S. Golomb. Run-length encodings. IEEE transactions on information theory, 12(3):399--401, 1966. Google ScholarDigital Library
- Google. Protocol buffers encoding, 2001. https://developers.google.com/protocol-buffers/docs/encoding#types.Google Scholar
- S. Gunderson. Snappy: A fast compressor/decompressor, 2015. https://code.google.com/p/snappy.Google Scholar
- M. A. Hanson, H. C. Powell Jr, A. T. Barth, K. Ringgenberg, B. H. Calhoun, J. H. Aylor, and J. Lach. Body area sensor networks: Challenges and opportunities. Computer, 42(1), 2009. Google ScholarDigital Library
- B. Hawkins. Kairos db: Fast time series database on cassandra. https://github.com/kairosdb/kairosdb.Google Scholar
- B. Hu, T. Rakthanmanon, Y. Hao, S. Evans, S. Lonardi, and E. Keogh. Discovering the intrinsic cardinality and dimensionality of time series using mdl. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 1086--1091. IEEE, 2011. Google ScholarDigital Library
- N. Q. V. Hung, H. Jeung, and K. Aberer. An evaluation of model-based approaches to sensor data compression. IEEE Transactions on Knowledge and Data Engineering, 25(11):2434--2447, 2013. Google ScholarDigital Library
- T. Instruments. 2.4-ghz bluetoothÂő low energy system-on-chip, 2013. http://www.ti.com/lit/ds/symlink/cc2540.pdf.Google Scholar
- T. Instruments. Cc2640 simplelink bluetooth wireless mcu, 2016. http://www.ti.com/lit/ds/swrs176b/swrs176b.pdf.Google Scholar
- E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3):263--286, 2001.Google Scholar
- E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Sigmod Record, 30(2):151--162, 2001. Google ScholarDigital Library
- E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 289--296. IEEE, 2001. Google ScholarDigital Library
- E. Keogh, J. Lin, and A. Fu. Hot sax: Efficiently finding the most unusual time series subsequence. In Data mining, fifth IEEE international conference on, pages 8--pp. Ieee, 2005. Google ScholarDigital Library
- E. Lazin. Akumuli timeseries database. https://akumuli.org.Google Scholar
- D. Lemire. A better alternative to piecewise linear time series segmentation. In Proceedings of the 2007 SIAM International Conference on Data Mining, pages 545--550. SIAM, 2007.Google ScholarCross Ref
- D. Lemire. Microbenchmarking calls for idealized conditions, 2018. https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/.Google Scholar
- D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Software: Practice and Experience, 45(1):1--29, 2015. Google ScholarDigital Library
- J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 2--11. ACM, 2003. Google ScholarDigital Library
- Y. Liu, M. De Vos, and S. Van Huffel. Compressed sensing of multichannel eeg signals: the simultaneous cosparsity and low-rank optimization. IEEE Transactions on Biomedical Engineering, 62(8):2055--2061, 2015.Google ScholarCross Ref
- S. Makonin, B. Ellert, I. V. Bajic, and F. Popowich. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Scientific Data, 3(160037):1--12, 2016.Google Scholar
- S.-G. Miaou and H.-L. Yen. Multichannel ecg compression using multichannel adaptive vector quantization. IEEE transactions on biomedical engineering, 48(10):1203--1207, 2001.Google Scholar
- J. Moffitt. Ogg vorbis. Linux journal, 2001(81es):9, 2001.Google Scholar
- P. Nemenyi. Distribution-free multiple comparisons. In Biometrics, volume 18, page 263. INTERNATIONAL BIOMETRIC SOC 1441 I ST, NW, SUITE 700, WASHINGTON, DC 20005-2210, 1962.Google Scholar
- M. Oberhumer. Lzo-a real-time data compression library. http://www.oberhumer.com/opensource/lzo/, 2008.Google Scholar
- T. Pelkonen, S. Franklin, J. Teller, P. Cavallaro, Q. Huang, J. Meza, and K. Veeraraghavan. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12):1816--1827, 2015. Google ScholarDigital Library
- T. Rakthanmanon and E. Keogh. Fast shapelets: A scalable algorithm for discovering time series shapelets. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 668--676. SIAM, 2013.Google ScholarCross Ref
- T. Rakthanmanon, E. J. Keogh, S. Lonardi, and S. Evans. Time series epenthesis: Clustering time series streams requires ignoring some data. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 547--556. IEEE, 2011. Google ScholarDigital Library
- A. Reiss and D. Stricker. Towards global aerobic activity monitoring. In Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, page 12. ACM, 2011. Google ScholarDigital Library
- S. Rhea, E. Wang, E. Wong, E. Atkins, and N. Storer. Littletable: a timeseries database and its uses. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 125--138. ACM, 2017. Google ScholarDigital Library
- R. F. Rice. Some practical universal noiseless coding techniques, part 3, module psl14, k+. 1991.Google Scholar
- T. Robinson. Shorten: Simple lossless and near-lossless waveform compression, 1994.Google Scholar
- B. Schlegel, R. Gemulla, and W. Lehner. Fast integer compression using simd instructions. In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 34--40. ACM, 2010. Google ScholarDigital Library
- M. Seidemann and B. Seeger. Chronicledb: A high-performance event store. In EDBT, pages 144--155, 2017.Google Scholar
- P. Senin and S. Malinchik. Sax-vsm: Interpretable time series classification using sax and vector space model. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1175--1180. IEEE, 2013.Google ScholarCross Ref
- J. Shieh and E. Keogh. isax: disk-aware mining and indexing of massive time series datasets. Data Mining and Knowledge Discovery, 19(1):24--57, 2009. Google ScholarDigital Library
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on, pages 1--10. IEEE, 2010. Google ScholarDigital Library
- H. Siedelmann, A. Wender, and M. Fuchs. High speed lossless image compression. In German Conference on Pattern Recognition, pages 343--355. Springer, 2015.Google ScholarDigital Library
- B. Sigoure. Opentsdb: The distributed, scalable time series database. Proc. OSCON, 11, 2010.Google Scholar
- A. A. Stepanov, A. R. Gangolli, D. E. Rose, R.J. Ernst, and P. S. Oberoi. Simd-based decoding of posting lists. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 317--326. ACM, 2011. Google ScholarDigital Library
- F. D. E. Team. Rocksdb: A persistent key-value store for fast storage environments. http://rocksdb.org.Google Scholar
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive-a petabyte scale data warehouse using hadoop. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 996--1005. IEEE, 2010.Google ScholarCross Ref
- A. Ukil, S. Bandyopadhyay, and A. Pal. Iot data compression: Sensor-agnostic approach. In Data Compression Conference (DCC), 2015, pages 303--312. IEEE, 2015. Google ScholarDigital Library
- J.-M. Valin, K. Vos, and T. Terriberry. Definition of the opus audio codec. Technical report, 2012.Google Scholar
- N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P. Chandrakasan. A micro-power eeg acquisition soc with integrated feature extraction processor for a chronic seizure detection system. IEEE Journal of Solid-State Circuits, 45(4):804--816, 2010.Google ScholarCross Ref
- F. Yang, E. Tschetter, X. Léauté, N. Ray, G. Merlino, and D. Ganguli. Druid: A real-time analytical data store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 157--168. ACM, 2014. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud, 10(10-10):95, 2010. Google ScholarDigital Library
- W. X. Zhao, X. Zhang, D. Lemire, D. Shan, J.-Y. Nie, H. Yan, and J.-R. Wen. A general simd-based approach to accelerating compression algorithms. ACM Transactions on Information Systems (TOIS), 33(3):15, 2015. Google ScholarDigital Library
- M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on, pages 59--59. IEEE, 2006. Google ScholarDigital Library
Index Terms
- Sprintz: Time Series Compression for the Internet of Things
Recommendations
Configuration compression for FPGA-based embedded systems
Field programmable gate arrays (FPGAs) are a promising technology for developing high-performance embedded systems. The density and performance of FPGAs have drastically improved over the past few years. Consequently, the size of the configuration ...
Second compression for pixelated images under edge-based compression algorithms: JPEG-LS as an example
This paper details the examination of a particular case of data compression, where the compression algorithm removes the redundancy from data, which occurs when edge-based compression algorithms compress (previously compressed) pixelated images. The newly ...
LOCO-I: a low complexity, context-based, lossless image compression algorithm
DCC '96: Proceedings of the Conference on Data CompressionLOCO-I (low complexity lossless compression for images) is a novel lossless compression algorithm for continuous-tone images which combines the simplicity of Huffman coding with the compression potential of context models, thus "enjoying the best of ...
Comments