skip to main content
research-article
Public Access

Sprintz: Time Series Compression for the Internet of Things

Published:18 September 2018Publication History
Skip Abstract Section

Abstract

Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data's utility, but smaller size enables both reduced network and storage costs at the servers and reduced power consumption in sensing devices. A natural solution is to compress the data at the sensing devices. Unfortunately, existing compression algorithms either violate the memory and latency constraints common for these devices or, as we show experimentally, perform poorly on sensor-generated time series.

We introduce a time series compression algorithm that achieves state-of-the-art compression ratios while requiring less than 1KB of memory and adding virtually no latency. This method is suitable not only for low-power devices collecting data, but also for servers storing and querying data; in the latter context, it can decompress at over 3GB/s in a single thread, even faster than many algorithms with much lower compression ratios. A key component of our method is a high-speed forecasting algorithm that can be trained online and significantly outperforms alternatives such as delta coding.

Extensive experiments on datasets from many domains show that these results hold not only for sensor data but also across a wide array of other time series.

References

  1. Information technology -- generic coding of moving pictures and associated audio information -- part 7: Advanced audio coding (aac), 2006. https://www.iso.org/standard/43345.html.Google ScholarGoogle Scholar
  2. Digi-key electronics, 2017. https://www.digikey.com/products/en/integrated-circuits-ics/data-acquisition-analog-to-digital-converters-adc700?k=adc8k=8pkeyword=adc8pv1989=0.Google ScholarGoogle Scholar
  3. Intel quark microcontrollers, 2017. https://www.intel.com/content/www/us/en/embedded/products/quark/overview.html.Google ScholarGoogle Scholar
  4. N. Aharony, W. Pan, C. Ip, I. Khayal, and A. Pentland. Social fmri: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile Computing, 7(6):643--659, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Alakuijala and Z. Szabadka. Brotli compressed data format. Technical report, 2016.Google ScholarGoogle Scholar
  6. M. P. Andersen and D. E. Culler. Btrdb: Optimizing storage system design for time-series processing. In FAST, pages 39--52, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. N. Anh and A. Moffat. Index compression using 64-bit words. Software: Practice and Experience, 40(2):131--147, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Apple. Apple lossless audio codec, 2011. https://github.com/macosforge/alac.Google ScholarGoogle Scholar
  9. S. Arrabi and J. Lach. Adaptive lossless compression in wireless body sensor networks. In Proceedings of the Fourth International Conference on Body Area Networks, page 19. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Beckett. Influxdb, 2017. https://influxdata.com.Google ScholarGoogle Scholar
  11. D. W. Blalock and J. V. Guttag. Extract: Strong examples from weakly-labeled sensor data. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pages 799--804. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Bose, S. Bandyopadhyay, S. Kumar, A. Bhattacharyya, and A. Pal. Signal characteristics on sensor data compression in iot-an investigation. In Sensing, Communication, and Networking (SECON), 2016 13th Annual IEEE International Conference on, pages 1--6. IEEE, 2016.Google ScholarGoogle Scholar
  13. T. Boutell. Png (portable network graphics) specification version 1.0. 1997.Google ScholarGoogle Scholar
  14. M. Buevich, A. Wright, R. Sargent, and A. Rowe. Respawn: A distributed multi-resolution timeseries datastore. In Real-Time Systems Symposium (RTSS), 2013 IEEE 34th, pages 288--297. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. isax 2.0: Indexing and mining one billion time series. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 58--67. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista. The ucr time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/.Google ScholarGoogle Scholar
  17. J. Coalson. Flac-free lossless audio codec, 2008. http://flac.sourceforge.net.Google ScholarGoogle Scholar
  18. Y. Collet. Finite state entropy. https://github.com/Cyan4973/FiniteStateEntropy.Google ScholarGoogle Scholar
  19. Y. Collet. Lz4--extremely fast compression, 2017. https://github.com/Cyan4973/lz4.Google ScholarGoogle Scholar
  20. Y. Collet. Zstandard - fast real-time compression algorithm, 2017. https://facebook.github.io/zstd/.Google ScholarGoogle Scholar
  21. D. D. I. F. Committee et al. Dwarf debugging information format, version 4. Free Standards Group, 2010.Google ScholarGoogle Scholar
  22. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan):1--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. P. Deutsch. Deflate compressed data format specification version 1.3. 1996.Google ScholarGoogle Scholar
  25. P. Deutsch and J.-L. Gailly. Zlib compressed data format specification version 3.3. Technical report, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Duda. Asymmetric numeral systems: entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540, 2013.Google ScholarGoogle Scholar
  27. F. Eichinger, P. Efros, S. Karnouskos, and K. Böhm. A timeseries compression technique and its application to the smart grid. The VLDB Journal, 24(2):193--218, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Fonollosa, S. Sheik, R. Huerta, and S. Marco. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical, 215:618--629, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. Fothergill, H. M. Mentis, P. Kohli, and S. Nowozin. Instructing people for training gestural interactive systems. In J. A. Konstan, E. H. Chi, and K. Höök, editors, CHI, pages 1737--1746. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J.-L. Gailly and M. Adler. The gzip home page, 2003. https://www.gzip.org/.Google ScholarGoogle Scholar
  31. S. Golomb. Run-length encodings. IEEE transactions on information theory, 12(3):399--401, 1966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Google. Protocol buffers encoding, 2001. https://developers.google.com/protocol-buffers/docs/encoding#types.Google ScholarGoogle Scholar
  33. S. Gunderson. Snappy: A fast compressor/decompressor, 2015. https://code.google.com/p/snappy.Google ScholarGoogle Scholar
  34. M. A. Hanson, H. C. Powell Jr, A. T. Barth, K. Ringgenberg, B. H. Calhoun, J. H. Aylor, and J. Lach. Body area sensor networks: Challenges and opportunities. Computer, 42(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Hawkins. Kairos db: Fast time series database on cassandra. https://github.com/kairosdb/kairosdb.Google ScholarGoogle Scholar
  36. B. Hu, T. Rakthanmanon, Y. Hao, S. Evans, S. Lonardi, and E. Keogh. Discovering the intrinsic cardinality and dimensionality of time series using mdl. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 1086--1091. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Q. V. Hung, H. Jeung, and K. Aberer. An evaluation of model-based approaches to sensor data compression. IEEE Transactions on Knowledge and Data Engineering, 25(11):2434--2447, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Instruments. 2.4-ghz bluetoothÂő low energy system-on-chip, 2013. http://www.ti.com/lit/ds/symlink/cc2540.pdf.Google ScholarGoogle Scholar
  39. T. Instruments. Cc2640 simplelink bluetooth wireless mcu, 2016. http://www.ti.com/lit/ds/swrs176b/swrs176b.pdf.Google ScholarGoogle Scholar
  40. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3):263--286, 2001.Google ScholarGoogle Scholar
  41. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Sigmod Record, 30(2):151--162, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 289--296. IEEE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. Keogh, J. Lin, and A. Fu. Hot sax: Efficiently finding the most unusual time series subsequence. In Data mining, fifth IEEE international conference on, pages 8--pp. Ieee, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. E. Lazin. Akumuli timeseries database. https://akumuli.org.Google ScholarGoogle Scholar
  45. D. Lemire. A better alternative to piecewise linear time series segmentation. In Proceedings of the 2007 SIAM International Conference on Data Mining, pages 545--550. SIAM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  46. D. Lemire. Microbenchmarking calls for idealized conditions, 2018. https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/.Google ScholarGoogle Scholar
  47. D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Software: Practice and Experience, 45(1):1--29, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 2--11. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Liu, M. De Vos, and S. Van Huffel. Compressed sensing of multichannel eeg signals: the simultaneous cosparsity and low-rank optimization. IEEE Transactions on Biomedical Engineering, 62(8):2055--2061, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  50. S. Makonin, B. Ellert, I. V. Bajic, and F. Popowich. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Scientific Data, 3(160037):1--12, 2016.Google ScholarGoogle Scholar
  51. S.-G. Miaou and H.-L. Yen. Multichannel ecg compression using multichannel adaptive vector quantization. IEEE transactions on biomedical engineering, 48(10):1203--1207, 2001.Google ScholarGoogle Scholar
  52. J. Moffitt. Ogg vorbis. Linux journal, 2001(81es):9, 2001.Google ScholarGoogle Scholar
  53. P. Nemenyi. Distribution-free multiple comparisons. In Biometrics, volume 18, page 263. INTERNATIONAL BIOMETRIC SOC 1441 I ST, NW, SUITE 700, WASHINGTON, DC 20005-2210, 1962.Google ScholarGoogle Scholar
  54. M. Oberhumer. Lzo-a real-time data compression library. http://www.oberhumer.com/opensource/lzo/, 2008.Google ScholarGoogle Scholar
  55. T. Pelkonen, S. Franklin, J. Teller, P. Cavallaro, Q. Huang, J. Meza, and K. Veeraraghavan. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12):1816--1827, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. T. Rakthanmanon and E. Keogh. Fast shapelets: A scalable algorithm for discovering time series shapelets. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 668--676. SIAM, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  57. T. Rakthanmanon, E. J. Keogh, S. Lonardi, and S. Evans. Time series epenthesis: Clustering time series streams requires ignoring some data. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 547--556. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. A. Reiss and D. Stricker. Towards global aerobic activity monitoring. In Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, page 12. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S. Rhea, E. Wang, E. Wong, E. Atkins, and N. Storer. Littletable: a timeseries database and its uses. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 125--138. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R. F. Rice. Some practical universal noiseless coding techniques, part 3, module psl14, k+. 1991.Google ScholarGoogle Scholar
  61. T. Robinson. Shorten: Simple lossless and near-lossless waveform compression, 1994.Google ScholarGoogle Scholar
  62. B. Schlegel, R. Gemulla, and W. Lehner. Fast integer compression using simd instructions. In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 34--40. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. M. Seidemann and B. Seeger. Chronicledb: A high-performance event store. In EDBT, pages 144--155, 2017.Google ScholarGoogle Scholar
  64. P. Senin and S. Malinchik. Sax-vsm: Interpretable time series classification using sax and vector space model. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1175--1180. IEEE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  65. J. Shieh and E. Keogh. isax: disk-aware mining and indexing of massive time series datasets. Data Mining and Knowledge Discovery, 19(1):24--57, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on, pages 1--10. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. H. Siedelmann, A. Wender, and M. Fuchs. High speed lossless image compression. In German Conference on Pattern Recognition, pages 343--355. Springer, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. B. Sigoure. Opentsdb: The distributed, scalable time series database. Proc. OSCON, 11, 2010.Google ScholarGoogle Scholar
  69. A. A. Stepanov, A. R. Gangolli, D. E. Rose, R.J. Ernst, and P. S. Oberoi. Simd-based decoding of posting lists. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 317--326. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. F. D. E. Team. Rocksdb: A persistent key-value store for fast storage environments. http://rocksdb.org.Google ScholarGoogle Scholar
  71. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive-a petabyte scale data warehouse using hadoop. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 996--1005. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  72. A. Ukil, S. Bandyopadhyay, and A. Pal. Iot data compression: Sensor-agnostic approach. In Data Compression Conference (DCC), 2015, pages 303--312. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. J.-M. Valin, K. Vos, and T. Terriberry. Definition of the opus audio codec. Technical report, 2012.Google ScholarGoogle Scholar
  74. N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P. Chandrakasan. A micro-power eeg acquisition soc with integrated feature extraction processor for a chronic seizure detection system. IEEE Journal of Solid-State Circuits, 45(4):804--816, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  75. F. Yang, E. Tschetter, X. Léauté, N. Ray, G. Merlino, and D. Ganguli. Druid: A real-time analytical data store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 157--168. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud, 10(10-10):95, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. W. X. Zhao, X. Zhang, D. Lemire, D. Shan, J.-Y. Nie, H. Yan, and J.-R. Wen. A general simd-based approach to accelerating compression algorithms. ACM Transactions on Information Systems (TOIS), 33(3):15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on, pages 59--59. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sprintz: Time Series Compression for the Internet of Things

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
          Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 2, Issue 3
          September 2018
          1536 pages
          EISSN:2474-9567
          DOI:10.1145/3279953
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 September 2018
          • Accepted: 1 September 2018
          • Revised: 1 July 2018
          • Received: 1 May 2018
          Published in imwut Volume 2, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader