Skip to main content
Top
Published in: The VLDB Journal 3/2024

12-02-2024 | Regular Paper

Time series data encoding in Apache IoTDB: comparative analysis and recommendation

Authors: Tianrui Xia, Jinzhao Xiao, Yuxiang Huang, Changyu Hu, Shaoxu Song, Xiangdong Huang, Jianmin Wang

Published in: The VLDB Journal | Issue 3/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Not only the vast applications but also the distinct features of time series data stimulate the booming growth of time series database management systems, such as Apache IoTDB, InfluxDB, OpenTSDB and so on. Almost all these systems employ columnar storage, with effective encoding of time series data. Given the distinct features of various time series data, different encoding strategies may perform variously. In this study, we first summarize the features of time series data that may affect encoding performance. We also introduce the latest feature extraction results in these features. Then, we introduce the storage scheme of a typical time series database, Apache IoTDB, prescribing the limits to implementing encoding algorithms in the system. A qualitative analysis of encoding effectiveness is then presented for the studied algorithms. To this end, we develop a benchmark for evaluating encoding algorithms, including a data generator and several real-world datasets. Also, we present an extensive experimental evaluation. Remarkably, a quantitative analysis of encoding effectiveness regarding to data features is conducted in Apache IoTDB. Finally, we recommend the best encoding algorithm for different time series referring to their data features. Machine learning models are trained for the recommendation and evaluated over real-world datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aamand, A., Indyk, P., Vakilian, A.: (Learned) frequency estimation algorithms under Zipfian distribution. CoRR, arXiv:1908.05198 (2019) Aamand, A., Indyk, P., Vakilian, A.: (Learned) frequency estimation algorithms under Zipfian distribution. CoRR, arXiv:​1908.​05198 (2019)
2.
go back to reference Abadi, D.J., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27–29, 2006, pp. 671–682. ACM (2006) Abadi, D.J., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27–29, 2006, pp. 671–682. ACM (2006)
3.
go back to reference Bartik, M., Ubik, S., Kubalík, P.: LZ4 compression algorithm on FPGA. In: 2015 IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2015, Cairo, Egypt, December 6–9, 2015, pp. 179–182. IEEE (2015) Bartik, M., Ubik, S., Kubalík, P.: LZ4 compression algorithm on FPGA. In: 2015 IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2015, Cairo, Egypt, December 6–9, 2015, pp. 179–182. IEEE (2015)
4.
go back to reference Blalock, D. W., Madden, S., Guttag, J. V.: Sprintz: time series compression for the internet of things. Proc. ACM Interact. Mob. Wearab. Ubiquit. Technol. 2(3), 93:1-93:23 (2018)CrossRef Blalock, D. W., Madden, S., Guttag, J. V.: Sprintz: time series compression for the internet of things. Proc. ACM Interact. Mob. Wearab. Ubiquit. Technol. 2(3), 93:1-93:23 (2018)CrossRef
5.
go back to reference Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984) Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)
7.
go back to reference Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. In: Digital SRC Research Report, Citeseer (1994) Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. In: Digital SRC Research Report, Citeseer (1994)
8.
go back to reference Campobello, G., Segreto, A., Zanafi, S., Serrano, S.: RAKE: a simple and efficient lossless compression algorithm for the internet of things. In: 25th European Signal Processing Conference, EUSIPCO 2017, Kos, Greece, August 28–September 2, 2017, pp. 2581–2585. IEEE (2017) Campobello, G., Segreto, A., Zanafi, S., Serrano, S.: RAKE: a simple and efficient lossless compression algorithm for the internet of things. In: 25th European Signal Processing Conference, EUSIPCO 2017, Kos, Greece, August 28–September 2, 2017, pp. 2581–2585. IEEE (2017)
9.
go back to reference Cen, L., Kipf, A., Marcus, R., Kraska, T.: LEA: a learned encoding advisor for column stores. In: aiDM ’21: 4th Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021, pp. 32–35. ACM (2021) Cen, L., Kipf, A., Marcus, R., Kraska, T.: LEA: a learned encoding advisor for column stores. In: aiDM ’21: 4th Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021, pp. 32–35. ACM (2021)
10.
go back to reference Chiarot, G., Silvestri, C.: Time series compression survey. ACM Comput. Surv. 55(10), 198:1-198:32 (2023)CrossRef Chiarot, G., Silvestri, C.: Time series compression survey. ACM Comput. Surv. 55(10), 198:1-198:32 (2023)CrossRef
11.
go back to reference Dalai, M., Leonardi, R.: Approximations of one-dimensional digital signals under the l\({}_{\text{ infty }}\) norm. IEEE Trans. Signal Process. 54(8), 3111–3124 (2006)CrossRef Dalai, M., Leonardi, R.: Approximations of one-dimensional digital signals under the l\({}_{\text{ infty }}\) norm. IEEE Trans. Signal Process. 54(8), 3111–3124 (2006)CrossRef
12.
go back to reference Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: Advances in Neural Information Processing Systems 9, NIPS, Denver, CO, USA, December 2–5, 1996, pp. 155–161. MIT Press (1996) Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: Advances in Neural Information Processing Systems 9, NIPS, Denver, CO, USA, December 2–5, 1996, pp. 155–161. MIT Press (1996)
13.
go back to reference Eichinger, F., Efros, P., Karnouskos, S., Böhm, K.: A time-series compression technique and its application to the smart grid. VLDB J. 24(2), 193–218 (2015)CrossRef Eichinger, F., Efros, P., Karnouskos, S., Böhm, K.: A time-series compression technique and its application to the smart grid. VLDB J. 24(2), 193–218 (2015)CrossRef
14.
go back to reference Fang, C., Song, S., Guan, H., Huang, X., Wang, C., Wang, J.: Grouping time series for efficient columnar storage. Proc. ACM Manag. Data 1(1), 23:1-23:26 (2023)CrossRef Fang, C., Song, S., Guan, H., Huang, X., Wang, C., Wang, J.: Grouping time series for efficient columnar storage. Proc. ACM Manag. Data 1(1), 23:1-23:26 (2023)CrossRef
15.
go back to reference Fink, E., Gandhi, H.S.: Compression of time series by extracting major extrema. J. Exp. Theor. Artif. Intell. 23(2), 255–270 (2011)CrossRef Fink, E., Gandhi, H.S.: Compression of time series by extracting major extrema. J. Exp. Theor. Artif. Intell. 23(2), 255–270 (2011)CrossRef
16.
go back to reference Golomb, S. W.: Run-length encodings (corresp.). IEEE Trans. Inf. Theory 12(3), 399–401 (1966)CrossRef Golomb, S. W.: Run-length encodings (corresp.). IEEE Trans. Inf. Theory 12(3), 399–401 (1966)CrossRef
17.
go back to reference Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. Springer (2009) Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. Springer (2009)
18.
go back to reference Hinton, G.E.: Connectionist learning procedures. Artif. Intell. 40(1–3), 185–234 (1989)CrossRef Hinton, G.E.: Connectionist learning procedures. Artif. Intell. 40(1–3), 185–234 (1989)CrossRef
19.
go back to reference Howard, P.G., Vitter, J.S.: Parallel lossless image compression using Huffman and arithmetic coding. Inf. Process. Lett. 59(2), 65–73 (1996)CrossRef Howard, P.G., Vitter, J.S.: Parallel lossless image compression using Huffman and arithmetic coding. Inf. Process. Lett. 59(2), 65–73 (1996)CrossRef
35.
go back to reference Huang, S., Chen, Y., Chen, X., Liu, K., Xu, X., Wang, C., Brown, K., Halilovic, I.: The next generation operational data historian for IoT based on informix. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014, pp. 169–176. ACM (2014) Huang, S., Chen, Y., Chen, X., Liu, K., Xu, X., Wang, C., Brown, K., Halilovic, I.: The next generation operational data historian for IoT based on informix. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014, pp. 169–176. ACM (2014)
36.
go back to reference Jiang, H., Liu, C., Paparrizos, J., Chien, A.A., Ma, J., Elmore, A.J.: Good to the last bit: Data-driven encoding with codecdb. In: SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20–25, 2021, pp. 843–856. ACM (2021) Jiang, H., Liu, C., Paparrizos, J., Chien, A.A., Ma, J., Elmore, A.J.: Good to the last bit: Data-driven encoding with codecdb. In: SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20–25, 2021, pp. 843–856. ACM (2021)
37.
go back to reference Katsis, Y., Freund, Y., Papakonstantinou, Y.: Combining databases and signal processing in plato. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, CA, USA, January 4–7, 2015, Online Proceedings (2015). https://www.cidrdb.org Katsis, Y., Freund, Y., Papakonstantinou, Y.: Combining databases and signal processing in plato. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, CA, USA, January 4–7, 2015, Online Proceedings (2015). https://​www.​cidrdb.​org
38.
go back to reference Khelifati, A., Khayati, M., Cudré-Mauroux, P.: CORAD: correlation-aware compression of massive time series using sparse dictionary coding. In: 2019 IEEE International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, December 9–12, 2019, pp. 2289–2298. IEEE (2019) Khelifati, A., Khayati, M., Cudré-Mauroux, P.: CORAD: correlation-aware compression of massive time series using sparse dictionary coding. In: 2019 IEEE International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, December 9–12, 2019, pp. 2289–2298. IEEE (2019)
39.
go back to reference Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data Engineering, March 5–8, 2003, Bangalore, India, pp. 429–440. IEEE Computer Society (2003) Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data Engineering, March 5–8, 2003, Bangalore, India, pp. 429–440. IEEE Computer Society (2003)
40.
go back to reference Liakos, P., Papakonstantinopoulou, K., Kotidis, Y.: Chimp: Efficient lossless floating point compression for time series databases. Proc. VLDB Endow. 15(11), 3058–3070 (2022)CrossRef Liakos, P., Papakonstantinopoulou, K., Kotidis, Y.: Chimp: Efficient lossless floating point compression for time series databases. Proc. VLDB Endow. 15(11), 3058–3070 (2022)CrossRef
41.
go back to reference Liu, C., Jiang, H., Paparrizos, J., Elmore, A.J.: Decomposed bounded floats for fast compression and queries. Proc. VLDB Endow. 14(11), 2586–2598 (2021)CrossRef Liu, C., Jiang, H., Paparrizos, J., Elmore, A.J.: Decomposed bounded floats for fast compression and queries. Proc. VLDB Endow. 14(11), 2586–2598 (2021)CrossRef
42.
go back to reference Lubba, C.H., Sethi, S.S., Knaute, P., Schultz, S.R., Fulcher, B.D., Jones, N.S.: catch22: Canonical time-series characteristics—selected through highly comparative time-series analysis. Data Min. Knowl. Discov. 33(6), 1821–1852 (2019)CrossRef Lubba, C.H., Sethi, S.S., Knaute, P., Schultz, S.R., Fulcher, B.D., Jones, N.S.: catch22: Canonical time-series characteristics—selected through highly comparative time-series analysis. Data Min. Knowl. Discov. 33(6), 1821–1852 (2019)CrossRef
43.
go back to reference Marascu, A., Pompey, P., Bouillet, E., Wurst, M., Verscheure, O., Grund, M., Cudré-Mauroux, P.: TRISTAN: real-time analytics on massive time series using sparse dictionary compression. In: 2014 IEEE International Conference on Big Data (IEEE BigData 2014), Washington, DC, USA, October 27–30, 2014, pp. 291–300. IEEE Computer Society (2014) Marascu, A., Pompey, P., Bouillet, E., Wurst, M., Verscheure, O., Grund, M., Cudré-Mauroux, P.: TRISTAN: real-time analytics on massive time series using sparse dictionary compression. In: 2014 IEEE International Conference on Big Data (IEEE BigData 2014), Washington, DC, USA, October 27–30, 2014, pp. 291–300. IEEE Computer Society (2014)
44.
go back to reference Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: 2009 Data Compression Conference (DCC 2009), 16–18 March 2009, Snowbird, UT, USA, pp. 193–202. IEEE Computer Society (2009) Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: 2009 Data Compression Conference (DCC 2009), 16–18 March 2009, Snowbird, UT, USA, pp. 193–202. IEEE Computer Society (2009)
45.
go back to reference Ong, G.H., Huang, S.: A data compression scheme for Chinese text files using Huffman coding and a two-level dictionary. Inf. Sci. 84(1 &2), 85–99 (1995)CrossRef Ong, G.H., Huang, S.: A data compression scheme for Chinese text files using Huffman coding and a two-level dictionary. Inf. Sci. 84(1 &2), 85–99 (1995)CrossRef
46.
go back to reference Pasco, R.C.: Source coding algorithms for fast data compression (ph.d. thesis abstr.). IEEE Trans. Inf. Theory 23(4), 548 (1977)CrossRef Pasco, R.C.: Source coding algorithms for fast data compression (ph.d. thesis abstr.). IEEE Trans. Inf. Theory 23(4), 548 (1977)CrossRef
47.
go back to reference Pelkonen, T., Franklin, S., Cavallaro, P., Huang, Q., Meza, J., Teller, J., Veeraraghavan, K.: Gorilla: a fast, scalable, in-memory time series database. Proc. VLDB Endow. 8(12), 1816–1827 (2015)CrossRef Pelkonen, T., Franklin, S., Cavallaro, P., Huang, Q., Meza, J., Teller, J., Veeraraghavan, K.: Gorilla: a fast, scalable, in-memory time series database. Proc. VLDB Endow. 8(12), 1816–1827 (2015)CrossRef
48.
go back to reference Ryabko, B.Y.: Data compression by means of a “book stack’’. Probl. Pered. Informat. 16(4), 16–21 (1980)MathSciNet Ryabko, B.Y.: Data compression by means of a “book stack’’. Probl. Pered. Informat. 16(4), 16–21 (1980)MathSciNet
49.
go back to reference Samulowitz, H., Reddy, C., Sabharwal, A., Sellmann, M.: Snappy: a simple algorithm portfolio. In: Theory and Applications of Satisfiability Testing - SAT 2013 - 16th International Conference, Helsinki, Finland, July 8–12, 2013. Proceedings, Volume 7962 of Lecture Notes in Computer Science, pp. 422–428. Springer (2013) Samulowitz, H., Reddy, C., Sabharwal, A., Sellmann, M.: Snappy: a simple algorithm portfolio. In: Theory and Applications of Satisfiability Testing - SAT 2013 - 16th International Conference, Helsinki, Finland, July 8–12, 2013. Proceedings, Volume 7962 of Lecture Notes in Computer Science, pp. 422–428. Springer (2013)
50.
go back to reference Seidel, R.: Small-dimensional linear programming and convex hulls made easy. Discret. Comput. Geom. 6, 423–434 (1991)MathSciNetCrossRef Seidel, R.: Small-dimensional linear programming and convex hulls made easy. Discret. Comput. Geom. 6, 423–434 (1991)MathSciNetCrossRef
51.
go back to reference Spiegel, J., Wira, P., Hermann, G.: A comparative experimental study of lossless compression algorithms for enhancing energy efficiency in smart meters. In: 16th IEEE International Conference on Industrial Informatics, INDIN 2018, Porto, Portugal, July 18–20, 2018, pp. 447–452. IEEE (2018) Spiegel, J., Wira, P., Hermann, G.: A comparative experimental study of lossless compression algorithms for enhancing energy efficiency in smart meters. In: 16th IEEE International Conference on Industrial Informatics, INDIN 2018, Porto, Portugal, July 18–20, 2018, pp. 447–452. IEEE (2018)
52.
go back to reference Walder, J., Krátký, M., Platos, J.: Fast fibonacci encoding algorithm. In: Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects, Stedronin-Plazy, Czech Republic, April 21–23, 2010, Volume 567 of CEUR Workshop Proceedings, pp. 72–83. CEUR-WS.org (2010) Walder, J., Krátký, M., Platos, J.: Fast fibonacci encoding algorithm. In: Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects, Stedronin-Plazy, Czech Republic, April 21–23, 2010, Volume 567 of CEUR Workshop Proceedings, pp. 72–83. CEUR-WS.org (2010)
53.
go back to reference Wang, C., Qiao, J., Huang, X., Song, S., Hou, H., Jiang, T., Rui, L., Wang, J., Sun, J.: Apache IoTDB: a time series database for IoT applications. Proc. ACM Manag. Data 1(2), 195:1-195:27 (2023)CrossRef Wang, C., Qiao, J., Huang, X., Song, S., Hou, H., Jiang, T., Rui, L., Wang, J., Sun, J.: Apache IoTDB: a time series database for IoT applications. Proc. ACM Manag. Data 1(2), 195:1-195:27 (2023)CrossRef
54.
go back to reference Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)CrossRef Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)CrossRef
55.
go back to reference Wong, R.C., Fu, A.W.: Mining top-k item sets over a sliding window based on zipfian distribution. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, Newport Beach, CA, USA, April 21–23, 2005, pp. 516–520. SIAM (2005) Wong, R.C., Fu, A.W.: Mining top-k item sets over a sliding window based on zipfian distribution. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, Newport Beach, CA, USA, April 21–23, 2005, pp. 516–520. SIAM (2005)
56.
go back to reference Yousri, R., Alsenwi, M., Saeed Darweesh, M., Ismail, T.: A design for an efficient hybrid compression system for EEG data. In: 2021 International Conference on Electronic Engineering (ICEEM), pp. 1–6 (2021) Yousri, R., Alsenwi, M., Saeed Darweesh, M., Ismail, T.: A design for an efficient hybrid compression system for EEG data. In: 2021 International Conference on Electronic Engineering (ICEEM), pp. 1–6 (2021)
57.
go back to reference Yu, X., Peng, Y., Li, F., Wang, S., Shen, X., Mai, H., Xie, Y.: Two-level data compression using machine learning in time series database. In: 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20–24, 2020, pp. 1333–1344. IEEE (2020) Yu, X., Peng, Y., Li, F., Wang, S., Shen, X., Mai, H., Xie, Y.: Two-level data compression using machine learning in time series database. In: 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20–24, 2020, pp. 1333–1344. IEEE (2020)
58.
go back to reference Yu, H., Huang, F., Lin, C.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011) Yu, H., Huang, F., Lin, C.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011)
59.
go back to reference Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977) Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
Metadata
Title
Time series data encoding in Apache IoTDB: comparative analysis and recommendation
Authors
Tianrui Xia
Jinzhao Xiao
Yuxiang Huang
Changyu Hu
Shaoxu Song
Xiangdong Huang
Jianmin Wang
Publication date
12-02-2024
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 3/2024
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-024-00840-5

Other articles of this Issue 3/2024

The VLDB Journal 3/2024 Go to the issue

Regular Paper

MM-DIRECT

Premium Partner