Skip to main content
Top
Published in: International Journal of Parallel Programming 3/2015

01-06-2015

Data Reduction Analysis for Climate Data Sets

Authors: Songbin Liu, Xiaomeng Huang, Haohuan Fu, Guangwen Yang, Zhenya Song

Published in: International Journal of Parallel Programming | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Global climate modeling not only requires computation capabilities, but also brings tough challenges for data storage systems. The input and output data sets generally require hundreds or even thousands of terabytes storage. Therefore, storage reduction methods, such as content deduplication and various data compression methods, are extremely important for reducing the storage size requirement in climate modeling. However, little work has been done on investigating the effectiveness of these data reduction methods for climate data sets. In this paper, the potential benefit of data reduction for climate data is studied by investigating a total of 46.5 TB climate data sets, including 3 observation data sets (14.1 TB) and 3 climate model output data sets (32.4 TB). Five different data compression algorithms and two types of content deduplication mechanisms are applied to these data sets to study the possible data reduction effectiveness. Further more, the compressibility of different climate component data is also examined. Our work demonstrates the potential of applying data reduction methods in climate modeling platforms, and provides guidance for selecting the suitable methods for different kinds of climate data sets. We find that the compression method \({LCFP}\) can provide the best compression ratio; however, its throughputs, especially the inflate throughputs are much lower than all the others. To strike a better balance between compression ratio and throughputs, we propose a new compression method for the model output data. The new compression method can achieve comparable compression ratio, while attain about 20 times higher inflate throughput than that of \({LCFP}\).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference 120.0-G-2, C. Lossless data compression. In: Report Concerning Space Data System Standards (2006), Green Book, Issue 2 120.0-G-2, C. Lossless data compression. In: Report Concerning Space Data System Standards (2006), Green Book, Issue 2
2.
go back to reference Biggar, H.: Experiencing data de-duplication: improving efficiency and reducing capacity requirements. The Enterprise Strategy Group (2007) Biggar, H.: Experiencing data de-duplication: improving efficiency and reducing capacity requirements. The Enterprise Strategy Group (2007)
3.
go back to reference Burtscher, M., Ratanaworabhan, P.: Fpc: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2009)CrossRefMathSciNet Burtscher, M., Ratanaworabhan, P.: Fpc: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2009)CrossRefMathSciNet
6.
go back to reference Constantinescu, C., Glider, J., Chambliss, D.: Mixing deduplication and compression on active data sets. In: Data Compression Conference (DCC), 2011, IEEE, pp. 393–402 (2011) Constantinescu, C., Glider, J., Chambliss, D.: Mixing deduplication and compression on active data sets. In: Data Compression Conference (DCC), 2011, IEEE, pp. 393–402 (2011)
8.
go back to reference Eshghi, K., Tang, H.: A framework for analyzing and improving content-based chunking algorithms. Hewlett-Packard Labs Technical Report TR 30 (2005) Eshghi, K., Tang, H.: A framework for analyzing and improving content-based chunking algorithms. Hewlett-Packard Labs Technical Report TR 30 (2005)
12.
go back to reference Hong, B., Plantenberg, D., Long, D., Sivan-Zimet, M.: Duplicate data elimination in a san file system. In: Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies), pp. 301–314 (2004) Hong, B., Plantenberg, D., Long, D., Sivan-Zimet, M.: Duplicate data elimination in a san file system. In: Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies), pp. 301–314 (2004)
13.
go back to reference Ibarria, L., Lindstrom, P., Rossignac, J., Szymczak, A.: Out-of-core compression and decompression of large n-dimensional scalar fields. In: Computer Graphics Forum (2003), vol. 22, Wiley Online Library, pp. 343–348 Ibarria, L., Lindstrom, P., Rossignac, J., Szymczak, A.: Out-of-core compression and decompression of large n-dimensional scalar fields. In: Computer Graphics Forum (2003), vol. 22, Wiley Online Library, pp. 343–348
14.
go back to reference Isenburg, M., Lindstrom, P., Snoeyink, J.: Lossless compression of predicted floating-point geometry. Comput.-Aided Des. 37(8), 869–877 (2005)CrossRefMATH Isenburg, M., Lindstrom, P., Snoeyink, J.: Lossless compression of predicted floating-point geometry. Comput.-Aided Des. 37(8), 869–877 (2005)CrossRefMATH
15.
go back to reference Jin, K., Miller, E.: The effectiveness of deduplication on virtual machine disk images. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, ACM, p. 7 (2009) Jin, K., Miller, E.: The effectiveness of deduplication on virtual machine disk images. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, ACM, p. 7 (2009)
16.
go back to reference Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of the USENIX Annual Technical Conference, pp. 59–72 (2004) Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of the USENIX Annual Technical Conference, pp. 59–72 (2004)
17.
go back to reference Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.: Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. Euro-Par 2011 Parallel Processing, pp. 366–379 (2011) Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.: Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. Euro-Par 2011 Parallel Processing, pp. 366–379 (2011)
18.
go back to reference Lu, M., Chambliss, D., Glider, J., Constantinescu, C.: Insights for data reduction in primary storage: a practical analysis. In: Proceedings of the 5th Annual International Systems and Storage Conference, ACM, p. 17 (2012) Lu, M., Chambliss, D., Glider, J., Constantinescu, C.: Insights for data reduction in primary storage: a practical analysis. In: Proceedings of the 5th Annual International Systems and Storage Conference, ACM, p. 17 (2012)
21.
go back to reference Meister, D., Brinkmann, A.: Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, ACM, p. 8 (2009) Meister, D., Brinkmann, A.: Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, ACM, p. 8 (2009)
22.
go back to reference Meister, D., Kaiser, J., Brinkmann, A., Cortes, T., Kuhn, M., Kunkel, J.: A study on data deduplication in hpc storage systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, p. 7 (2012) Meister, D., Kaiser, J., Brinkmann, A., Cortes, T., Kuhn, M., Kunkel, J.: A study on data deduplication in hpc storage systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, p. 7 (2012)
23.
go back to reference Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: ACM SIGOPS Operating Systems Review, vol. 35. ACM, pp. 174–187 (2001) Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: ACM SIGOPS Operating Systems Review, vol. 35. ACM, pp. 174–187 (2001)
26.
go back to reference Overpeck, J., Meehl, G., Bony, S., Easterling, D.: Climate data challenges in the 21st century. Science 331(6018), 700–702 (2011)CrossRef Overpeck, J., Meehl, G., Bony, S., Easterling, D.: Climate data challenges in the 21st century. Science 331(6018), 700–702 (2011)CrossRef
27.
go back to reference Park, N., Lilja, D.J.: Characterizing datasets for data deduplication in backup applications. In: Workload Characterization (IISWC), 2010 IEEE International Symposium on (2010), IEEE, pp. 1–10 Park, N., Lilja, D.J.: Characterizing datasets for data deduplication in backup applications. In: Workload Characterization (IISWC), 2010 IEEE International Symposium on (2010), IEEE, pp. 1–10
28.
go back to reference Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies, vol. 4 (2002) Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies, vol. 4 (2002)
29.
go back to reference Rice, R.F.: Practical universal noiseless coding. In: 23rd Annual Technical Symposium. International Society for Optics and Photonics, pp. 247–267 (1979) Rice, R.F.: Practical universal noiseless coding. In: 23rd Annual Technical Symposium. International Society for Optics and Photonics, pp. 247–267 (1979)
30.
go back to reference Schendel, E.R., Pendse, S.V., Jenkins, J., Boyuka II, D.A., Gong, Z., Lakshminarasimhan, S., Liu, Q., Kolla, H., Chen, J., Klasky, S., et al.: Isobar hybrid compression-i/o interleaving for large-scale parallel i/o optimization. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 61–72 (2012) Schendel, E.R., Pendse, S.V., Jenkins, J., Boyuka II, D.A., Gong, Z., Lakshminarasimhan, S., Liu, Q., Kolla, H., Chen, J., Klasky, S., et al.: Isobar hybrid compression-i/o interleaving for large-scale parallel i/o optimization. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 61–72 (2012)
31.
go back to reference Schmalzl, J.: Using standard image compression algorithms to store data from computational fluid dynamics. Comput. Geosci. 29(8), 1021–1031 (2003)CrossRef Schmalzl, J.: Using standard image compression algorithms to store data from computational fluid dynamics. Comput. Geosci. 29(8), 1021–1031 (2003)CrossRef
32.
go back to reference Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: idedup: latency-aware, inline data deduplication for primary storage. In: Proceedings of the Tenth USENIX Conference on File and Storage Technologies (FAST12), San Jose, CA (2012) Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: idedup: latency-aware, inline data deduplication for primary storage. In: Proceedings of the Tenth USENIX Conference on File and Storage Technologies (FAST12), San Jose, CA (2012)
33.
go back to reference Taylor, K., Stouffer, R., Meehl, G.: An overview of cmip5 and the experiment design. Bull. Am. Meteorol. Soc. 93(4), 485 (2012)CrossRef Taylor, K., Stouffer, R., Meehl, G.: An overview of cmip5 and the experiment design. Bull. Am. Meteorol. Soc. 93(4), 485 (2012)CrossRef
35.
go back to reference Wallace, G., Douglis, F., Qian, H., Shilane, P., Smaldone, S., Chamness, M., Hsu, W.: Characteristics of backup workloads in production systems. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies (Berkeley, CA, USA, 2012), FAST’12, USENIX Association, pp. 4–4 Wallace, G., Douglis, F., Qian, H., Shilane, P., Smaldone, S., Chamness, M., Hsu, W.: Characteristics of backup workloads in production systems. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies (Berkeley, CA, USA, 2012), FAST’12, USENIX Association, pp. 4–4
36.
go back to reference Wang, C., Yu, H., Ma, K.-L.: Application-driven compression for visualizing large-scale time-varying data. IEEE Comput. Gr. Appl. 30(1), 59–69 (2010)CrossRefMathSciNet Wang, C., Yu, H., Ma, K.-L.: Application-driven compression for visualizing large-scale time-varying data. IEEE Comput. Gr. Appl. 30(1), 59–69 (2010)CrossRefMathSciNet
37.
go back to reference Welton, B., Kimpe, D., Cope, J., Patrick, C.M., Iskra, K., Ross, R.: Improving i/o forwarding throughput with data compression. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on (2011), IEEE, pp. 438–445 Welton, B., Kimpe, D., Cope, J., Patrick, C.M., Iskra, K., Ross, R.: Improving i/o forwarding throughput with data compression. In: Cluster Computing (CLUSTER), 2011 IEEE International Conference on (2011), IEEE, pp. 438–445
38.
go back to reference Wessel, P.: Compression of large data grids for internet transmission. Comput. Geosci. 29(5), 665–671 (2003)CrossRef Wessel, P.: Compression of large data grids for internet transmission. Comput. Geosci. 29(5), 665–671 (2003)CrossRef
39.
go back to reference Wheeler, D., Burrows, M.: A block-sorting lossless data compression algorithm. Digital Systems Research Center Report 124 (1994) Wheeler, D., Burrows, M.: A block-sorting lossless data compression algorithm. Digital Systems Research Center Report 124 (1994)
41.
go back to reference Yeh, P.-S., Xia-Serafino, W., Miles, L., Kobler, B., Menasce, D.: Implementation of ccsds lossless data compression in hdf. In: Earth Science Technology Conference (2002) Yeh, P.-S., Xia-Serafino, W., Miles, L., Kobler, B., Menasce, D.: Implementation of ccsds lossless data compression in hdf. In: Earth Science Technology Conference (2002)
42.
go back to reference Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, vol. 18 (2008) Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, vol. 18 (2008)
43.
Metadata
Title
Data Reduction Analysis for Climate Data Sets
Authors
Songbin Liu
Xiaomeng Huang
Haohuan Fu
Guangwen Yang
Zhenya Song
Publication date
01-06-2015
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 3/2015
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0287-0

Other articles of this Issue 3/2015

International Journal of Parallel Programming 3/2015 Go to the issue

Premium Partner