Skip to main content
Erschienen in: Cluster Computing 2/2014

01.06.2014

neCODEC: nearline data compression for scientific applications

verfasst von: Yuan Tian, Cong Xu, Weikuan Yu, Jeffrey S. Vetter, Scott Klasky, Honggao Liu, Saad Biaz

Erschienen in: Cluster Computing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Advances on multicore technologies lead to processors with tens and soon hundreds of cores in a single socket, resulting in an ever growing gap between computing power and available memory and I/O bandwidths for data handling. It would be beneficial if some of the computing power can be transformed into gains of I/O efficiency, thereby reducing this speed disparity between computing and I/O. In this paper, we design and implement a NEarline data COmpression and DECompression (neCODEC) scheme for data-intensive parallel applications. Several salient techniques are introduced in neCODEC, including asynchronous compression threads, elastic file representation, distributed metadata handling, and balanced subfile distribution. Our performance evaluation indicates that neCODEC can improve the performance of a variety of data-intensive microbenchmarks and scientific applications. Particularly, neCODEC is capable of increasing the effective bandwidth of S3D, a combustion simulation code, by more than 5 times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K.: Datastager: scalable data staging services for petascale applications. In: HPDC ’09, New York, NY, USA (2009) Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K.: Datastager: scalable data staging services for petascale applications. In: HPDC ’09, New York, NY, USA (2009)
4.
Zurück zum Zitat Adiga, N., Almasi, G., Almasi, G., et al.: An overview of the BlueGene/l supercomputer. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (Supercomputing ’02), Los Alamitos, CA, USA, pp. 1–22 (2002) Adiga, N., Almasi, G., Almasi, G., et al.: An overview of the BlueGene/l supercomputer. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (Supercomputing ’02), Los Alamitos, CA, USA, pp. 1–22 (2002)
7.
Zurück zum Zitat Gong, Z., Lakshminarasimhan, S., Jenkins, J., Kolla, H., Ethier, S., Chen, J., Ross, R., Klasky, S., Samatova, N.: Multi-level layout optimization for efficient spatio-temporal queries on Isabela-compressed data. In: 2012 IEEE 26th International, Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE Press, New York (2012) CrossRef Gong, Z., Lakshminarasimhan, S., Jenkins, J., Kolla, H., Ethier, S., Chen, J., Ross, R., Klasky, S., Samatova, N.: Multi-level layout optimization for efficient spatio-temporal queries on Isabela-compressed data. In: 2012 IEEE 26th International, Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE Press, New York (2012) CrossRef
8.
Zurück zum Zitat Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996) CrossRefMATH Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996) CrossRefMATH
9.
Zurück zum Zitat Jenter, H.L., Signell, R.P.: NetCDF: a public-domain-software solution to data-access problems for numerical modelers (1992) Jenter, H.L., Signell, R.P.: NetCDF: a public-domain-software solution to data-access problems for numerical modelers (1992)
11.
Zurück zum Zitat Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.: Compressing the incompressible with Isabela: in-situ reduction of spatio-temporal data. In: Euro-Par 2011 Parallel Processing, pp. 366–379 (2011) CrossRef Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.: Compressing the incompressible with Isabela: in-situ reduction of spatio-temporal data. In: Euro-Par 2011 Parallel Processing, pp. 366–379 (2011) CrossRef
12.
Zurück zum Zitat Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R., Samatova, N.: Isabela for effective in situ compression of scientific data. Concurr. Comput. 25, 524–540 (2013) CrossRef Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R., Samatova, N.: Isabela for effective in situ compression of scientific data. Concurr. Comput. 25, 524–540 (2013) CrossRef
13.
Zurück zum Zitat Li, J., Liao, W., Choudhary, A., Ross, R., Thakur, R., Gropp, W., Latham, R.: Parallel netCDF: a high performance scientific I/O interface. In: Proceedings of the Supercomputing ’03 (2003) Li, J., Liao, W., Choudhary, A., Ross, R., Thakur, R., Gropp, W., Latham, R.: Parallel netCDF: a high performance scientific I/O interface. In: Proceedings of the Supercomputing ’03 (2003)
14.
Zurück zum Zitat Liao, W.k., Choudhary, A.: Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), Piscataway, NJ, USA, pp. 1–12 (2008) Liao, W.k., Choudhary, A.: Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), Piscataway, NJ, USA, pp. 1–12 (2008)
15.
Zurück zum Zitat Lofstead, J., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible I/O and integration for scientific codes through the adaptable I/O system (adios). In: 6th International Workshop on Challenges of Large Applications in Distributed Environments, Boston, MA (2008) Lofstead, J., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible I/O and integration for scientific codes through the adaptable I/O system (adios). In: 6th International Workshop on Challenges of Large Applications in Distributed Environments, Boston, MA (2008)
16.
Zurück zum Zitat Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Parallel and Distributed Processing International Symposium, pp. 1–10 (2009) Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Parallel and Distributed Processing International Symposium, pp. 1–10 (2009)
17.
Zurück zum Zitat Ma, X., Winslett, M., Lee, J., Yu, S.: Improving MPI–IO output performance with active buffering plus threads. In: Proceedings of International Parallel and Distributed Processing Symposium, p. 10 (2003). doi:10.1109/IPDPS.2003.1213165 Ma, X., Winslett, M., Lee, J., Yu, S.: Improving MPI–IO output performance with active buffering plus threads. In: Proceedings of International Parallel and Distributed Processing Symposium, p. 10 (2003). doi:10.​1109/​IPDPS.​2003.​1213165
18.
Zurück zum Zitat Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting practical content-addressable caching with czip compression. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (ATC’07), Berkeley, CA, USA, pp. 1–14 (2007) Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting practical content-addressable caching with czip compression. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (ATC’07), Berkeley, CA, USA, pp. 1–14 (2007)
19.
Zurück zum Zitat Prost, J.P., Treumann, R., Hedges, R., Jia, B., Koniges, A.: MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS. In: Proceedings of Supercomputing’01 (2001) Prost, J.P., Treumann, R., Hedges, R., Jia, B., Koniges, A.: MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS. In: Proceedings of Supercomputing’01 (2001)
22.
Zurück zum Zitat Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: FAST’02, pp. 231–244. USENIX, Berkeley (2002) Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: FAST’02, pp. 231–244. USENIX, Berkeley (2002)
23.
Zurück zum Zitat Tatebe, O., Morita, Y., Matsuoka, S., Soda, N., Sekiguchi, S.: Grid datafarm architecture for petascale data intensive computing. In: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), Washington, DC, USA, p. 102 (2002) CrossRef Tatebe, O., Morita, Y., Matsuoka, S., Soda, N., Sekiguchi, S.: Grid datafarm architecture for petascale data intensive computing. In: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), Washington, DC, USA, p. 102 (2002) CrossRef
24.
Zurück zum Zitat Thakur, R., Choudhary, A.: An extended two-phase method for accessing sections of out-of-core arrays. Sci. Program. 5(4), 301–317 (1996) Thakur, R., Choudhary, A.: An extended two-phase method for accessing sections of out-of-core arrays. Sci. Program. 5(4), 301–317 (1996)
26.
Zurück zum Zitat Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999) CrossRef Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999) CrossRef
27.
Zurück zum Zitat Thakur, R., Gropp, W., Lusk, E.: On implementing MPI–IO portably and with high performance. In: Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, pp. 23–32. ACM Press, New York (1999) CrossRef Thakur, R., Gropp, W., Lusk, E.: On implementing MPI–IO portably and with high performance. In: Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, pp. 23–32. ACM Press, New York (1999) CrossRef
29.
Zurück zum Zitat Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing tunable consistency for a parallel file store. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST’05), Berkeley, CA, USA, pp. 2 (2005) Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing tunable consistency for a parallel file store. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST’05), Berkeley, CA, USA, pp. 2 (2005)
30.
Zurück zum Zitat Wong, P., Van der Wijngaart, R.F.: NAS parallel benchmarks I/O, version 2.4. Tech. rep. NAS-03-002, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division Wong, P., Van der Wijngaart, R.F.: NAS parallel benchmarks I/O, version 2.4. Tech. rep. NAS-03-002, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division
31.
Zurück zum Zitat Yu, W., Vetter, J.: ParColl: partitioned collective I/O on the cray XT. In: International Conference on Parallel Processing (ICPP’08), Portland, OR (2008) Yu, W., Vetter, J.: ParColl: partitioned collective I/O on the cray XT. In: International Conference on Parallel Processing (ICPP’08), Portland, OR (2008)
32.
Zurück zum Zitat Yu, W., Vetter, J., Canon, R., Jiang, S.: Exploiting lustre file joining for effective collective I/O. In: 7th Int’l Conference on Cluster Computing and Grid (CCGrid’07), Rio de Janeiro, Brazil (2007) Yu, W., Vetter, J., Canon, R., Jiang, S.: Exploiting lustre file joining for effective collective I/O. In: 7th Int’l Conference on Cluster Computing and Grid (CCGrid’07), Rio de Janeiro, Brazil (2007)
33.
Zurück zum Zitat Yu, W., Vetter, J., Oral, H.: Performance characterization and optimization of parallel I/O on the cray XT. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), Miami, FL (2008) Yu, W., Vetter, J., Oral, H.: Performance characterization and optimization of parallel I/O on the cray XT. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), Miami, FL (2008)
34.
Zurück zum Zitat Zheng, F., et al.: Predata—preparatory data analytics on peta-scale machines. In: IPDPS, Atlanta, GA (2010) Zheng, F., et al.: Predata—preparatory data analytics on peta-scale machines. In: IPDPS, Atlanta, GA (2010)
Metadaten
Titel
neCODEC: nearline data compression for scientific applications
verfasst von
Yuan Tian
Cong Xu
Weikuan Yu
Jeffrey S. Vetter
Scott Klasky
Honggao Liu
Saad Biaz
Publikationsdatum
01.06.2014
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2014
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-013-0265-8

Weitere Artikel der Ausgabe 2/2014

Cluster Computing 2/2014 Zur Ausgabe