nach oben

Cluster Computing

Erschienen in:

01.06.2014

neCODEC: nearline data compression for scientific applications

verfasst von: Yuan Tian, Cong Xu, Weikuan Yu, Jeffrey S. Vetter, Scott Klasky, Honggao Liu, Saad Biaz

Erschienen in: Cluster Computing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Advances on multicore technologies lead to processors with tens and soon hundreds of cores in a single socket, resulting in an ever growing gap between computing power and available memory and I/O bandwidths for data handling. It would be beneficial if some of the computing power can be transformed into gains of I/O efficiency, thereby reducing this speed disparity between computing and I/O. In this paper, we design and implement a NEarline data COmpression and DECompression (neCODEC) scheme for data-intensive parallel applications. Several salient techniques are introduced in neCODEC, including asynchronous compression threads, elastic file representation, distributed metadata handling, and balanced subfile distribution. Our performance evaluation indicates that neCODEC can improve the performance of a variety of data-intensive microbenchmarks and scientific applications. Particularly, neCODEC is capable of increasing the effective bandwidth of S3D, a combustion simulation code, by more than 5 times.

Vorheriger Artikel Seamless Paxos coordinators

Nächster Artikel Cloud-hosted databases: technologies, challenges and opportunities

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

NetCDF-4. http://www.unidata.ucar.edu/software/netcdf

The parallel virtual file system, version 2. http://www.pvfs.org/pvfs2

Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K.: Datastager: scalable data staging services for petascale applications. In: HPDC ’09, New York, NY, USA (2009)

Adiga, N., Almasi, G., Almasi, G., et al.: An overview of the BlueGene/l supercomputer. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (Supercomputing ’02), Los Alamitos, CA, USA, pp. 1–22 (2002)

Chen, J.H., et al.: Terascale direct numerical simulations of turbulent combustion using S3D. Comput Sci. Discov. 2(1), 015001 (2009). http://stacks.iop.org/1749-4699/2/015001 CrossRef

Cluster File System, Inc.: Lustre: a scalable, high performance file system. http://www.lustre.org/docs.html

Gong, Z., Lakshminarasimhan, S., Jenkins, J., Kolla, H., Ethier, S., Chen, J., Ross, R., Klasky, S., Samatova, N.: Multi-level layout optimization for efficient spatio-temporal queries on Isabela-compressed data. In: 2012 IEEE 26th International, Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE Press, New York (2012) CrossRef

Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996) CrossRefMATH

Jenter, H.L., Signell, R.P.: NetCDF: a public-domain-software solution to data-access problems for numerical modelers (1992)

10.

Klasky, S., Ethier, S., Lin, Z., Martins, K., McCune, D., Samtaney, R.: Grid -based parallel data streaming implemented for the gyrokinetic toroidal code. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing (SC’03), p. 24, Washington, DC, USA, (2003). http://portal.acm.org/citation.cfm?id=1048935.1050175 CrossRef

11.

Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.: Compressing the incompressible with Isabela: in-situ reduction of spatio-temporal data. In: Euro-Par 2011 Parallel Processing, pp. 366–379 (2011) CrossRef

12.

Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R., Samatova, N.: Isabela for effective in situ compression of scientific data. Concurr. Comput. 25, 524–540 (2013) CrossRef

13.

Li, J., Liao, W., Choudhary, A., Ross, R., Thakur, R., Gropp, W., Latham, R.: Parallel netCDF: a high performance scientific I/O interface. In: Proceedings of the Supercomputing ’03 (2003)

14.

Liao, W.k., Choudhary, A.: Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), Piscataway, NJ, USA, pp. 1–12 (2008)

15.

Lofstead, J., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible I/O and integration for scientific codes through the adaptable I/O system (adios). In: 6th International Workshop on Challenges of Large Applications in Distributed Environments, Boston, MA (2008)

16.

Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Parallel and Distributed Processing International Symposium, pp. 1–10 (2009)

17.

Ma, X., Winslett, M., Lee, J., Yu, S.: Improving MPI–IO output performance with active buffering plus threads. In: Proceedings of International Parallel and Distributed Processing Symposium, p. 10 (2003). doi:10.1109/IPDPS.2003.1213165

18.

Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting practical content-addressable caching with czip compression. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (ATC’07), Berkeley, CA, USA, pp. 1–14 (2007)

19.

Prost, J.P., Treumann, R., Hedges, R., Jia, B., Koniges, A.: MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS. In: Proceedings of Supercomputing’01 (2001)

20.

Thakur, R., Ross, R., Latham, R., Lusk, R., Gropp, B.: Romio: a high-performance, portable MPI-IO implementation (2012). http://www.mcs.anl.gov/research/projects/romio/

21.

Ross, R.: Parallel I/O benchmarking consortium. http://www-unix.mcs.anl.gov/rross/pio-benchmark/html/

22.

Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: FAST’02, pp. 231–244. USENIX, Berkeley (2002)

23.

Tatebe, O., Morita, Y., Matsuoka, S., Soda, N., Sekiguchi, S.: Grid datafarm architecture for petascale data intensive computing. In: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), Washington, DC, USA, p. 102 (2002) CrossRef

24.

Thakur, R., Choudhary, A.: An extended two-phase method for accessing sections of out-of-core arrays. Sci. Program. 5(4), 301–317 (1996)

25.

Thakur, R., Gropp, W., Lusk, E.: An abstract-device interface for implementing portable paralle-I/O interfaces. In: Proceedings of the Sixth Symposium on the Frontiers of Massively Parallel Computation (Frontiers ’96) (1996). http://www.mcs.anl.gov/home/thakur/adio.ps

26.

Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999) CrossRef

27.

Thakur, R., Gropp, W., Lusk, E.: On implementing MPI–IO portably and with high performance. In: Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, pp. 23–32. ACM Press, New York (1999) CrossRef

28.

The National Center for SuperComputing. HDF5 home page. http://hdf.ncsa.uiuc.com/HPD5/

29.

Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing tunable consistency for a parallel file store. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST’05), Berkeley, CA, USA, pp. 2 (2005)

30.

Wong, P., Van der Wijngaart, R.F.: NAS parallel benchmarks I/O, version 2.4. Tech. rep. NAS-03-002, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division

31.

Yu, W., Vetter, J.: ParColl: partitioned collective I/O on the cray XT. In: International Conference on Parallel Processing (ICPP’08), Portland, OR (2008)

32.

Yu, W., Vetter, J., Canon, R., Jiang, S.: Exploiting lustre file joining for effective collective I/O. In: 7th Int’l Conference on Cluster Computing and Grid (CCGrid’07), Rio de Janeiro, Brazil (2007)

33.

Yu, W., Vetter, J., Oral, H.: Performance characterization and optimization of parallel I/O on the cray XT. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), Miami, FL (2008)

34.

Zheng, F., et al.: Predata—preparatory data analytics on peta-scale machines. In: IPDPS, Atlanta, GA (2010)

Titel: neCODEC: nearline data compression for scientific applications
verfasst von: Yuan Tian
Cong Xu
Weikuan Yu
Jeffrey S. Vetter
Scott Klasky
Honggao Liu
Saad Biaz
Publikationsdatum: 01.06.2014
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 2/2014
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-013-0265-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2014

Solving the three dimensional quadratic assignment problem on a computational grid

Heuristic quadratic approximation for the universality theorem

Special issue on unconventional cluster architectures and applications

Special issue on soft computing techniques in cluster and grid computing systems

GPGPU implementation of the BFECC algorithm for pure advection equations

Service models and pricing schemes for cloud computing