No abstract available.
Proceeding Downloads
Mixing Hadoop and HPC workloads on parallel filesystems
MapReduce-tailored distributed filesystems---such as HDFS for Hadoop MapReduce---and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of each ...
DiskReduce: RAID for data-intensive scalable computing
Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has ...
Data layout optimization for petascale file systems
In this study, the authors propose a simple performance model to promote a better integration between the parallel I/O middleware layer and parallel file systems. They show that application-specific data layout optimization can improve overall data ...
Case studies in storage access by loosely coupled petascale applications
A large number of real-world scientific applications can be characterized as loosely coupled: the communication among tasks is infrequent and can be performed by using file operations. While these applications may be ported to large scale machines ...
...and eat it too: high read performance in write-optimized HPC I/O middleware file formats
- Milo Polte,
- Jay Lofstead,
- John Bent,
- Garth Gibson,
- Scott A. Klasky,
- Qing Liu,
- Manish Parashar,
- Norbert Podhorszki,
- Karsten Schwan,
- Meghan Wingate,
- Matthew Wolf
As HPC applications run on increasingly high process counts on larger and larger machines, both the frequency of checkpoints needed for fault tolerance [14] and the resolution and size of Data Analysis Dumps are expected to increase proportionally. In ...
Scalable I/O tracing and analysis
As supercomputer performance approached and then surpassed the petaflop level, I/O performance has become a major performance bottleneck for many scientific applications. Several tools exist to collect I/O traces to assist in the analysis of I/O ...
pNFS, POSIX, and MPI-IO: a tale of three semantics
MPI-IO is emerging as the standard mechanism for file I/O within HPC applications. While pNFS demonstrates high-performance I/O for bulk data transfers, its performance and scalability with MPI-IO is unproven. To attain success, the consistency ...
Uncovering errors: the cost of detecting silent data corruption
Data integrity is pivotal to the usefulness of any storage system. It ensures that the data stored is free from any modification throughout its existence on the storage medium. Hash functions such as cyclic redundancy checks or check-sums are frequently ...
Fusing data management services with file systems
File systems are the backbone of large-scale data processing for scientific applications. Motivated by the need to provide an extensible and flexible framework beyond the abstractions provided by API libraries for files to manage and analyze large-scale ...
Using the Active Storage Fabrics model to address petascale storage challenges
We present the Active Storage Fabrics (ASF) model for storage embedded parallel processing as a way to address petascale data intensive challenges. ASF is aimed at emerging scalable system-on-a-chip, storage class memory architectures, but may be ...