Published in:

2004 | OriginalPaper | Chapter

Providing Efficient I/O Redundancy in MPI Environments

Authors : Willam D. Gropp, Robert Ross, Neill Miller

Published in: Recent Advances in Parallel Virtual Machine and Message Passing Interface

Publisher: Springer Berlin Heidelberg

Included in: Professional Book Archive

Get Access

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Highly parallel applications often use either highly parallel file systems or large numbers of independent disks. Either approach can provide the high data rates necessary for parallel applications. However, the failure of a single disk or server can render the data useless. Conventional techniques, such as those based on applying erasure correcting codes to each file write, are prohibitively expensive for massively parallel scientific applications because of the granularity of access at which the codes are applied. In this paper we demonstrate a scalable method for recovering from single disk failures that is optimized for typical scientific data sets. This approach exploits coarser-grained (but precise) semantics to reduce the overhead of constructing recovery data and makes use of parallel computation (proportional to the data size and independent of number of processors) to construct data. Experiments are presented showing the efficiency of this approach on a cluster with independent disks, and a technique is described for hiding the creation of redundant data within the MPI-IO implementation.

Springer Professional

Providing Efficient I/O Redundancy in MPI Environments

Premium Partner