2004 | OriginalPaper | Chapter
Providing Efficient I/O Redundancy in MPI Environments
Authors : Willam D. Gropp, Robert Ross, Neill Miller
Published in: Recent Advances in Parallel Virtual Machine and Message Passing Interface
Publisher: Springer Berlin Heidelberg
Included in: Professional Book Archive
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Highly parallel applications often use either highly parallel file systems or large numbers of independent disks. Either approach can provide the high data rates necessary for parallel applications. However, the failure of a single disk or server can render the data useless. Conventional techniques, such as those based on applying erasure correcting codes to each file write, are prohibitively expensive for massively parallel scientific applications because of the granularity of access at which the codes are applied. In this paper we demonstrate a scalable method for recovering from single disk failures that is optimized for typical scientific data sets. This approach exploits coarser-grained (but precise) semantics to reduce the overhead of constructing recovery data and makes use of parallel computation (proportional to the data size and independent of number of processors) to construct data. Experiments are presented showing the efficiency of this approach on a cluster with independent disks, and a technique is described for hiding the creation of redundant data within the MPI-IO implementation.