ABSTRACT
Preserving data for a long period of time in the face of faults, large and small, is crucial for designing reliable archival storage systems. However, the survivability of data is different from the reliability of storage because typically, data are stored in more than one storage at a given moment. Previous studies of reliability ignore the former. We present a framework for relating data survivability and storage reliability, and use the framework to gauge the impact of rare but large-scale events on data survivability. We also present a method to track all copies of data and the condition of all the online and offline media, devices and systems on which they are stored uninterruptedly over the whole lifetime of the data. With this method, the survivability of the data can be closely monitored, and potential dangers can be handled in a timely manner. A better understanding of data survivability can be used in reducing unnecessary data replicas, thus reducing the cost.
- S. O. Akçiz, L. G. Ludwig, J. R. Arrowsmith, and O. Zielke. Century-long average time intervals between earthquake ruptures of the San Andreas fault in the Carrizo Plain, California. Geology, 38: 787--790, Sept. 2010.Google ScholarCross Ref
- L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. An analysis of data corruption in the storage stack. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), pages 223--238, Feb. 2008. Google ScholarDigital Library
- M. Baker, M. Shah, D. S. H. Rosenthal, M. Roussopoulos, P. Maniatis, T. Giuli, and P. Bungale. A fresh look at the reliability of long-term digital storage. In Proceedings of EuroSys 2006, pages 221--234, Apr. 2006. Google ScholarDigital Library
- D. Bhagwat, K. Pollack, D. D. E. Long, E. L. Miller, J.-F. Pâris, and T. Schwarz, S. J. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '06), Monterey, CA, Sept. 2006. Google ScholarDigital Library
- A. M. Blum, A. Goyal, P. Heidelberger, S. S. Lavenberg, M. K. Nakayama, and P. Shahbuddin. Modeling and analysis of system dependability using the system availability estimator. In Proceedings of the 24th International Symposium on Fault-Tolerant Computing (FTCS '94), pages 137--141, 1994.Google ScholarCross Ref
- Y. Bozorgnia and V. V. Bertero. Earthquake engineering: from engineering seismology to performance-based engineering. CRC Press LLC, 2006.Google Scholar
- R. Chalfant. Tape: A collapsing star. http://www.mainframezone.com/storage/backup-recovery-business-continuity/tape-a-collapsing-star, 2010.Google Scholar
- J. G. Elerath. Specifying reliability in the disk drive industry: No more MTBF's. In Proceedings of 2000 Annual Reliability and Maintainability Symposium, pages 194--199. IEEE, 2000.Google ScholarCross Ref
- J. G. Elerath and M. Pecht. Enhanced reliability modeling of RAID storage systems. In Proceedings of the 2007 Int'l Conference on Dependable Systems and Networking (DSN 2007), pages 175--184. IEEE, June 2007. Google ScholarDigital Library
- D. Giaretta. Advanced Digital Preservation. Springer, 2011. Google ScholarDigital Library
- H. M. Gladney. Preserving digital information. Springer, 2007. Google ScholarDigital Library
- K. Gopinath, J. Elerath, and D. Long. Reliability modelling of disk subsystems with probabilistic model checking. Technical Report UCSC-SSRC-09-05, University of California, Santa Cruz, May 2009.Google Scholar
- K. M. Greenan. Reliability and power-efficiency in erasure-coded storage systems. Technical report, University of California, Santa Cruz, Dec. 2009.Google Scholar
- K. M. Greenan, E. L. Miller, and J. J. Wylie. Reliability of flat XOR-based erasure codes on heterogeneous devices. In Proceedings of the 2008 Int'l Conference on Dependable Systems and Networking (DSN 2008), pages 147--156, June 2008.Google ScholarCross Ref
- K. M. Greenan, J. S. Plank, and J. J. Wylie. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 1st Workshop on Hot Topics in Storage and File Systems (HotStorage '10), 2010. Google ScholarDigital Library
- N. Greenfieldboyce. Houston, we erased the Apollo 11 tapes. National Public Radio, http://www.npr.org/templates/story/story.php?storyId=106637066, July 2009.Google Scholar
- W. Jiang, C. Hu, Y. Zhou, and A. Kanevsky. Are disks the dominant contributor for storage failures? In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), Feb. 2008. Google ScholarDigital Library
- P. Maniatis, M. Roussopoulos, T. J. Giuli, D. S. H. Rosenthal, and M. Baker. The LOCKSS peer-to-peer digital preservation system. ACM Transactions on Computer Systems, 23(1): 2--50, 2005. Google ScholarDigital Library
- S. Nath, H. Yu, P. B. Gibbons, and S. Seshan. Subtleties in tolerating correlated failures in wide-area storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI), 2006. Google ScholarDigital Library
- A. Oprea and A. Juels. A clean-slate look at disk scrubbing. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), Feb. 2010. Google ScholarDigital Library
- B. Panzer-Steindel. Data integrity. CERN/IT, 2007.Google Scholar
- D. A. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116. ACM, 1988. Google ScholarDigital Library
- E. Pinheiro, W.-D. Weber, and L. A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), Feb. 2007. Google ScholarDigital Library
- D. S. H. Rosenthal. Keeping bits safe: How hard can it be? Communications of the ACM, 53, Nov. 2010. Google ScholarDigital Library
- B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), pages 1--16, Feb. 2007. Google ScholarDigital Library
- T. J. E. Schwarz, Q. Xin, E. L. Miller, D. D. E. Long, A. Hospodor, and S. Ng. Disk scrubbing in large archival storage systems. In Proceedings of the 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '04), pages 409--418, Oct. 2004. Google ScholarDigital Library
- A. L. Shimpi. The SandForce roundup: Corsair, Kingston, Patriot, OCZ, OWC & MemoRight SSDs compared. AnandTech, Aug. 2011.Google Scholar
- M. Storer, K. Greenan, E. L. Miller, and C. Maltzahn. Pot-shards: Storing data for the long-term without encryption. In Proceedings of the 3rd International IEEE Security in Storage Workshop, Dec. 2005. Google ScholarDigital Library
- M. W. Storer, K. M. Greenan, I. Adams, E. L. Miller, D. D. E. Long, and K. Vorugant. Logan: Automatic management for evolvable, large-scale, archival storage. In Proceedings of the 3rd Petascale Data Storage Workshop (PDSW '08), Nov. 2008.Google ScholarCross Ref
- M. W. Storer, K. M. Greenan, E. L. Miller, and K. Voruganti. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), Feb. 2008. Google ScholarDigital Library
- T. A. W. S. Team. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East Region. Amazon Web Services, http://aws.amazon.com/message/65648/, Apr. 2011.Google Scholar
- R. Weisman. Data backup firm sues 2 hardware suppliers. The Boston Globe, Mar. 2009.Google Scholar
- L. L. You, K. T. Pollack, D. D. E. Long, and K. Gopinath. PRESIDIO: a framework for efficient archival data storage. ACM Transactions on Storage, 7(2), July 2011. Google ScholarDigital Library
Recommendations
Combining Low IO-Operations During Data Recovery with Low Parity Overhead in Two-Failure Tolerant Archival Storage Systems
PRDC '15: Proceedings of the 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC)Archival data storage systems contain data that must be preserved over long periods of time but which are often unlikely to be accessed during their lifetime. The best strategy for such systems is to keep their disks powered-off unless they have to be ...
Disk Scrubbing in Large Archival Storage Systems
MASCOTS '04: Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications SystemsLarge archival storage systems experience long periods of idleness broken up by rare data accesses. In such systems, disks may remain powered off for long periods of time. These systems can lose data for a variety of reasons, including failures at both ...
Comments