research-article

Understanding data survivability in archival storage systems

Authors:
Yan Li

University of California, Santa Cruz

University of California, Santa Cruz
View Profile

,
Ethan L. Miller

University of California, Santa Cruz

University of California, Santa Cruz
View Profile

,
Darrell D. E. Long

University of California, Santa Cruz

University of California, Santa Cruz
View Profile

SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage ConferenceJune 2012Article No.: 16Pages 1–12https://doi.org/10.1145/2367589.2367605

Published:04 June 2012Publication History

SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

Pages 1–12

ABSTRACT

Preserving data for a long period of time in the face of faults, large and small, is crucial for designing reliable archival storage systems. However, the survivability of data is different from the reliability of storage because typically, data are stored in more than one storage at a given moment. Previous studies of reliability ignore the former. We present a framework for relating data survivability and storage reliability, and use the framework to gauge the impact of rare but large-scale events on data survivability. We also present a method to track all copies of data and the condition of all the online and offline media, devices and systems on which they are stored uninterruptedly over the whole lifetime of the data. With this method, the survivability of the data can be closely monitored, and potential dangers can be handled in a timely manner. A better understanding of data survivability can be used in reducing unnecessary data replicas, thus reducing the cost.

References

S. O. Akçiz, L. G. Ludwig, J. R. Arrowsmith, and O. Zielke. Century-long average time intervals between earthquake ruptures of the San Andreas fault in the Carrizo Plain, California. Geology, 38: 787--790, Sept. 2010.Google ScholarCross Ref
L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. An analysis of data corruption in the storage stack. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), pages 223--238, Feb. 2008. Google ScholarDigital Library
M. Baker, M. Shah, D. S. H. Rosenthal, M. Roussopoulos, P. Maniatis, T. Giuli, and P. Bungale. A fresh look at the reliability of long-term digital storage. In Proceedings of EuroSys 2006, pages 221--234, Apr. 2006. Google ScholarDigital Library
D. Bhagwat, K. Pollack, D. D. E. Long, E. L. Miller, J.-F. Pâris, and T. Schwarz, S. J. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '06), Monterey, CA, Sept. 2006. Google ScholarDigital Library
A. M. Blum, A. Goyal, P. Heidelberger, S. S. Lavenberg, M. K. Nakayama, and P. Shahbuddin. Modeling and analysis of system dependability using the system availability estimator. In Proceedings of the 24th International Symposium on Fault-Tolerant Computing (FTCS '94), pages 137--141, 1994.Google ScholarCross Ref
Y. Bozorgnia and V. V. Bertero. Earthquake engineering: from engineering seismology to performance-based engineering. CRC Press LLC, 2006.Google Scholar
R. Chalfant. Tape: A collapsing star. http://www.mainframezone.com/storage/backup-recovery-business-continuity/tape-a-collapsing-star, 2010.Google Scholar
J. G. Elerath. Specifying reliability in the disk drive industry: No more MTBF's. In Proceedings of 2000 Annual Reliability and Maintainability Symposium, pages 194--199. IEEE, 2000.Google ScholarCross Ref
J. G. Elerath and M. Pecht. Enhanced reliability modeling of RAID storage systems. In Proceedings of the 2007 Int'l Conference on Dependable Systems and Networking (DSN 2007), pages 175--184. IEEE, June 2007. Google ScholarDigital Library
D. Giaretta. Advanced Digital Preservation. Springer, 2011. Google ScholarDigital Library
H. M. Gladney. Preserving digital information. Springer, 2007. Google ScholarDigital Library
K. Gopinath, J. Elerath, and D. Long. Reliability modelling of disk subsystems with probabilistic model checking. Technical Report UCSC-SSRC-09-05, University of California, Santa Cruz, May 2009.Google Scholar
K. M. Greenan. Reliability and power-efficiency in erasure-coded storage systems. Technical report, University of California, Santa Cruz, Dec. 2009.Google Scholar
K. M. Greenan, E. L. Miller, and J. J. Wylie. Reliability of flat XOR-based erasure codes on heterogeneous devices. In Proceedings of the 2008 Int'l Conference on Dependable Systems and Networking (DSN 2008), pages 147--156, June 2008.Google ScholarCross Ref
K. M. Greenan, J. S. Plank, and J. J. Wylie. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 1st Workshop on Hot Topics in Storage and File Systems (HotStorage '10), 2010. Google ScholarDigital Library
N. Greenfieldboyce. Houston, we erased the Apollo 11 tapes. National Public Radio, http://www.npr.org/templates/story/story.php?storyId=106637066, July 2009.Google Scholar
W. Jiang, C. Hu, Y. Zhou, and A. Kanevsky. Are disks the dominant contributor for storage failures? In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), Feb. 2008. Google ScholarDigital Library
P. Maniatis, M. Roussopoulos, T. J. Giuli, D. S. H. Rosenthal, and M. Baker. The LOCKSS peer-to-peer digital preservation system. ACM Transactions on Computer Systems, 23(1): 2--50, 2005. Google ScholarDigital Library
S. Nath, H. Yu, P. B. Gibbons, and S. Seshan. Subtleties in tolerating correlated failures in wide-area storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI), 2006. Google ScholarDigital Library
A. Oprea and A. Juels. A clean-slate look at disk scrubbing. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), Feb. 2010. Google ScholarDigital Library
B. Panzer-Steindel. Data integrity. CERN/IT, 2007.Google Scholar
D. A. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116. ACM, 1988. Google ScholarDigital Library
E. Pinheiro, W.-D. Weber, and L. A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), Feb. 2007. Google ScholarDigital Library
D. S. H. Rosenthal. Keeping bits safe: How hard can it be? Communications of the ACM, 53, Nov. 2010. Google ScholarDigital Library
B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), pages 1--16, Feb. 2007. Google ScholarDigital Library
T. J. E. Schwarz, Q. Xin, E. L. Miller, D. D. E. Long, A. Hospodor, and S. Ng. Disk scrubbing in large archival storage systems. In Proceedings of the 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '04), pages 409--418, Oct. 2004. Google ScholarDigital Library
A. L. Shimpi. The SandForce roundup: Corsair, Kingston, Patriot, OCZ, OWC & MemoRight SSDs compared. AnandTech, Aug. 2011.Google Scholar
M. Storer, K. Greenan, E. L. Miller, and C. Maltzahn. Pot-shards: Storing data for the long-term without encryption. In Proceedings of the 3rd International IEEE Security in Storage Workshop, Dec. 2005. Google ScholarDigital Library
M. W. Storer, K. M. Greenan, I. Adams, E. L. Miller, D. D. E. Long, and K. Vorugant. Logan: Automatic management for evolvable, large-scale, archival storage. In Proceedings of the 3rd Petascale Data Storage Workshop (PDSW '08), Nov. 2008.Google ScholarCross Ref
M. W. Storer, K. M. Greenan, E. L. Miller, and K. Voruganti. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), Feb. 2008. Google ScholarDigital Library
T. A. W. S. Team. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East Region. Amazon Web Services, http://aws.amazon.com/message/65648/, Apr. 2011.Google Scholar
R. Weisman. Data backup firm sues 2 hardware suppliers. The Boston Globe, Mar. 2009.Google Scholar
L. L. You, K. T. Pollack, D. D. E. Long, and K. Gopinath. PRESIDIO: a framework for efficient archival data storage. ACM Transactions on Storage, 7(2), July 2011. Google ScholarDigital Library

Recommendations

Efficient archival data storage
Read More
Combining Low IO-Operations During Data Recovery with Low Parity Overhead in Two-Failure Tolerant Archival Storage Systems
PRDC '15: Proceedings of the 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC)

Archival data storage systems contain data that must be preserved over long periods of time but which are often unlikely to be accessed during their lifetime. The best strategy for such systems is to keep their disks powered-off unless they have to be ...
Read More
Disk Scrubbing in Large Archival Storage Systems
MASCOTS '04: Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems

Large archival storage systems experience long periods of idleness broken up by rare data accesses. In such systems, disks may remain powered off for long periods of time. These systems can lose data for a variety of reasons, including failures at both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference
June 2012
183 pages
ISBN:9781450314480
DOI:10.1145/2367589
General Chair:
Michael Vinov
IBM Haifa
,
Program Chairs:
Dan Tsafrir
Technion
,
Erez Zadok
Stony Brook University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate94of285submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 254
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Understanding data survivability in archival storage systems

SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

ABSTRACT

References

Cited By

Recommendations

Efficient archival data storage

Combining Low IO-Operations During Data Recovery with Low Parity Overhead in Two-Failure Tolerant Archival Storage Systems

Disk Scrubbing in Large Archival Storage Systems