Article

Dynamo: amazon's highly available key-value store

Authors:
Giuseppe DeCandia

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Deniz Hastorun

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Madan Jampani

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Gunavardhan Kakulapati

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Avinash Lakshman

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Alex Pilchin

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Swaminathan Sivasubramanian

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Peter Vosshall

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

,
Werner Vogels

Amazon.com, Seattle, WA

Amazon.com, Seattle, WA
View Profile

SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principlesOctober 2007Pages 205–220https://doi.org/10.1145/1294261.1294281

Published:14 October 2007Publication History

SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles

Pages 205–220

ABSTRACT

Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.

This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Supplemental Material

1294281.mp4

mp4

187.6 MB

Download

Available for Download

other

Slides from the presentation

zip

p205-slides.zip (12.9 MB)

Supplemental material for Dynamo: amazon's highly available key-value store

mp3

1294281.mp3 (13.6 MB)

References

Adya, A., Bolosky, W. J., Castro, M., Cermak, G., Chaiken, R., Douceur, J. R., Howell, J., Lorch, J. R., Theimer, M., and Wattenhofer, R. P. 2002. Farsite: federated, available, and reliable storage for an incompletely trusted environment. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 1--14. Google ScholarDigital Library
Bernstein, P.A., and Goodman, N. An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans. on Database Systems, 9(4): 596--615, December 1984. Google ScholarDigital Library
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R.Google Scholar
Douceur, J. R. and Bolosky, W. J. 2000. Process-based regulation of low-importance processes. SIGOPS Oper. Syst. Rev. 34, 2 (Apr. 2000), 26--27. Google ScholarDigital Library
Fox, A., Gribble, S. D., Chawathe, Y., Brewer, E. A., and Gauthier, P. 1997. Cluster-based scalable network services. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (Saint Malo, France, October 05 -- 08, 1997). W. M. Waite, Ed. SOSP '97. ACM Press, New York, NY, 78--91. Google ScholarDigital Library
Ghemawat, S., Gobioff, H., and Leung, S. 2003. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (Bolton Landing, NY, USA, October 19 -- 22, 2003). SOSP '03. ACM Press, New York, NY, 29--43. Google ScholarDigital Library
Gray, J., Helland, P., O'Neil, P., and Shasha, D. 1996. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD international Conference on Management of Data (Montreal, Quebec, Canada, June 04 -- 06, 1996). J. Widom, Ed. SIGMOD '96. ACM Press, New York, NY, 173--182. Google ScholarDigital Library
Gupta, I., Chandra, T. D., and Goldszmidt, G. S. 2001. On scalable and efficient distributed failure detectors. In Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing (Newport, Rhode Island, United States). PODC '01. ACM Press, New York, NY, 170--179. Google ScholarDigital Library
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global--scale persistent storage. SIGARCH Comput. Archit. News 28, 5 (Dec. 2000), 190--201. Google ScholarDigital Library
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., and Lewin, D. 1997. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the Twenty--Ninth Annual ACM Symposium on theory of Computing (El Paso, Texas, United States, May 04 -- 06, 1997). STOC '97. ACM Press, New York, NY, 654--663. Google ScholarDigital Library
Lindsay, B.G., et. al., "Notes on Distributed Databases", Research Report RJ2571(33471), IBM Research, July 1979.Google Scholar
Lamport, L. Time, clocks, and the ordering of events in a distributed system. ACM Communications, 21(7), pp. 558--565, 1978. Google ScholarDigital Library
Merkle, R. A digital signature based on a conventional encryption function. Proceedings of CRYPTO, pages 369--378. Springer-Verlag, 1988. Google ScholarDigital Library
Ramasubramanian, V., and Sirer, E. G. Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays. In Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, San Francisco, CA, March 29-31, 2004. Google ScholarDigital Library
Reiher, P., Heidemann, J., Ratner, D., Skinner, G., and Popek, G. 1994. Resolving file conflicts in the Ficus file system. In Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference -- Volume 1 (Boston, Massachusetts, June 06-10, 1994). USENIX Association, Berkeley, CA, 12--12. Google ScholarDigital Library
Rowstron, A., and Druschel, P. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. Proceedings of Middleware, pages 329--350, November, 2001. Google ScholarDigital Library
Rowstron, A., and Druschel, P. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. Proceedings of Symposium on Operating Systems Principles, October 2001. Google ScholarDigital Library
Saito, Y., Frølund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: building distributed enterprise disk arrays from commodity components. SIGOPS Oper. Syst. Rev. 38, 5 (Dec. 2004), 48--58. Google ScholarDigital Library
Satyanarayanan, M., Kistler, J.J., Siegel, E.H. Coda: A Resilient Distributed File System. IEEE Workshop on Workstation Operating Systems, Nov. 1987.Google Scholar
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (San Diego, California, United States). SIGCOMM '01. ACM Press, New York, NY, 149--160. Google ScholarDigital Library
Terry, D. B., Theimer, M. M., Petersen, K., Demers, A. J., Spreitzer, M. J., and Hauser, C. H. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (Copper Mountain, Colorado, United States, December 03 -- 06, 1995). M. B. Jones, Ed. SOSP '95. ACM Press, New York, NY, 172--182. Google ScholarDigital Library
Thomas, R. H. A majority consensus approach to concurrency control for multiple copy databases. ACM Transactions on Database Systems 4 (2): 180--209, 1979. Google ScholarDigital Library
Weatherspoon, H., Eaton, P., Chun, B., and Kubiatowicz, J. 2007. Antiquity: exploiting a secure log for wide-area distributed storage. SIGOPS Oper. Syst. Rev. 41, 3 (Jun. 2007), 371--384. Google ScholarDigital Library
Welsh, M., Culler, D., and Brewer, E. 2001. SEDA: an architecture for well-conditioned, scalable internet services. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles (Banff, Alberta, Canada, October 21 -- 24, 2001). SOSP '01. ACM Press, New York, NY, 230--243. Google ScholarDigital Library

Index Terms

Dynamo: amazon's highly available key-value store

Recommendations

Dynamo: amazon's highly available key-value store
SOSP '07

Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com ...
Read More
HPDA: A hybrid parity-based disk array for enhanced performance and reliability

Flash-based Solid State Drive (SSD) has been productively shipped and deployed in large scale storage systems. However, a single flash-based SSD cannot satisfy the capacity, performance and reliability requirements of the modern storage systems that ...
Read More
An Efficient Penalty-Aware Cache to Improve the Performance of Parity-Based Disk Arrays under Faulty Conditions

The buffer cache plays an essential role in smoothing the gap between the upper level computational components and the lower level storage devices. A good buffer cache management scheme should be beneficial to not only the computational components, but ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
October 2007
378 pages
ISBN:9781595935915
DOI:10.1145/1294261
General Chair:
Thomas C. Bressoud
Denison University, USA
,
Program Chair:
M. Frans Kaashoek
Massachusetts Institute of Technology, USA
ACM SIGOPS Operating Systems Review Volume 41, Issue 6
SOSP '07
December 2007
363 pages
ISSN:0163-5980
DOI:10.1145/1323293
Issue’s Table of Contents
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
performance
reliability
scalability
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3,085
  Total Citations
  View Citations
- 29,508
  Total Downloads
- Downloads (Last 12 months)2,479
- Downloads (Last 6 weeks)660
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamo: amazon's highly available key-value store

SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Dynamo: amazon's highly available key-value store

HPDA: A hybrid parity-based disk array for enhanced performance and reliability

An Efficient Penalty-Aware Cache to Improve the Performance of Parity-Based Disk Arrays under Faulty Conditions