research-article

Don't settle for eventual: scalable causal consistency for wide-area storage with COPS

Authors:
Wyatt Lloyd

Princeton University

Princeton University
View Profile

,
Michael J. Freedman

Princeton University

Princeton University
View Profile

,
Michael Kaminsky

Intel Labs

Intel Labs
View Profile

,
David G. Andersen

Carnegie Mellon University

Carnegie Mellon University
View Profile

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems PrinciplesOctober 2011Pages 401–416https://doi.org/10.1145/2043556.2043593

Published:23 October 2011Publication History

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pages 401–416

ABSTRACT

Geo-replicated, distributed data stores that support complex online applications, such as social networks, must provide an "always-on" experience where operations always complete with low latency. Today's systems often sacrifice strong consistency to achieve these goals, exposing inconsistencies to their clients and necessitating complex application logic. In this paper, we identify and define a consistency model---causal consistency with convergent conflict handling, or causal+---that is the strongest achieved under these constraints.

We present the design and implementation of COPS, a key-value store that delivers this consistency model across the wide-area. A key contribution of COPS is its scalability, which can enforce causal dependencies between keys stored across an entire cluster, rather than a single server like previous systems. The central approach in COPS is tracking and explicitly checking whether causal dependencies between keys are satisfied in the local cluster before exposing writes. Further, in COPS-GT, we introduce get transactions in order to obtain a consistent view of multiple keys without locking or blocking. Our evaluation shows that COPS completes operations in less than a millisecond, provides throughput similar to previous systems when using one server per cluster, and scales well as we increase the number of servers in each cluster. It also shows that COPS-GT provides similar latency, throughput, and scaling to COPS for common workloads.

References

M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM TOCS, 27(3), 2009. Google ScholarDigital Library
M. Ahamad, G. Neiger, P. Kohli, J. Burns, and P. Hutto. Causal memory: Definitions, implementation, and programming. Distributed Computing, 9(1), 1995.Google Scholar
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, Aug. 2008. Google ScholarDigital Library
P. Alsberg and J. Day. A principle for resilient sharing of distributed resources. In Conf. Software Engineering, Oct. 1976. Google ScholarDigital Library
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudeyan. FAWN: A fast array of wimpy nodes. In SOSP, Oct. 2009. Google ScholarDigital Library
T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli, and R. Y. Wang. Serverless network file systems. ACM TOCS, 14(1), 1996. Google ScholarDigital Library
Apache Thrift. http://thrift.apache.org/, 2011.Google Scholar
H. Attiya and J. L. Welch. Sequential consistency versus linearizability. ACM TOCS, 12(2), 1994. Google ScholarDigital Library
J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, Jan. 2011.Google Scholar
N. Belaramani, M. Dahlin, L. Gao, A. Nayate, A. Venkataramani, P. Yalagandula, and J. Zheng. PRACTI replication. In NSDI, May 2006. Google ScholarDigital Library
K. P. Birman and T. Joseph. Exploiting virtual synchrony in distributed systems. In SOSP, Nov. 1987. Google ScholarDigital Library
K. P. Birman and R. V. Renesse. Reliable Distributed Computing with the ISIS Toolkit. IEEE Comp. Soc. Press, 1994. Google ScholarDigital Library
E. Brewer. Towards robust distributed systems. PODC Keynote, July 2000. Google ScholarDigital Library
D. R. Cheriton and D. Skeen. Understanding the limitations of causally and totally ordered communication. In SOSP, Dec. 1993. Google ScholarDigital Library
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. In VLDB, Aug. 2008. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In SOSP, Oct. 2007. Google ScholarDigital Library
D. DeWitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen. The gamma database machine project. Knowledge and Data Engineering, 2(1), 1990. Google ScholarDigital Library
FAWN-KV. https://github.com/vrv/FAWN-KV, 2011.Google Scholar
B. Fitzpatrick. Memcached: a distributed memory object caching system. http://memcached.org/.2011.Google Scholar
A. Fox, S. D. Gribble, Y. Chawathe. E. A. Brewer, and P. Gauthier. Cluster-based scalable network services. In SOSP, Oct. 1997. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In SOSP, Oct. 2003. Google ScholarDigital Library
D. K. Gifford. Weighted voting for replicated data. In SOSP, Dec. 1979. Google ScholarDigital Library
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 2002. Google ScholarDigital Library
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In SIGCOMM, Aug. 2009. Google ScholarDigital Library
M. Herlihy. A quorum-consensus replication method for abstract data types. ACM TOCS, 4(1), Feb. 1986. Google ScholarDigital Library
M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM TOPLAS, 12(3), 1990. Google ScholarDigital Library
D. Karger, E. Lehman, F. Leighton, M. Levine, D. Lewin, and R. Panigrahy. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In STOC, May 1997. Google ScholarDigital Library
J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda file system. ACM TOCS, 10(3), Feb. 1992. Google ScholarDigital Library
R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Providing high availability using lazy replication. ACM TOCS, 10(4), 1992. Google ScholarDigital Library
A. Lakshman and P. Malik. Cassandra -- a decentralized structured storage system. In LADIS, Oct. 2009.Google Scholar
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Comm. ACM, 21(7), 1978. Google ScholarDigital Library
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Computer, 28(9), 1979. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM TOCS, 16(2), 1998. Google ScholarDigital Library
R. J. Lipton and J. S. Sandberg. PRAM: A scalable shared memory. Technical Report TR-180-88, Princeton Univ., Dept. Comp. Sci., 1988.Google Scholar
P. Mahajan, L. Alvisi, and M. Dahlin. Consistency, availability, and convergence. Technical Report TR-11-22, Univ. Texas at Austin, Dept. Comp. Sci., 2011.Google Scholar
J. Misra. Axioms for memory access in asynchronous hardware systems. ACM TOPLAS, 8(1), Jan. 1986. Google ScholarDigital Library
C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system. ACM Trans. Database Sys., 11(4), 1986 Google ScholarDigital Library
MySQL. http://www.mysql.com/, 2011.Google Scholar
B. M. Oki and B. H. Liskov. Viewstamped replication: A general primary copy. In PODC, Aug. 1988.Google ScholarDigital Library
D. S. Parker, G. J. Popek, G. Rudisin, A. Stoughton, B. J. Walker, E. Walton, J. M. Chow, D. Edwards, S. Kiser, and C. Kline. Detection of mutual inconsistency in distributed systems. IEEE Trans. Software Eng., 9(3), 1983. Google ScholarDigital Library
K. Petersen, M. Spreitzer, D. Terry, M. Theimer, and A. Demers. Flexible update propagation for weakly consistent replication. In SOSP, Oct. 1997. Google ScholarDigital Library
PostgresSQL. http://www.postgresql.org/, 2011.Google Scholar
Project Voldemort. http://project-voldemort.com/, 2011.Google Scholar
D. Skeen. A formal model of crash recovery in a distributed system. IEEE Trans. Software Engineering, 9(3), May 1983. Google ScholarDigital Library
Snappy. http://code.google.com/p/snappy/, 2011.Google Scholar
J. Sobel. Scaling out. Engineering at Facebook blog, Aug. 20 2008.Google Scholar
Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In SOSP, Oct. 2011. Google ScholarDigital Library
J. Terrace and M. J. Freedman. Object storage on CRAQ: High-throughput chain replication for read-mostly workloads. In USENIX ATC, June 2009. Google ScholarDigital Library
D. B. Terry, A. J. Demers, K. Petersen, M. Spreitzer, M. Theimer, and B. W. Welch. Session guarantees for weakly consistent replicated data. In Conf. Parallel Distributed Info. Sys., Sept. 1994. Google ScholarDigital Library
R. H. Thomas. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Database Sys., 4(2), 1979. Google ScholarDigital Library
R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, Dec. 2004. Google ScholarDigital Library
VICCI. http://vicci.org/, 2011.Google Scholar
H. Yu and A. Vahdat. Design and evaluation of a continuous consistency model for replicated services. In OSDI, Oct. 2000. Google ScholarDigital Library

Index Terms

Don't settle for eventual: scalable causal consistency for wide-area storage with COPS
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Constraining the eventual in eventual consistency
PaPoC '18: Proceedings of the 5th Workshop on the Principles and Practice of Consistency for Distributed Data

CRDTs are highly available replicated data structures which offer strong eventual consistency in the face of concurrent operations [3]. By their definition, CRDTs eventually converge to a consistent state given enough time. However, this is not strict ...
Read More
Quantifying eventual consistency with PBS

Data replication results in a fundamental trade-off between operation latency and consistency. At the weak end of the spectrum of possible consistency models is eventual consistency, which provides no limit to the staleness of data returned. However, ...
Read More
ChainReaction: a causal+ consistent datastore based on chain replication
EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

This paper proposes a Geo-distributed key-value datastore, named ChainReaction, that offers causal+ consistency, with high performance, fault-tolerance, and scalability. ChainReaction enforces causal+ consistency which is stronger than eventual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
October 2011
417 pages
ISBN:9781450309776
DOI:10.1145/2043556
General Chair:
Ted Wobber
MSR Silicon Valley
,
Program Chair:
Peter Druschel
MPI-SWS
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ALPS systems
causal + consistency
key-value storage
read transactions
scalable wide-area replication
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 370
  Total Citations
  View Citations
- 1,934
  Total Downloads
- Downloads (Last 12 months)121
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Don't settle for eventual: scalable causal consistency for wide-area storage with COPS

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

ABSTRACT

References

Cited By

Index Terms

Recommendations

Constraining the eventual in eventual consistency

Quantifying eventual consistency with PBS

ChainReaction: a causal+ consistent datastore based on chain replication

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Don't settle for eventual: scalable causal consistency for wide-area storage with COPS

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

ABSTRACT

References

Cited By

Index Terms

Recommendations

Constraining the eventual in eventual consistency

Quantifying eventual consistency with PBS

ChainReaction: a causal+ consistent datastore based on chain replication

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media