ABSTRACT
Geo-replicated storage systems are at the core of current Internet services. The designers of the replication protocols used by these systems must choose between either supporting low-latency, eventually-consistent operations, or ensuring strong consistency to ease application correctness. We propose an alternative consistency model, Explicit Consistency, that strengthens eventual consistency with a guarantee to preserve specific invariants defined by the applications. Given these application-specific invariants, a system that supports Explicit Consistency identifies which operations would be unsafe under concurrent execution, and allows programmers to select either violation-avoidance or invariant-repair techniques. We show how to achieve the former, while allowing operations to complete locally in the common case, by relying on a reservation system that moves coordination off the critical path of operation execution. The latter, in turn, allows operations to execute without restriction, and restore invariants by applying a repair operation to the database state. We present the design and evaluation of Indigo, a middleware that provides Explicit Consistency on top of a causally-consistent data store. Indigo guarantees strong application invariants while providing similar latency to an eventually-consistent system in the common case.
Supplemental Material
- M. Abadi and L. Lamport. The Existence of Refinement Mappings. Theor. Comput. Sci., 82(2): 253--284, May 1991. Google ScholarDigital Library
- S. Almeida, J. Leitão, and L. Rodrigues. ChainReaction: A Causal+ Consistent Datastore Based on Chain Replication. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 85--98, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Bolton Causal Consistency. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 761--772, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- P. Bailis, A. Fekete, J. M. Hellerstein, A. Ghodsi, and I. Stoica. Scalable Atomic Visibility with RAMP Transactions. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 27--38, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Coordination Avoidance in Database Systems. Proc. VLDB Endow. (to appear), 2015. Google ScholarDigital Library
- D. Barbará-Millá and H. Garcia-Molina. The Demarcation Protocol: A Technique for Maintaining Constraints in Distributed Database Systems. The VLDB Journal, 3(3): 325--353, July 1994. Google ScholarDigital Library
- Basho. Riak. http://basho.com/riak/, 2014. Accessed Oct/2014.Google Scholar
- E. Cohen, M. Dahlweid, M. Hillebrand, D. Leinenbach, M. Moskal, T. Santen, W. Schulte, and S. Tobies. VCC: A practical system for verifying concurrent C. In Theorem Proving in Higher Order Logics, pages 23--42. Springer, 2009. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. Proc. VLDB Endow., 1(2): 1277--1288, Aug. 2008. Google ScholarDigital Library
- J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's Globally-distributed Database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 251--264, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarDigital Library
- L. De Moura and N. Bjørner. Z3: An Efficient SMT Solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS'08/ETAPS'08, pages 337--340, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP '07, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. L. Detlefs, K. R. M. Leino, G. Nelson, and J. B. Saxe. Extended static checking. Technical Report 159, Compaq Systems Research Center, 12 1998.Google Scholar
- J. Du, S. Elnikety, A. Roy, and W. Zwaenepoel. Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 11:1--11:14, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- ed Liu, T. Magrino, O. Arden, M. D. George, and A. C. Myers. Warranties for Faster Strong Consistency. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, Berkeley, CA, USA, 2014. USENIX Association. Google ScholarDigital Library
- H. Garcia-Molina. Using Semantic Knowledge for Transaction Processing in a Distributed Database. ACM Trans. Database Syst., 8(2): 186--213, June 1983. Google ScholarDigital Library
- J. N. Gray, R. A. Lorie, G. R. Putzolu, and I. L. Traiger. Readings in Database Systems. chapter Granularity of Locks and Degrees of Consistency in a Shared Data Base, pages 94--121. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988. Google ScholarDigital Library
- C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10): 576--580, 1969. Google ScholarDigital Library
- H. B. Hunt and D. J. Rosenkrantz. The Complexity of Testing Predicate Locks. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD '79, pages 127--133, New York, NY, USA, 1979. ACM. Google ScholarDigital Library
- B. Jacobs, J. Smans, P. Philippaerts, F. Vogels, W. Penninckx, and F. Piessens. VeriFast: A Powerful, Sound, Predictable, Fast Verifier for C and Java. In Proceedings of the Third International Conference on NASA Formal Methods, NFM'11, pages 41--55. Springer-Verlag, Berlin, Heidelberg, 2011. Google ScholarDigital Library
- T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann. Consistency Rationing in the Cloud: Pay Only when It Matters. Proc. VLDB Endow., 2(1): 253--264, Aug. 2009. Google ScholarDigital Library
- T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. MDCC: Multi-data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 113--126, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev., 44(2): 35--40, Apr. 2010. Google ScholarDigital Library
- L. Lamport. The Temporal Logic of Actions. ACM Trans. Program. Lang. Syst., 16(3): 872--923, May 1994. Google ScholarDigital Library
- C. Li, D. Porto, A. Clement, J. Gehrke, N. Preguiça, and R. Rodrigues. Making Geo-replicated Systems Fast As Possible, Consistent when Necessary. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 265--278, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarDigital Library
- C. Li, J. Leitão, A. Clement, N. Preguiça, R. Rodrigues, and V. Vafeiadis. Automating the Choice of Consistency Levels in Replicated Systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 281--292, Berkeley, CA, USA, 2014. USENIX Association. Google ScholarDigital Library
- W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don'T Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 401--416, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger Semantics for Low-latency Geo-replicated Storage. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 313--328, Berkeley, CA, USA, 2013. USENIX Association. Google ScholarDigital Library
- H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. El Abbadi. Low-latency Multi-datacenter Databases Using Replicated Commit. Proc. VLDB Endow., 6(9): 661--672, July 2013. Google ScholarDigital Library
- S. Martin, M. Ahmed-Nacer, and P. Urso. Abstract unordered and ordered trees CRDT. Research Report RR-7825, INRIA, Dec. 2011.Google Scholar
- S. Martin, M. Ahmed-Nacer, and P. Urso. Controlled conflict resolution for replicated document. In Proceedings of the 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), pages 471--480. IEEE, Oct 2012.Google ScholarDigital Library
- P. E. O'Neil. The Escrow Transactional Method. ACM Trans. Database Syst., 11(4): 405--430, Dec. 1986. Google ScholarDigital Library
- S. S. Owicki. Axiomatic Proof Techniques for Parallel Programs. PhD thesis, Cornell University, Ithaca, NY, USA, 1975. Google ScholarDigital Library
- A. Pnueli. The Temporal Logic of Programs. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science, SFCS '77, pages 46--57, Washington, DC, USA, 1977. IEEE Computer Society. Google ScholarDigital Library
- N. Preguiça, J. L. Martins, M. Cunha, and H. Domingos. Reservations for Conflict Avoidance in a Mobile Database System. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services, MobiSys '03, pages 43--56, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- K. Ramamritham and C. Pu. A Formal Characterization of Epsilon Serializability. IEEE Trans. on Knowl. and Data Eng., 7(6): 997--1007, Dec. 1995. Google ScholarDigital Library
- S. Roy, L. Kot, N. Foster, J. Gehrke, H. Hojjat, and C. Koch. The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (to appear), SIGMOD '15. ACM, May-June 2015.Google ScholarDigital Library
- M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. Conflict-free Replicated Data Types. In Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems, SSS'11, pages 386--400, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
- L. Shrira, H. Tian, and D. Terry. Exo-leasing: Escrow Synchronization for Mobile Clients of Commodity Storage Servers. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, Middleware '08, pages 42--61, New York, NY, USA, 2008. Springer-Verlag New York, Inc. Google ScholarDigital Library
- S. Sivasubramanian. Amazon dynamoDB: A Seamlessly Scalable Non-relational Database Service. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 729--730, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional Storage for Geo-replicated Systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 385--400, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 172--182, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- D. B. Terry, V. Prabhakaran, R. Kotla, M. Balakrishnan, M. K. Aguilera, and H. Abu-Libdeh. Consistency-based Service Level Agreements for Cloud Storage. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 309--324, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- G. D. Walborn and P. K. Chrysanthis. Supporting Semantics-based Transaction Processing in Mobile Database Applications. In Proceedings of the 14th Symposium on Reliable Distributed Systems, SRDS '95, pages 31--40, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarDigital Library
- H. Yu and A. Vahdat. Design and Evaluation of a Continuous Consistency Model for Replicated Services. In Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation - Volume 4, OSDI'00, Berkeley, CA, USA, 2000. USENIX Association. Google ScholarDigital Library
- M. Zawirski, A. Bieniusa, V. Balegas, S. Duarte, C. Baquero, M. Shapiro, and N. Preguiça. SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine. Research Report RR-8347, INRIA, Oct. 2013.Google Scholar
- Y. Zhang, R. Power, S. Zhou, Y. Sovran, M. K. Aguilera, and J. Li. Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 276--291, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
Index Terms
- Putting consistency back into eventual consistency
Recommendations
Constraining the eventual in eventual consistency
PaPoC '18: Proceedings of the 5th Workshop on the Principles and Practice of Consistency for Distributed DataCRDTs are highly available replicated data structures which offer strong eventual consistency in the face of concurrent operations [3]. By their definition, CRDTs eventually converge to a consistent state given enough time. However, this is not strict ...
Quantifying eventual consistency with PBS
Data replication results in a fundamental trade-off between operation latency and consistency. At the weak end of the spectrum of possible consistency models is eventual consistency, which provides no limit to the staleness of data returned. However, ...
On Mixing Eventual and Strong Consistency: Acute Cloud Types
In this article we study the properties of distributed systems that mix eventual and strong consistency. We formalize such systems through <italic>acute cloud types</italic> (ACTs), abstractions similar to conflict-free replicated data types (CRDTs), ...
Comments