Abstract
To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items. In this work, we consider the problem of providing Highly Available Transactions (HATs): transactional guarantees that do not suffer unavailability during system partitions or incur high network latency. We introduce a taxonomy of highly available systems and analyze existing ACID isolation and distributed data consistency guarantees to identify which can and cannot be achieved in HAT systems. This unifies the literature on weak transactional isolation, replica consistency, and highly available systems. We analytically and experimentally quantify the availability and performance benefits of HATs---often two to three orders of magnitude over wide-area networks---and discuss their necessary semantic compromises.
- D. J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2), 2012. Google ScholarDigital Library
- A. Adya. Weak consistency: a generalized theory and optimistic implementations for distributed transactions. PhD thesis, MIT, 1999. Google ScholarDigital Library
- M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. Hutto. Causal memory: Definitions, implementation and programming. Dist. Comp., 9(1), 1995.Google Scholar
- P. Alvaro, N. Conway, J. M. Hellerstein, and W. R. Marczak. Consistency analysis in Bloom: a CALM and collected approach. In CIDR 2011.Google Scholar
- ISO/IEC 9075-2: 2011 Information technology -- Database languages -- SQL -- Part 2: Foundation (SQL/Foundation).Google Scholar
- R. Attar, P. A. Bernstein, and N. Goodman. Site initialization, recovery, and backup in a distributed database system. IEEE Trans. Softw. Eng., 10(6): 645--650, Nov. 1984. Google ScholarDigital Library
- AWS. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. http://tinyurl.com/6ab6el6, April 2011.Google Scholar
- P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Highly Available Transactions: Virtues and Limitations (Extended). arXiv:1302.0309. Google ScholarDigital Library
- P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. HAT, not CAP: Introducing Highly Available Transactions. In HotOS 2013. Google ScholarDigital Library
- P. Bailis and A. Ghodsi. Eventual Consistency today: Limitations, extensions, and beyond. ACM Queue, 11(3), March 2013. Google ScholarDigital Library
- P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Bolt-on causal consistency. In SIGMOD 2013. Google ScholarDigital Library
- H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In SIGMOD 1995. Google ScholarDigital Library
- P. Bernstein and S. Das. Rethinking eventual consistency. In SIGMOD, 2013. Google ScholarDigital Library
- P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, volume 370. Addison-wesley New York, 1987. Google ScholarDigital Library
- M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. In SIGMOD 2008. Google ScholarDigital Library
- E. Brewer. Towards robust distributed systems. Keynote at PODC 2000. Google ScholarDigital Library
- J. Brzezinski, C. Sobaniec, and D. Wawrzyniak. From session causality to causal consistency. In PDP 2004.Google ScholarCross Ref
- S. Burckhardt, A. Gotsman, and H. Yang. Understanding eventual consistency. Technical Report MSR-TR-2013-39. http://tinyurl.com/bqty9yz.Google Scholar
- S. Bykov, A. Geller, G. Kliot, J. R. Larus, R. Pandya, and J. Thelin. Orleans: cloud computing for everyone. In SOCC 2011. Google ScholarDigital Library
- A. Chan and R. Gray. Implementing distributed read-only transactions. IEEE Transactions on Software Engineering, (2): 205--212, 1985. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, et al. Bigtable: A distributed storage system for structured data. In OSDI 2006. Google ScholarDigital Library
- G. Clarke. The Register: NoSQL's CAP theorem busters: We don't drop ACID. http://tinyurl.com/bpsug4b, November 2012.Google Scholar
- B. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB 2008. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In ACM SOCC 2010. Google ScholarDigital Library
- J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, et al. Spanner: Google's globally-distributed database. In OSDI 2012. Google ScholarDigital Library
- S. Das, D. Agrawal, and A. El Abbadi. G-store: a scalable data store for transactional multi key access in the cloud. In SOCC 2010, pages 163--174. Google ScholarDigital Library
- K. Daudjee and K. Salem. Lazy database replication with ordering guarantees. In ICDE 2004, pages 424--435. Google ScholarDigital Library
- S. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM CSUR, 17(3): 341--370, 1985. Google ScholarDigital Library
- J. Dean. Designs, lessons and advice from building large distributed systems. Keynote at LADIS 2009.Google Scholar
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, et al. Dynamo: Amazon's highly available key-value store. In SOSP 2007. Google ScholarDigital Library
- A. Demers et al. Epidemic algorithms for replicated database maintenance. In PODC, 1987. Google ScholarDigital Library
- P. Deutsch. The eight fallacies of distributed computing. http://tinyurl.com/c6vvtzg, 1994.Google Scholar
- R. Dillet. Update: Amazon Web Services down in North Virginia Reddit, Pinterest, Airbnb, Foursquare, Minecraft and others affected. TechCrunch http://tinyurl.com/9r43dwt, October 2012.Google Scholar
- A. Fekete, D. Liarokapis, E. O'Neil, P. O'Neil, and D. Shasha. Making snapshot isolation serializable. ACM TODS, 30(2): 492--528, June 2005. Google ScholarDigital Library
- S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2): 51--59, 2002. Google ScholarDigital Library
- P. Gill, N. Jain, and N. Nagappan. Understanding network failures in data centers: measurement, analysis, and implications. In SIGCOMM 2011. Google ScholarDigital Library
- J. Gray. The transaction concept: Virtues and limitations. In VLDB 1981. Google ScholarDigital Library
- J. Gray, R. Lorie, G. Putzolu, and I. Traiger. Granularity of locks and degrees of consistency in a shared data base. Technical report, IBM, 1976.Google Scholar
- J. Hamilton. Stonebraker on CAP Theorem and Databases. http://tinyurl.com/d3gtfq9, April 2010.Google Scholar
- P. Helland. Life beyond distributed transactions: an apostate's opinion. In CIDR 2007.Google Scholar
- M. Herlihy and N. Shavit. The art of multiprocessor programming. 2008. Google ScholarDigital Library
- R. Kallman et al. H-store: a high-performance, distributed main memory transaction processing system. In VLDB 2008. Google ScholarDigital Library
- B. Kemme. Database replication for clusters of workstations. PhD thesis, EPFL, 2000.Google Scholar
- K. Kingsbury and P. Bailis. The network is reliable. June 2013. http://aphyr.com/posts/288-the-network-is-reliable.Google Scholar
- C. Labovitz, A. Ahuja, and F. Jahanian. Experimental study of internet stability and backbone failures. In FTCS 1999. Google ScholarDigital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7): 558--565, July 1978. Google ScholarDigital Library
- V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson. F10: Fault tolerant engineered networks. In NSDI 2013. Google ScholarDigital Library
- W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In NSDI 2013. Google ScholarDigital Library
- P. Mahajan, L. Alvisi, and M. Dahlin. Consistency, availability, convergence. Technical Report TR-11-22, CS Department, UT Austin, May 2011.Google Scholar
- A. Markopoulou et al. Characterization of failures in an operational IP backbone network. IEEE/ACM TON, 16(4). Google ScholarDigital Library
- D. McCullagh. How Pakistan knocked YouTube offline (and how to make sure it never happens again). CNET, http://tinyurl.com/c4pffd, February 2008.Google Scholar
- R. McMillan. Research experiment disrupts internet, for some. Computerworld, http://tinyurl.com/23sqpek, August 2010.Google Scholar
- P. P. S. Narayan. Sherpa update. YDN Blog, http://tinyurl.com/c3ljuce, June 2010.Google Scholar
- F. Pedone and R. Guerraoui. On transaction liveness in replicated databases. In Pacific Rim International Symposium on Fault-Tolerant Systems, 1997. Google ScholarDigital Library
- Y. Saito and M. Shapiro. Optimistic replication. ACM Comput. Surv., 37(1), Mar. 2005. Google ScholarDigital Library
- A. Schiper and M. Raynal. From group communication to transactions in distributed systems. CACM, 39(4), 1996. Google ScholarDigital Library
- L. Segall. Internet routing glitch kicks millions offline. CNNMoney, http://tinyurl.com/cmqqac3, November 2011.Google Scholar
- M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. A comprehensive study of convergent and commutative replicated data types. INRIA TR 7506, 2011.Google Scholar
- Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In SOSP, pages 385--400, 2011. Google ScholarDigital Library
- D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, et al. Session guarantees for weakly consistent replicated data. In PDIS 1994. Google ScholarDigital Library
- A. Thomson, T. Diamond, S. Weng, K. Ren, P. Shao, and D. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In SIGMOD 2012. Google ScholarDigital Library
- D. Turner, K. Levchenko, J. C. Mogul, S. Savage, and A. C. Snoeren. On failure in managed enterprise networks. HP Labs HPL-2012-101, 2012.Google Scholar
- D. Turner, K. Levchenko, A. C. Snoeren, and S. Savage. California fault lines: understanding the causes and impact of network failures. SIGCOMM 2011. Google ScholarDigital Library
- W. Vogels. Eventually consistent. CACM, 52(1): 40--44, Jan. 2009. Google ScholarDigital Library
- M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso. Database replication techniques: A three parameter classification. In SRDS 2000. Google ScholarDigital Library
- Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: avoiding long tails in the cloud. In NSDI, 2013. Google ScholarDigital Library
- M. Zawirski, A. Bieniusa, V. Balegas, N. Preguica, S. Duarte, M. Shapiro, and C. Baquero. Geo-replication all the way to the edge. Personal communication and draft under submission. http://tinyurl.com/cp68svy.Google Scholar
Recommendations
Using Paxos to build a scalable, consistent, and highly available datastore
Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either ...
Comments