skip to main content
10.1145/2815400.2815425acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access

No compromises: distributed transactions with consistency, availability, and performance

Published:04 October 2015Publication History

ABSTRACT

Transactions with strong consistency and high availability simplify building and reasoning about distributed systems. However, previous implementations performed poorly. This forced system designers to avoid transactions completely, to weaken consistency guarantees, or to provide single-machine transactions that require programmers to partition their data. In this paper, we show that there is no need to compromise in modern data centers. We show that a main memory distributed computing platform called FaRM can provide distributed transactions with strict serializability, high performance, durability, and high availability. FaRM achieves a peak throughput of 140 million TATP transactions per second on 90 machines with a 4.9 TB database, and it recovers from a failure in less than 50 ms. Key to achieving these results was the design of new transaction, replication, and recovery protocols from first principles to leverage commodity networks with RDMA and a new, inexpensive approach to providing non-volatile DRAM.

Skip Supplemental Material Section

Supplemental Material

p54.mp4

mp4

2.3 GB

References

  1. Memcached. http://memcached.org.Google ScholarGoogle Scholar
  2. Viking Technology. http://www.vikingtechnology.com/.Google ScholarGoogle Scholar
  3. Apache Cassandra. http://cassandra.apache.org/, 2015.Google ScholarGoogle Scholar
  4. MySQL. http://www.mysql.com/, 2015.Google ScholarGoogle Scholar
  5. neo4j. http://neo4j.com/, 2015.Google ScholarGoogle Scholar
  6. redis. http://redis.io/, 2015.Google ScholarGoogle Scholar
  7. Adya, A., Dunagan, J., and Wolman, A. Centrifuge: Integrated lease management and partitioning for cloud services. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation (2010), NSDI'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aaguilera, M. K., Merchant, A., Shah, M., Veitch, A., and Karamanolis, C. Sinfonia: A new paradigm for building scalable distributed systems. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (2007), SOSP'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A distributed storage system for structured data. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (2006), OSDI'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chockler, G. V., Keidar, I., and Vitenberg, R. Group communication specifications: a comprehensive study. ACM Computing Surveys (CSUR) 33, 4 (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J. J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W. C., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., and Woodford, D. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (2012), OSDI'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dalessandro, L., and Scott, M. L. Sandboxing transactional memory. In Proceedings of the 21st ACM International Conference on Parallel Architectures and Compilation Techniques (2012), PACT'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's highly available key-value store. In Proceedings of the the 21st ACM Symposium on Operating Systems Principles (2007), SOSP'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Diaconu, C., Freedman, C., Ismert, E., Larson, P.-Å., Mittal, P., Stonecipher, R., Verma, N., and Zwilling, M. Hekaton: SQL Server's memory-optimized OLTP engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2013), SIGMOD' 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dice, D., Shalev, O., and Shavit, N. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing (2006), DISC'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dragojević, A., Narayanan, D., Hodson, O., and Castro, M. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (2014), NSDI'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Graefe, G. Write-optimized B-trees. In Proceedings of the 30th International Conference on Very Large Data Bases (2004), VLDB'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gray, C., and Cheriton, D. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. SIGOPS Operating Systems Review (OSR) 23, 5 (1989). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gray, J., and Reuter, A. Transaction Processing: Concepts and Techniques. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Guerraoui, R., and Kapalka, M. On the correctness of transactional memory. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2008), PPoPP'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hunt, P., Konar, M., Junqueira, F. P., and Reed, B. Zookeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (2010), USENIX ATC'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. InfiniBand Trade Association. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 Annex A16: RDMA over Converged Ethernet (RoCE), 2010.Google ScholarGoogle Scholar
  23. Kalia, A., Kaminsky, M., and Andersen, D. G. Using RDMA efficiently for key-value services. In Proceedings of the 2014 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (2014), SIGCOMM'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lamport, L. The part-time parliament. ACM Transactions on Computer Systems 16, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lamport, L., Malkhi, D., and Zhou, L. Vertical Paxos and primary-backup replication. In Proceedings of the 28th ACM Symposium on Principles of Distributed Computing (2009), PODC'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Larson, P.-Å., Blanas, S., Diaconu, C., Freedman, C., Patel, J. M., and Zwilling, M. High-performance concurrency control mechanisms for main-memory databases. PVLDB 5, 4 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lehman, P. L., and Yao, S. B. Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems 6, 4 (Dec. 1981). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Microsoft. Scaling out SQL Server. http://www.microsoft.com/en-us/server-cloud/solutions/high-availability.aspx.Google ScholarGoogle Scholar
  29. Microsoft. Open CloudServer OCS V2 specification: Blade, 2014.Google ScholarGoogle Scholar
  30. Microsoft. OCS Open CloudServer power supply v2.0. http://www.opencompute.org/wiki/Server/SpecsAndDesigns, 2015.Google ScholarGoogle Scholar
  31. Mitchell, C., Yifeng, G., and Jinyang, L. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proceedings of the 2013 USENIX Annual Technical Conference (2013), USENIX ATC'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Neuvonen, S., Wolski, A., Manner, M., and Raatikka, V. Telecom Application Transaction Processing benchmark. http://tatpbenchmark.sourceforge.net/.Google ScholarGoogle Scholar
  33. Ongaro, D., Rumble, S. M., Stutsman, R., Ousterhout, J., and Rosenblum, M. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (2011), SOSP'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rumble, S. M., Kejriwal, A., and Ousterhout, J. Log-structured Memory for DRAM-based Storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (2014), FAST'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sethi, R. Useless actions make a difference: Strict serializability of database updates. JACM 29, 2 (1982). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shaun Harris. Microsoft reinvents datacenter power backup with new Open Compute project specification. http://blogs.msdn.com/b/windowsazure/archive/2012/11/13/windows-azure-benchmarks-show-top-performance-for-big-compute.aspx, 2015.Google ScholarGoogle Scholar
  37. Sowell, B., Golab, W. M., and Shah, M. A. Minuet: A scalable distributed multiversion B-tree. PVLDB 5, 9 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Transaction Processing Performance Council (TPC). TPC benchmark C: Standard specification. http://www.tpc.org.Google ScholarGoogle Scholar
  39. Tu, S., Zheng, W., Kohler, E., Liskov, B., and Madden, S. Speedy transactions in multicore in-memory databases. In Proceedings of the 24th Symposium on Operating Systems Principles (2013), SOSP'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zheng, W., Tu, S., Kohler, E., and Liskov, B. Fast databases with fast durability and recovery through multicore parallelism. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (2014), OSDI'14. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. No compromises: distributed transactions with consistency, availability, and performance

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles
                October 2015
                499 pages
                ISBN:9781450338349
                DOI:10.1145/2815400

                Copyright © 2015 Owner/Author

                Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 4 October 2015

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                SOSP '15 Paper Acceptance Rate30of181submissions,17%Overall Acceptance Rate131of716submissions,18%

                Upcoming Conference

                SOSP '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader