skip to main content
research-article
Open Access

Programming at the edge of synchrony

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Synchronization primitives for fault-tolerant distributed systems that ensure an effective and efficient cooperation among processes are an important challenge in the programming languages community. We present a new programming abstraction, ReSync, for implementing benign and Byzantine fault-tolerant protocols. ReSync has a new round structure that offers a simple abstraction for group communication, like it is customary in synchronous systems, but also allows messages to be received one by one, like in the asynchronous systems. This extension allows implementing network and algorithm-specific policies for the message reception, which is not possible in classic round models.

The execution of ReSync programs is based on a new generic round switch protocol that generalizes the famous theoretical result about consensus in the presence of partial synchrony by of Dwork, Lynch, and Stockmeyer. We evaluate experimentally the performance of ReSync’s execution platform, by comparing consensus implementations in ReSync with LibPaxos3, etcd, and Bft-SMaRt, three consensus libraries tolerant to benign, resp. byzantine faults.

References

  1. 2019. etcd. Retrieved Nov 21, 2019 from https://etcd.io/Google ScholarGoogle Scholar
  2. Benjamin Aminof, Sasha Rubin, Ilina Stoilkovska, Josef Widder, and Florian Zuleger. 2018. Parameterized Model Checking of Synchronous Distributed Algorithms by Abstraction. In VMCAI. 1-24.Google ScholarGoogle Scholar
  3. Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, François Garillot, Dahlia Malkhi Zekun Li, Oded Naor, Dmitri Perelman, and Alberto Sonnino. 2019. State Machine Replication in the Libra Blockchain. https://developers. libra.org/docs/assets/papers/libra-consensus-state-machine-replication-in-the-libra-blockchain.pdf.Google ScholarGoogle Scholar
  4. Alysson Neves Bessani, João Sousa, and Eduardo Adílio Pelinson Alchieri. 2014. State Machine Replication for the Masses with BFT-SMART. In 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA, June 23-26, 2014. IEEE Computer Society, 355-362. https://doi.org/10.1109/DSN. 2014.43 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Martin Biely, Bernadette Charron-Bost, Antoine Gaillard, Martin Hutle, André Schiper, and Josef Widder. 2007. Tolerating corrupted communication. In PODC. 244-253.Google ScholarGoogle Scholar
  6. K. Birman and T. Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. SIGOPS Oper. Syst. Rev. 21, 5 (Nov. 1987 ), 123-138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Manuel Bravo, Gregory Chockler, and Alexey Gotsman. 2020. Making Byzantine consensus live. International Symposium on Distributed Computing ( 2020 ), to appear.Google ScholarGoogle Scholar
  8. Ethan Buchman. 2016. Tendermint: Byzantine Fault Tolerance in the Age of Blockchains. Master's thesis. University of Guelph. http://hdl.handle.net/10214/9769.Google ScholarGoogle Scholar
  9. Ethan Buchman, Jae Kwon, and Zarko Milosevic. 2018. The latest gossip on BFT consensus. CoRR abs/ 1807.04938 ( 2018 ). http://arxiv.org/abs/ 1807.04938Google ScholarGoogle Scholar
  10. CASSANDRA 2013. Bug report. https://issues.apache.org/jira/browse/CASSANDRA-6023. accessed May 2020.Google ScholarGoogle Scholar
  11. Miguel Castro and Barbara Liskov. 2002. Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20, 4 ( 2002 ), 398-461. https://doi.org/10.1145/571637.571640 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable Failure Detectors for Reliable Distributed Systems. J. ACM 43, 2 ( 1996 ), 225-267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mouna Chaouch-Saad, Bernadette Charron-Bost, and Stephan Merz. 2009. A Reduction Theorem for the Verification of Round-Based Distributed Algorithms. In RP (LNCS), Vol. 5797. 93-106.Google ScholarGoogle Scholar
  14. Bernadette Charron-Bost and Stephan Merz. 2009. Formal Verification of a Consensus Algorithm in the Heard-Of Model. Int. J. Software and Informatics 3, 2-3 ( 2009 ), 273-303.Google ScholarGoogle Scholar
  15. Bernadette Charron-Bost and André Schiper. 2009. The Heard-Of model: computing in distributed systems with benign faults. Distributed Computing 22, 1 ( 2009 ), 49-71.Google ScholarGoogle Scholar
  16. Tyler Crain, Vincent Gramoli, Mikel Larrea, and Michel Raynal. 2018. DBFT: Eficient Leaderless Byzantine Consensus and its Application to Blockchains. In 17th IEEE International Symposium on Network Computing and Applications, NCA 2018, Cambridge, MA, USA, November 1-3, 2018. 1-8.Google ScholarGoogle ScholarCross RefCross Ref
  17. Andrei Damian, Cezara Drăgoi, Alexandru Militaru, and Josef Widder. 2019. Communication-closed asynchronous protocols. In CAV. (to appear).Google ScholarGoogle Scholar
  18. Henri Debrat and Stephan Merz. 2012. Verifying Fault-Tolerant Distributed Algorithms in the Heard-Of Model. Archive of Formal Proofs 2012 ( 2012 ).Google ScholarGoogle Scholar
  19. Ankush Desai, Vivek Gupta, Ethan K. Jackson, Shaz Qadeer, Sriram K. Rajamani, and Damien Zuferey. 2013. P: safe asynchronous event-driven programming. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013. 321-332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cezara Dragoi, Thomas A. Henzinger, Helmut Veith, Josef Widder, and Damien Zuferey. 2014. A Logic-Based Framework for Verifying Consensus Algorithms. In VMCAI, Kenneth L. McMillan and Xavier Rival (Eds.). Springer, 161-181.Google ScholarGoogle Scholar
  21. Cezara Drăgoi, Thomas A. Henzinger, and Damien Zuferey. 2016. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL. 400-415.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Partha Dutta, Rachid Guerraoui, and Leslie Lamport. 2005. How Fast Can Eventual Synchrony Lead to Consensus?. In 2005 International Conference on Dependable Systems and Networks (DSN 2005 ), 28 June-1 July 2005, Yokohama, Japan, Proceedings. 22-27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony. JACM 35, 2 (April 1988 ), 288-323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tzilla Elrad and Nissim Francez. 1982. Decomposition of Distributed Programs into Communication-Closed Layers. Sci. Comput. Program. 2, 3 ( 1982 ), 155-173.Google ScholarGoogle Scholar
  25. Colin Fidge. 1991. Logical Time in Distributed Computing Systems. Computer 24, 8 (Aug. 1991 ), 28-33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ivana Filipovic, Peter W. O'Hearn, Noam Rinetzky, and Hongseok Yang. 2009. Abstraction for Concurrent Objects. In Programming Languages and Systems, 18th European Symposium on Programming, ESOP 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings (Lecture Notes in Computer Science), Giuseppe Castagna (Ed.), Vol. 5502. Springer, 252-266. https://doi.org/10.1007/978-3-642-00590-9_19 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michael J. Fischer, Nancy A. Lynch, and M. S. Paterson. 1985. Impossibility of Distributed Consensus with one Faulty Process. J. ACM 32, 2 (April 1985 ), 374-382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Eli Gafni. 1998. Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony (Extended Abstract). In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, PODC '98, Puerto Vallarta, Mexico, June 28-July 2, 1998, Brian A. Coan and Yehuda Afek (Eds.). 143-152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Flavio Paiva Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In DSN. 245-256.Google ScholarGoogle Scholar
  30. Ramakrishna Kotla, Lorenzo Alvisi, Michael Dahlin, Allen Clement, and Edmund L. Wong. 2009. Zyzzyva: Speculative Byzantine Fault Tolerance. ACM Trans. Comput. Syst. 27, 4 ( 2009 ), 7 : 1-7 : 39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. Operating Systems Review 44, 2 ( 2010 ), 35-40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report. 60 pages. https://www.microsoft.com/enus/research/publication/generalized-consensus-and-paxos/Google ScholarGoogle Scholar
  33. Richard J. Lipton. 1975. Reduction: A Method of Proving Properties of Parallel Programs. Commun. ACM 18, 12 ( 1975 ), 717-721.Google ScholarGoogle Scholar
  34. Yanhong A. Liu, Scott D. Stoller, Bo Lin, and Michael Gorbovitski. 2012. From clarity to eficiency for distributed algorithms. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, Gary T. Leavens and Matthew B. Dwyer (Eds.). ACM, 395-410. https://doi.org/10.1145/2384616.2384645 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nancy Lynch. 1996. Distributed Algorithms. Morgan Kaufman.Google ScholarGoogle Scholar
  36. Ognjen Marić, Christoph Sprenger, and David A. Basin. 2017. Cutof Bounds for Consensus Algorithms. In CAV. 217-237.Google ScholarGoogle Scholar
  37. Zarko Milosevic, Martin Hutle, and André Schiper. 2009. Unifying Byzantine Consensus Algorithms with Weak Interactive Consistency. In Principles of Distributed Systems, 13th International Conference, OPODIS 2009, Nîmes, France, December 15-18, 2009. Proceedings. 300-314.Google ScholarGoogle Scholar
  38. Oded Naor, Mathieu Baudet, Dahlia Malkhi, and Alexander Spiegelman. 2019. Lumière: Byzantine View Synchronization. CoRR abs/ 1909.05204 ( 2019 ). arXiv: 1909.05204 http://arxiv.org/abs/ 1909.05204Google ScholarGoogle Scholar
  39. Brian M. Oki and Barbara Liskov. 1988. Viewstamped Replication: A General Primary Copy. In PODC. 8-17.Google ScholarGoogle Scholar
  40. Diego Ongaro and John K. Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In 2014 USENIX Annual Technical Conference, USENIX ATC '14, Philadelphia, PA, USA, June 19-20, 2014., Garth Gibson and Nickolai Zeldovich (Eds.). 305-319. https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaroGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  41. Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: safety verification by interactive generalization. In PLDI. 614-630.Google ScholarGoogle Scholar
  42. Nicola Santoro and Peter Widmayer. 1989. Time is not a healer. In STACS (LNCS), Vol. 349. 304-313.Google ScholarGoogle Scholar
  43. Nicola Santoro and Peter Widmayer. 2007. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384, 2-3 ( 2007 ), 232-249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Daniele Sciascia. 2016. LibPaxos3. Retrieved July 9, 2019 from https://bitbucket.org/sciascid/libpaxos/Google ScholarGoogle Scholar
  45. Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. III Scherer. 2008. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of the 22nd Annual International Conference on Supercomputing, ICS 2008, Island of Kos, Greece, June 7-12, 2008. 277-288.Google ScholarGoogle Scholar
  46. T. K. Srikanth and Sam Toueg. 1987. Simulating Authenticated Broadcasts to Derive Simple Fault-Tolerant Algorithms. Distributed Computing 2, 2 ( 1987 ), 80-94. https://doi.org/10.1007/BF01667080 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ilina Stoilkovska, Igor Konnov, Josef Widder, and Florian Zuleger. 2019. Verifying Safety of Synchronous Fault-Tolerant Algorithms Bounded Model Checking. In TACAS, Part II (LNCS), Vol. 11428. 357-374.Google ScholarGoogle Scholar
  48. Pierre Sutra. 2019. On the correctness of Egalitarian Paxos. CoRR abs/ 1906.10917 ( 2019 ). http://arxiv.org/abs/ 1906.10917Google ScholarGoogle Scholar
  49. Leslie G. Valiant. 1990. A Bridging Model for Parallel Computation. Commun. ACM 33, 8 ( 1990 ), 103-111. https: //doi.org/10.1145/79173.79181 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Robbert van Renesse, Kenneth P. Birman, and Silvano Mafeis. 1996. Horus: A Flexible Group Communication System. Commun. ACM 39, 4 ( 1996 ), 76-83. https://doi.org/10.1145/227210.227229 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Klaus von Gleissenthall, Rami Gökhan Kici, Alexander Bakst, Deian Stefan, and Ranjit Jhala. 2019. Pretend synchrony: synchronous verification of asynchronous distributed programs. PACMPL 3, POPL ( 2019 ), 59 : 1-59 : 30. https://doi.org/10. 1145/3290372 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Josef Widder and Ulrich Schmid. 2009. The Theta-Model: achieving synchrony without clocks. Distributed Comput. 22, 1 ( 2009 ), 29-47. https://doi.org/10.1007/s00446-009-0080-x Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan-Gueta, and Ittai Abraham. 2019. HotStuf: BFT Consensus with Linearity and Responsiveness. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Toronto, ON, Canada, July 29-August 2, 2019. 347-356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. ZOOKEEPER 2017. Data Inconsistency bug report ZOOKEEPER-2832. https://issues.apache.org/jira/browse/ZOOKEEPER2832. accessed May 2020.Google ScholarGoogle Scholar

Index Terms

  1. Programming at the edge of synchrony

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Programming Languages
            Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
            November 2020
            3108 pages
            EISSN:2475-1421
            DOI:10.1145/3436718
            Issue’s Table of Contents

            Copyright © 2020 Owner/Author

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 November 2020
            Published in pacmpl Volume 4, Issue OOPSLA

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader