Abstract
Synchronization primitives for fault-tolerant distributed systems that ensure an effective and efficient cooperation among processes are an important challenge in the programming languages community. We present a new programming abstraction, ReSync, for implementing benign and Byzantine fault-tolerant protocols. ReSync has a new round structure that offers a simple abstraction for group communication, like it is customary in synchronous systems, but also allows messages to be received one by one, like in the asynchronous systems. This extension allows implementing network and algorithm-specific policies for the message reception, which is not possible in classic round models.
The execution of ReSync programs is based on a new generic round switch protocol that generalizes the famous theoretical result about consensus in the presence of partial synchrony by of Dwork, Lynch, and Stockmeyer. We evaluate experimentally the performance of ReSync’s execution platform, by comparing consensus implementations in ReSync with LibPaxos3, etcd, and Bft-SMaRt, three consensus libraries tolerant to benign, resp. byzantine faults.
- 2019. etcd. Retrieved Nov 21, 2019 from https://etcd.io/Google Scholar
- Benjamin Aminof, Sasha Rubin, Ilina Stoilkovska, Josef Widder, and Florian Zuleger. 2018. Parameterized Model Checking of Synchronous Distributed Algorithms by Abstraction. In VMCAI. 1-24.Google Scholar
- Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, François Garillot, Dahlia Malkhi Zekun Li, Oded Naor, Dmitri Perelman, and Alberto Sonnino. 2019. State Machine Replication in the Libra Blockchain. https://developers. libra.org/docs/assets/papers/libra-consensus-state-machine-replication-in-the-libra-blockchain.pdf.Google Scholar
- Alysson Neves Bessani, João Sousa, and Eduardo Adílio Pelinson Alchieri. 2014. State Machine Replication for the Masses with BFT-SMART. In 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA, June 23-26, 2014. IEEE Computer Society, 355-362. https://doi.org/10.1109/DSN. 2014.43 Google ScholarDigital Library
- Martin Biely, Bernadette Charron-Bost, Antoine Gaillard, Martin Hutle, André Schiper, and Josef Widder. 2007. Tolerating corrupted communication. In PODC. 244-253.Google Scholar
- K. Birman and T. Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. SIGOPS Oper. Syst. Rev. 21, 5 (Nov. 1987 ), 123-138.Google ScholarDigital Library
- Manuel Bravo, Gregory Chockler, and Alexey Gotsman. 2020. Making Byzantine consensus live. International Symposium on Distributed Computing ( 2020 ), to appear.Google Scholar
- Ethan Buchman. 2016. Tendermint: Byzantine Fault Tolerance in the Age of Blockchains. Master's thesis. University of Guelph. http://hdl.handle.net/10214/9769.Google Scholar
- Ethan Buchman, Jae Kwon, and Zarko Milosevic. 2018. The latest gossip on BFT consensus. CoRR abs/ 1807.04938 ( 2018 ). http://arxiv.org/abs/ 1807.04938Google Scholar
- CASSANDRA 2013. Bug report. https://issues.apache.org/jira/browse/CASSANDRA-6023. accessed May 2020.Google Scholar
- Miguel Castro and Barbara Liskov. 2002. Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20, 4 ( 2002 ), 398-461. https://doi.org/10.1145/571637.571640 Google ScholarDigital Library
- Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable Failure Detectors for Reliable Distributed Systems. J. ACM 43, 2 ( 1996 ), 225-267.Google ScholarDigital Library
- Mouna Chaouch-Saad, Bernadette Charron-Bost, and Stephan Merz. 2009. A Reduction Theorem for the Verification of Round-Based Distributed Algorithms. In RP (LNCS), Vol. 5797. 93-106.Google Scholar
- Bernadette Charron-Bost and Stephan Merz. 2009. Formal Verification of a Consensus Algorithm in the Heard-Of Model. Int. J. Software and Informatics 3, 2-3 ( 2009 ), 273-303.Google Scholar
- Bernadette Charron-Bost and André Schiper. 2009. The Heard-Of model: computing in distributed systems with benign faults. Distributed Computing 22, 1 ( 2009 ), 49-71.Google Scholar
- Tyler Crain, Vincent Gramoli, Mikel Larrea, and Michel Raynal. 2018. DBFT: Eficient Leaderless Byzantine Consensus and its Application to Blockchains. In 17th IEEE International Symposium on Network Computing and Applications, NCA 2018, Cambridge, MA, USA, November 1-3, 2018. 1-8.Google ScholarCross Ref
- Andrei Damian, Cezara Drăgoi, Alexandru Militaru, and Josef Widder. 2019. Communication-closed asynchronous protocols. In CAV. (to appear).Google Scholar
- Henri Debrat and Stephan Merz. 2012. Verifying Fault-Tolerant Distributed Algorithms in the Heard-Of Model. Archive of Formal Proofs 2012 ( 2012 ).Google Scholar
- Ankush Desai, Vivek Gupta, Ethan K. Jackson, Shaz Qadeer, Sriram K. Rajamani, and Damien Zuferey. 2013. P: safe asynchronous event-driven programming. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013. 321-332.Google ScholarDigital Library
- Cezara Dragoi, Thomas A. Henzinger, Helmut Veith, Josef Widder, and Damien Zuferey. 2014. A Logic-Based Framework for Verifying Consensus Algorithms. In VMCAI, Kenneth L. McMillan and Xavier Rival (Eds.). Springer, 161-181.Google Scholar
- Cezara Drăgoi, Thomas A. Henzinger, and Damien Zuferey. 2016. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL. 400-415.Google ScholarDigital Library
- Partha Dutta, Rachid Guerraoui, and Leslie Lamport. 2005. How Fast Can Eventual Synchrony Lead to Consensus?. In 2005 International Conference on Dependable Systems and Networks (DSN 2005 ), 28 June-1 July 2005, Yokohama, Japan, Proceedings. 22-27.Google ScholarDigital Library
- Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony. JACM 35, 2 (April 1988 ), 288-323.Google ScholarDigital Library
- Tzilla Elrad and Nissim Francez. 1982. Decomposition of Distributed Programs into Communication-Closed Layers. Sci. Comput. Program. 2, 3 ( 1982 ), 155-173.Google Scholar
- Colin Fidge. 1991. Logical Time in Distributed Computing Systems. Computer 24, 8 (Aug. 1991 ), 28-33.Google ScholarDigital Library
- Ivana Filipovic, Peter W. O'Hearn, Noam Rinetzky, and Hongseok Yang. 2009. Abstraction for Concurrent Objects. In Programming Languages and Systems, 18th European Symposium on Programming, ESOP 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings (Lecture Notes in Computer Science), Giuseppe Castagna (Ed.), Vol. 5502. Springer, 252-266. https://doi.org/10.1007/978-3-642-00590-9_19 Google ScholarDigital Library
- Michael J. Fischer, Nancy A. Lynch, and M. S. Paterson. 1985. Impossibility of Distributed Consensus with one Faulty Process. J. ACM 32, 2 (April 1985 ), 374-382.Google ScholarDigital Library
- Eli Gafni. 1998. Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony (Extended Abstract). In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, PODC '98, Puerto Vallarta, Mexico, June 28-July 2, 1998, Brian A. Coan and Yehuda Afek (Eds.). 143-152.Google ScholarDigital Library
- Flavio Paiva Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In DSN. 245-256.Google Scholar
- Ramakrishna Kotla, Lorenzo Alvisi, Michael Dahlin, Allen Clement, and Edmund L. Wong. 2009. Zyzzyva: Speculative Byzantine Fault Tolerance. ACM Trans. Comput. Syst. 27, 4 ( 2009 ), 7 : 1-7 : 39.Google ScholarDigital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. Operating Systems Review 44, 2 ( 2010 ), 35-40.Google ScholarDigital Library
- Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report. 60 pages. https://www.microsoft.com/enus/research/publication/generalized-consensus-and-paxos/Google Scholar
- Richard J. Lipton. 1975. Reduction: A Method of Proving Properties of Parallel Programs. Commun. ACM 18, 12 ( 1975 ), 717-721.Google Scholar
- Yanhong A. Liu, Scott D. Stoller, Bo Lin, and Michael Gorbovitski. 2012. From clarity to eficiency for distributed algorithms. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, Gary T. Leavens and Matthew B. Dwyer (Eds.). ACM, 395-410. https://doi.org/10.1145/2384616.2384645 Google ScholarDigital Library
- Nancy Lynch. 1996. Distributed Algorithms. Morgan Kaufman.Google Scholar
- Ognjen Marić, Christoph Sprenger, and David A. Basin. 2017. Cutof Bounds for Consensus Algorithms. In CAV. 217-237.Google Scholar
- Zarko Milosevic, Martin Hutle, and André Schiper. 2009. Unifying Byzantine Consensus Algorithms with Weak Interactive Consistency. In Principles of Distributed Systems, 13th International Conference, OPODIS 2009, Nîmes, France, December 15-18, 2009. Proceedings. 300-314.Google Scholar
- Oded Naor, Mathieu Baudet, Dahlia Malkhi, and Alexander Spiegelman. 2019. Lumière: Byzantine View Synchronization. CoRR abs/ 1909.05204 ( 2019 ). arXiv: 1909.05204 http://arxiv.org/abs/ 1909.05204Google Scholar
- Brian M. Oki and Barbara Liskov. 1988. Viewstamped Replication: A General Primary Copy. In PODC. 8-17.Google Scholar
- Diego Ongaro and John K. Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In 2014 USENIX Annual Technical Conference, USENIX ATC '14, Philadelphia, PA, USA, June 19-20, 2014., Garth Gibson and Nickolai Zeldovich (Eds.). 305-319. https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaroGoogle ScholarDigital Library
- Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: safety verification by interactive generalization. In PLDI. 614-630.Google Scholar
- Nicola Santoro and Peter Widmayer. 1989. Time is not a healer. In STACS (LNCS), Vol. 349. 304-313.Google Scholar
- Nicola Santoro and Peter Widmayer. 2007. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384, 2-3 ( 2007 ), 232-249.Google ScholarDigital Library
- Daniele Sciascia. 2016. LibPaxos3. Retrieved July 9, 2019 from https://bitbucket.org/sciascid/libpaxos/Google Scholar
- Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. III Scherer. 2008. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of the 22nd Annual International Conference on Supercomputing, ICS 2008, Island of Kos, Greece, June 7-12, 2008. 277-288.Google Scholar
- T. K. Srikanth and Sam Toueg. 1987. Simulating Authenticated Broadcasts to Derive Simple Fault-Tolerant Algorithms. Distributed Computing 2, 2 ( 1987 ), 80-94. https://doi.org/10.1007/BF01667080 Google ScholarDigital Library
- Ilina Stoilkovska, Igor Konnov, Josef Widder, and Florian Zuleger. 2019. Verifying Safety of Synchronous Fault-Tolerant Algorithms Bounded Model Checking. In TACAS, Part II (LNCS), Vol. 11428. 357-374.Google Scholar
- Pierre Sutra. 2019. On the correctness of Egalitarian Paxos. CoRR abs/ 1906.10917 ( 2019 ). http://arxiv.org/abs/ 1906.10917Google Scholar
- Leslie G. Valiant. 1990. A Bridging Model for Parallel Computation. Commun. ACM 33, 8 ( 1990 ), 103-111. https: //doi.org/10.1145/79173.79181 Google ScholarDigital Library
- Robbert van Renesse, Kenneth P. Birman, and Silvano Mafeis. 1996. Horus: A Flexible Group Communication System. Commun. ACM 39, 4 ( 1996 ), 76-83. https://doi.org/10.1145/227210.227229 Google ScholarDigital Library
- Klaus von Gleissenthall, Rami Gökhan Kici, Alexander Bakst, Deian Stefan, and Ranjit Jhala. 2019. Pretend synchrony: synchronous verification of asynchronous distributed programs. PACMPL 3, POPL ( 2019 ), 59 : 1-59 : 30. https://doi.org/10. 1145/3290372 Google ScholarDigital Library
- Josef Widder and Ulrich Schmid. 2009. The Theta-Model: achieving synchrony without clocks. Distributed Comput. 22, 1 ( 2009 ), 29-47. https://doi.org/10.1007/s00446-009-0080-x Google ScholarDigital Library
- Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan-Gueta, and Ittai Abraham. 2019. HotStuf: BFT Consensus with Linearity and Responsiveness. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Toronto, ON, Canada, July 29-August 2, 2019. 347-356.Google ScholarDigital Library
- ZOOKEEPER 2017. Data Inconsistency bug report ZOOKEEPER-2832. https://issues.apache.org/jira/browse/ZOOKEEPER2832. accessed May 2020.Google Scholar
Index Terms
- Programming at the edge of synchrony
Recommendations
Self-stabilizing 2m-clock for unidirectional rings of odd size
In this paper, we propose a self-stabilizing <i>K</i>- clock protocol for unidirectional rings with odd size, where <i>K</i> = 2<sup><i>m</i></sup> and <i>m</i> is any positive integer. Besides the variable for maintaining the clock, the proposed ...
PSync: a partially synchronous language for fault-tolerant distributed algorithms
POPL '16Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the ...
PSync: a partially synchronous language for fault-tolerant distributed algorithms
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesFault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the ...
Comments