Abstract
A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine Fault-Tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (“Byzantine”) ways. Despite this promise, practitioners hesitate to deploy BFT systems, at least partly because of the perception that BFT must impose high overheads.
In this article, we present Zyzzyva, a protocol that uses speculation to reduce the cost of BFT replication. In Zyzzyva, replicas reply to a client's request without first running an expensive three-phase commit protocol to agree on the order to process requests. Instead, they optimistically adopt the order proposed by a primary server, process the request, and reply immediately to the client. If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima and to achieve throughputs of tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services.
- Abd-El-Malek, M., Ganger, G., Goodson, G., Reiter, M., and Wylie, J. 2005. Fault-Scalable Byzantine fault-tolerant services. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 59--74. Google ScholarDigital Library
- Aiyer, A. S., Alvisi, L., Clement, A., Dahlin, M., Martin, J.-P., and Porth, C. 2005. BAR fault tolerance for cooperative services. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 45--58. Google ScholarDigital Library
- Amazon. 2008. Amazon S3 availability event: July 20, 2008. http://status.aws.amazon.com/s3-20080720.html.Google Scholar
- Bellare, M. and Micciancio, D. 1997. A new paradigm for collision-free hashing: Incrementally at reduced cost. In Proceedings of 14th Annual Eurocrypt Conference (Eurocrypt'97). 163--192. Google ScholarDigital Library
- Castro, M. 2001. Practical Byzantine fault tolerance. Ph.D. thesis, MIT, Cambridge, MA.Google Scholar
- Castro, M. and Liskov, B. 1999. Practical Byzantine fault tolerance. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI'99). 173--186. Google ScholarDigital Library
- Castro, M. and Liskov, B. 2000. Proactive recovery in a Byzantine-fault-tolerant system. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (OSDI'00). 273--288. Google ScholarDigital Library
- Castro, M. and Listov, B. 2002. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20, 4, 398--461. Google ScholarDigital Library
- Chun, B.-G., Maniatis, P., Shenker, S., and Kubiatowicz, J. 2007. Attested append-only memory: Making adversaries stick to their word. SIGOPS Oper. Syst. Rev. 41, 6, 189--204. Google ScholarDigital Library
- Clement, A., Kapritsos, M., Lee, S., Wang, Y., Alvisi, L., Dahlin, M., and Riche, T. 2009a. UpRight cluster services. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09). 270--290. Google ScholarDigital Library
- Clement, A., Marchetti, M., Wong, E., Alvisi, L., and Dahlin, M. 2009b. Making Byzantine fault tolerant services tolerate Byzantine faults. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI'09). 153--168. Google ScholarDigital Library
- Cowling, J., Myers, D., Liskov, B., Rodrigues, R., and Shrira, L. 2006. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). 177--190. Google ScholarDigital Library
- Dutta, P., Guerraoui, R., and Vukolić, M. 2005. Best-Case complexity of asynchronous Byzantine consensus. Tech. rep. EPFL/IC/200499, EPFL.Google Scholar
- Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM, 288--323. Google ScholarDigital Library
- Fischer, M., Lynch, N., and Paterson, M. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2, 374--382. Google ScholarDigital Library
- Gmail. 2006. Lost gmail emails and the future of Web apps. http://it.slashdot.org (12/29/06).Google Scholar
- Herlihy, M. and Wing, J. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3, 463--492. Google ScholarDigital Library
- Hotmail. 2004. Hotmail incinerates customer files. http://news.com.com, (6/3/04).Google Scholar
- Keeney, M., Kowalski, E., Cappelli, D., Moore, A., Shimeall, T., and Rogers, S. 2005. Insider threat study: Computer system sabotage in critical infrastructure sectors. http://www.cert.org/archive/pdf/insidercross051105.pdf.Google Scholar
- Kotla, R. 2008. xbft: Byzantine fault tolerance with high performance, low cost, and aggressive fault isolation. Ph.D. thesis, The University of Texas at Austin, Austin, TX. Google ScholarDigital Library
- Kotla, R., Alvisi, L., Dahlin, M., Clement, A., and Wong, E. 2007a. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07). 45--58. Google ScholarDigital Library
- Kotla, R. and Dahlin, M. 2004. High throughput Byzantine fault tolerance. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'04). 575--584. Google ScholarDigital Library
- Kotla, R., Dahlin, M., and Alvisi, L. 2007b. SafeStore: A durable and practical storage system. In Proceedings of the USENIX Annual Technical Conference. 129--142. Google ScholarDigital Library
- Lamport, Shostak, and Pease. 1982. The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4, 3, 382--401. Google ScholarDigital Library
- Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Comm. ACM 21, 7, 558--565. Google ScholarDigital Library
- Lamport, L. 1984. Using time instead of timeout for fault-tolerant distributed systems. ACM Trans. Program. Lang. Syst. 6, 2, 254--280. Google ScholarDigital Library
- Lamport, L. 2003. Lower bounds for asynchronous consensus. Lecture Notes in Computer Science, vol. 2584. Springer, 22--23. Google ScholarDigital Library
- Li, J. and Mazières, D. 2007. Beyond one-third faulty replicas in Byzantine fault tolerant services. In Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI'07). 131--144. Google ScholarDigital Library
- Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., and Shrira, L. 1991. Replication in the Harp file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. 226--238. Google ScholarDigital Library
- Martin, J.-P. and Alvisi, L. 2006. Fast Byzantine consensus. IEEE Trans. Depend. Secure. Comput. 3, 3, 202--215. Google ScholarDigital Library
- Nightingale, E., Veeraraghavan, K., Chen, P., and Flinn, J. 2006. Rethink the sync. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06). 1--14. Google ScholarDigital Library
- Nightingale, E. B., Chen, P., and Flinn, J. 2005. Speculative execution in a distributed file system. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 191--205. Google ScholarDigital Library
- OpenSSL. 2007. OpenSSL. http://www.openssl.org/.Google Scholar
- Pease, M., Shostak, R., and Lamport, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 2. Google ScholarDigital Library
- Prabhakaran, V., Bairavasundaram, L., Agrawal, N., Arpaci-Dusseau, H. G. A., and Arpaci-Dusseau, R. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 206--220. Google ScholarDigital Library
- Reiter, M. 1995. The Rampart toolkit for building high-integrity services. Lecture Notes in Computer Science, vol. 938. Springer, 99--110. Google ScholarDigital Library
- Rodrigues, R., Castro, M., and Liskov, B. 2001. BASE: Using abstraction to improve fault tolerance. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). 15--28. Google ScholarDigital Library
- Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R. W., and Ofir, J. 1999. Deciding when to forget in the Elephant file system. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP'99). 110--123. Google ScholarDigital Library
- Schneider, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4. Google ScholarDigital Library
- Singh, A., Das, T., Maniatis, P., Druschel, P., and Roscoe, T. 2008. BFT protocols under fire. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI'08). 189--204. Google ScholarDigital Library
- Singh, A., Fonseca, P., Kuznetsov, P., Rodrigues, R., and Maniatis, P. 2009. Zeno: Eventually consistent Byzantine fault tolerance. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI'09). 169--184. Google ScholarDigital Library
- Wester, B., Cowling, J., Nightingale, E. B., Chen, P. M., Flinn, J., and Liskov, B. 2009. Tolerating latency in replicated state machines through client speculation. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI'09). 245--260. Google ScholarDigital Library
- Wood, T., Singh, R., Venkataramani, A., and Shenoy, P. 2008. ZZ: Cheap practical BFT using virtualization. Tech. rep. TR14-08, University of Massachusetts, Amherst, MA.Google Scholar
- Yang, J., Sar, C., and Engler, D. 2006. Explode: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06). 131--146. Google ScholarDigital Library
- Yin, J., Martin, J.-P., Venkataramani, A., Alvisi, L., and Dahlin, M. 2003. Separating agreement from execution for Byzantine fault tolerant services. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). 253--267. Google ScholarDigital Library
Index Terms
- Zyzzyva: Speculative Byzantine fault tolerance
Recommendations
Zyzzyva: speculative byzantine fault tolerance
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principlesWe present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client's request without first running an expensive three-phase ...
Zyzzyva: speculative byzantine fault tolerance
SOSP '07We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client's request without first running an expensive three-phase ...
GRADE: Graceful Degradation in Byzantine Quorum Systems
SRDS '12: Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed SystemsDistributed storage systems are expected to provide correct services in the presence of Byzantine failures, which do not have any assumptions about the behavior of faulty servers and clients. In designing such systems, we often encounter the paradox of ...
Comments