Skip to main content
Erschienen in: Computing 2/2019

21.08.2018

Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers

verfasst von: Michael Eischer, Tobias Distler

Erschienen in: Computing | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

When provided with more powerful or extra hardware, state-of-the-art Byzantine fault-tolerant (BFT) replication protocols are unable to effectively exploit the additional computing resources: on the one hand, in settings with heterogeneous servers existing protocols cannot fully utilize servers with higher performance capabilities. On the other hand, using more servers than the minimum number of replicas required for Byzantine fault tolerance in general does not lead to improved throughput and latency, but instead actually degrades performance. In this paper, we address these problems with Omada, a BFT system architecture that is able to benefit from additional hardware resources. To achieve this property while still providing strong consistency, Omada first parallelizes agreement into multiple groups and then executes the requests handled by different groups in a deterministic order. By varying the number of requests to be ordered between groups as well as the number of groups that a replica participates in between servers, Omada offers the possibility to individually adjust the resource usage per server. Moreover, the fact that not all replicas need to take part in every group enables the architecture to exploit additional servers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aguilera MK, Strom RE (2000) Efficient atomic broadcast using deterministic merge. In: Proceedings of the 19th symposium on principles of distributed computing, pp 209–218 Aguilera MK, Strom RE (2000) Efficient atomic broadcast using deterministic merge. In: Proceedings of the 19th symposium on principles of distributed computing, pp 209–218
2.
Zurück zum Zitat Amir Y, Coan B, Kirsch J, Lane J (2007) Customizable fault tolerance for wide-area replication. In: Proceedings of the 26th international symposium on reliable distributed systems, pp 65–82 Amir Y, Coan B, Kirsch J, Lane J (2007) Customizable fault tolerance for wide-area replication. In: Proceedings of the 26th international symposium on reliable distributed systems, pp 65–82
3.
Zurück zum Zitat Amir Y, Coan B, Kirsch J, Lane J (2011) Prime: Byzantine replication under attack. In: IEEE transactions on dependable and secure computing, vol 8, no 4, pp 564–577 Amir Y, Coan B, Kirsch J, Lane J (2011) Prime: Byzantine replication under attack. In: IEEE transactions on dependable and secure computing, vol 8, no 4, pp 564–577
4.
Zurück zum Zitat Aublin PL, Mokhtar SB, Quéma V (2013) RBFT: Redundant Byzantine fault tolerance. In: Proceedings of the 33rd international conference on distributed computing systems, pp 297–306 Aublin PL, Mokhtar SB, Quéma V (2013) RBFT: Redundant Byzantine fault tolerance. In: Proceedings of the 33rd international conference on distributed computing systems, pp 297–306
5.
Zurück zum Zitat Babay A, Amir Y (2016) Fast total ordering for modern data centers. In: Proceedings of the 36th international conference on distributed computing systems, pp 669–679 Babay A, Amir Y (2016) Fast total ordering for modern data centers. In: Proceedings of the 36th international conference on distributed computing systems, pp 669–679
6.
Zurück zum Zitat Behl J, Distler T, Kapitza R (2015) Consensus-oriented parallelization: how to earn your first million. In: Proceedings of the 16th middleware conference, pp 173–184 Behl J, Distler T, Kapitza R (2015) Consensus-oriented parallelization: how to earn your first million. In: Proceedings of the 16th middleware conference, pp 173–184
7.
Zurück zum Zitat Bessani A, Sousa J, Alchieri EEP (2014) State machine replication for the masses with BFT-SMaRt. In: Proceedings of the 44th international conference on dependable systems networks, pp 355–362 Bessani A, Sousa J, Alchieri EEP (2014) State machine replication for the masses with BFT-SMaRt. In: Proceedings of the 44th international conference on dependable systems networks, pp 355–362
8.
Zurück zum Zitat Castro M, Liskov B (1999) Practical Byzantine fault tolerance. In: Proceedings of the 3rd symposium on operating systems design and implementation, pp 173–186 Castro M, Liskov B (1999) Practical Byzantine fault tolerance. In: Proceedings of the 3rd symposium on operating systems design and implementation, pp 173–186
9.
Zurück zum Zitat Castro M, Rodrigues R, Liskov B (2003) BASE: using abstraction to improve fault tolerance. In: ACM transactions on computer systems, vol 21, no 3, pp 236–269 Castro M, Rodrigues R, Liskov B (2003) BASE: using abstraction to improve fault tolerance. In: ACM transactions on computer systems, vol 21, no 3, pp 236–269
10.
Zurück zum Zitat Clement A, Kapritsos M, Lee S, Wang Y, Alvisi L, Dahlin M, Riche T (2009) UpRight cluster services. In: Proceedings of the 22nd symposium on operating systems principles, pp 277–290 Clement A, Kapritsos M, Lee S, Wang Y, Alvisi L, Dahlin M, Riche T (2009) UpRight cluster services. In: Proceedings of the 22nd symposium on operating systems principles, pp 277–290
11.
Zurück zum Zitat Distler T, Kapitza R, Reiser HP (2010) State transfer for hypervisor-based proactive recovery of heterogeneous replicated services. In: Proceedings of the 5th “Sicherheit, Schutz und Zuverlässigkeit” conference, pp 61–72 Distler T, Kapitza R, Reiser HP (2010) State transfer for hypervisor-based proactive recovery of heterogeneous replicated services. In: Proceedings of the 5th “Sicherheit, Schutz und Zuverlässigkeit” conference, pp 61–72
12.
Zurück zum Zitat Distler T, Cachin C, Kapitza R (2016) Resource-efficient Byzantine fault tolerance. In: IEEE transactions on computers, vol 65, no 9, pp 2807–2819 Distler T, Cachin C, Kapitza R (2016) Resource-efficient Byzantine fault tolerance. In: IEEE transactions on computers, vol 65, no 9, pp 2807–2819
13.
Zurück zum Zitat Garcia M, Bessani A, Gashi I, Neves N, Obelheiro R (2014) Analysis of operating system diversity for intrusion tolerance. In: Software—practice & experience, vol 44, no 6, pp 735–770 Garcia M, Bessani A, Gashi I, Neves N, Obelheiro R (2014) Analysis of operating system diversity for intrusion tolerance. In: Software—practice & experience, vol 44, no 6, pp 735–770
14.
Zurück zum Zitat Hunt P, Konar M, Junqueira F, Reed B (2010) ZooKeeper: wait-free coordination for Internet-scale systems. In: Proceedings of the 2010 USENIX annual technical conference, pp 145–158 Hunt P, Konar M, Junqueira F, Reed B (2010) ZooKeeper: wait-free coordination for Internet-scale systems. In: Proceedings of the 2010 USENIX annual technical conference, pp 145–158
15.
Zurück zum Zitat Junqueira F, Bhagwan R, Hevia A, Marzullo K, Voelker GM (2005) Surviving Internet catastrophes. In: Proceedings of the 2005 USENIX annual technical conference, pp 45–60 Junqueira F, Bhagwan R, Hevia A, Marzullo K, Voelker GM (2005) Surviving Internet catastrophes. In: Proceedings of the 2005 USENIX annual technical conference, pp 45–60
16.
Zurück zum Zitat Kapitza R, Behl J, Cachin C, Distler T, Kuhnle S, Mohammadi SV, Schröder-Preikschat W, Stengel K (2012) CheapBFT: resource-efficient Byzantine fault tolerance. In: Proceedings of the 7th European conference on computer systems, pp 295–308 Kapitza R, Behl J, Cachin C, Distler T, Kuhnle S, Mohammadi SV, Schröder-Preikschat W, Stengel K (2012) CheapBFT: resource-efficient Byzantine fault tolerance. In: Proceedings of the 7th European conference on computer systems, pp 295–308
17.
Zurück zum Zitat Kapritsos M, Junqueira FP (2010) Scalable agreement: toward ordering as a service. In: Proceedings of the 6th workshop on hot topics in system dependability, pp 7–12 Kapritsos M, Junqueira FP (2010) Scalable agreement: toward ordering as a service. In: Proceedings of the 6th workshop on hot topics in system dependability, pp 7–12
18.
Zurück zum Zitat Li B, Xu W, Abid MZ, Distler T, Kapitza R (2016) SAREK: optimistic parallel ordering in Byzantine fault tolerance. In: Proceedings of the 12th European dependable computing conference, pp 77–88 Li B, Xu W, Abid MZ, Distler T, Kapitza R (2016) SAREK: optimistic parallel ordering in Byzantine fault tolerance. In: Proceedings of the 12th European dependable computing conference, pp 77–88
19.
Zurück zum Zitat Mao Y, Junqueira FP, Marzullo K (2008) Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th conference on operating systems design and implementation, pp 369–384 Mao Y, Junqueira FP, Marzullo K (2008) Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th conference on operating systems design and implementation, pp 369–384
20.
Zurück zum Zitat Ou Z, Zhuang H, Lukyanenko A, Nurminen JK, Hui P, Mazalov V, Ylä-Jääski A (2013) Is the same instance type created equal? Exploiting heterogeneity of public clouds. In: IEEE transactions on cloud computing, vol 1, no 2, pp 201–214 Ou Z, Zhuang H, Lukyanenko A, Nurminen JK, Hui P, Mazalov V, Ylä-Jääski A (2013) Is the same instance type created equal? Exploiting heterogeneity of public clouds. In: IEEE transactions on cloud computing, vol 1, no 2, pp 201–214
21.
Zurück zum Zitat Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Dover Publications, New YorkMATH Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Dover Publications, New YorkMATH
23.
Zurück zum Zitat Veronese GS, Correia M, Bessani AN, Lung LC (2009) Spin one’s wheels? Byzantine fault tolerance with a spinning primary. In: Proceedings of the 28th international symposium on reliable distributed systems, pp 135–144 Veronese GS, Correia M, Bessani AN, Lung LC (2009) Spin one’s wheels? Byzantine fault tolerance with a spinning primary. In: Proceedings of the 28th international symposium on reliable distributed systems, pp 135–144
24.
Zurück zum Zitat Veronese GS, Correia M, Bessani AN, Lung LC (2010) EBAWA: efficient Byzantine agreement for wide-area networks. In: Proceedings of the 12th symposium on high-assurance systems engineering, pp 10–19 Veronese GS, Correia M, Bessani AN, Lung LC (2010) EBAWA: efficient Byzantine agreement for wide-area networks. In: Proceedings of the 12th symposium on high-assurance systems engineering, pp 10–19
25.
Zurück zum Zitat Yin J, Martin JP, Venkataramani A, Alvisi L, Dahlin M (2003) Separating agreement from execution for Byzantine fault tolerant services. In: Proceedings of the 19th symposium on operating systems principles, pp 253–267 Yin J, Martin JP, Venkataramani A, Alvisi L, Dahlin M (2003) Separating agreement from execution for Byzantine fault tolerant services. In: Proceedings of the 19th symposium on operating systems principles, pp 253–267
Metadaten
Titel
Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers
verfasst von
Michael Eischer
Tobias Distler
Publikationsdatum
21.08.2018
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 2/2019
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-018-0652-3

Weitere Artikel der Ausgabe 2/2019

Computing 2/2019 Zur Ausgabe

Premium Partner