Skip to main content
Erschienen in: The Journal of Supercomputing 7/2015

01.07.2015

Camel: collective-aware message logging

verfasst von: Esteban Meneses, Laxmikant V. Kalé

Erschienen in: The Journal of Supercomputing | Ausgabe 7/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The continuous progress in the performance of supercomputers has made possible the understanding of many fundamental problems in science. Simulation, the third scientific pillar, constantly demands more powerful machines to use algorithms that would otherwise be unviable. That will inevitably lead to the deployment of an exascale machine during the next decade. However, fault tolerance is a major challenge that has to be overcome to make such a machine usable. With an unprecedented number of parts, machines at extreme scale will have a small mean-time-between-failures. The popular checkpoint/restart mechanism used in today’s machines may not be effective at that scale. One promising way to revamp checkpoint/restart is to use message-logging techniques. By storing messages during execution and replaying them in case of a failure, message logging is able to shorten recovery time and save a substantial amount of energy. The downside of message logging is that memory footprint may grow to unsustainable levels. This paper presents a technique that decreases the memory pressure in message-logging protocols by only storing the necessary messages in collective-communication operations. We introduce Camel, a protocol that has a low memory overhead for multicast and reduction operations. Our results show that Camel can reduce memory footprint in a molecular dynamics benchmark for more than 95 % on 16,384 cores.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alvisi L, Hoppe B, Marzullo K (1993) Nonblocking and orphan-free message logging protocols. In: FTCS, pp 145–154 Alvisi L, Hoppe B, Marzullo K (1993) Nonblocking and orphan-free message logging protocols. In: FTCS, pp 145–154
2.
Zurück zum Zitat Alvisi L, Marzullo K (1995) Message logging: pessimistic, optimistic, and causal. International conference on distributed computing systems, pp 229–236 Alvisi L, Marzullo K (1995) Message logging: pessimistic, optimistic, and causal. International conference on distributed computing systems, pp 229–236
3.
Zurück zum Zitat Bouteiller A, Bosilca G, Dongarra J (2010) Redesigning the message logging model for high performance. Concurr Comput Pract Exp 22(16):2196–2211CrossRef Bouteiller A, Bosilca G, Dongarra J (2010) Redesigning the message logging model for high performance. Concurr Comput Pract Exp 22(16):2196–2211CrossRef
4.
Zurück zum Zitat Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Collective operations in application-level fault-tolerant MPI. In: Proceedings of the 17th annual international conference on supercomputing, ICS ’03ACM, New York, NY, USA, pp 234–243 Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Collective operations in application-level fault-tolerant MPI. In: Proceedings of the 17th annual international conference on supercomputing, ICS ’03ACM, New York, NY, USA, pp 234–243
5.
Zurück zum Zitat Cappello F (2009) Fault tolerance in petascale/ exascale systems: current knowledge, challenges and research opportunities. IJHPCA 23(3):212–226 Cappello F (2009) Fault tolerance in petascale/ exascale systems: current knowledge, challenges and research opportunities. IJHPCA 23(3):212–226
7.
Zurück zum Zitat Chakravorty S, Kale LV (2007) A fault tolerance protocol with fast fault recovery. In: Proceedings of the 21st IEEE international parallel and distributed processing symposium. IEEE Press Chakravorty S, Kale LV (2007) A fault tolerance protocol with fast fault recovery. In: Proceedings of the 21st IEEE international parallel and distributed processing symposium. IEEE Press
8.
Zurück zum Zitat Chandy KM, Lamport L (1985) Distributed snapshots : determining global states of distributed systems. ACM transactions on computer systems Chandy KM, Lamport L (1985) Distributed snapshots : determining global states of distributed systems. ACM transactions on computer systems
9.
Zurück zum Zitat Elnozahy EN, Bianchini R, El-Ghazawi T, Fox A, Godfrey F, Hoisie A, McKinley K, Melhem R, Plank JS, Ranganathan P, Simons J (2008) System resilience at extreme scale. Defense Advanced Research Project Agency (DARPA), Tech. Rep Elnozahy EN, Bianchini R, El-Ghazawi T, Fox A, Godfrey F, Hoisie A, McKinley K, Melhem R, Plank JS, Ranganathan P, Simons J (2008) System resilience at extreme scale. Defense Advanced Research Project Agency (DARPA), Tech. Rep
10.
Zurück zum Zitat Elnozahy EN, Alvisi L, Wang YM, Johnson DB (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375–408CrossRef Elnozahy EN, Alvisi L, Wang YM, Johnson DB (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375–408CrossRef
11.
Zurück zum Zitat Elnozahy EN, Zwaenepoel W (1992) Manetho: transparent roll back-recovery with low overhead, limited rollback, and fast output commit. IEEE Trans Comput 41(5):526–531. doi:10.1109/12.142678 CrossRef Elnozahy EN, Zwaenepoel W (1992) Manetho: transparent roll back-recovery with low overhead, limited rollback, and fast output commit. IEEE Trans Comput 41(5):526–531. doi:10.​1109/​12.​142678 CrossRef
12.
Zurück zum Zitat Ferreira K, Stearley J, Laros III JH, Oldfield R, Pedretti K, Brightwell R, Riesen R, Bridges PG, Arnold D (2011) Evaluating the viability of process replication reliability for exascale systems. In: Supercomputing, ACM, New York, pp 44:1–44:12 Ferreira K, Stearley J, Laros III JH, Oldfield R, Pedretti K, Brightwell R, Riesen R, Bridges PG, Arnold D (2011) Evaluating the viability of process replication reliability for exascale systems. In: Supercomputing, ACM, New York, pp 44:1–44:12
13.
Zurück zum Zitat Guermouche A, Ropars T, Brunet E, Snir M, Cappello F(2011) Uncoordinated checkpointing without domino effect for send-deterministic MPI applications. In: IPDPS, pp 989–1000 Guermouche A, Ropars T, Brunet E, Snir M, Cappello F(2011) Uncoordinated checkpointing without domino effect for send-deterministic MPI applications. In: IPDPS, pp 989–1000
14.
Zurück zum Zitat Hargrove PH, Duell JC (2006) Berkeley lab checkpoint/restart (BLCR) for linux clusters. In: SciDAC Hargrove PH, Duell JC (2006) Berkeley lab checkpoint/restart (BLCR) for linux clusters. In: SciDAC
15.
Zurück zum Zitat Hursey J, Graham RL (2011) Preserving collective performance across process failure for a fault tolerant MPI. In: Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and PhD forum., IPDPSW ’11IEEE Computer Society, Washington, DC, USA, pp 1208–1215 Hursey J, Graham RL (2011) Preserving collective performance across process failure for a fault tolerant MPI. In: Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and PhD forum., IPDPSW ’11IEEE Computer Society, Washington, DC, USA, pp 1208–1215
16.
Zurück zum Zitat Johnson DB, Zwaenepoel W (1987) Sender-based message logging. In: In digest of papers: 17 annual international symposium on fault-tolerant computing, IEEE Computer Society, pp 14–19 Johnson DB, Zwaenepoel W (1987) Sender-based message logging. In: In digest of papers: 17 annual international symposium on fault-tolerant computing, IEEE Computer Society, pp 14–19
17.
Zurück zum Zitat Jonathan Lifflander EM, Menon H, Miller P, Krishnamoorthy S, Kale L (2014) Scalable replay with partial-order dependencies for message-logging fault tolerance. In: Proceedings of IEEE Cluster 2014. Madrid, Spain Jonathan Lifflander EM, Menon H, Miller P, Krishnamoorthy S, Kale L (2014) Scalable replay with partial-order dependencies for message-logging fault tolerance. In: Proceedings of IEEE Cluster 2014. Madrid, Spain
18.
Zurück zum Zitat Kalé L, Krishnan S (1993) Charm++ : a portable concurrent object oriented system based on C++. In: Proceedings of the conference on object oriented programming systems, languages and applications Kalé L, Krishnan S (1993) Charm++ : a portable concurrent object oriented system based on C++. In: Proceedings of the conference on object oriented programming systems, languages and applications
19.
Zurück zum Zitat Kogge P, Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hiller J, Karp S, Keckler S, Klein D, Lucas R, Richards M, Scarpelli A, Scott S, Snavely A, Sterling T, Williams RS, Yelick K (2008) Exascale computing study: technology challenges in achieving exascale systems Kogge P, Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hiller J, Karp S, Keckler S, Klein D, Lucas R, Richards M, Scarpelli A, Scott S, Snavely A, Sterling T, Williams RS, Yelick K (2008) Exascale computing study: technology challenges in achieving exascale systems
20.
Zurück zum Zitat Meneses E, Bronevetsky G, Kale LV (2011) Evaluation of simple causal message logging for large-scale fault tolerant HPC systems. In: 16th IEEE workshop on dependable parallel, distributed and network-centric systems in 25th IEEE international parallel and distributed processing symposium (IPDPS 2011) Meneses E, Bronevetsky G, Kale LV (2011) Evaluation of simple causal message logging for large-scale fault tolerant HPC systems. In: 16th IEEE workshop on dependable parallel, distributed and network-centric systems in 25th IEEE international parallel and distributed processing symposium (IPDPS 2011)
21.
Zurück zum Zitat Meneses E, Mendes CL, Kale LV (2010) Team-based message logging: preliminary results. In: 3rd workshop on resiliency in high performance computing (Resilience) in clusters, clouds, and grids (CCGRID 2010) Meneses E, Mendes CL, Kale LV (2010) Team-based message logging: preliminary results. In: 3rd workshop on resiliency in high performance computing (Resilience) in clusters, clouds, and grids (CCGRID 2010)
22.
Zurück zum Zitat Meneses E, Ni X, Kale LV (2011) Design and analysis of a message logging protocol for fault tolerant multicore systems. Tech. Rep. 11–30, Parallel Programming Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign Meneses E, Ni X, Kale LV (2011) Design and analysis of a message logging protocol for fault tolerant multicore systems. Tech. Rep. 11–30, Parallel Programming Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign
23.
Zurück zum Zitat Meneses E, Ni X, Zheng G, Mendes CL, Kale LV (2014) Using migratable objects to enhance fault tolerance schemes in supercomputers. In: IEEE transactions on parallel and distributed systems Meneses E, Ni X, Zheng G, Mendes CL, Kale LV (2014) Using migratable objects to enhance fault tolerance schemes in supercomputers. In: IEEE transactions on parallel and distributed systems
25.
Zurück zum Zitat Moody A, Bronevetsky G, Mohror K, de Supinski BR (2010) Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: SC, pp 1–11 Moody A, Bronevetsky G, Mohror K, de Supinski BR (2010) Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: SC, pp 1–11
26.
Zurück zum Zitat Ropars T, Guermouche A, Uçar B, Meneses E, Kalé LV, Cappello F (2011) On the use of cluster-based partial message logging to improve fault tolerance for mpi hpc applications. Euro-Par 1:567–578 Ropars T, Guermouche A, Uçar B, Meneses E, Kalé LV, Cappello F (2011) On the use of cluster-based partial message logging to improve fault tolerance for mpi hpc applications. Euro-Par 1:567–578
28.
Zurück zum Zitat Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in mpich. Int J High Perform Comput Appl 19(1), 49–66 (Spring 2005). doi:10.1177/1094342005051521 Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in mpich. Int J High Perform Comput Appl 19(1), 49–66 (Spring 2005). doi:10.​1177/​1094342005051521​
29.
Zurück zum Zitat Zheng G, Shi L, Kalé LV (2004) FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI. In: 2004 IEEE Cluster, San Diego, CA, pp 93–103 Zheng G, Shi L, Kalé LV (2004) FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI. In: 2004 IEEE Cluster, San Diego, CA, pp 93–103
Metadaten
Titel
Camel: collective-aware message logging
verfasst von
Esteban Meneses
Laxmikant V. Kalé
Publikationsdatum
01.07.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 7/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1402-3

Weitere Artikel der Ausgabe 7/2015

The Journal of Supercomputing 7/2015 Zur Ausgabe