Skip to main content
Top

2014 | OriginalPaper | Chapter

Equidistant Checkpoint Placement for Checkpointing and Rollback Recovery

Authors : Zhenpeng Xu, Weiwei Li, Jinyong Yin

Published in: Unifying Electrical Engineering and Electronics Engineering

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

To derive the proper equidistant checkpoint interval for log-based checkpointing and rollback recovery mechanism, a directed state transition model of the system execution is presented under the assumption that the inter-failure time follows the exponential distribution. Various related essential factors are considered synthetically in this model. Combined with Laplace transform, the fault-tolerant overhead ratio is derived by evaluating the expected total execution overhead of a single checkpoint interval. Finally, the optimal equidistant checkpoint interval can be obtained. The metrics show that the derived formula is more practical to determine the checkpoint placement for log-based fault-tolerant performance optimization and the degenerated formula agrees with the previous model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Elnozahy EN, Alvisi L, Wang YM et al (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375–408CrossRef Elnozahy EN, Alvisi L, Wang YM et al (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375–408CrossRef
2.
go back to reference Lin JW, Kuo SY (2000) A new log-based approach for independent recovery in distributed shared memory systems. J Inform Sci Eng 16(2):271–290MathSciNet Lin JW, Kuo SY (2000) A new log-based approach for independent recovery in distributed shared memory systems. J Inform Sci Eng 16(2):271–290MathSciNet
3.
go back to reference Park T, Woo N, Yeom HY (2003) An efficient recovery scheme for mobile computing environments. Future Gener Comput Syst 19:37–53MATHCrossRef Park T, Woo N, Yeom HY (2003) An efficient recovery scheme for mobile computing environments. Future Gener Comput Syst 19:37–53MATHCrossRef
4.
go back to reference Chandy KM, Browne JC, Dissly CW et al (1975) Analytic models for rollback and recovery strategies in data base systems. IEEE Trans Software Eng 1:100–110CrossRef Chandy KM, Browne JC, Dissly CW et al (1975) Analytic models for rollback and recovery strategies in data base systems. IEEE Trans Software Eng 1:100–110CrossRef
6.
go back to reference Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17(9):530–531MATHCrossRef Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17(9):530–531MATHCrossRef
7.
go back to reference L’Ecuyer P, Malenfant J (1988) Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Trans Comput 37(4):491–496CrossRef L’Ecuyer P, Malenfant J (1988) Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Trans Comput 37(4):491–496CrossRef
8.
go back to reference Daly J (2003) A model for predicting the optimum checkpoint interval for restart dumps. LNCS 2660(4):3–12MathSciNet Daly J (2003) A model for predicting the optimum checkpoint interval for restart dumps. LNCS 2660(4):3–12MathSciNet
9.
go back to reference Daly JT (2004) A strategy for running large scale applications based on a model that optimizes the checkpoint interval for restart dumps. In: Proc. 26th international conf. on software engineering. Edinburgh, Scotland, UK, pp 70–74 Daly JT (2004) A strategy for running large scale applications based on a model that optimizes the checkpoint interval for restart dumps. In: Proc. 26th international conf. on software engineering. Edinburgh, Scotland, UK, pp 70–74
10.
go back to reference Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener Comput Syst 22:303–312CrossRef Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener Comput Syst 22:303–312CrossRef
11.
go back to reference Vaidya NH (1997) Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Trans Comput 46(8):942–947CrossRef Vaidya NH (1997) Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Trans Comput 46(8):942–947CrossRef
12.
go back to reference Ziv A, Bruck J (1997) Performance optimization of checkpointing schemes with task duplication. IEEE Trans Comput 46(12):1381–1386MathSciNetCrossRef Ziv A, Bruck J (1997) Performance optimization of checkpointing schemes with task duplication. IEEE Trans Comput 46(12):1381–1386MathSciNetCrossRef
14.
go back to reference Ling Y, Mi J, Lin X (2001) A variational calculus approach to optimal checkpoint placement. IEEE Trans Comput 50(7):699–707CrossRef Ling Y, Mi J, Lin X (2001) A variational calculus approach to optimal checkpoint placement. IEEE Trans Comput 50(7):699–707CrossRef
15.
go back to reference Ozaki T, Dohi T, Okamura H (2006) Distribution-free checkpoint placement algorithms based on min-max principle. IEEE Trans Dependable Secure Comput 3(2):130–140CrossRef Ozaki T, Dohi T, Okamura H (2006) Distribution-free checkpoint placement algorithms based on min-max principle. IEEE Trans Dependable Secure Comput 3(2):130–140CrossRef
16.
go back to reference Dohi T, Ozaki T, Kaio N (2006) Optimal checkpoint placement with equality constraints. In: Proc. 2nd IEEE international symposium on dependable, autonomic and secure computing, DASC 2006. pp 77–84 Dohi T, Ozaki T, Kaio N (2006) Optimal checkpoint placement with equality constraints. In: Proc. 2nd IEEE international symposium on dependable, autonomic and secure computing, DASC 2006. pp 77–84
17.
go back to reference Ozaki T, Dohi T, Kaio N (2009) Numerical computation algorithms for sequential checkpoint placement. Perform Eval 66:311–326CrossRef Ozaki T, Dohi T, Kaio N (2009) Numerical computation algorithms for sequential checkpoint placement. Perform Eval 66:311–326CrossRef
18.
go back to reference Liu Y, Nassa R, Leangsuksun C et al (2007) A reliability-aware approach for an optimal checkpoint/restart model in HPC environments. In: Proc. 2007 I.E. international conf. on cluster computing. pp 452–457 Liu Y, Nassa R, Leangsuksun C et al (2007) A reliability-aware approach for an optimal checkpoint/restart model in HPC environments. In: Proc. 2007 I.E. international conf. on cluster computing. pp 452–457
Metadata
Title
Equidistant Checkpoint Placement for Checkpointing and Rollback Recovery
Authors
Zhenpeng Xu
Weiwei Li
Jinyong Yin
Copyright Year
2014
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-4981-2_243