article

On unreliable computing systems when heavy-tails appear as a result of the recovery procedure

Authors:
Pierre M. Fiorini

University of Southern Maine, Portland, ME

University of Southern Maine, Portland, ME
View Profile

,
Robert Sheahan

University of Connecticut, Storrs, CT

University of Connecticut, Storrs, CT
View Profile

,
Lester Lipsky

University of Connecticut, Storrs, CT

University of Connecticut, Storrs, CT
View Profile

ACM SIGMETRICS Performance Evaluation Review Volume 33 Issue 2September 2005pp 15–17https://doi.org/10.1145/1101892.1101898

Published:01 September 2005Publication History

ACM SIGMETRICS Performance Evaluation Review

Abstract

For some computing systems, failure is rare enough that it can be ignored. In other systems, failure is so common that how to handle it can have a significant impact on the performance of the system. There are many different recovery schemes for tasks, however, they can be classified into three broad categories: 1) Resume: when a task fails, it knows exactly where it stops and can continue at that point when allowed to resume (i.e., preemptive resume - prs); 2) Replace: when a task fails, then later when the processor continues, it begins with a brand new task (i.e., preemptive repeat different prd); and, 3) Restart: when a task fails it loses all work done to that point and must start anew upon continuing later (i.e., preemptive repeat identical - pri).In this paper, assuming a computing system is unreliable, we discuss how heavy-tail (hereafter referred to as power-tail - PT) distributions can appear in a job's task stream given the Restart recovery procedure. This is an important consideration since it is known that power-tails can lead to unstable systems [4], We then demonstrate how to obtain performance and dependablity measures for a class of computing systems comprised of P unreliable processors and a finite number of tasks N given the above recovery procedures.

References

A. Bobbio and K. Trivedi, "Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable", Communications in Statistics - Stochastic Models, 1990.Google Scholar
M. Greiner, M. Jobmann, and L. Lipsky, "The Importance of Power-Tail Distributions for Modeling Queueing Systems," Operations Research, 47(2), 1999. Google ScholarDigital Library
V. Kulkarni, V. Nicola, and K. Trivedi, "The Completion Time of a Job on a Multmode System," Advances in Applied Probability, 19:932--954, 1987.Google ScholarCross Ref
L. Lipsky, Queueing Theory: A Linear Algebraic Approach, MacMillan and Company, New York, 1992.Google Scholar

Index Terms

On unreliable computing systems when heavy-tails appear as a result of the recovery procedure

Recommendations

On checkpointing and heavy-tails in unreliable computing environments

In this paper, we discuss checkpointing issues that should be considered whenever jobs execute in unreliable computing environments. Specifically, we show that if proper check-pointing procedures are not properly implemented, then under certain ...
Read More
Non-FIFO checkpointing and rollback recovery for distributed computing systems
Read More
Quasi-synchronous checkpointing and failure recovery in distributed systems
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGMETRICS Performance Evaluation Review Volume 33, Issue 2
Special issue on the workshop on MAthematical performance Modeling And Analysis (MAMA 2005)
September 2005
43 pages
ISSN:0163-5999
DOI:10.1145/1101892
Issue’s Table of Contents

Copyright © 2005 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2005
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 152
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On unreliable computing systems when heavy-tails appear as a result of the recovery procedure

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

On checkpointing and heavy-tails in unreliable computing environments

Non-FIFO checkpointing and rollback recovery for distributed computing systems

Quasi-synchronous checkpointing and failure recovery in distributed systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On unreliable computing systems when heavy-tails appear as a result of the recovery procedure

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

On checkpointing and heavy-tails in unreliable computing environments

Non-FIFO checkpointing and rollback recovery for distributed computing systems

Quasi-synchronous checkpointing and failure recovery in distributed systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media