skip to main content
article

On unreliable computing systems when heavy-tails appear as a result of the recovery procedure

Published:01 September 2005Publication History
Skip Abstract Section

Abstract

For some computing systems, failure is rare enough that it can be ignored. In other systems, failure is so common that how to handle it can have a significant impact on the performance of the system. There are many different recovery schemes for tasks, however, they can be classified into three broad categories: 1) Resume: when a task fails, it knows exactly where it stops and can continue at that point when allowed to resume (i.e., preemptive resume - prs); 2) Replace: when a task fails, then later when the processor continues, it begins with a brand new task (i.e., preemptive repeat different prd); and, 3) Restart: when a task fails it loses all work done to that point and must start anew upon continuing later (i.e., preemptive repeat identical - pri).In this paper, assuming a computing system is unreliable, we discuss how heavy-tail (hereafter referred to as power-tail - PT) distributions can appear in a job's task stream given the Restart recovery procedure. This is an important consideration since it is known that power-tails can lead to unstable systems [4], We then demonstrate how to obtain performance and dependablity measures for a class of computing systems comprised of P unreliable processors and a finite number of tasks N given the above recovery procedures.

References

  1. A. Bobbio and K. Trivedi, "Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable", Communications in Statistics - Stochastic Models, 1990.Google ScholarGoogle Scholar
  2. M. Greiner, M. Jobmann, and L. Lipsky, "The Importance of Power-Tail Distributions for Modeling Queueing Systems," Operations Research, 47(2), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Kulkarni, V. Nicola, and K. Trivedi, "The Completion Time of a Job on a Multmode System," Advances in Applied Probability, 19:932--954, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Lipsky, Queueing Theory: A Linear Algebraic Approach, MacMillan and Company, New York, 1992.Google ScholarGoogle Scholar

Index Terms

  1. On unreliable computing systems when heavy-tails appear as a result of the recovery procedure

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGMETRICS Performance Evaluation Review
              ACM SIGMETRICS Performance Evaluation Review  Volume 33, Issue 2
              Special issue on the workshop on MAthematical performance Modeling And Analysis (MAMA 2005)
              September 2005
              43 pages
              ISSN:0163-5999
              DOI:10.1145/1101892
              Issue’s Table of Contents

              Copyright © 2005 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 September 2005

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader