skip to main content
article

On checkpointing and heavy-tails in unreliable computing environments

Published:01 September 2006Publication History
Skip Abstract Section

Abstract

In this paper, we discuss checkpointing issues that should be considered whenever jobs execute in unreliable computing environments. Specifically, we show that if proper check-pointing procedures are not properly implemented, then under certain conditions, job completion time distributions exhibit properties of heavy-tail or power-tail distributions (hereafter referred to as power-tail distributions (PT), which can lead to highly-variable and long completion times.

References

  1. P. Fiorini, R. Sheahan, and L. Lipsky, "On Unreliable Computing Systems When Heavy-Tails Appear as a Result of The Recovery Procedure," to appear in ACM Sigmetrics Performance Evaluation Review, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Greiner, M. Jobmann, and L. Lipsky, "The Importance of Power-Tail Distributions for Modeling Queueing Systems," Operations Research, Vol. 47, No. 2, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Harchol-Balter, Mark Crovella, and Christina Murta, "On Choosing a Task Assignment Policy for a Distributed Server System," Journal of Parallel and Distributed Computing, pp. 204--228, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Leland and T. Ott, "Load-Balancing Heuristics and Process Behavior," in Proceedings of Performance and ACM Sigmetrics, pp. 54--69, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Plank and W. Elwasif, "Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems," in proceedings of The 28th International Symposium on Fault-Tolerant Computing, Munich, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On checkpointing and heavy-tails in unreliable computing environments

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGMETRICS Performance Evaluation Review
              ACM SIGMETRICS Performance Evaluation Review  Volume 34, Issue 2
              September 2006
              30 pages
              ISSN:0163-5999
              DOI:10.1145/1168134
              Issue’s Table of Contents

              Copyright © 2006 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 September 2006

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader