Abstract
In this paper, we discuss checkpointing issues that should be considered whenever jobs execute in unreliable computing environments. Specifically, we show that if proper check-pointing procedures are not properly implemented, then under certain conditions, job completion time distributions exhibit properties of heavy-tail or power-tail distributions (hereafter referred to as power-tail distributions (PT), which can lead to highly-variable and long completion times.
- P. Fiorini, R. Sheahan, and L. Lipsky, "On Unreliable Computing Systems When Heavy-Tails Appear as a Result of The Recovery Procedure," to appear in ACM Sigmetrics Performance Evaluation Review, 2005. Google ScholarDigital Library
- M. Greiner, M. Jobmann, and L. Lipsky, "The Importance of Power-Tail Distributions for Modeling Queueing Systems," Operations Research, Vol. 47, No. 2, 1999. Google ScholarDigital Library
- M. Harchol-Balter, Mark Crovella, and Christina Murta, "On Choosing a Task Assignment Policy for a Distributed Server System," Journal of Parallel and Distributed Computing, pp. 204--228, 1999. Google ScholarDigital Library
- W. Leland and T. Ott, "Load-Balancing Heuristics and Process Behavior," in Proceedings of Performance and ACM Sigmetrics, pp. 54--69, 1986. Google ScholarDigital Library
- J. Plank and W. Elwasif, "Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems," in proceedings of The 28th International Symposium on Fault-Tolerant Computing, Munich, 1998. Google ScholarDigital Library
Index Terms
- On checkpointing and heavy-tails in unreliable computing environments
Recommendations
Heavy Tails in Multi-Server Queue
In this paper, the asymptotic behaviour of the distribution tail of the stationary waiting time W in the GI/GI /2 FCFS queue is studied. Under subexponential-type assumptions on the service time distribution, bounds and sharp asymptotics are given ...
On unreliable computing systems when heavy-tails appear as a result of the recovery procedure
Special issue on the workshop on MAthematical performance Modeling And Analysis (MAMA 2005)For some computing systems, failure is rare enough that it can be ignored. In other systems, failure is so common that how to handle it can have a significant impact on the performance of the system. There are many different recovery schemes for tasks, ...
Heavy Tails: The Effect of the Service Discipline
TOOLS '02: Proceedings of the 12th International Conference on Computer Performance Evaluation, Modelling Techniques and ToolsThis paper considers the M/G/ 1 queue with regularly varying service requirement distribution. It studies the effect of the service discipline on the tail behavior of the waiting- or sojourn time distribution, demonstrating that different disciplines ...
Comments