Skip to main content
Top

2019 | Book

Fair Scheduling in High Performance Computing Environments

insite
SEARCH

About this book

This book introduces a new scheduler to fairly and efficiently distribute system resources to many users of varying usage patterns compete for them in large shared computing environments. The Rawlsian Fair scheduler developed for this effort is shown to boost performance while reducing delay in high performance computing workloads of certain types including the following four types examined in this book:

i. Class A – similar but complementary workloads

ii. Class B – similar but steady vs intermittent workloads

iii. Class C – Large vs small workloads

iv. Class D – Large vs noise-like workloads

This new scheduler achieves short-term fairness for small timescale demanding rapid response to varying workloads and usage profiles. Rawlsian Fair scheduler is shown to consistently benefit workload Classes C and D while it only benefits Classes A and B workloads where they become disproportionate as the number of users increases.

A simulation framework, dSim, simulates the new Rawlsian Fair scheduling mechanism. The dSim helps achieve instantaneous fairness in High Performance Computing environments, effective utilization of computing resources, and user satisfaction through the Rawlsian Fair scheduler.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
While some industries use High Performance Computing (HPC) to increase profits, other industries require HPC to do business at all. In the oil and gas industries, HPC is used to conduct seismic simulations. In the pharmaceutical industry, HPC is used to discover new drugs.
Art Sedighi, Milton Smith
Chapter 2. Financial Market Risk
Abstract
In the first 5 min (9:30–9:35 AM EST) after the market opened on Friday, June 24, 2016, the trading volume of the Dow Jones Industrial Average reached 5.71 million shares; by the closing minute (4:00 PM), the volume was over 63 million shares (Table 2.1). Over the course of the day, a total of 5.2 million trades were processed by the New York Stock Exchange (NYSE), and over five million of these were small trades of 1–2000 shares.
Art Sedighi, Milton Smith
Chapter 3. Scheduling in High Performance Computing
Abstract
This chapter provides background information relevant to our objectives:
Art Sedighi, Milton Smith
Chapter 4. Fairshare Scheduling
Abstract
Many scheduling systems use fair-share or proportional-fair-share algorithms (Kay and Lauder 1988). Fair-share schedulers were initially designed to manage the time allocations of processors in uniprocessor systems with workloads consisting of long-running, computer-bound processes (Kleban and Clearwater 2003). Each user was assigned a time slot on a machine (i.e. a mainframe), and in this time slot, the user’s job was the highest priority. If there were any other jobs, they were stopped and restarted at a later time.
Art Sedighi, Milton Smith
Chapter 5. Multi-Criteria Scheduling: A Mathematical Model
Abstract
This section details a mathematical model that further explains the multi-criteria scheduling mechanism outlined in the main thesis. The traditional scheduling systems covered in Chap. 3 typically use two primary variables: load and priority. To avoid the limitations faced by such schedulers, the scheduler presented in our work uses as a third independent variable—seniority—to determine the order in which tasks are scheduled for execution. Unlike load and priority, seniority is system-calculated and changes over time. This aspect of this tertiary variable is also not modeled in scheduling systems.
Art Sedighi, Milton Smith
Chapter 6. Simulation and Methodology
Abstract
As outlined in 1.3, our work assesses the performance characteristics of a multi-criteria scheduler that uses seniority as well as priority and load is to make decisions. It does so by using various simulation models to test the scheduling algorithm. Later in this book, the simulator used to conduct these simulations (dSim) is introduced.
Art Sedighi, Milton Smith
Chapter 7. DSIM
Abstract
dSim improves upon Alea 3.0 (Klusacek 11/17/2014; Klusáček and Rudová 2010) and GridSim (Albin et al. 2007; Buyya and Murshed 2002). GridSim is a discrete, event-based toolkit that uses the Java language (McGill 2008) (p.56–75). It allows components to be modelled and simulated in parallel environments, and it can be used to evaluate scheduling algorithms. It can also create different classes of heterogeneous resources and enables such resources to be managed using a user-defined scheduling algorithm. Even though GridSim allows environments to be managed via a graphical user interface, GridSim’s framework was of greatest interest because it allowed us to customize our interactions with the environment. Alea expands upon the GridSim framework, enabling the evaluation of various scheduling techniques. dSim extends the framework created by Alea 3.0, enabling scenarios to be automated and simulated by varying the number of users, the types of job profiles, and the number of resources. Most importantly, it includes a simulator clock. dSim uses a (bucket) Placement Counter (PC) to keep all of the tasks synchronized in buckets. Each tick of the PC is a new bucket, in which events like task submissions, gatherings of results, and scheduling events are synchronized. The duration of a single PC tick was set at 1000 ms (1 s.).
Art Sedighi, Milton Smith
Chapter 8. Simulation Scenarios
Abstract
Table 8.1 outlines the seventeen simulated scenarios run in our work. The goal of these simulations was to simulate the model proposed in Sect. 8.2. These simulations cover the four classes of job profiles discussed in Sect 1.​2.
Art Sedighi, Milton Smith
Chapter 9. Overview of Results
Abstract
Chapter 8 outlined 17 cases simulated using dSim. Both the traditional, fair-share scheduling algorithm and the Rawlsian Fair scheduling algorithm developed for our work were used in these simulations. Table 9.1 outlines the different classes of simulations and their purposes.
Art Sedighi, Milton Smith
Chapter 10. Class A Results and Analysis
Abstract
Class A simulations describe situations featuring intermittent workloads. In Simulation2, two users submit equal but complementary workloads. As the number of users increases—to 6 in Simulation6, 11 in Simulation10, and 21 in Simulation14—one user’s submission rate stays constant while the workload from the second user (5000 tasks) is distributed among the rest of the users in the system. In Simulation6, users 1–5 submit 1000 tasks each and 5000 total, while user 6 submits 5000 tasks on their own. In Simulation10, users 1–10 submit 500 tasks each, while user 11 submits 5000 tasks on their own. In Simulation14, users 1–20 submit 250 tasks each, while user 21 submits 5000 tasks on their own.
Art Sedighi, Milton Smith
Chapter 11. Class B Results and Analysis
Abstract
Class B simulations describe situations in which one user submits an intermittent workload while another user or other users submits a steady workload. In simulation 2, two users submit workloads that are steady and equal in the number of tasks, while one user submits an intermittent workload. As the number of users increases—to 6 in simulation 7, 11 in simulation 11, and 21 in simulation 15—one user’s workload remains constant at an intermittent rate of 200 tasks for every two PCs up to 5000 tasks.
Art Sedighi, Milton Smith
Chapter 12. Class C Results and Analysis
Abstract
Class C simulations describe cases featuring a small user and one or more disproportionately large users. In Simulation 4, one user submits a large workload (5000), and another user submits a 50-task, one-time workload at a later time. As the number of users increases—to 6 in Simulation 8, 11 in Simulation 12, and 21 in Simulation 16—the smaller user’s workload remains constant while the larger users’ workloads are distributed to one or more other users. In Simulation 8, users 1–5 submit 1000 tasks each and 5000 together. In Simulation 12, users 1–10 submit 500 tasks each, and user 11 submits 5000 tasks on their own. In Simulation 16, users 1–20 submit 250 tasks each.
Art Sedighi, Milton Smith
Chapter 13. Class D Results and Simulations
Abstract
Class D workloads represent cases in which one user submits a small, noise-like workload, and one or more disproportionately larger users submit much larger workload. In Simulation 5, one user submits a large workload with 5000 tasks, and another user submits 1 task for every two PCs up to a total of 10 tasks. This submission pattern continues, with 6 users in Simulation 9, 11 users in Simulation 13, and 21 users in Simulation 17. In Simulation 9, users 1–5 submit 1000 tasks each for a total of 5000 tasks. In Simulation 13, users 1–10 submit 500 tasks each, while user 11 submits 5000 tasks on their own. In Simulation 17, users 1–20 submit 250 tasks each.
Art Sedighi, Milton Smith
Chapter 14. Conclusion
Abstract
Our work introduced a new scheduler—Rawlsian Fair—which employs Rawls’s theory of fairness in scheduling malleable tasks in High Performance Computing environments. This scheduler assigns precedence to the least-well-off user via a new parameter: seniority. Rawlsian Fair performed up to 17× better than did traditional fair-share in scheduling scenarios featuring users with disproportionate task-submission profiles.
Art Sedighi, Milton Smith
Backmatter
Metadata
Title
Fair Scheduling in High Performance Computing Environments
Authors
Art Sedighi
Milton Smith
Copyright Year
2019
Electronic ISBN
978-3-030-14568-2
Print ISBN
978-3-030-14567-5
DOI
https://doi.org/10.1007/978-3-030-14568-2

Premium Partner