Multi-objective energy-efficient workflow scheduling using list-based heuristics
Introduction
Precedence-constrained parallel applications, also known as workflows, are a popular paradigm used by scientists for modelling large applications. Most of these applications have to deliver results as fast as possible and thus require parallel or distributed computation for decreasing their execution time. A major challenge in this case is to achieve the “best” schedule of the workflow tasks onto the available resources that minimises its execution time—a well-known NP-complete problem. Nowadays, high-performance parallel computing is available to almost any scientist or researcher. Along with their many advantages, these facilities introduce new challenges such as the considerable amount of energy required in today’s data centres. Besides the green implications of energy savings, Hamilton [1] reported that the financial expenditure for the energy consumption of Amazon’s data centres in a period of fifteen years accounts for 19% of its total budget and that 23% of the same budget represents the cost of maintaining the cooling infrastructure. Under these circumstances, energy efficiency is becoming an important objective for computing infrastructure providers due to its environmental and financial implications.
Reducing the energy consumption while maintaining the quality of the service in terms of performance has therefore become a major research challenge. The existing works mainly address this issue through the use of Dynamic Voltage and Frequency Scaling (DVFS), which enables online adjustment to voltage and frequency in CMOS circuits [2], [3], [4]. By applying this technique to different CPUs in a heterogeneous parallel system, it is possible to have resources working at their highest speed, and implicitly at peak energy consumption, while executing time-critical tasks. Conversely, they will be tuned to work at a lower speed with lower energy consumption while executing non-critical tasks. In such systems, the user is confronted with a set of resources working at different speeds and with different characteristic energy consumptions.
In these circumstances, workflow scheduling must be formulated as a multi-objective optimisation problem (MOP) aiming at optimising two possibly conflicting criteria: makespan and energy consumption for executing the workflow. The main characteristic of MOPs is that no single solution exists that is optimal with respect to all objectives, but a set of tradeoff solutions known as Pareto set or Pareto front, depending on if we refer to the domain or co-domain of the functions to be optimised. The property of the solutions on the Pareto set/front is that they cannot be simultaneously improved with respect to all objectives. In our particular case, it is likely that no single schedule simultaneously minimises makespan and energy consumption. Instead, there will be a set of solutions with a different tradeoff between makespan and energy consumption.
In this paper, we present a holistic approach for an energy-efficient workflow scheduler for heterogeneous systems, capable of computing a set of tradeoff solutions in a single run. Our method called MOHEFT is an extension of the Heterogeneous Earliest Finish Time (HEFT) algorithm [5], one of the most popular algorithms for workflow scheduling. HEFT is based on the assumption that the execution time of each workflow task is known on each resource, which is not always realistic for highly dynamic distributed systems consisting of a large number of multi-core processors sharing resources (e.g. bus, caches) and affected by the external load caused by other concurrent tasks. To overcome this drawback, MOHEFT relies on empirical models for execution time of workflow tasks and their entailed energy consumption. These models are based on the knowledge extracted from historical executions, reflecting the behaviour of real multi-core CPUs with different levels of energy consumption depending on the number of cores used and their individual level of utilisation, and predicting it for new unseen resources. While we do look into the external load generated by different workflow tasks, we do not consider multi-tenancy in our approach, as scientific applications, including workflows, are typically highly specialised and customised applications owned by individual scientists and tuned for their particular temporary research needs.
The contributions of this paper in the area of energy-efficient scheduling are as follows.
- 1.
Identification of fine-grained levels of energy consumption in a multi-core CPU. Thorough extensive experimentation, we measure the energy consumption and performance of different multi-core CPUs with different number of cores used. The experiments show that the performance of individual cores decreases with the concurrent number of cores used, while the energy efficiency increases.
- 2.
Use of empirical models for energy consumption and performance based on real data. We build neural network-based empirical models for time and energy consumption of CPU-intensive tasks based on historical executions on heterogeneous sets of machines. Our approach considers external load coming from tasks of the same as the one being modelled; however, we plan in future work to extend it with training information about the types of tasks defining the external load.
- 3.
A multi-objective energy-efficient scheduling algorithm. We present an algorithm capable of computing workflow schedules representing tradeoffs between energy consumption and makespan. We prove thorough an extensive experimentation that our algorithm computes solutions with the same or shorter makespan than HEFT and with lower energy consumption than HEFT and greenHEFT, an ad-hoc greedy algorithm for energy optimisation. MOHEFT also outperforms the most popular multi-objective optimisation algorithm, NSGA-II [6], computing solutions with shorter makespan and lower energy consumption.
- 4.
Analysis of the impact of different workflow characteristics and different resources on the tradeoff solutions. We carry out a thorough evaluation of our proposal under different circumstances: number of tasks, homogeneous resources in terms of speed and energy consumption, or heterogeneous resources. Our aim is to analyse the performance of MOHEFT versus other algorithms and the impact of different scenarios on the computed tradeoff solutions.
The rest of this paper is structured as follows. The next section formally describes our problem and introduces some background on multi-objective optimisation. Related works on energy modelling, multi-objective workflow scheduling and energy-efficient scheduling are analysed in Section 4. After that, we describe the modelling approach followed in this paper. Section 6 introduces MOHEFT. The evaluation of our algorithm and the analysis of the obtained results is included in Section 7. Finally, we present the main conclusions and future work in Section 8.
Section snippets
Formalism
Our problem consists in scheduling a workflow application’s tasks on a set of available heterogeneous resources in such a way that the makespan and the energy consumption of its execution are minimised. We introduce in the remainder of this section a simple but realistic formalism that defines the workflow, resource environment, and the metrics targeted.
Multi-objective optimisation
We introduce in this section a few concepts from the multi-objective optimisation theory for a better understanding of this work. We assume without loss of generality that minimisation is the goal for all objectives.
A multi-objective optimisation problem can be formally defined as finding all vectors which minimise the vector function . For our problem, represents the cardinality of the task set , and the -th component of a solution the
Related work
We review the related work in three fields touched by our approach: energy modelling, multi-objective workflow scheduling, and energy-efficient scheduling.
Applicability of existing models
In this section we show that the existing models reviewed in Section 4.1 are not applicable in the context of modern multi-core architectures. For this purpose, we conduct an extensive experiment aiming to prove that: (1) the performance of individual cores is impacted by the workload of the other cores; (2) the energy consumption of a multi-core CPU may vary among a multitude of different levels at any instant ; and (3) the energy consumption overhead for powering on a core decreases with
MOHEFT: multi-objective heterogeneous earliest finish time
In this section we describe our proposed multi-objective scheduling algorithm for computing a set of tradeoff solutions (instead of a single one) as an extension to the HEFT list scheduling algorithm. For a better understanding, we start by describing the mono-objective version of the algorithm and extend it afterwards for dealing with multiple objectives. Finally, we present also greenHEFT, a heft-like heuristic for minimising the energy consumption.
Evaluation
We run experiments for evaluating the MOHEFT solutions, aiming at
- •
demonstrating that they are at least as good as those delivered by the mono-objective HEFT for makespan and greenHEFT for energy;
- •
demonstrating that MOHEFT computes higher quality tradeoff solutions than classical multi-objective optimisation algorithms;
- •
analysing the results of MOHEFT for
- –
workflows with different shapes and number of activities;
- –
reduced number of resources;
- –
different types of resources in terms of clock frequency
- –
Conclusions and future work
We tackled in this paper the problem of energy-efficient workflow scheduling in heterogeneous parallel systems. We proposed a novel multi-objective list-based workflow scheduling heuristic called MOHEFT, capable of computing tradeoff solutions between makespan and energy consumption. This algorithm relies on empirical models for predicting the energy consumption and execution time of workflow activities, designed based on historical data collected from real workflow task executions. We compared
Juan J. Durillo received the M.S. and Ph.D. degrees in computer science from the University of Málaga, Spain, in 2006 and 2011, respectively. Currently, he is working as post-doctoral researcher in the Department of Computer Science, University of Innsbruck, Austria. Dr. Durillo has authored more than ten science papers in international journals and has contributed to more than 20 papers in international conferences or book chapters. His research interests include multi-objective optimisation,
References (30)
- et al.
A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems
Journal of Parallel and Distributed Computing
(2011) - et al.
Power efficient rate monotici scheduling for multi-core systems
Journal of Parallel and Distributed Computing
(2012) - et al.
jMetal: a java framework for multi-objective optimization
Advances in Engineering Software
(2011) - James Hamilton, Cooperative expendable micro-slice servers (CEMS): low cost, low power servers for Internet-scale...
- et al.
Impact of process variations on multicore performance symmetry
- T. Burd, T. Pering, A. Stratakos, R. Brodersen, A dynamic voltage scaled microprocessor system, in: Solid-State...
- et al.
Low-power CMOS digital design
IEEE Journal of Solid-State Circuits
(1992) - et al.
Performance-effective and low-complexity task scheduling for heterogeneous computing
IEEE Transactions on Parallel and Distributed Systems
(2002) - et al.
A fast elitist multi-objective genetic algorithm: NSGA-II
IEEE Transactions on Evolutionary Computation
(2000) - J. Knowles, L. Thiele, E. Zitzler, A tutorial on the performance assessment of stochastic multiobjective optimizers,...
Scheduling parallel applications on utility grids: time and cost trade-off management
Low complexity performance effective task scheduling algorithm for heterogeneous computing environments
Journal of Computer Science
Reliability-oriented genetic algorithm for workflow applications using max–min strategy
A bi-criteria scheduling heuristics for distributed embedded systems under reliability and real-time constraints
Multi-objective planning for workflow execution on grids
Cited by (104)
A two-stage preference driven multi-objective evolutionary algorithm for workflow scheduling in the Cloud
2024, Expert Systems with ApplicationsKnowledge-driven adaptive evolutionary multi-objective scheduling algorithm for cloud workflows
2023, Applied Soft ComputingScheduling energy consumption-constrained workflows in heterogeneous multi-processor embedded systems
2023, Journal of Systems ArchitectureAn Analytical Review and Performance Measures of State-of-Art Scheduling Algorithms in Heterogenous Computing Enviornment
2024, Archives of Computational Methods in EngineeringA critical review on energy efficient scheduling techniques in cloud computing
2023, AIP Conference ProceedingsDecision variable contribution based adaptive mechanism for evolutionary multi-objective cloud workflow scheduling
2023, Complex and Intelligent Systems
Juan J. Durillo received the M.S. and Ph.D. degrees in computer science from the University of Málaga, Spain, in 2006 and 2011, respectively. Currently, he is working as post-doctoral researcher in the Department of Computer Science, University of Innsbruck, Austria. Dr. Durillo has authored more than ten science papers in international journals and has contributed to more than 20 papers in international conferences or book chapters. His research interests include multi-objective optimisation, parallel and distributed computing, and search-based software engineering.
Vlad Nae is currently a Post-doctoral Fellow at the Institute of Computer Science, University of Innsbruck. He received his Diploma Engineer degree in 2006 from the Politehnica University of Bucharest, Romania. In December 2011 he received a Ph.D. in Computer Science from the University of Innsbruck. His research interests are energy efficiency and resource management in distributed systems, particularly in the area of highly interactive applications such as massively multiplayer online games. He is the author of 19 publications including four journal articles and three book chapters.
Radu Prodan received the Ph.D. degree from the Vienna University of Technology, Vienna, Austria, in 2004. He is currently associate professor at the Institute of Computer Science, University of Innsbruck, Austria. His research in the area of parallel and distributed systems comprises programming methods, compiler technology, performance analysis and scheduling. He participated in several national and European projects. He is currently coordinating the Austrian projects and was workpackage Leader in the IST-034601 (edutain@grid) and 26185 (SHIWA) projects. He is the author of more than 70 journal and conference publications and one book. He was the recipient of an IEEE Best Paper Award. He is a member of the ACM.