Multi-objective energy-efficient workflow scheduling using list-based heuristics

https://doi.org/10.1016/j.future.2013.07.005Get rights and content

Highlights

  • Mono-objective greedy schedulers perform poorly in terms of energy efficiency.

  • MOHEFT outperforms HEFT and greenHEFT in both makespan and energy consumption.

  • MOHEFT increases energy efficiency with small hits on makespan.

  • Energy consumption reduced by up to 34.5% with only 2% makespan overhead.

Abstract

Workflow applications are a popular paradigm used by scientists for modelling applications to be run on heterogeneous high-performance parallel and distributed computing systems. Today, the increase in the number and heterogeneity of multi-core parallel systems facilitates the access to high-performance computing to almost every scientist, yet entailing additional challenges to be addressed. One of the critical problems today is the power required for operating these systems for both environmental and financial reasons. To decrease the energy consumption in heterogeneous systems, different methods such as energy-efficient scheduling are receiving increasing attention. Current schedulers are, however, based on simplistic energy models not matching the reality, use techniques like DVFS not available on all types of systems, or do not approach the problem as a multi-objective optimisation considering both performance and energy as simultaneous objectives. In this paper, we present a new Pareto-based multi-objective workflow scheduling algorithm as an extension to an existing state-of-the-art heuristic capable of computing a set of tradeoff optimal solutions in terms of makespan and energy efficiency. Our approach is based on empirical models which capture the real behaviour of energy consumption in heterogeneous parallel systems. We compare our new approach with a classical mono-objective scheduling heuristic and state-of-the-art multi-objective optimisation algorithm and demonstrate that it computes better or similar results in different scenarios. We analyse the different tradeoff solutions computed by our algorithm under different experimental configurations and we observe that in some cases it finds solutions which reduce the energy consumption by up to 34.5% with a slight increase of 2% in the makespan.

Introduction

Precedence-constrained parallel applications, also known as workflows, are a popular paradigm used by scientists for modelling large applications. Most of these applications have to deliver results as fast as possible and thus require parallel or distributed computation for decreasing their execution time. A major challenge in this case is to achieve the “best” schedule of the workflow tasks onto the available resources that minimises its execution time—a well-known NP-complete problem. Nowadays, high-performance parallel computing is available to almost any scientist or researcher. Along with their many advantages, these facilities introduce new challenges such as the considerable amount of energy required in today’s data centres. Besides the green implications of energy savings, Hamilton  [1] reported that the financial expenditure for the energy consumption of Amazon’s data centres in a period of fifteen years accounts for 19% of its total budget and that 23% of the same budget represents the cost of maintaining the cooling infrastructure. Under these circumstances, energy efficiency is becoming an important objective for computing infrastructure providers due to its environmental and financial implications.

Reducing the energy consumption while maintaining the quality of the service in terms of performance has therefore become a major research challenge. The existing works mainly address this issue through the use of Dynamic Voltage and Frequency Scaling (DVFS), which enables online adjustment to voltage and frequency in CMOS circuits  [2], [3], [4]. By applying this technique to different CPUs in a heterogeneous parallel system, it is possible to have resources working at their highest speed, and implicitly at peak energy consumption, while executing time-critical tasks. Conversely, they will be tuned to work at a lower speed with lower energy consumption while executing non-critical tasks. In such systems, the user is confronted with a set of resources working at different speeds and with different characteristic energy consumptions.

In these circumstances, workflow scheduling must be formulated as a multi-objective optimisation problem (MOP) aiming at optimising two possibly conflicting criteria: makespan and energy consumption for executing the workflow. The main characteristic of MOPs is that no single solution exists that is optimal with respect to all objectives, but a set of tradeoff solutions known as Pareto set or Pareto front, depending on if we refer to the domain or co-domain of the functions to be optimised. The property of the solutions on the Pareto set/front is that they cannot be simultaneously improved with respect to all objectives. In our particular case, it is likely that no single schedule simultaneously minimises makespan and energy consumption. Instead, there will be a set of solutions with a different tradeoff between makespan and energy consumption.

In this paper, we present a holistic approach for an energy-efficient workflow scheduler for heterogeneous systems, capable of computing a set of tradeoff solutions in a single run. Our method called MOHEFT is an extension of the Heterogeneous Earliest Finish Time (HEFT) algorithm  [5], one of the most popular algorithms for workflow scheduling. HEFT is based on the assumption that the execution time of each workflow task is known on each resource, which is not always realistic for highly dynamic distributed systems consisting of a large number of multi-core processors sharing resources (e.g. bus, caches) and affected by the external load caused by other concurrent tasks. To overcome this drawback, MOHEFT relies on empirical models for execution time of workflow tasks and their entailed energy consumption. These models are based on the knowledge extracted from historical executions, reflecting the behaviour of real multi-core CPUs with different levels of energy consumption depending on the number of cores used and their individual level of utilisation, and predicting it for new unseen resources. While we do look into the external load generated by different workflow tasks, we do not consider multi-tenancy in our approach, as scientific applications, including workflows, are typically highly specialised and customised applications owned by individual scientists and tuned for their particular temporary research needs.

The contributions of this paper in the area of energy-efficient scheduling are as follows.

  • 1.

    Identification of fine-grained levels of energy consumption in a multi-core CPU. Thorough extensive experimentation, we measure the energy consumption and performance of different multi-core CPUs with different number of cores used. The experiments show that the performance of individual cores decreases with the concurrent number of cores used, while the energy efficiency increases.

  • 2.

    Use of empirical models for energy consumption and performance based on real data. We build neural network-based empirical models for time and energy consumption of CPU-intensive tasks based on historical executions on heterogeneous sets of machines. Our approach considers external load coming from tasks of the same as the one being modelled; however, we plan in future work to extend it with training information about the types of tasks defining the external load.

  • 3.

    A multi-objective energy-efficient scheduling algorithm. We present an algorithm capable of computing workflow schedules representing tradeoffs between energy consumption and makespan. We prove thorough an extensive experimentation that our algorithm computes solutions with the same or shorter makespan than HEFT and with lower energy consumption than HEFT and greenHEFT, an ad-hoc greedy algorithm for energy optimisation. MOHEFT also outperforms the most popular multi-objective optimisation algorithm, NSGA-II  [6], computing solutions with shorter makespan and lower energy consumption.

  • 4.

    Analysis of the impact of different workflow characteristics and different resources on the tradeoff solutions. We carry out a thorough evaluation of our proposal under different circumstances: number of tasks, homogeneous resources in terms of speed and energy consumption, or heterogeneous resources. Our aim is to analyse the performance of MOHEFT versus other algorithms and the impact of different scenarios on the computed tradeoff solutions.

The rest of this paper is structured as follows. The next section formally describes our problem and introduces some background on multi-objective optimisation. Related works on energy modelling, multi-objective workflow scheduling and energy-efficient scheduling are analysed in Section  4. After that, we describe the modelling approach followed in this paper. Section  6 introduces MOHEFT. The evaluation of our algorithm and the analysis of the obtained results is included in Section  7. Finally, we present the main conclusions and future work in Section  8.

Section snippets

Formalism

Our problem consists in scheduling a workflow application’s tasks on a set of available heterogeneous resources in such a way that the makespan and the energy consumption of its execution are minimised. We introduce in the remainder of this section a simple but realistic formalism that defines the workflow, resource environment, and the metrics targeted.

Multi-objective optimisation

We introduce in this section a few concepts from the multi-objective optimisation theory for a better understanding of this work. We assume without loss of generality that minimisation is the goal for all objectives.

A multi-objective optimisation problem can be formally defined as finding all vectors x=[x1,x2,,xn] which minimise the vector function f(x)=[f1(x),f2(x),,fo(x)]. For our problem, n=|A| represents the cardinality of the task set A, and the i-th component of a solution x the

Related work

We review the related work in three fields touched by our approach: energy modelling, multi-objective workflow scheduling, and energy-efficient scheduling.

Applicability of existing models

In this section we show that the existing models reviewed in Section  4.1 are not applicable in the context of modern multi-core architectures. For this purpose, we conduct an extensive experiment aiming to prove that: (1) the performance of individual cores is impacted by the workload of the other cores; (2) the energy consumption of a multi-core CPU may vary among a multitude of different levels at any instant t; and (3) the energy consumption overhead for powering on a core decreases with

MOHEFT: multi-objective heterogeneous earliest finish time

In this section we describe our proposed multi-objective scheduling algorithm for computing a set of tradeoff solutions (instead of a single one) as an extension to the HEFT list scheduling algorithm. For a better understanding, we start by describing the mono-objective version of the algorithm and extend it afterwards for dealing with multiple objectives. Finally, we present also greenHEFT, a heft-like heuristic for minimising the energy consumption.

Evaluation

We run experiments for evaluating the MOHEFT solutions, aiming at

  • demonstrating that they are at least as good as those delivered by the mono-objective HEFT for makespan and greenHEFT for energy;

  • demonstrating that MOHEFT computes higher quality tradeoff solutions than classical multi-objective optimisation algorithms;

  • analysing the results of MOHEFT for

    • workflows with different shapes and number of activities;

    • reduced number of resources;

    • different types of resources in terms of clock frequency

Conclusions and future work

We tackled in this paper the problem of energy-efficient workflow scheduling in heterogeneous parallel systems. We proposed a novel multi-objective list-based workflow scheduling heuristic called MOHEFT, capable of computing tradeoff solutions between makespan and energy consumption. This algorithm relies on empirical models for predicting the energy consumption and execution time of workflow activities, designed based on historical data collected from real workflow task executions. We compared

Juan J. Durillo received the M.S. and Ph.D. degrees in computer science from the University of Málaga, Spain, in 2006 and 2011, respectively. Currently, he is working as post-doctoral researcher in the Department of Computer Science, University of Innsbruck, Austria. Dr. Durillo has authored more than ten science papers in international journals and has contributed to more than 20 papers in international conferences or book chapters. His research interests include multi-objective optimisation,

References (30)

  • M. Mezmaz et al.

    A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems

    Journal of Parallel and Distributed Computing

    (2011)
  • N. Min-Allah et al.

    Power efficient rate monotici scheduling for multi-core systems

    Journal of Parallel and Distributed Computing

    (2012)
  • Juan J. Durillo et al.

    jMetal: a java framework for multi-objective optimization

    Advances in Engineering Software

    (2011)
  • James Hamilton, Cooperative expendable micro-slice servers (CEMS): low cost, low power servers for Internet-scale...
  • Eric Humenay et al.

    Impact of process variations on multicore performance symmetry

  • T. Burd, T. Pering, A. Stratakos, R. Brodersen, A dynamic voltage scaled microprocessor system, in: Solid-State...
  • A.P. Chandrakasan et al.

    Low-power CMOS digital design

    IEEE Journal of Solid-State Circuits

    (1992)
  • H. Topcuoglu et al.

    Performance-effective and low-complexity task scheduling for heterogeneous computing

    IEEE Transactions on Parallel and Distributed Systems

    (2002)
  • Kalyanmoy Deb et al.

    A fast elitist multi-objective genetic algorithm: NSGA-II

    IEEE Transactions on Evolutionary Computation

    (2000)
  • J. Knowles, L. Thiele, E. Zitzler, A tutorial on the performance assessment of stochastic multiobjective optimizers,...
  • Saurabh Kumar Garg et al.

    Scheduling parallel applications on utility grids: time and cost trade-off management

  • E. Ilavarsan et al.

    Low complexity performance effective task scheduling algorithm for heterogeneous computing environments

    Journal of Computer Science

    (2007)
  • Xiaofeng Wang et al.

    Reliability-oriented genetic algorithm for workflow applications using max–min strategy

  • I. Assayad et al.

    A bi-criteria scheduling heuristics for distributed embedded systems under reliability and real-time constraints

  • Jia Yu et al.

    Multi-objective planning for workflow execution on grids

  • Cited by (104)

    View all citing articles on Scopus

    Juan J. Durillo received the M.S. and Ph.D. degrees in computer science from the University of Málaga, Spain, in 2006 and 2011, respectively. Currently, he is working as post-doctoral researcher in the Department of Computer Science, University of Innsbruck, Austria. Dr. Durillo has authored more than ten science papers in international journals and has contributed to more than 20 papers in international conferences or book chapters. His research interests include multi-objective optimisation, parallel and distributed computing, and search-based software engineering.

    Vlad Nae is currently a Post-doctoral Fellow at the Institute of Computer Science, University of Innsbruck. He received his Diploma Engineer degree in 2006 from the Politehnica University of Bucharest, Romania. In December 2011 he received a Ph.D. in Computer Science from the University of Innsbruck. His research interests are energy efficiency and resource management in distributed systems, particularly in the area of highly interactive applications such as massively multiplayer online games. He is the author of 19 publications including four journal articles and three book chapters.

    Radu Prodan received the Ph.D. degree from the Vienna University of Technology, Vienna, Austria, in 2004. He is currently associate professor at the Institute of Computer Science, University of Innsbruck, Austria. His research in the area of parallel and distributed systems comprises programming methods, compiler technology, performance analysis and scheduling. He participated in several national and European projects. He is currently coordinating the Austrian projects and was workpackage Leader in the IST-034601 (edutain@grid) and 26185 (SHIWA) projects. He is the author of more than 70 journal and conference publications and one book. He was the recipient of an IEEE Best Paper Award. He is a member of the ACM.

    View full text