Skip to main content
Top
Published in: The Journal of Supercomputing 1/2024

Open Access 27-06-2023

Carbon emission-aware job scheduling for Kubernetes deployments

Authors: Tobias Piontek, Kawsar Haghshenas, Marco Aiello

Published in: The Journal of Supercomputing | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Decreasing carbon emissions of data centers while guaranteeing Quality of Service (QoS) is one of the major challenges for efficient resource management of large-scale cloud infrastructures and societal sustainability. Previous works in the area of carbon reduction mostly focus on decreasing overall energy consumption, replacing energy sources with renewable ones, and migrating workloads to locations where lower emissions are expected. These measures do not consider the energy mix of the power used for the data center. In other words, all KWh of energy are considered the same from the point of view of emissions, which is rarely the case in practice. In this paper, we overcome this deficit by proposing a novel practical CO2-aware workload scheduling algorithm implemented in the Kubernetes orchestrator to shift non-critical jobs in time. The proposed algorithm predicts future CO2 emissions by using historical data of energy generation, selects time-shiftable jobs, and creates job schedules utilizing greedy sub-optimal CO2 decisions. The proposed algorithm is implemented using Kubernetes’ scheduler extender solution due to its ease of deployment with little overheads. The algorithm is evaluated with real-world workload traces and compared to the default Kubernetes scheduling implementation on several actual scenarios.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

In the past decade, the scale and number of data centers have been continuously increasing to meet the demand for computing resources and services. The increase comes with the important consumption of electricity. In fact, data centers are estimated to account for about 1% of the global energy usage [1]. It is also estimated that such usage will increase to 3%-13% by 2030 [2]. This trend is bringing pressure on data center owners to take all possible measures to reduce the consumption of energy and related emissions.
Traditional approaches to reduce environmental effects consist of changing the source and, when possible, reducing consumption. The progress in renewable energy generation has brought a variation to this, that is, the idea of shifting load to when and where low-carbon energy is available, e.g., postponing load to the middle of a sunny day in the case photovoltaic panels are available. This idea is based on the fact that the relation between energy consumption and carbon emission is not strictly proportional. The emissions depend on the sources used for electricity generation. In other terms, the very same KWh of energy can have a low or high impact in terms of emissions. In a country like Germany, for instance, the variance in emission per KWh can be as high as four times during the same day [3].
The challenge then becomes the prediction of the best moment and place in which to utilize data center power and the identification of the load that can be shifted. The prediction of environmental impact depends on the weather for renewables like sun and wind, but for others is not strictly dependent on it; in fact, forecasting is more complex than simply looking at the expected weather [4]. However as Germany has a higher average percentage of renewable energy sources, CO2 power grid efficiency is more influenced by environmental factors such as hours of sunshine. Scheduling jobs by utilizing sunshine hours alone can enable a scheduler to reduce CO2 emissions [5]. Given energy mix forecasts, one can then predict the CO2 emission relative to power consumption over time [6]. The idea we put forward here is to use such a signal to drive the scheduling of jobs for data centers.
It should be noted that while a precise measurement of the CO2 emissions is not feasible, one can produce very good estimates. The national energy providers of most countries publish information about the energy mix used at any given moment in time. This energy mix can be used to deduce the average CO2 emissions, which provides a signal for the power coming from the grid. Our implementation in the present paper is based on our previous work for smart homes [3] and for minimizing the CO2 impact of cooking pasta [7]. If additionally there are local renewable sources and storage, their CO2 intensity signal can also be estimated using appropriate modeling of the local installations d [8].
The present work is a novel approach with respect to the state of the art. In fact, most previous works consider the decrease of energy consumption by packing jobs on as less as possible computing nodes [9, 10] or to increase renewable energy usage [11, 12]. Few works propose utilizing CO2 emission intensities by moving jobs between geographical locations to utilize high carbon emission fluctuations between different regions [5, 9, 13]. However, in order to be able to utilize such fluctuations, it is required to migrate between data centers, often across country borders. This has important overheads, especially for data-intensive tasks. To the best of our knowledge, only one other work considers load shifting based on CO2 expected intensities [14]. However, the reported experiments are based on simulations and no actual implementation, in a real environment such as Kubernetes, is defined.
Kubernetes, developed by Google in 2014, is one of the largest open-source orchestrators for automating software deployment, scaling, and management, which is available at almost any public cloud platform in the market as a Platform-as-a-Service (PAAS) [15, 16]. Containers form the fundamental technology within Kubernetes, which enables efficient utilization and execution of applications. In large data centers, cluster schedulers manage job queues and schedule/allocate computing resources to the jobs. As reported by SlashData, at the time of writing, 5.6 million developers were using Kubernetes and the overall usage had increased by 4% from 2020 to 2021 [17]. In addition, more and more enterprises are deploying their applications on the cloud using Kubernetes and further growth can be expected.
In this paper, we introduce a new CO2 signal-aware job scheduling algorithm that manages jobs’ queues to shift non-critical jobs in time with the ultimate goal of decreasing total carbon dioxide emissions. We also provide a practical, detailed implementation for developing the job scheduler for Kubernetes. The proposed approach utilizes the CO2 signal history to predict future intensities and schedules the jobs accordingly. There are several scheduling modification approaches for Kubernetes, each with pros and cons. In this work, we used the scheduler extender solution, as it is powerful enough to implement the workload time-shifting approach and it needs lower implementation effort than the other solutions, and its lightweight structure simplifies the deployment process. In summary, the main contributions of our work are as follows:
1.
We propose a CO2 signal-aware job scheduling algorithm to dynamically shift non-critical jobs in time, reducing total carbon dioxide emissions while satisfying the Service Level Agreement (SLA) for all jobs.
 
2.
The proposed solution utilizes Weighted Moving Average (WMA) approach to predict future carbon emissions using available historical data. The experimental results using real-world data show WMA is able to predict CO2 signal trends in various scenarios.
 
3.
The proposed algorithm is implemented in practice for Kubernetes using the scheduler extender solution, which comes with the least deployment overheads compared to other possible solutions. Several benchmark scenarios are used to evaluate the implementation and to compare the proposed algorithm with the default scheduler.
 
The remainder of the paper is organized as follows. Section 2 provides basic background information on Kubernetes, its architecture, the Minikube default scheduler, and possible scheduler extension approaches. The problem formulation and the proposed algorithm are presented in Sect. 3. Implementation details are the object of Sect. 4.2. Experimental setup and results are the objects of Sect. 5. Section 6 discusses and compares the presented approach to related work. Finally, conclusions are drawn and presented in Sect. 7.

2 Background

Kubernetes is the current leading container-orchestration system with an estimated 4 million developers worldwide and more than 2.8 million contributions made to its open source code base [18].

2.1 Kubernetes

Kubernetes also known as k8s is an orchestrator for deploying, scaling, and managing containerized applications. It was originally developed by Google, and made open source in 2014 [16]. It is a very versatile software in its deployment capabilities. With Kubernetes, multiple nodes form a cluster, which can be easily managed across the cloud or hybrid local–remote deployments. Figure 1 illustrates the architectural overview of Kubernetes and its main components.
Every Kubernetes must have at least one master node, which is responsible for managing all deployments in the distributed platform [16]. The main component of a master node is kube-apiserver which connects other components internally and is attached to other cluster nodes directly. The kube-apiserver also provides the backend for the kubectl tool, Command Line Interface (CLI), which accesses the master node to retrieve information of pods (most small deployable Kubernetes object), nodes, the overall cluster state, and all metrics that are collected by the cluster [16].
The kube-controller-manager is another part of the master node which is responsible for multiple actions including [19]:
  • Node controller, which initializes other nodes and determines their network addresses.
  • Route controller, which is an optional component that can be used on Google Compute Engine (GCP) clusters to enable communication between containers on different nodes.
  • Service controller, which is responsible for generating, deleting, and updating services. It also listens to events on the cluster and logs them.
The cloud-controller-manager is an optional component that enables cloud vendors to easily introduce new code on a plugin basis without modifying the core code of Kubernetes. This component is connected to the cloud connector which establishes and maintains the connection to a vendor-specific cloud [19]. The etcd server is a persistent storage where all Application Programming Interface (API) objects are stored. Entities always need to store information externally, as in Kubernetes. Containers are completely wiped if they are deleted/rebuilt [16]. Finally, the kube-scheduler decides when and where each deployment gets deployed. The master node uses this component to perform the allocation decisions on the whole cluster.
The Kubernetes minions, also known as worker nodes, are responsible for hosting the deployments on the cluster. While the master node manages the cluster, worker nodes get work assigned by the kube-apiserver and are accessible through it. Each worker node has two main components: kubelet and kube-proxy. The kubelet is the node agent which is responsible for managing and registering the worker node at the kube-apiserver. The kube-proxy component is responsible for managing the network traffic of a worker node as well as load balancing over it [16].

2.2 Containerization

Containers are the core technology that enables users to accomplish encapsulation between components and solves dependency management. There are two distinct standards for containerized applications: Docker and Open Container Initiative (OCI). However, since this work relies on Docker, the explanations will primarily focus on this container format. A container image is a binary package that contains all the necessary files to run a program within an Operating System (OS) container [16]. A containerized application is fundamentally different in structure from a Virtual Machine (VM). While container applications share a host OS and the OS kernel, VMs have their separate OS. As a result, container images are smaller in size and require fewer VMs and OSes. Containers are the core technology in Kubernetes, utilized by pods, which act as a unit comprising one or multiple containers.

2.3 Minikube

Minikube is a simple single-node Kubernetes cluster that is used for development, testing, and experimentation, but not for actual production environments. There exists an official Minikube tool on GitHub1 that automates the installation, setup, and starting/stopping/deleting of the minimal cluster setup. An advantage of Minikube is that it is easy to set up. In addition, multiple versions can be tested easily, as the desired version of Kubernetes can be attached as a console parameter. However, Minikube has some limitations; most notably, it is only possible to be run on a VM.

2.4 Kube-scheduler and possible modification approaches

The kube-scheduler assigns pods to the worker nodes of the cluster. In its basic implementation, the kube-scheduler first picks a pod from its priority queue and then assigns it to a worker node. The priority queue is sorted by priority classes. The higher the priority value, the more privileged the pod becomes. Pod priority values roughly range between zero and one billion, while higher priority values are reserved for system-critical Kubernetes jobs. Using too large priority values can potentially decrease cluster reliability. The second priority parameter of the sort function is the timestamp of the pods, in which the oldest pods are favored by a First In First Out (FIFO) policy [20].
Once a pod is picked, the scheduler determines the possible host candidates based on the resource and property constraints, known as the filtering step. Property constraints attach labels to nodes. These can indicate certain features available, such as a specific availability zone or specific hardware requirements [21]. In the second stage, the candidate nodes are prioritized by a rank function, which in the ideal case returns at least one node, known as the ranking step. Finally, the best node is picked and the pod is then bound by the cluster to this node.
Currently, there exist four different approaches to change the default scheduling algorithm of Kubernetes [22, 23].
  • The custom reference scheduler is a direct modification of the standard Kubernetes implementation. The main benefit is the direct approach and liberty of modification. However, Kubernetes is not a static piece of code and changes from time to time. Therefore, the main drawback of the custom reference scheduler approach is that changes in the Kubernetes cluster may require major adaptions in the adapted algorithm or even a complete rewrite. In addition, this approach enforces the new scheduler to be a replacement of the standard scheduler [22].
  • The custom scheduler implementation requires a full custom implementation of the scheduler. The scheduler is deployed on the cluster as an alternative component to the standard scheduler. With this approach, the standard scheduler is not replaced, and thus, multiple schedulers are available. As an example, pods can explicitly specify with which scheduler they want to be scheduled. A potential drawback of this solution is that it produces boilerplate code that needs to be maintained and can easily become obsolete over time. However, the worst case is that the new scheduler becomes useless with an update and the environment is reconfigured to the default scheduler [22].
  • The scheduler extender is a rather quick solution, which applies a filter to the default scheduler, rather than offering a complete replacement. The scheduler extender is deployed as a basic REST service communicating over HTTP or HTTPS with the standard scheduler. The standard scheduler receives a configuration file with the connectivity information. The main benefit of this approach is that the boilerplate code is reduced to a minimum, as the filters can directly be written in the scheduler extender with very few instructions. As the scheduler extender is rather small, code changes can be made very rapidly. However, as the scheduler extender is realized over a web socket, there is a potential threat to performance, as the delay overhead of a method call over a web socket is potentially higher than over a local call [23]. Another limiting factor is that the scheduler extender can only extend the default scheduler in certain predefined phases, where the standard scheduler sends the request to the extender. This can be circumvented by implementing some changes also in the scheduler that calls the extender, but it comes at the expense of higher complexity [22].
  • The scheduler framework is the most recent solution for extending the scheduler. The scheduler framework approach addresses the scalability issues of the scheduler extender approach. The main code of the scheduler stays untouched, and the changes are applied by implementing plugins and linking them to the standard scheduler. This requires a recompilation of the whole scheduler. The main benefit of this approach, compared to rewriting the whole scheduler code is that, in theory, scheduler plugins ideally are not affected by future updates. For a production environment, this approach is more scalable, while rapid prototyping is easier with the scheduler extender [24].

3 System model and problem formulation

Kubernetes is a successful platform for automating and easing the deployments of containerized jobs on distributed systems. The goal of this paper is to use it as an underlying infrastructure and to add the ability to make data centers more sustainable by following CO2 emission signals. In this section, we give a precise formulation of the optimization problem we are after.
The task of a workload scheduler is to decide where and when the incoming data center jobs get deployed. If we assume, as is the case, that CO2 emissions vary over time for the same unit of energy, the scheduling policy for a given workload will affect the total emissions of running the workload. The workload consists of two types of jobs: critical/service and non-critical/shiftable jobs. The critical jobs need to be executed as fast as possible, ideally immediately. The non-critical jobs can be queued and shifted in time, possibly based on some given long-term deadlines.
Our goal is to move temporally shiftable jobs with the overall objective of minimizing total CO2 emissions per given workload while guaranteeing the execution deadlines per job defined in the service level agreement. Therefore, SLA violations can only occur if the resources are insufficient for a given workload and not because of the scheduling policy. We add the constraint that shiftable workload must be executed immediately once they have waited for 24 h.
Formally, the SLA for job j is defined as [25, 26]:
$$\begin{aligned} SLA_j= {\left\{ \begin{array}{ll} \text {Job must be allocated within 24-hours}&{}\text {if }j\in S_{NC}\wedge d_j\le 24h\}\\ \text {Job must be allocated as soon as possible} &{}\text {otherwise} \end{array}\right. } \end{aligned}$$
(1)
where \(S_{NC}\) represents the set of non-critical jobs. The parameter \(d_j\) stands for the waiting time of job j. In other terms, if a job is critical (\(j\in S_C)\) or if it is non-critical but has been stalled for more than 24 h, it should be run as soon as possible, ideally immediately. If not, then it can be delayed up to 24 h. As a point of notation, we define each job as a triple \(<s_j,u_j,d_j>\), i.e., start time (\(s_j\)), resource utilization (\(u_j\)), and waiting time (\(d_j\)).
To minimize the total carbon emission of a cluster, the total load of the cluster has to be shaped to reduce power usage during high CO\(_2\) intensities and increase during low intensities measured in \({CO_2}/{kWh}\). We formulate our optimization problem in Eqs. (3)–(7). Let Emiss(t) represents the total CO\(_2\) emission of a cluster at time slot t, which can be calculated as follows:
$$\begin{aligned} Emiss(t) = Power(t) * CO_2(t) \end{aligned}$$
(2)
where Power(t) and \(CO_2(t)\) stand for the total power consumption of the cluster at time t and average CO\(_2\) intensity at time t, respectively.
Based on the above explanations, we can formulate the optimization problem for the time period (t1, t2) as follows:
$$\begin{aligned} \textit{Minimize.} \qquad \hbox { Total } \text {CO}\ _2 \text {Emission} = \sum _{t=t_1}^{t_2} Emiss(t) \end{aligned}$$
(3)
            Subject to
$$\begin{aligned}{} & {} U_s(t) < U_{upper} \,\,\,\forall s \in S_H \text { AND } \forall t \in [t_1,t_2] \end{aligned}$$
(4)
$$\begin{aligned}{} & {} {\forall j \in \{S_C \wedge S_{NC}\} \text {: } SLA_j \text { (Eq. (1)) must be met.}} \end{aligned}$$
(5)
$$\begin{aligned}{} & {} \sum _{t=t_1}^{t_2}U(t) = \sum _{t=t_1}^{t_2}U'(t) \end{aligned}$$
(6)
The first constraint (4) ensures that the utilization of each physical server s in the set of all servers (\(S_H\)) stays less than a predefined threshold to prevent server overload. The second constraint satisfies the defined SLA in Eq. (1) for all the service and batch jobs. Finally, the third constraint guarantees that any scheduling plan must keep the sum of the utilization of all nodes during time interval \((t_1, t_2)\) the same as the default scheduler. This constraint prevents shifting the jobs out of the considered time window. U(t) in constraint (6) is defined as follows:
$$\begin{aligned} U(t) = \sum _{j=0}^{N}{u_j}*\alpha _{j,t} \end{aligned}$$
(7)
where N stands for the number of total jobs arriving at the cluster during time period \((t_1, t_2)\). The binary decision variable \(\alpha _{j,t}\) shows whether job j is running on a node during time slot t or not. Finally, \(u_j\) represents the resource utilization parameter in the defined triple of job j.

4 Approach

This section introduces our proposed algorithm for the defined problem in the previous section. In addition, the implementation details including our CO2 prediction approach, modifications on Kube-scheduler, and workload generation method are described.

4.1 Proposed algorithm

The algorithm proposed to address the problem presented in Sect. 3 schedules the non-critical jobs by placing them in the time slots with lower expected CO2 intensities, and it has been designed to be implemented in Kubernetes. The pseudo-code is presented in Algorithm 1. First, the expected CO2 intensities for a predefined future time window are predicted using the techniques explained in Sect. 4.2.1.
Algorithm 1 schedules the jobs based on their SLA (Lines4–20 of Algorithm 1). For each job, the algorithm iterates over possible time slots for running it. The possible time slots are determined based on the job’s SLA. In this way, critical jobs are assigned immediately at their arrival time, while the non-critical jobs are shifted. The algorithm searches for the optimal window in terms of carbon emission (Lines 5–9 of Algorithm 1). In the allocation phase, an upper limit is set for the utilization of the CPU (upper_threshold) to prevent CPU overflow/over-utilization. This limit is set dynamically by a static load prediction that is given to the scheduler code at startup. At least 5% of the cluster utilization must be kept free. If the load prediction expects 100% load, an additional 10% CPU load is reserved for critical tasks. Therefore, in total, about 15% of the system resources can be reserved for critical tasks to ensure responsiveness for time-critical workload.
In Line 2 of Algorithm 1, the job parameters are defined as a tuple. Each job/pod is associated with several features including podname, priority_class, and milicore_reservation. The "podname" is a unique identifier of the pod, and the "priority_class" is used to differentiate prioritized and shiftable workloads. The third variable "milicore_reservation" is the CPU consumption of the defined pod. The for loop in Line 5 determines an optimal time window within the next time_frame for allocating non-critical jobs. To determine the optimal time window, we calculate the average CO2 emission for all possible time windows within the range of \(time\_frame - window\_frame\) and then select the time window with the lowest emission value as the optimal choice.
In our implementation, the length of time_frame has been set to 24 h, as the workloads are 24-hour traces, and the optimal time window (window_frame) has been set to 6 h. These deployments might need to be adjusted based on the workload traces. The algorithm iterates over the desired time interval and looks for the overall best CO2 average efficiency within the time interval. Starting with Line 11, the scheduler conditions are checked one by one. The else conditions are checked only if all the previous conditions are false. A pod is allowed to get scheduled if one of the following conditions holds:
1.
priority_class: pods with higher priority class are preferred. A critical job is picked immediately if resources are available.
 
2.
podage: if the pod has waited for more than the defined max_stale_time (e.g., 24 h), the pod is immediately allocated. This prevents pods from starving in the queue infinitely. This reduces CO2 reduction potential; however, it ensures QoS.
 
3.
cpu_limit and inside optimal CO2 window: if the current cluster utilization of critical tasks is lower than the defined threshold and the time of the day is within the CO2 optimal time window, the non-critical job is allowed to be allocated.
 
4.
else: if none of the above conditions apply, the pod is delayed further, until it reaches a pod age of 24. Then it is treated as a critical pod and queued immediately.
 

4.2 Implementation

The proposed algorithm is implemented as a scheduler extender in Kubernetes. In our implementation, an empty scheduler extender dummy implementation licensed under the Apache 2.0 license is used as a base [27]. The dummy implementation schedules every pod on every possible node without applying any kind of filtering. Then, the dummy code is modified based on the proposed mathematical model and proposed algorithm. The code of the implementation is publicly available on Github.2 Most new code is in the file ./predicate.go in the main project directory.

4.2.1 A) CO2 intensity prediction

The average CO2 intensity depends on the energy mix used for electricity generation. The energy mix of any region at any time consists of the resources used; typically, these are: nuclear, lignite, carbon, oil, gas, solar, wind, and hydroelectric. The differences in average emission per kWh can vary significantly as different sources are used [3]. Consuming energy when the CO2 intensity of the energy mix is high has a worse effect than consuming that same energy when the intensity is low. However, with the variable nature of renewable energies, the future energy mix and consequently the future average CO2 emission is not known precisely and need to be estimated.
We use WMA approach to predict future carbon intensities [28]. The moving average is a well-known technical indicator widely used in time series analysis for forecasting future data [28]. The WMA incorporates historical data points, assigning descending weights to each point around a specified central point. Typically, the central data point carries the highest weight, while the weight decreases for points further away. The number of data points used for calculating the WMA is usually limited to a fixed amount. This technique is usually used for datasets with periodic patterns that require the occasional smoothing of infrequent spikes. The WMA can be applied to any time series and is widely used as it is reliable and easy to apply. In addition, in most real-world applications, such as stock market prediction, the WMA is used for smoothing out curves. Similar to the stock market, the CO2 average intensity subordinates to periodic patterns [29]. The main formula for calculating future values using previous n points is [30]:
$$\begin{aligned} WMA=\frac{nP_M+(n-1)P_{(M-1)}+\dots +2P_{(M-n+2)}+P_{(M-n+1)}}{n+(n-1)+\dots +2+1} \end{aligned}$$
(8)
where M stands for the reference spot in time and is the most weighted point.
In the implementation, we use the hourly historical data of Germany (2020 and 2021) provided by electricitymap.org/research, measured in gCO2eq/kWh. We weigh the current day as 40% of the final value, while the preceding and following days are weighted 20%, and 10% for the next and previous days. When moving the implementation to production, one can use a predictive model, as we do here, or use a live signal from the energy service provider (typically the national Transmission Service Operator (TSO)), as we have done in previous work for scheduling jobs for smart homes [3, 7].

4.2.2 B) Kube-scheduler modification

There are four modification approaches for the kube-scheduler, see Sect. 2.4. In Table 1, a comparison in terms of the benefits and drawbacks of these four approaches is presented.
Table 1
Kube-scheduler modification approaches
 
Custom
Custom scheduler
Scheduler
Scheduler
reference scheduler
implementation
extender
framework
Programming language
GO
Free choice
Free choice
GO
Feature set
Very large
Very large
Small
Large
Implementation effort
Very high
Very high
Very low
High
Maintenance effort
Very high
very high
Very low
High
Rapid prototyping
Very slow
Very slow
Very fast
Slow
Potential performance
Very fast
Very fast
Slow
Very fast
The scheduler extender provides two filter stages to the extender known as “filter” and “prioritize”. These stages can adapt the scheduling algorithm of the Kubernetes cluster [23]. In addition, the scheduler extender approach requires less implementation effort compared to other solutions and its lightweight structure simplifies the deployment process. Furthermore, it is powerful enough to implement the load-shifting algorithm. Therefore, the scheduler extender is the best fit for a straightforward implementation of our proposed algorithm.
With the scheduler extender approach, multiple extension points are exposed to the scheduler via a REST API [23]. The architecture of the scheduler extender approach is shown in Fig. 2. The three extension points provided to the scheduler extender framework can be seen in the figure, including the internal communication between the kube-scheduler and the scheduler extender deployment. The Kubernetes and the scheduler extender get deployed in one Pod in the Kubernetes cluster, so they represent one atomic unit.
Figure 2b shows a diagram of the deployment of the scheduler extender. On the top, the master node is seen with all its deployments. The scheduler extender as well as the kube-scheduler are deployed on the master node. This is because in Kubernetes pods are atomic units and therefore need to be deployed on the same node [16]. In addition, the scheduler communicates with the scheduler extender via HTTPS. Hosting these two components on different nodes in the same data center or distributed around the world can impose large performance penalty and is not recommended.

4.2.3 C) Workload generation

The real workloads GWA-T-10 SHARCNet and GWA-T-4 AuverGrid available at [32] have been used for the evaluation. These workloads contain varying characteristics for each job, including run-time, number of processors, and required memory. GWA-T-10 SHARCNet workload contains about one million jobs gathered over a period of about 1.5 years, and GWA-T-4 AuverGrid workload includes about 400,000 jobs gathered in one year. The workload traces were analyzed based on the job density, CPU utilization, and job duration. These characteristics were then applied to a workload generator script. The defined workload is manipulated by the statistical data acquired in the analysis run of the real-world workload trace. Abstraction was necessary here, as the real job definitions cannot be translated to Minikube workload one by one.
Real workload traces were recorded on large clusters with multiple nodes. Therefore, statistical analysis and then abstracted models are required to run the traces in a Minikube cluster. By constraining these factors with a new model and only applying variations of the analyzed cluster, it can be ensured that the benchmark scenarios are kept within reasonable resource constraints. The CO2 data were sampled randomly by the day the benchmark runs. The statistical analysis and abstracted models should not enhance or disfavor the benchmark results.
For a fair comparison, we have considered four different scenarios of workload traces. For each of these scenarios, average run-time, average CPU utilization, average job count per hour, and uncritical jobs rate are calculated and presented in Table 2. In these scenarios, the maximum job length is set to 2 h.
Finally, the workload trace of each scenario is passed to a workload generator, as a file to be then parsed again to create corresponding pods to be run by Minikube.
Table 2
Defined workload scenarios
 
Scenario #1
Scenario #2
Scenario #3
Scenario #4
Avg. job duration
10 min and 54 S
32 min and 45 S
10 min and 2 S
5 min
Avg. utilization
51.24%
49.81%
44.00%
49.70%
Avg. jobs / hour
184.2
60.5
185.2
378.3
Non-critical jobs rate
20%
40%
40%
40%

5 Experimental setup and evaluation

This section compares our algorithm with the default scheduling approach in Kubernetes. Section 5.1 describes the experimental setup, and Sect. 5.2 represents the results of the experiments.

5.1 Experimental setup

In the experiments, we simulate a server model with 2x Intel Xeon Gold 6240 (36-cores total @ 2.60GHz) and 768GB RAM. The server’s idle and maximum power consumption’s are 212 W and 597 W, respectively. To simulate the server rack, the Minikube cluster was defined with realistic resource constraints, and run a bunch of dummy jobs to simulate the real-world cluster. The Minikube instance is run on Windows 10 (21H2) on a computer with a Core Intel i5 6200U CPU and 16GB RAM. The workload behavior of the cluster is then transferred to the power model of the cluster to get a realistic estimation of its real-world consumption.
As the server node has several power-consuming components including CPU, RAM, and storage, the CPU utilization is a good representative of the power consumption [33]. Therefore, we utilized the most used power model [26, 34] which is:
$$\begin{aligned} P_{(server, t)} = P_{idle} + (P_{max}-P_{idle})*U_t \end{aligned}$$
(9)
where \(P_{(server, t)}\) and \(U_t\) stand for the server’s power and CPU utilization at time t, respectively. \(P_{max}\) represents the server’s power while operating in maximum utilization, and \(P_{idle}\) accounts for the server’s power in the idle state. Having the power consumption of the cluster and CO2 intensity at time t, the carbon dioxide emission of the cluster at time t is calculated by Eq. (2). Then, the total emission of the cluster for running a benchmark for the time period \((t_1,t_2)\) is calculated using Eq. (3).
The proposed algorithm has been evaluated using the real workload traces available from [32]. These workloads are presented as.gwf files, each row representing one job. Therefore, modifications are required to create pods runnable by Minikube, as explained in Sect. 4.2.3.

5.2 CPU utilization and CO2 emission evaluation

In order to evaluate our proposed scheduler, we compare the CPU utilization and total CO2 emission of a simulated cluster while running the default scheduler where no optimization is applied with running the optimized CO2-aware scheduler.
The CPU utilization of the simulated cluster running with the default scheduler compared with the proposed algorithm for scenario #1 is shown in Fig. 3a. The algorithm has moved the non-critical jobs to the hours 9–11 AM. Based on the data of Fig. 3b, this time interval is the optimal time window in terms of carbon emission. In addition, Fig. 3b shows the actual CO2 emissions in comparison with the predicted values by the WMA. Despite the existence of value disparities, the predicted curve maintains a consistent pattern with the actual data. The primary objective of predicting future CO2 intensities in this work is to identify the optimal time window, a task that is effectively accomplished through the use of WMA method. In addition, the results show that WMA is capable of calculating an hourly prediction for an entire year in approximately 200 milliseconds, which makes it an efficient choice for our implementation in Kubernetes.
When extending the Kubernetes scheduler, it is crucial to minimize the runtime of the scheduler algorithm. This is particularly important because the scheduler extender operates within the cluster through a web protocol. Overall, our proposed algorithm and implementation lead to minimum overheads (similar to the default implementation) and is applicable to real-world applications.
Figure 3c–h presents the CPU utilization and CO2 intensity curves for Scenario #2, Scenario #3, and Scenario #4, respectively. For all the scenarios, the proposed algorithm successfully shifts the non-critical load to the times with lower CO2 emissions. In addition, the WMA predictions follow the same trend with the real curves which consequently leads to correct optimal time window identifications.
Table 3 summarizes the carbon emission and reductions for running the scenarios. The proposed algorithm decreases total carbon dioxide emissions of the cluster for running Scenarios #1, #2, #3, and #4, of 2.41%, 0.4%, 1.81%, and 0.37%, respectively. While in Scenarios #1 only 20% of the jobs are non-critical, which is half when compared to all other scenarios, i.e., the gained improvement is higher than all the other scenarios. This is due to the higher CO2 signal variation in Scenario #1. In Scenario #2, the load is mostly in low emission times, and thus there is less potential for emission reduction. The load in Scenario #3 is rather monotone (with only slight changes within the day) compared with Scenario #2. Therefore, the improvement potential is higher. Comparing Scenario #3 with Scenario #1, the cluster load naturally tends to be higher during emission-efficient times which decreases improvement potential. The load trend of Scenario #4 is similar to Scenario #3; however, its average load is about 5% higher which leads to a more saturated usage at the optimal time window. Together with the low CO\(_2\) variations in Scenario #4, its improvements are less than all other scenarios.
Theoretically, it is possible to construct scenarios with higher gains. To ensure our experiments closely resemble real-world situations, we considered various scenarios with different effective factors such as load trend, variations in CO2 levels, average cluster utilization, and percentages of non-critical jobs. Scenarios with higher CO2 variation, a higher number of non-critical jobs, and lower cluster utilization present a higher potential for reducing carbon emissions. The results indicate that the effect of CO2 intensity variation on the optimization potential is higher than the effect of the load characteristics.
In addition, in this work, the scheduler is only able to start jobs, and not freeze or un-deploy them. Therefore, short running not-critical jobs give a good granularity for applying load-shaping approach. This is while long-running jobs (runtime > 24 h) do not show any effect as they would apply load to the cluster during the entire considered time period.
Table 3
Experiments results on carbon emission and emission reductions
 
Scenario #1
Scenario #2
Scenario #3
Scenario #4
CO\(_{2}\) window start
9:00
10:00
9:00
10:00
CO\(_{2}\) window end
16:00
17:00
16:00
17:00
Total CO\(_{2}\) emission (default scheduler)
3174.22 g
1489.48 g
2939.83 g
3725.28 g
Total CO\(_{2}\) emission (CO2
-aware scheduler)
3097.74 g
1483.48 g
2886.67 g
3711.64 g
CO\(_{2}\) reduction
2.41 %
0.40 %
1.81 %
0.37 %
Reducing carbon emission of data centers can be achieved by "changing the source of electricity", "reducing energy consumption", and by "load management". The few previous works on load management, either shift the non-critical jobs of the data center to the times with lowers CO2 intensities or migrate them between different geographical locations.
Damme et al. propose a job scheduling algorithm that regulates the system to the optimal thermal state to minimize the total energy consumption [35]. The ePower is a heterogeneous job scheduling approach proposed by Cheng et al., for sustainable data centers that completely rely on renewable energy [36]. The core of ePower is a power-aware simulated annealing algorithm with fuzzy performance modeling for the efficient search for optimal scheduling. Soumya et al. also proposed an algorithm to match power consumption with server capacity demand consumption [37]. This increases resource utilization which consequently decreases total energy consumption, and finally decreases total carbon emission. A similar temporal job scheduling approach has been proposed by Aksanli et al. which utilizes short-term prediction of solar and wind energy production [38]. These approaches try to decrease energy consumption to consequently decrease total carbon dioxide emissions. However, they fall short in utilizing load-shaping opportunities to decrease carbon emissions further.
Aled et al. proposed and implemented a carbon-aware scheduling policy for the Kubernetes container orchestrator which migrates jobs to geographical locations with lower emission [5]. The modified scheduler uses solar radiation as a scheduling metric, assuming that data centers located in higher solar radiation should be preferred. The focus of such work is mainly on the implementation rather than on efficiency. Lin et al. follow a similar approach and explore the carbon emission reduction potential of distributed data centers with renewable generations, considering the triple uncertainties: electricity price, fuel mix, and renewable generation [39]. The proposed approach in [39] is evaluated by simulations and shows the potential for both economic and environmental benefits. In a similar work by Narayanan et al., focused on Machine Learning (ML) training workloads, the potential cost reduction possible by leveraging the distributed cloud instance market was explored [40]. Finally, Wang et al. propose an integer linear programming formulation for the task scheduling problem for geo-distributed data centers [41].
Ana et al. propose Carbon-Intelligent Computing System (CICS), a carbon-aware load-shifting algorithm [13]. CICS itself is not a scheduling algorithm; however, it uses Virtual Capacity Curves (VCC) to constrain maximum load during carbon-intense times. The VCC artificially limits the cluster’s CPU usage by reducing the total amount of calculation power that is available at a specific hour of the day level. Similarly, the potential carbon reduction by load shifting has been examined by Philipp et al. [14]. The assessments and evaluations presented in [14] and [13] rely on simulations and the Borg orchestrator, respectively.
The above-mentioned approaches try to manage the load of the data center by shifting in time or migrating between geographical locations to finally decrease carbon emission. However, to the best of our knowledge, our paper is the first work that explores the potential carbon reduction by temporal load shifting in particular in the Kubernetes container orchestrator.

7 Conclusion

In this paper, a CO2-aware scheduling algorithm has been proposed to decrease the total carbon dioxide emissions of data centers utilizing the flexibility of non-critical jobs and historical CO2 emission data. To achieve this, the future CO2 intensities are predicted and the non-critical jobs are shifted accordingly. The proposed algorithm has been implemented and evaluated in Kubernetes, one of the largest open-source orchestrators for automating software deployment. For the implementation, the scheduler extender solution has been used as it comes with the least deployment overheads compared to other possible solutions.
The proposed approach has been evaluated using real-world workloads with four different scenarios, in terms of cluster CPU utilization and total carbon dioxide emission. The results show that the proposed approach decreases total CO2 emission by an average of 1.3% compared to the default scheduler of Kubernetes while keeping the SLA of all jobs. In addition, the proposed algorithm is able to decrease emissions in all examined benchmark scenarios.
The gained improvement by shifting the load in time depends on several factors including the accuracy of the CO2 window prediction and the workload type of the cluster. The higher variation in CO2 signal and more non-critical jobs provide more flexibility and opportunity to decrease total carbon emissions.
The next steps for continuing the presented research include studying how to extend the approach for geographically distributed data centers, which enables the migration of jobs between data centers in order to decrease total carbon dioxide emissions.3

Declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literature
1.
go back to reference Masanet E, Shehabi A, Lei N, Smith S, Koomey J (2020) Recalibrating global data center energy-use estimates. Science 367(6481):984–986CrossRef Masanet E, Shehabi A, Lei N, Smith S, Koomey J (2020) Recalibrating global data center energy-use estimates. Science 367(6481):984–986CrossRef
2.
go back to reference Andrae AS, Edler T (2015) On global electricity usage of communication technology: trends to 2030. Challenges 6(1):117–157CrossRef Andrae AS, Edler T (2015) On global electricity usage of communication technology: trends to 2030. Challenges 6(1):117–157CrossRef
3.
go back to reference Fiorini L, Aiello M (2020) Predictive multi-objective scheduling with dynamic prices and marginal CO\(_2\)-emission intensities. In: Proceedings of the Eleventh ACM International Conference on Future Energy Systems pp. 196–207 Fiorini L, Aiello M (2020) Predictive multi-objective scheduling with dynamic prices and marginal CO\(_2\)-emission intensities. In: Proceedings of the Eleventh ACM International Conference on Future Energy Systems pp. 196–207
5.
go back to reference James A, Schien D (2019) A low carbon kubernetes scheduler. In: ICT4S James A, Schien D (2019) A low carbon kubernetes scheduler. In: ICT4S
7.
go back to reference Fiorini L, Steg L, Aiello M (2020) Sustainability choices when cooking pasta. In: E-Energy ’20: The Eleventh ACM International Conference on Future Energy Systems, Virtual Event, Australia, June 22-26, 2020. ACM, pp 161–166 Fiorini L, Steg L, Aiello M (2020) Sustainability choices when cooking pasta. In: E-Energy ’20: The Eleventh ACM International Conference on Future Energy Systems, Virtual Event, Australia, June 22-26, 2020. ACM, pp 161–166
8.
go back to reference Pagani GA, Aiello M (2015) Generating realistic dynamic prices and services for the smart grid. IEEE Syst J 9(1):191–198CrossRef Pagani GA, Aiello M (2015) Generating realistic dynamic prices and services for the smart grid. IEEE Syst J 9(1):191–198CrossRef
9.
go back to reference Rawas S, Zekri A, El-Zaart A (2022) LECC: location, energy, carbon and cost-aware VM placement model in geo-distributed DCS. Sustain Comput Inf Syst 33:100649 Rawas S, Zekri A, El-Zaart A (2022) LECC: location, energy, carbon and cost-aware VM placement model in geo-distributed DCS. Sustain Comput Inf Syst 33:100649
10.
go back to reference Haghshenas K, Pahlevan A, Zapater M, Mohammadi S, Atienza D (2019) Magnetic: multi-agent machine learning-based approach for energy efficient dynamic consolidation in data centers. IEEE Trans Serv Comput 15(1):30–44CrossRef Haghshenas K, Pahlevan A, Zapater M, Mohammadi S, Atienza D (2019) Magnetic: multi-agent machine learning-based approach for energy efficient dynamic consolidation in data centers. IEEE Trans Serv Comput 15(1):30–44CrossRef
11.
go back to reference Renugadevi T, Geetha K (2021) Task aware optimized energy cost and carbon emission-based virtual machine placement in sustainable data centers. J Intell Fuzzy Syst 41(5):5677–5689CrossRef Renugadevi T, Geetha K (2021) Task aware optimized energy cost and carbon emission-based virtual machine placement in sustainable data centers. J Intell Fuzzy Syst 41(5):5677–5689CrossRef
12.
go back to reference Zhao D, Zhou J (2022) An energy and carbon-aware algorithm for renewable energy usage maximization in distributed cloud data centers. J Parallel and Distrib Comput 165:156–166CrossRef Zhao D, Zhou J (2022) An energy and carbon-aware algorithm for renewable energy usage maximization in distributed cloud data centers. J Parallel and Distrib Comput 165:156–166CrossRef
13.
go back to reference Radovanovic A, Koningstein R, Schneider I, Chen B, Duarte A, Roy B, Xiao D, Haridasan M, Hung P, Care N, et al (2021) Carbon-aware computing for datacenters. rXiv preprint arXiv:2106.11750 Radovanovic A, Koningstein R, Schneider I, Chen B, Duarte A, Roy B, Xiao D, Haridasan M, Hung P, Care N, et al (2021) Carbon-aware computing for datacenters. rXiv preprint arXiv:​2106.​11750
14.
go back to reference Wiesner P, Behnke I, Scheinert D, Gontarska K, Thamsen L (2021) Let’s wait awhile: how temporal workload shifting can reduce carbon emissions in the cloud. arXiv preprint arXiv:2110.13234 Wiesner P, Behnke I, Scheinert D, Gontarska K, Thamsen L (2021) Let’s wait awhile: how temporal workload shifting can reduce carbon emissions in the cloud. arXiv preprint arXiv:​2110.​13234
16.
go back to reference Burns B, Beda J, Hightower K (2018) Kubernetes. Dpunkt Burns B, Beda J, Hightower K (2018) Kubernetes. Dpunkt
25.
go back to reference Li J, Bao Z, Li Z (2014) Modeling demand response capability by internet data centers processing batch computing jobs. IEEE Trans Smart Grid 6(2):737–747CrossRef Li J, Bao Z, Li Z (2014) Modeling demand response capability by internet data centers processing batch computing jobs. IEEE Trans Smart Grid 6(2):737–747CrossRef
26.
go back to reference Haghshenas K, Taheri S, Goudarzi M, Mohammadi S (2020) Infrastructure aware heterogeneous-workloads scheduling for data center energy cost minimization. IEEE Trans Cloud Comput Haghshenas K, Taheri S, Goudarzi M, Mohammadi S (2020) Infrastructure aware heterogeneous-workloads scheduling for data center energy cost minimization. IEEE Trans Cloud Comput
28.
go back to reference Hansun S (2013) A new approach of moving average method in time series analysis. In: Conference on New Media Studies (CoNMedia). IEEE, pp 1–4 Hansun S (2013) A new approach of moving average method in time series analysis. In: Conference on New Media Studies (CoNMedia). IEEE, pp 1–4
29.
30.
go back to reference Dash S (2012) A comparative study of moving averages: simple, weighted, and exponential. Appl Tech Anal Dash S (2012) A comparative study of moving averages: simple, weighted, and exponential. Appl Tech Anal
33.
go back to reference Fan X, Weber W-D, Barroso LA (2007) Power provisioning for a warehouse-sized computer. ACM SIGARCH Comput Archit News 35(2):13–23CrossRef Fan X, Weber W-D, Barroso LA (2007) Power provisioning for a warehouse-sized computer. ACM SIGARCH Comput Archit News 35(2):13–23CrossRef
34.
go back to reference Wu W, Wang W, Fang X, Luo J, Vasilakos AV (2019) Electricity price-aware consolidation algorithms for time-sensitive VM services in cloud systems. IEEE Trans Serv Comput 14(6):1726–1738CrossRef Wu W, Wang W, Fang X, Luo J, Vasilakos AV (2019) Electricity price-aware consolidation algorithms for time-sensitive VM services in cloud systems. IEEE Trans Serv Comput 14(6):1726–1738CrossRef
35.
go back to reference Van Damme T, De Persis C, Tesi P (2018) Optimized thermal-aware job scheduling and control of data centers. IEEE Trans Control Syst Technol 27(2):760–771CrossRef Van Damme T, De Persis C, Tesi P (2018) Optimized thermal-aware job scheduling and control of data centers. IEEE Trans Control Syst Technol 27(2):760–771CrossRef
36.
go back to reference Cheng D, Rao J, Jiang C, Zhou X (2015) Elastic power-aware resource provisioning of heterogeneous workloads in self-sustainable datacenters. IEEE Trans Comput 65(2):508–521MathSciNetCrossRef Cheng D, Rao J, Jiang C, Zhou X (2015) Elastic power-aware resource provisioning of heterogeneous workloads in self-sustainable datacenters. IEEE Trans Comput 65(2):508–521MathSciNetCrossRef
37.
go back to reference Jena SR, Padhy S, Ima B (2014) Minimizing CO\(_2\) emissions on cloud data centers. Int J Sci Eng Res 5(4):1119–1122 Jena SR, Padhy S, Ima B (2014) Minimizing CO\(_2\) emissions on cloud data centers. Int J Sci Eng Res 5(4):1119–1122
38.
go back to reference Aksanli B, Venkatesh J, Zhang L, Rosing T (2011) Utilizing green energy prediction to schedule mixed batch and service jobs in data centers. In: Proceedings of the 4th workshop on power-aware computing and systems, pp 1–5 Aksanli B, Venkatesh J, Zhang L, Rosing T (2011) Utilizing green energy prediction to schedule mixed batch and service jobs in data centers. In: Proceedings of the 4th workshop on power-aware computing and systems, pp 1–5
39.
go back to reference Lin W-T, Chen G, Li H (2022) Carbon-aware load balance control of data centers with renewable generations. IEEE Trans Cloud Comput Lin W-T, Chen G, Li H (2022) Carbon-aware load balance control of data centers with renewable generations. IEEE Trans Cloud Comput
40.
go back to reference Narayanan D, Santhanam K, Kazhamiaka F, Phanishayee A, Zaharia M (2020) Analysis and exploitation of dynamic pricing in the public cloud for ml training. In: VLDB DISPA Workshop 2020 Narayanan D, Santhanam K, Kazhamiaka F, Phanishayee A, Zaharia M (2020) Analysis and exploitation of dynamic pricing in the public cloud for ml training. In: VLDB DISPA Workshop 2020
41.
go back to reference Wang P, Liu W, Cheng M, Ding Z, Wang Y (2022) Electricity and carbon-aware task scheduling in geo-distributed internet data centers. In: IEEE/IAS industrial and commercial power system Asia (I &CPS Asia). IEEE, pp 1416–1421CrossRef Wang P, Liu W, Cheng M, Ding Z, Wang Y (2022) Electricity and carbon-aware task scheduling in geo-distributed internet data centers. In: IEEE/IAS industrial and commercial power system Asia (I &CPS Asia). IEEE, pp 1416–1421CrossRef
Metadata
Title
Carbon emission-aware job scheduling for Kubernetes deployments
Authors
Tobias Piontek
Kawsar Haghshenas
Marco Aiello
Publication date
27-06-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 1/2024
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05506-7

Other articles of this Issue 1/2024

The Journal of Supercomputing 1/2024 Go to the issue

Premium Partner