Online cost-efficient scheduling of deadline-constrained workloads on hybrid clouds

https://doi.org/10.1016/j.future.2012.12.012Get rights and content

Abstract

Cloud computing has found broad acceptance in both industry and research, with public cloud offerings now often used in conjunction with privately owned infrastructure. Technical aspects such as the impact of network latency, bandwidth constraints, data confidentiality and security, as well as economic aspects such as sunk costs and price uncertainty are key drivers towards the adoption of such a hybrid cloud model. The use of hybrid clouds introduces the need to determine which workloads are to be outsourced, and to what cloud provider. These decisions should minimize the cost of running a partition of the total workload on one or multiple public cloud providers while taking into account the application requirements such as deadline constraints and data requirements. The variety of cost factors, pricing models and cloud provider offerings to consider, further calls for an automated scheduling approach in hybrid clouds. In this work, we tackle this problem by proposing a set of algorithms to cost-efficiently schedule the deadline-constrained bag-of-tasks applications on both public cloud providers and private infrastructure. Our algorithms take into account both computational and data transfer costs as well as network bandwidth constraints. We evaluate their performance in a realistic setting with respect to cost savings, deadlines met and computational efficiency, and investigate the impact of errors in runtime estimates on these performance metrics.

Highlights

► This contribution focuses on scheduling algorithms for deadline-based workloads in a hybrid cloud setting. ► Computational and data transfer costs as well as data transfer times are considered. ► We propose cost-aware scheduling algorithms, which manage to reduce costs and are shown to be resistant to inaccuracies in the user provided runtimes.

Introduction

The IT outsourcing model in which access to scalable IT services hosted by external providers can flexibly be acquired and released in a pay-as-you-go manner, has seen an increasing adoption recently under the term cloud computing [1], [2]. A generally adopted definition by the National Institute of Standards and Technology (NIST) describes the cloud computing model as characterized by an on-demand self-service, available through broad network access, in which providers use resource pooling and provide rapidly scalable (elastic) resources, through a measured service [3]. Cloud services are further categorized as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS), depending on their focus of delivering respectively IT infrastructure, frameworks for software development and deployment, or a finished software product.

Our work focuses on the IaaS service class in which a consumer can acquire access to compute, storage, and networking capacity at the provider’s site. To acquire compute resources, a consumer launches a server instance on the cloud provider’s infrastructure, thereby specifying the instance’s characteristics such as the available processing power, main memory and I/O capacity. Most commonly, the notion of an instance materializes into a virtual machine that is launched on the provider’s physical IT infrastructure. Virtualization technology enables the provider to increase infrastructure utilization by multiplexing the execution of multiple instances on a single physical server, and allows one to flexibly tune the individual characteristics of an instance. Nevertheless, most providers predefine the combinations of instance characteristics in a fixed number of instance types.

In the last few years, the IaaS market has quickly matured, with providers rapidly diversifying their product offerings in terms of pricing plans, instance types and services. Amazon’s Elastic Compute Cloud (EC2) offering for example, has increased the number of instance types from one to sixteen in less than six years, and moved from a single on-demand pricing model to three different pricing models with the addition of reserved instances and dynamically priced spot instances. Different types of reserved instance contracts have been introduced depending on the expected utilization of the instance, and an open market for selling such contracts was launched [4]. Likewise, it has expanded the number of geographical regions in which instances can be launched from one to eight. The choice of the instance’s region both influences the network characteristics in terms of available bandwidth and latency, as well as the cost of running the instance. Finally, the number of IaaS providers has increased significantly as well, with each provider differentiating itself in terms of services offered, prices charged for different resources (ingress and egress network bandwidth, memory, CPU capacity), and performance delivered.

Consequently, consumers face an increasing complexity in making cost-optimal decisions when matching their infrastructure requirements to the IaaS services currently available. Continuing standardization efforts in virtualization technology and IaaS offerings, such as OGF’s Open Cloud Computing Interface,1 are expected to further increase the options available to a consumer when acquiring resources “in the cloud”. This issue is exacerbated by the fact that consumers often also own internal IT infrastructure. Through the creation of hybrid clouds [5], [6], one can use this internal infrastructure in tandem with public cloud resources, thereby capitalizing on investments made, and maintaining a high quality-of-service level by employing the public cloud to deal with peak loads.

Although tools exist that allow a consumer to deal with some of the technical issues that arise when allocating resources at multiple public cloud providers [7], [8], [9], [10], [11], they do not tackle the optimization problem of allocating resources in a cost-optimal manner and with support for application-level quality of service constraints such as completion deadlines for the application’s execution. This lack of tool support combined with the inherent complexity of cost-optimal resource allocation within a hybrid cloud setting, renders this process error-prone and time-consuming. Despite current research efforts [12], [13], [14], [15], [16], [17], [18], [19], [20], the cost-optimal resource scheduling and procurement of external resources while taking into account the availability of a local IT infrastructure remains an open problem.2 Moreover, a structured approach is required that can optimize resource allocations in a multi-consumer context. Indeed, the addition of volume or reservation-based price reductions in the pricing options of the public cloud providers allows for the further reduction of costs if an organization collectively engages in resource allocations for its entire user base.

Within the HICCAM (Hybrid Cloud Construction and Management) project, we are investigating the design of software components and algorithms for building and deploying hybrid clouds efficiently, and for automated resource provisioning and management at the level of an entire organization. Within the project’s software architecture outlined in Fig. 1, the organization’s Cloud Procurement Endpoint (CPE) is responsible for managing the procurement of resources from public cloud providers. Requests for executing applications are sent to the CPE by the different decision support systems (DSS) that assist the users in making a trade-off between quality of service levels and costs.

In this paper, we focus on the optimization problem of allocating resources from both a private cloud and multiple public cloud providers in a cost-optimal manner, with support for application-level quality of service constraints. We analyze how this optimization problem can be tackled in the context of resource provisioning for batch workloads through the use of different scheduling algorithms, and investigate the impact of workload model parameters on the cost reductions that can be achieved. Our workload model considers non-preemptible and non-migratable workloads with a hard deadline that are characterized by CPU, memory and data transmission requirements. To our knowledge, we are the first to address the resource allocation problem in hybrid clouds in this form. In previous work [21], we presented a linear programming formulation of the optimization problem for a static set of applications. We experienced large solve time variances, undermining the feasibility of this approach for more complex and large-scale scenarios. In [22], initial steps were taken to tackle this issue by developing custom algorithms for cost-efficient scheduling. We extended our workload model to include applications that are associated with a data set, which allows us to take data transmission speeds and data locality into account during the scheduling process, and to support the online arrival of applications. Our results showed that a cost-oriented approach pays off with regard to the number of deadlines met and that there is potential for significant cost reductions. There was however still room for improvement in terms of the scheduling algorithm’s sensitivity to estimation errors in the user provided runtimes and the efficiency of the private cloud scheduling component. This contribution introduces more advanced algorithms that deal with these issues.

Section 2 outlines the problem domain and workload model used in this paper. Section 3 introduces a public cloud scheduling algorithm and hybrid cloud scheduling algorithm. In Section 4 the experimental settings for the evaluation of the proposed algorithms are explained, after which the evaluation results for the scheduling components and corresponding algorithms can be found in Section 5. Finally, we present related works in the field in Section 6 and summarize the most important conclusions of our work in Section 7.

Section snippets

Cloud setting

In a hybrid cloud setting, the execution of applications occurs through the deployment of virtual machine instances on resources internal to the consumer organization, as well as resources provided by the public cloud providers. The characteristics of the virtual hardware on which these instances are executed are determined by the — mostly provider-specific — definition of instance types. An instance type fixes the number and speed of CPUs, the amount of memory, local storage and the

Algorithm design

We introduce a hybrid scheduling approach for the problem domain described in Section 2. The proposed solution consists of two loosely coupled components:

  • The public cloud scheduler decides for an incoming application — with given task runtimes for each of the available instance types and with a given data set size — on which of the public cloud providers to execute that application. It takes into account the cost for execution and transferring the data, as well as differences in data transfer

Experimental setup

We evaluate the proposed scheduling algorithms using a Java-based discrete time simulator.3 Simulations are executed on an AMD Opteron 6274-based system with 64 CPU cores and 196 GB of memory.

Simulation runtimes depend on the scheduling techniques and the private cloud size used in the simulation setup, as well as the queue scanning interval N. In Section 5, we evaluate the influence of N on the simulation

Results

In this Section, the scheduling algorithms proposed in Section 3 are evaluated using experiments according to the workload model and cloud setting described in Section 4.

Related work

In Tordsson et al. [12], a binary integer program is used to tackle the problem of selecting resources among different cloud providers in a federated environment. They focus on a static approach in which online applications — non-finite applications without deadline constraints — are scheduled on cloud providers in such a way that the total infrastructure capacity is maximized, given budget and load balancing constraints. Li et al. [13] investigate VM migration across multiple clouds while

Conclusion

Supplementing the private infrastructure of an organization with resources from public cloud providers introduces the problem of cost-efficiently and automatically managing the application workloads within such a hybrid cloud environment. This paper presents scheduling algorithms to deal with this optimization problem for deadline-constrained bag-of-task type applications while taking into account data constraints, data locality and inaccuracies in task runtime estimates. In the previous work 

Ruben Van den Bossche is a Ph.D. student in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include scheduling mechanisms in grid and cloud computing.

References (47)

  • D. Petcu et al.

    Portable cloud applications — from theory to practice

    Future Generation Computer Systems

    (2012)
  • A. Edmonds et al.

    Toward an open cloud standard

    IEEE Internet Computing

    (2012)
  • The Apache Software Foundation, libcloud, a unified interface to the cloud, 2012....
  • Apache Software Foundation, Deltacloud API, 2012....
  • Rightscale Inc., Rightscale multi-cloud engine, 2012....
  • W. Li, J. Tordsson, E. Elmroth, Modeling for dynamic cloud scheduling via migration of virtual machines, in: IEEE Third...
  • J.L. Lucas-Simarro et al.

    Scheduling strategies for optimal service deployment across multiple clouds

    Future Generation Computer Systems

    (2012)
  • D. Breitgand, A. Maraschini, J. Tordsson, Policy-driven service placement optimization in federated clouds, Technical...
  • J. Strebel, A. Stage, An economic decision model for business software application deployment on hybrid cloud...
  • A. Andrzejak, D. Kondo, S. Yi, Decision model for cloud computing under sla constraints, in: 2010 IEEE International...
  • S. Kailasam et al.

    Optimizing service level agreements for autonomic cloud bursting schedulers

  • U. Lampe, M. Siebenhaar, D. Schuller, R. Steinmetz, A cloud-oriented broker for cost-minimal software service...
  • R. Van den Bossche, K. Vanmechelen, J. Broeckhove, Cost-optimal scheduling in hybrid IaaS clouds for deadline...
  • Cited by (152)

    • Efficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments

      2020, Information Sciences
      Citation Excerpt :

      To attain the high-quality security needed in security-sensitive applications, such as bank systems, Tang et al. [31] developed a security-driven scheduling architecture to guarantee high quality of security and high performance while processing such parallel applications. Considering the important influence of bandwidth constraint, security network latency, and other key economic aspects on the cloud model, a set of cost-efficiency algorithms has been proposed for deadline-constrained parallel task applications on public and private cloud centers [32]. Xiao et al. [33] developed a CASpMV framework for a supercomputer center to promote the parallel efficiency of SpMV in heterogeneous computing systems.

    View all citing articles on Scopus

    Ruben Van den Bossche is a Ph.D. student in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include scheduling mechanisms in grid and cloud computing.

    Kurt Vanmechelen is a post-doctoral fellow in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include resource management in grid and cloud environments in general, and the adoption of market mechanisms in such systems in particular. In 2009 he received his Ph.D., in Computer Science, at the University of Antwerp (UA), Belgium.

    Jan Broeckhove is a professor in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include computational science and distributed computing. He received his Ph.D., in Physics, in 1982 at the Free University of Brussels (VUB), Belgium.

    View full text