Online cost-efficient scheduling of deadline-constrained workloads on hybrid clouds
Highlights
► This contribution focuses on scheduling algorithms for deadline-based workloads in a hybrid cloud setting. ► Computational and data transfer costs as well as data transfer times are considered. ► We propose cost-aware scheduling algorithms, which manage to reduce costs and are shown to be resistant to inaccuracies in the user provided runtimes.
Introduction
The IT outsourcing model in which access to scalable IT services hosted by external providers can flexibly be acquired and released in a pay-as-you-go manner, has seen an increasing adoption recently under the term cloud computing [1], [2]. A generally adopted definition by the National Institute of Standards and Technology (NIST) describes the cloud computing model as characterized by an on-demand self-service, available through broad network access, in which providers use resource pooling and provide rapidly scalable (elastic) resources, through a measured service [3]. Cloud services are further categorized as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS), depending on their focus of delivering respectively IT infrastructure, frameworks for software development and deployment, or a finished software product.
Our work focuses on the IaaS service class in which a consumer can acquire access to compute, storage, and networking capacity at the provider’s site. To acquire compute resources, a consumer launches a server instance on the cloud provider’s infrastructure, thereby specifying the instance’s characteristics such as the available processing power, main memory and I/O capacity. Most commonly, the notion of an instance materializes into a virtual machine that is launched on the provider’s physical IT infrastructure. Virtualization technology enables the provider to increase infrastructure utilization by multiplexing the execution of multiple instances on a single physical server, and allows one to flexibly tune the individual characteristics of an instance. Nevertheless, most providers predefine the combinations of instance characteristics in a fixed number of instance types.
In the last few years, the IaaS market has quickly matured, with providers rapidly diversifying their product offerings in terms of pricing plans, instance types and services. Amazon’s Elastic Compute Cloud (EC2) offering for example, has increased the number of instance types from one to sixteen in less than six years, and moved from a single on-demand pricing model to three different pricing models with the addition of reserved instances and dynamically priced spot instances. Different types of reserved instance contracts have been introduced depending on the expected utilization of the instance, and an open market for selling such contracts was launched [4]. Likewise, it has expanded the number of geographical regions in which instances can be launched from one to eight. The choice of the instance’s region both influences the network characteristics in terms of available bandwidth and latency, as well as the cost of running the instance. Finally, the number of IaaS providers has increased significantly as well, with each provider differentiating itself in terms of services offered, prices charged for different resources (ingress and egress network bandwidth, memory, CPU capacity), and performance delivered.
Consequently, consumers face an increasing complexity in making cost-optimal decisions when matching their infrastructure requirements to the IaaS services currently available. Continuing standardization efforts in virtualization technology and IaaS offerings, such as OGF’s Open Cloud Computing Interface,1 are expected to further increase the options available to a consumer when acquiring resources “in the cloud”. This issue is exacerbated by the fact that consumers often also own internal IT infrastructure. Through the creation of hybrid clouds [5], [6], one can use this internal infrastructure in tandem with public cloud resources, thereby capitalizing on investments made, and maintaining a high quality-of-service level by employing the public cloud to deal with peak loads.
Although tools exist that allow a consumer to deal with some of the technical issues that arise when allocating resources at multiple public cloud providers [7], [8], [9], [10], [11], they do not tackle the optimization problem of allocating resources in a cost-optimal manner and with support for application-level quality of service constraints such as completion deadlines for the application’s execution. This lack of tool support combined with the inherent complexity of cost-optimal resource allocation within a hybrid cloud setting, renders this process error-prone and time-consuming. Despite current research efforts [12], [13], [14], [15], [16], [17], [18], [19], [20], the cost-optimal resource scheduling and procurement of external resources while taking into account the availability of a local IT infrastructure remains an open problem.2 Moreover, a structured approach is required that can optimize resource allocations in a multi-consumer context. Indeed, the addition of volume or reservation-based price reductions in the pricing options of the public cloud providers allows for the further reduction of costs if an organization collectively engages in resource allocations for its entire user base.
Within the HICCAM (Hybrid Cloud Construction and Management) project, we are investigating the design of software components and algorithms for building and deploying hybrid clouds efficiently, and for automated resource provisioning and management at the level of an entire organization. Within the project’s software architecture outlined in Fig. 1, the organization’s Cloud Procurement Endpoint (CPE) is responsible for managing the procurement of resources from public cloud providers. Requests for executing applications are sent to the CPE by the different decision support systems (DSS) that assist the users in making a trade-off between quality of service levels and costs.
In this paper, we focus on the optimization problem of allocating resources from both a private cloud and multiple public cloud providers in a cost-optimal manner, with support for application-level quality of service constraints. We analyze how this optimization problem can be tackled in the context of resource provisioning for batch workloads through the use of different scheduling algorithms, and investigate the impact of workload model parameters on the cost reductions that can be achieved. Our workload model considers non-preemptible and non-migratable workloads with a hard deadline that are characterized by CPU, memory and data transmission requirements. To our knowledge, we are the first to address the resource allocation problem in hybrid clouds in this form. In previous work [21], we presented a linear programming formulation of the optimization problem for a static set of applications. We experienced large solve time variances, undermining the feasibility of this approach for more complex and large-scale scenarios. In [22], initial steps were taken to tackle this issue by developing custom algorithms for cost-efficient scheduling. We extended our workload model to include applications that are associated with a data set, which allows us to take data transmission speeds and data locality into account during the scheduling process, and to support the online arrival of applications. Our results showed that a cost-oriented approach pays off with regard to the number of deadlines met and that there is potential for significant cost reductions. There was however still room for improvement in terms of the scheduling algorithm’s sensitivity to estimation errors in the user provided runtimes and the efficiency of the private cloud scheduling component. This contribution introduces more advanced algorithms that deal with these issues.
Section 2 outlines the problem domain and workload model used in this paper. Section 3 introduces a public cloud scheduling algorithm and hybrid cloud scheduling algorithm. In Section 4 the experimental settings for the evaluation of the proposed algorithms are explained, after which the evaluation results for the scheduling components and corresponding algorithms can be found in Section 5. Finally, we present related works in the field in Section 6 and summarize the most important conclusions of our work in Section 7.
Section snippets
Cloud setting
In a hybrid cloud setting, the execution of applications occurs through the deployment of virtual machine instances on resources internal to the consumer organization, as well as resources provided by the public cloud providers. The characteristics of the virtual hardware on which these instances are executed are determined by the — mostly provider-specific — definition of instance types. An instance type fixes the number and speed of CPUs, the amount of memory, local storage and the
Algorithm design
We introduce a hybrid scheduling approach for the problem domain described in Section 2. The proposed solution consists of two loosely coupled components:
- •
The public cloud scheduler decides for an incoming application — with given task runtimes for each of the available instance types and with a given data set size — on which of the public cloud providers to execute that application. It takes into account the cost for execution and transferring the data, as well as differences in data transfer
Experimental setup
We evaluate the proposed scheduling algorithms using a Java-based discrete time simulator.3 Simulations are executed on an AMD Opteron 6274-based system with 64 CPU cores and 196 GB of memory.
Simulation runtimes depend on the scheduling techniques and the private cloud size used in the simulation setup, as well as the queue scanning interval . In Section 5, we evaluate the influence of on the simulation
Results
In this Section, the scheduling algorithms proposed in Section 3 are evaluated using experiments according to the workload model and cloud setting described in Section 4.
Related work
In Tordsson et al. [12], a binary integer program is used to tackle the problem of selecting resources among different cloud providers in a federated environment. They focus on a static approach in which online applications — non-finite applications without deadline constraints — are scheduled on cloud providers in such a way that the total infrastructure capacity is maximized, given budget and load balancing constraints. Li et al. [13] investigate VM migration across multiple clouds while
Conclusion
Supplementing the private infrastructure of an organization with resources from public cloud providers introduces the problem of cost-efficiently and automatically managing the application workloads within such a hybrid cloud environment. This paper presents scheduling algorithms to deal with this optimization problem for deadline-constrained bag-of-task type applications while taking into account data constraints, data locality and inaccuracies in task runtime estimates. In the previous work
Ruben Van den Bossche is a Ph.D. student in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include scheduling mechanisms in grid and cloud computing.
References (47)
- et al.
Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility
Future Generation Computer Systems
(2009) - et al.
Cloud brokering mechanisms for optimized placement of virtual machines across multiple providers
Future Generation Computer Systems
(2012) - et al.
Failure-aware resource provisioning for hybrid cloud infrastructure
Journal of Parallel and Distributed Computing
(2012) - et al.
The impact of data replication on job scheduling performance in the data grid
Future Generation Computer Systems
(2006) - et al.
The data grid: towards an architecture for the distributed management and analysis of large scientific datasets
Journal of Network and Computer Applications
(2000) - et al.
Sky computing
IEEE Internet Computing
(2009) - P. Mell, T. Grace, The NIST definition of cloud computing, Technical Report 800-145, National Institute of Standards...
- “Amazon Web Services”, Amazon EC2 reserved instance marketplace — beta, 2012....
- et al.
Virtual infrastructure management in private and hybrid clouds
IEEE Internet Computing
(2009) - et al.
A cost-benefit analysis of using cloud computing to extend the capacity of clusters
Cluster Computing
(2010)
Portable cloud applications — from theory to practice
Future Generation Computer Systems
Toward an open cloud standard
IEEE Internet Computing
Scheduling strategies for optimal service deployment across multiple clouds
Future Generation Computer Systems
Optimizing service level agreements for autonomic cloud bursting schedulers
Cited by (152)
Cost-efficient auto-scaling of container-based elastic processes
2023, Future Generation Computer SystemsResource scheduling methods for cloud computing environment: The role of meta-heuristics and artificial intelligence
2022, Engineering Applications of Artificial IntelligenceImproving the hybrid cloud performance through disk activity-aware data access
2021, Simulation Modelling Practice and TheoryHybrid scheduling for scientific workflows on hybrid clouds
2020, Computer NetworksEfficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments
2020, Information SciencesCitation Excerpt :To attain the high-quality security needed in security-sensitive applications, such as bank systems, Tang et al. [31] developed a security-driven scheduling architecture to guarantee high quality of security and high performance while processing such parallel applications. Considering the important influence of bandwidth constraint, security network latency, and other key economic aspects on the cloud model, a set of cost-efficiency algorithms has been proposed for deadline-constrained parallel task applications on public and private cloud centers [32]. Xiao et al. [33] developed a CASpMV framework for a supercomputer center to promote the parallel efficiency of SpMV in heterogeneous computing systems.
Optimal job partitioning and allocation for vehicular cloud computing
2020, Future Generation Computer Systems
Ruben Van den Bossche is a Ph.D. student in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include scheduling mechanisms in grid and cloud computing.
Kurt Vanmechelen is a post-doctoral fellow in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include resource management in grid and cloud environments in general, and the adoption of market mechanisms in such systems in particular. In 2009 he received his Ph.D., in Computer Science, at the University of Antwerp (UA), Belgium.
Jan Broeckhove is a professor in the Department of Mathematics and Computer Science at the University of Antwerp (UA), Belgium. His research interests include computational science and distributed computing. He received his Ph.D., in Physics, in 1982 at the Free University of Brussels (VUB), Belgium.