Auto-scaling web applications in clouds: A cost-aware approach

doi:10.1016/j.jnca.2017.07.012

Journal of Network and Computer Applications

Volume 95, 1 October 2017, Pages 26-41

https://doi.org/10.1016/j.jnca.2017.07.012 Get rights and content

Highlights

•
We design an auto-scaling mechanism based on the MAPE concept for Web applications.
•
We enhance the effectiveness of the execution phase of the control MAPE loop with a cost-aware approach.
•
We provide an innovative solution for overcoming the challenges of delayed VM startup.
•
We design an executor in order to mitigate oscillation and increase the stability of the mechanism.
•
We conduct experiments to evaluate the performance of our approach under real-world workload traces for different metrics.

Abstract

The elasticity feature of cloud computing and its pay-per-use pricing entice application providers to use cloud application hosting. One of the most valuable methods, an application provider can use in order to reduce costs is resource auto-scaling. Resource auto-scaling for the purpose of preventing resource over-provisioning or under-provisioning is a widely investigated topic in cloud environments. The Auto-scaling process is often implemented based on the four phases of MAPE loop: Monitoring (M), Analysis (A), Planning (P) and Execution (E). Hence, researchers seek to improve the performance of this mechanism with different solutions for each phase. However, the solutions in this area are generally focused on the improvement of the performance in the three phases of the monitoring, analysis, and planning, while the execution phase is considered less often. This paper provides a cost saving super professional executor which shows the importance and effectiveness of this phase of the controlling cycle. Unlike common executors, the proposed solution executes scale-down commands via aware selection of surplus virtual machines; moreover, with its novel features, surplus virtual machines are kept quarantined for the rest of their billing period in order to maximize the cost efficiency. Simulation results show that the proposed executor reduces the cost of renting virtual machines by 7% while improves the final service level agreement of the application provider and controls the mechanism's oscillation in decision-making.

Introduction

With the rapid development of cloud computing, nowadays, instead of purchasing computing infrastructure, many application providers (APs) tend to host their applications on cloud resources offered by cloud providers (CPs). Cloud providers such as Amazon EC2 (EC2) offer resources to the AP in the form of Virtual Machines (VMs) with the scalability feature and pay-per-use charging model (Lorido-Botran et al., 2014, Coutinho et al., 2015, Qu et al., 2016). The AP, the application, and application users can be a webmaster, online store website, and end users, respectively.

Since the AP, in particular, the Web application provider is aware of the dynamics of the Web environment and end users requests, static resource provisioning is not efficient. The reason is that in static resource provisioning, with increased rate of incoming user requests, resource under-provisioning occurs which consequently results in interruption or delayed response to user requests. On the other hand, in the period of reduced traffic, the issue of resource over-provisioning occurs and as a result increased AP costs arise (Qu et al., 2016, Arani and Shamsi, 2015). Therefore, considering the various pricing models in the cloud (Qu et al., 2016, Shen et al., 2014), the AP usually prepays a minimum number of resources for its permanent and long-term needs to receive a discount for this type of rental (for example, reserved instances in EC2 receive a discount of up to 75%). Consequently, with load fluctuations, the AP seeks to use the short-term rental model to cover its temporary needs (for example on-demand machines in the form of pay per hourly use). However, this approach is not enough as it requires a mechanism capable of automatically determining the capacity and the number of rented on-demand resources proportional to the incoming load (Amiri and Mohammad-Khanli, 2017).

Presentation of an efficient auto-scaling mechanism is a research topic which is mainly faced with the challenge of maintaining a balance between cost reduction and the Service Level Agreement (SLA). IBM proposes a model for autonomous management of auto-scaling mechanism in the form of MAPE (Monitor-Analyze-Plan-Execute) loop as a reference model (Computing, 2006). The MAPE loop model can be applied to implement a cloud web application system which knows its state and reacts to its changes. Therefore, the majority of auto-scaling mechanisms are based on the MAPE loop (Lorido-Botran et al., 2014, Qu et al., 2016, Mohamed et al., 2014, Ghobaei-Arani et al., 2016, Weingärtner et al., 2015). MAPE-based mechanisms constantly repeat the four general processes of the monitoring, analysis, planning, and execution, in a way that a monitor iteratively gathers information about resources, for example, the status of resource utilization. After monitoring, the auto-scaling mechanism indicates the analysis process (Qu et al., 2016) to start which can be simple or complicated; simple analysis is the use of raw information obtained by the monitor, while complex analysis discovers knowledge from information using methods such as artificial intelligence or machine learning (Amiri and Mohammad-Khanli, 2017, Ghobaei-Arani et al., 2017a). Afterward, by matching obtained analyses to a series of rules predefined by the AP, the planner makes scale-up or down decisions (rule-based planner (Qu et al., 2016)). The final phase of the MAPE cycle is the execution of the decision by the executor. This is when the auto-scaling mechanism needs to send a request for instantiation of a new VM or release of the one of the VMs previously rented from the CP. This research focuses on improving the performance of the executor in the resource auto-scaling mechanism with a cost-aware approach.

The motivation behind the improvement of the executor's performance lies in the following: Thanks to the possibility of selecting different types of VMs with various capacities, APs can rent a large number of VMs of different types simultaneously; considering the intense workload fluctuation in the Web environment, this is highly possible to happen (Singh and Chana, 2016). That said, if the auto-scaling mechanism makes a scale-down decision, the executor needs to select from a diverse set of rented VMs and release one. The basic question posed here is whether it matters which VM is selected? If the answer is yes, what policy is the best to be used for this selection? Unlike Amazon's auto-scaling policy that always selects the oldest VM for release (as default executor) and according to the dark spots seen in related research (Ghobaei-Arani et al., 2016, Islam et al., 2012, Huang et al., 2012, Bankole and Ajila, 2013, Bankole and Ajila, 2013., Ajila and Bankole, 2013., Herbst et al., 2014, Qavami et al., 2014, García et al., 2014, Singh and Chana, 2015, Fallah et al., 2015, de Assunção et al., 2016), this selection should be made cautiously and rigorously. This is because, firstly, the CP calculates partial billing as full billing (billing cycle) (Moldovan et al., 2016, Li et al., 2015). For example, in the EC2 service, billing is carried out on an hourly basis and the release of a VM for a duration of 4 h and 1 min would result in billing for 5 h. Therefore, policy making for minimizing the minutes wasted in the release of surplus VMs is an important economic matter for the AP. Secondly, due to unresolved load balancing challenges (Khan and Ahmad, 2016), candidate VMs are probably processing different workloads and the influence of releasing each VM on the SLA would vary. Hence, the first purpose of the present research is to employ novel policies, especially cost saving ones, in the selection of surplus VMs (professional executor).

A research gap can be still seen after the selection of the surplus VM and before its release. On the one hand, it is likely that the selector did not manage to find a VM with exactly X hours of use. In this situation, the release of that VM would impose extra costs. On the other hand, a scale-up decision may be made immediately after the release of the surplus VM; in this case, the delayed startup of the new VM is a challenge which negatively affects SLA (Lorido-Botran et al., 2014, Coutinho et al., 2015, Qu et al., 2016). Due to the unpredictability of the Web environment (Panneerselvam et al., 2014, Gholami and Arani, 2015) or maladjustment of scaling rules (Qu et al., 2016), it is highly likely that the mechanism to be affected by contradictory actions when the mechanism is in an oscillation condition (Lorido-Botran et al., 2014, Qu et al., 2016). As a result, the following hypothesis is put forward: If the selected surplus VM stays rented by the AP until the last minute of the bill, it can possibly be used for the improvement of the scaling mechanism's performance. Therefore, the other goal of this research is to offer an executor with the ability to quarantine the surplus VM until the billed hour is completed in order to resolve the challenge of delayed VM startup (super professional executor - Suprex). This is while to date, researchers merely considered benefits of vertical scaling or applying cooling time in the execution of the commands as the method for overcoming this challenge (Qu et al., 2016).

This paper presents a scaling mechanism equipped with a super professional executor (Suprex) with a cost-aware approach. We seek to show that the execution phase of the MAPE cycle can play an effective role in cost saving. We explain all four phases as they are required for the full implementation of the auto-scaling mechanism. This is also required for better understanding of the paper. The auto-scaling mechanism offered in this research is different from others where the focus is mainly on improving the mechanism's performance in the monitoring, analysis, and planning phases rather than the execution phase. The reason why the execution phase was overlooked lies in the fact the actions are often considered under the control of the CP and the CP is considered as a black box. However, by applying an architecture with full control (Casalicchio and Silvestri, 2013) from the AP's perspective, the power is granted to the AP in the execution of all scaling commands. The main contributions of this research are as follows:

1.
We designed an auto-scaling mechanism based on the MAPE concept for Web applications,
2.
We enhanced the effectiveness of the execution phase of the control MAPE loop with a cost-aware approach,
3.
We provided an innovative solution for overcoming the challenges of delayed VM startup,
4.
We designed an executor in order to mitigate oscillation and increase the stability of the mechanism, and
5.
We conducted a series of experiments to evaluate the performance of proposed approach under real-world workload traces for different metrics.

The rest of the article is organized as follows: Section 2 provides the necessary background. Section 3 includes related work; Section 4 fully explains the proposed approach. Section 5 simulates and evaluates the performance of the Suprex executor. Section 6 discusses the experimental results in details. Finally, Section 7 presents conclusions and future work.

Section snippets

Background

This section provides a brief overview of autonomic computing and application adaptation.

Related works

This section is an overview of related works in the field of auto-scaling for web applications. It is centered on the studies focused on each MAPE phase and applied techniques. Note that research is considered to be focused on the analysis phase if it benefits from complex analysis, e.g., neural networks; research is considered to be focused on the planning phase if planning regarding the capacity and the number of resources is conducted based on several scaling indicators. In the analysis

Proposed approach

This section details the proposed scaling mechanism equipped with a Suprex executor. This mechanism conducts its operation based on the MAPE concept with an approach to cost-saving and exploitation of surplus resources. Afterward, the problem formulation is explained and the algorithm needed for the implementation of the proposed mechanism is presented.

Performance evaluation

This section explains experiments conducted to evaluate the performance of Suprex in CloudSim Simulator (Buyya et al., 2009). The experiment scenario is as follows: the AP rents a limited number of VMs from the CP for hosting a Web application. Afterward, the users start sending their requests to the application. Meanwhile, the scaling mechanism automatically prevents resource under-provisioning and over-provisioning by employing on-demand VMs. The purpose of the mechanism is cost saving and

Discussion

With a closer look, this section studies how the two features of aware selection and quarantining surplus VMs influence the proposed Suprex executor.

The results show that by adding a cost-aware selection of surplus VMs, the Professional executor managed to decrease the imposed cost relative to the Default executor ($5.76 to $5.52). Note that whenever the figure witnessed a fall, a scaling down decision is made and a VM is released. According to Fig. 17, the number and time of these decisions

Conclusion

In this paper, we proposed Suprex, an executor for the cost-aware auto-scaling mechanism. Suprex benefits from two heuristic features: (1) aware selection of surplus VMs during the execution of scale-down commands and (2) quarantining and immediate restore of surplus VMs (as opposed to immediate release). Simulation results show that Suprex can achieve a 7% reduction in resource rental costs for the AP while improving response time by up to 5% and decreasing SLA violation and the mechanism's

Mohammad Sadegh Aslanpour received the BSc degree in Computer Engineering from Andisheh University, Jahrom, Iran, in 2012 and MSc degree from Azad University of Sirjan, Iran, in 2016. Since 2011, he has also been working in the IT Department of the municipality of Jahrom, Iran as Software Engineer. His current research interests include Cloud Computing (specifically resource management) and scalability of cloud applications. He investigated the use of Service Level Agreements and Artificial

References (56)

M.D. de Assunção et al.
Impact of user patience on auto-scaling resource capacity for cloud services
Future Gener. Comput. Syst.
(2016)
T. Baker et al.
Intention-oriented programming support for runtime adaptive autonomic cloud-based applications
Comput. Electr. Eng.
(2013)
E. Casalicchio et al.
Mechanisms for SLA provisioning in cloud-based service providers
Comput. Netw.
(2013)
S. Islam et al.
Empirical prediction models for adaptive resource provisioning in the cloud
Future Gener. Comput. Syst.
(2012)
P.D. Kaur et al.
A resource elasticity framework for QoS-aware execution of cloud applications
Future Gener. Comput. Syst.
(2014)
J. Li et al.
Cost-efficient coordinated scheduling for leasing cloud resources on hybrid workloads
Parallel Comput.
(2015)
G. Moltó et al.
Automatic memory-based vertical elasticity and oversubscription on cloud platforms
Future Gener. Comput. Syst.
(2016)
S. Singh et al.
Q-aware: quality of service based cloud resource provisioning
Comput. Electr. Eng.
(2015)
X. Wang et al.
Intelligent web traffic mining and analysis
J. Netw. Comput. Appl.
(2005)
R. Weingärtner et al.
Cloud resource management: a survey on forecasting and profiling models
J. Netw. Comput. Appl.
(2015)

Amazon, 2015. 12, 12). Amazon Auto Scaling. Available:...

M. Amiri et al.

Survey on prediction models of applications for resources provisioning in cloud

J. Netw. Comput. Appl.

(2017)

V. Andrikopoulos et al.

How to adapt applications for the Cloud environment

Computing

(2013)

A.-F. Antonescu et al.

Simulation of SLA-based VM-scaling algorithms for cloud-distributed applications

Future Gener. Comput. Syst.

(2015)

Arlitt, M.F., Williamson, C.L., 1997. Internet web servers: workload characterization and performance implications....

Aslanpour, M.S., Dashti, S.E., 2016. SLA-aware resource allocation for application service providers in the cloud. In:...

M.S. Aslanpour et al.

Proactive Auto-Scaling Algorithm (PASA) for cloud application

Int. J. Grid High. Perform. Comput.

(2017)

Bankole, A.A., 2013. Cloud client prediction models using machine learning techniques. In: Computer Software and...

Bankole, A.A., Ajila, S.A., 2013. Cloud client prediction models for cloud resource provisioning in a multitier web...

M. Beltrán

Automatic provisioning of multi-tier applications in cloud computing environments

J. Supercomput.

(2015)

Buyya, R., Ranjan, R., Calheiros, R.N., 2009. Modeling and simulation of scalable Cloud computing environments and the...

Cavalcante, E., Batista, T., Lopes, F., Almeida, A., de Moura, A.L., Rodriguez, N., 2013. , et al., Autonomous...

A. Computing

An architectural blueprint for autonomic computing

IBM White Pap.

(2006)

Coutinho, E.F., de Carvalho, F.R. Sousa, Rego, P.A.L., Gomes, D.G., de Souza, J.N., 2015. Elasticity in cloud...

Dhingra, M., 2014. Elasticity in IaaS Cloud, preserving performance SLAs (Master’s thesis), Indian Institute of...

EC2. Elastic Compute Cloud. Available:...

M. Fallah et al.

NASLA: novel auto scaling approach based on learning automata for web application in cloud computing environment

Int. J. Comput. Appl.

(2015)

Cited by (79)

Learning-driven hybrid scaling for multi-type services in cloud
2024, Journal of Parallel and Distributed Computing
In order to deal with the fast changing requirements of container based services in clouds, auto-scaling is used as an essential mechanism for adapting the number of provisioned resources with the variable service workloads. However, the latest auto-scaling approaches lack the comprehensive consideration of variable workloads and hybrid auto-scaling for multi-type services. Firstly, the historical data based proactive approaches are widely used to handle complex and variable workloads in advance. The decision-making accuracy of proactive approaches depends on the prediction algorithm, which is affected by the anomalies, missing values and errors in the historical workload data, and the unexpected workload cannot be handled. Secondly, the trigger based reactive approaches are seriously affected by workload fluctuation which causes the frequent invalid scaling of service resources. Besides, due to the existence of scaling time, there are different completion delays of different scaling actions. Thirdly, the latest approaches also ignore the different scaling time of hybrid scaling for multi-type services including stateful services and stateless services. Especially, when the stateful services are scaled horizontally, the neglected long scaling time causes the untimely supply and withdrawal of resources. Consequently, all three issues above can lead to the degradation of Quality of Services (QoS) and the inefficient utilization of resources. This paper proposes a new hybrid auto-scaling approach for multi-type services to resolve the impact of service scaling time on decision making. We combine the proactive scaling strategy with the reactive anomaly detection and correction mechanism. For making a proactive decision, the ensemble learning model with the structure improved deep network is designed to predict the future workload. On the basis of the predicted results and the scaling time of different types of services, the auto-scaling decisions are made by a Deep Reinforcement Learning (DRL) model with heterogeneous action space, which integrates horizontal and vertical scaling actions. Meanwhile, with the anomaly detection and correction mechanism, the workload fluctuation and unexpected workload can be detected and handled. We evaluate our approach against three different proactive and reactive auto-scaling approaches in the cloud environment, and the experimental results show the proposed approach can achieve the better scaling behavior compared to state-of-the-art approaches.
A Synergistic Elixir-EDA-MQTT Framework for Advanced Smart Transportation Systems
2024, Future Internet
ML-Based Detection of DDoS Attacks Using Evolutionary Algorithms Optimization
2024, Sensors
Efficient Auto-scaling for Host Load Prediction through VM migration in Cloud
2024, Concurrency and Computation: Practice and Experience
An Auto-Scaling Approach for Microservices in Cloud Computing Environments
2023, Journal of Grid Computing
An auto-scaling approach for microservices in cloud computing environments
2023, Research Square

View all citing articles on Scopus

Mostafa Ghobaei-Arani received the B.Sc. degree in Software Engineering from Kashan University, Iran in 2009, and M.Sc degree from Azad University of Tehran, Iran in 2011, respectively. He was honored Ph.D. degree Software Engineering from Islamic Azad University, Science and Research Branch, Tehran, Iran in 2016. His current research interests are Distributed Systems, Cloud Computing, Pervasive Computing, Big Data, and IoT.

Adel Nadjaran Toosi is a Post-doctoral Research Fellow at the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, School of Computing and Information Systems (CIS), the University of Melbourne, Australia. He received his BSc degree in 2003 and his MSc degree in 2006 both in Computer Science and Software Engineering from Ferdowsi University of Mashhad, Iran. He has done his PhD, supported by International Research Scholarship (MIRS) and Melbourne International Fee Remission Scholarship (MIFRS), at the CIS department of the University of Melbourne. Adel’s thesis was nominated for CORE John Makepeace Bennett Award for the Australasian Distinguished Doctoral Dissertation and John Melvin Memorial Scholarship for the Best PhD Thesis in Engineering. His current h-index is 14 based on the Google scholar Citations. His research interests include scheduling and resource provisioning mechanisms for distributed systems. Currently he is working on data-intensive application resource provisioning and scheduling in cloud environments. Please visit his homepage: http://adelnadjarantoosi.info

View full text

Auto-scaling web applications in clouds: A cost-aware approach

Highlights

Abstract

Introduction

Section snippets

Background

Related works

Proposed approach

Performance evaluation

Discussion

Conclusion

Future Gener. Comput. Syst.

Comput. Electr. Eng.

Comput. Netw.

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

Parallel Comput.

Future Gener. Comput. Syst.

Comput. Electr. Eng.

J. Netw. Comput. Appl.

J. Netw. Comput. Appl.

Survey on prediction models of applications for resources provisioning in cloud

J. Netw. Comput. Appl.

How to adapt applications for the Cloud environment

Computing

Simulation of SLA-based VM-scaling algorithms for cloud-distributed applications

Future Gener. Comput. Syst.

Proactive Auto-Scaling Algorithm (PASA) for cloud application

Int. J. Grid High. Perform. Comput.

Automatic provisioning of multi-tier applications in cloud computing environments

J. Supercomput.

An architectural blueprint for autonomic computing

IBM White Pap.

NASLA: novel auto scaling approach based on learning automata for web application in cloud computing environment

Int. J. Comput. Appl.