Towards autonomic detection of SLA violations in Cloud infrastructures

doi:10.1016/j.future.2011.08.018

Future Generation Computer Systems

Volume 28, Issue 7, July 2012, Pages 1017-1029

https://doi.org/10.1016/j.future.2011.08.018 Get rights and content

Abstract

Cloud computing has become a popular paradigm for implementing scalable computing infrastructures provided on-demand on a case-by-case basis. Self-manageable Cloud infrastructures are required in order to comply with users’ requirements defined by Service Level Agreements (SLAs) and to minimize user interactions with the computing environment. Thus, adequate SLA monitoring strategies and timely detection of possible SLA violations represent challenging research issues. This paper presents the Detecting SLA Violation infrastructure (DeSVi) architecture, sensing SLA violations through sophisticated resource monitoring. Based on the user requests, DeSVi allocates computing resources for a requested service and arranges its deployment on a virtualized environment. Resources are monitored using a novel framework capable of mapping low-level resource metrics (e.g., host up and down time) to user-defined SLAs (e.g., service availability). The detection of possible SLA violations relies on the predefined service level objectives and utilization of knowledge databases to manage and prevent such violations. We evaluate the DeSVi architecture using two application scenarios: (i) image rendering applications based on ray-tracing, and (ii) transactional web applications based on the well-known TPC-W benchmark. These applications exhibit heterogeneous workloads for investigating optimal monitoring interval of SLA parameters. The achieved results show that our architecture is able to monitor and detect SLA violations. The architecture output also provides a guideline on the appropriate monitoring intervals for applications depending on their resource consumption behavior.

Highlights

► We design the Detecting SLA Violation infrastructure (DeSVi) architecture for Clouds. ► It monitors SLA at runtime to detect violation situations in order to avert them. ► It manages autonomously the whole application provisioning lifecycle in Clouds. ► We evaluated the architecture with transactional and HPC application workloads. ► Based on the achieved results, we can determine optimal application monitoring intervals.

Introduction

Cloud computing represents a novel paradigm for the implementation of scalable computing infrastructures combining concepts from virtualization, distributed application design, Grid, and enterprise IT management [1], [2], [3]. Service provisioning in the Cloud relies on Service Level Agreements (SLAs) representing a contract signed between the customer and the service provider including non-functional requirements of the service specified as Quality of Service (QoS) [4], [5]. SLA considers obligations, service pricing, and penalties in case of agreement violations.

Flexible and reliable management of SLA agreements is of paramount importance for both Cloud providers and consumers. On the one hand, prevention of SLA violations avoids penalties providers have to pay and on the other hand, based on flexible and timely reactions to possible SLA violations, user interaction with the system can be minimized, which enables Cloud computing to take roots as a flexible and reliable form of on-demand computing.

Although, there is a large body of work considering development of flexible and self-manageable Cloud computing infrastructures [6], [7], [8], there is still a lack of adequate monitoring infrastructures able to predict possible SLA violations. Most of the available monitoring systems rely either on Grid [9], [10] or service-oriented infrastructures [11], which are not directly compatible to Clouds due to the difference of resource usage model, or due to heavily network-oriented monitoring infrastructures [12]. In Grids [13] resources are mostly owned by different individuals/enterprises, and in some cases, as desktop Grids for instance, resources are only available for usage when the owners are not using them [14]. Therefore, resource availability varies much and this impacts its usage for application provisioning, whereas in Cloud computing, resources are owned by an enterprise (Cloud provider), provisioning them to customers in a pay-as-you-go manner. Therefore, availability of resources is more stable and resources can be provisioned on-demand. Hence, the monitoring strategies used for detection of SLA violations in Grids cannot be directly applied to Clouds.

Furthermore, another important aspect for the usage of SLAs is the required elasticity of Cloud infrastructures. Thus, SLAs are not only used to provide guarantees to end user, they are also used by providers to efficiently manage Cloud infrastructures, considering competing priorities like energy efficiency and attainment of SLA agreements [15], [16] while delivering sufficient elasticity. Moreover, SLAs are also recently used as part of novel Cloud engineering models like Cloud federation [17], [18] where provider can in- or outsource their infrastructure depending on the current load. Thus, since SLA parameters are usually defined by Cloud providers and can comprise various user-defined attributes, current monitoring infrastructures lack appropriate solutions for adequate SLA monitoring. The first challenge is to facilitate mapping of measured metrics by low level tools to application based SLAs. The second challenge is to determine appropriate monitoring intervals at the application level keeping the balance between the early detection of possible SLA violations and system intrusiveness of the monitoring tools.

In this paper we present the novel concept for mapping low-level resource metrics to high-level SLAs—LoM2HiS [19], where system metrics (e.g., system up and down time) are translated to high-level SLAs (e.g., system availability). Thus, LoM2HiS facilitates efficient monitoring of Cloud infrastructures and early detection of possible SLA violations. Furthermore, LoM2HiS framework enables user-driven mappings between the resource metric and SLA parameters by utilizing mapping rules defined with Domain Specific Languages (DSLs). However, determination of optimal measurement intervals of low-level metrics and their translation to SLAs is still an open research issue. Short measurement intervals may negatively affect the overall system performance, whereas long measurement intervals may cause heavy SLA violations.

In order to assist Cloud providers in detecting SLA violations through resource monitoring, we developed the DeSVi architecture [20]. This architecture represents a core step towards achieving flexible and autonomic SLA management. The main components of the DeSVi architecture are: (i) the automatic VM deployer, (ii) application deployer, and (iii) the LoM2HiS framework. Based on user requests, the automatic VM deployer allocates necessary resources for the requested service and arranges its deployment on a virtual machine (VM). After service deployment, LoM2HiS framework monitors the VMs and translates the low-level metrics into high-level SLAs using the specified mapping rules. To realize autonomic SLA management DeSVi utilizes a knowledge database for the evaluation of the monitored information in order to propose reactive actions in case of SLA violation situations.

The main contributions of the paper are: (i) definition of the motivation scenario for the development of the architecture aimed at detecting SLA violations, (ii) conceptual design of the DeSVi architecture for the prediction of SLA violations, (iii) discussion of the implementation choices for the DeSVi, and (iv) extensive evaluation of the architecture in a real computing infrastructure using various SLA parameters and two Cloud applications: an image rendering service based on POV-Ray¹ and the TPC-W transactional web e-Commerce benchmark.²

The rest of this paper is organized as follows: Section 2 presents the related work. Section 3 presents the architecture for the autonomic management of Cloud services and the motivating scenario for the development of the DeSVi architecture. Section 4 introduces the DeSVi architecture. In particular we discuss the automatic VM deployer, application deployer, and the monitoring components. Section 5 discusses our implementation choices, whereas Section 6 discusses experimental evaluation of the DeSVi architecture. Section 7 presents our conclusions and describes future work.

Section snippets

Related work

We classify related work into (i) resource monitoring [21], [12], [22], (ii) SLA management including violation detection [23], [24], [25], [26], [27], and (iii) mapping techniques of monitored metrics to SLA parameters [28], [11]. Currently, there is little work in the area of resource monitoring, low-level metrics mapping, and SLA violation detection in Cloud computing. Because of that, we look into the related areas of Grid and Service-Oriented Architecture (SOA) based systems.

Fu et al. [21]

Background and motivation

The processes of service provisioning based on SLA and efficient management of resources in an autonomic manner have been identified as major research challenges in Cloud environments [30], [1]. FoSII project (Foundations of Self-governing Infrastructures) is developing models and concepts for autonomic SLA management and enforcement in Clouds. FoSII components manage the whole lifecycle of self-adaptable Cloud services [6] as explained next.

SLA are used to guarantee customers a certain level

DeSVi architecture

This section describes in detail the Detecting SLA Violation infrastructure-DeSVi architecture, its components, and how the components interact with one another (Fig. 3). The proposed architecture is designed to handle the complete service provisioning management lifecycle in Cloud environments. The service provisioning lifecycle includes activities such as service deployment, resource allocation to tasks, resource monitoring, and SLA violation detection.

The topmost layer represents the users

Implementation issues

In this section, we describe the implementation choices for each DeSVi component. The implementation of the DeSVi components targets the fulfillment of some fundamental Cloud requirements such as scalability, efficiency, and reliability. To achieve these goals, we incorporated, whenever possible, well-established and tested open source tools in the implementation. Results presented in Section 6 where obtained with utilization of the components presented in this section.

Evaluation

This section discusses the evaluation of our approach using two use-case scenarios. The use-case scenarios represent the most dominant application domains provisioned in Clouds today, namely (i) high performance computing applications, which include image processing and scientific simulations; and (ii) transactional applications, which include web applications, social network sites, and media sites. The first use-case scenario comprises three types of ray-tracing applications based on POV-Ray,

Conclusion and future work

Flexible and reliable management of SLA agreements represents an open research issue in Cloud computing. Advantages of flexible and reliable Cloud infrastructures are manifold. For example, prevention of SLA violations avoids unnecessary penalties providers have to pay in case of violations. Moreover, based on flexible and timely reactions to possible SLA violations, interactions with users can be minimized. In this paper we presented DeSVi—the novel architecture for monitoring and detecting

Acknowledgments

This work is supported by the Vienna Science and Technology Fund (WWTF) under grant agreement ICT08-018 Foundations of Self-governing ICT Infrastructures (FoSII) and Australian Research Council. This paper is a substantially extended version of a CloudComp 2010 paper [20]. The experiments were performed in the High Performance Computing Lab at Catholic University of Rio Grande do Sul (LAD-PUCRS) Brazil.

References (49)

R. Buyya et al.
Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility
Future Generation Computer Systems
(2009)
P. Balakrishnan et al.
SLA enabled CARE resource broker
Future Generation Computer Systems
(2011)
A. Litke et al.
Managing service level agreement contracts in OGSA-based grids
Future Generation Computer Systems
(2008)
W.-C. Chung et al.
A new mechanism for resource monitoring in grid computing
Future Generation Computer Systems
(2009)
S. Reyes et al.
Monitoring and steering grid applications with grid superscalar
Future Generation Computer Systems
(2010)
D. Kondo et al.
Characterizing resource availability in enterprise desktop grids
Future Generation Computer Systems
(2007)
C. Li et al.
Competitive proportional resource allocation policy for computational grid
Future Generation Computer Systems
(2004)
T. Wood et al.
Sandpiper: black-box and gray-box resource management for virtual machines
Computer Networks
(2009)
M.L. Massie et al.
The Ganglia distributed monitoring system: design, implementation and experience
Parallel Computing
(2004)
D. Kyriazis et al.
An innovative workflow mapping mechanism for grids in the frame of quality of service
Future Generation Computer Systems
(2008)

D. Abramson et al.

A computational economy for grid computing and its implementation in the Nimrod-G resource broker

Future Generation Computer Systems

(2002)

S. Seneviratne et al.

Task profiling model for load profile prediction

Future Generation Computer Systems

(2011)

B. Rochwerger et al.

The RESERVOIR model and architecture for open federated cloud computing

IBM Journal of Research and Development

(2009)

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, D. Zagorodnov, The Eucalyptus open-source...

I. Brandic, Towards self-manageable cloud services, in: Proceedings of the 33rd Annual IEEE International Computer...

FoSII, Foundations of self-governing infrastructures,...

R.N. Calheiros et al.

Building an automated and self-configurable emulation testbed for grid applications

Software: Practice and Experience

(2010)

A. D’Ambrogio, P. Bocciarelli, A model-driven approach to describe and predict the performance of composite services,...

D. Gunter, B. Tierney, B. Crowley, M. Holding, J. Lee, Netlogger: a toolkit for distributed system performance...

J.L. Berral, I. Goiri, R. Nou, F. Juliá, J. Guitart, R. Gavaldá, J. Torres, Towards energy-aware scheduling in data...

A. Beloglazov et al.

Energy efficient resource management in virtualized cloud data centers

A. Celesti, F. Tusa, M. Villari, A. Puliafito, How to enhance cloud architectures to enable cross-federation, in: IEEE...

A. Celesti, F. Tusa, M. Villari, A. Puliafito, Three-phase cross-cloud federation model: the cloud sso authentication,...

V.C. Emeakaroha, I. Brandic, M. Maurer, S. Dustdar, Low level metrics to high level SLAs — LoM2HiS framework: bridging...

Cited by (167)

Service Level Agreement in cloud computing: Taxonomy, prospects, and challenges
2024, Internet of Things (Netherlands)
Cloud computing represents a distributed computing paradigm, offering hosted services via the internet on a pay-as-you-go basis. The Service Level Agreement (SLA) serves as a pivotal element of communication between Cloud Service Providers (CSPs) and Cloud Service Users (CSUs), detailing Service Level Objectives (SLOs) pertinent to service functionalities and Quality of Service (QoS). Within an SLA, service terms are formally documented in a contract between the CSP and CSU, ensuring that outsourcing customers establish an SLA with their vendors, thereby holding vendors accountable for financial consequences or payments should the set objectives not be achieved. In recent years, SLA methodologies in cloud computing have attracted considerable attention from the research community, with numerous strategies being developed to tackle challenges that may impede the efficient provisioning and management of QoS. This survey paper offers a comprehensive review of SLA techniques, presenting a detailed taxonomy based on their distinctive attributes. It discusses the evaluation parameters and platforms utilized in analyzing SLA approaches. Furthermore, the paper outlines design objectives and highlights open research issues that should be addressed when proposing new SLA techniques.
Multi objective task scheduling algorithm based on SLA and processing time suitable for cloud environment
2020, Computer Communications
Cloud computing is a new paradigm which provides subscription-oriented services. Scheduling the tasks in cloud computing environments is a multi-goal optimization problem, which is NP hard. To exaggerate task scheduling performance and reduce the overall Makespan of the task allocation in clouds, this paper proposes two scheduling algorithms named as TBTS (Threshold based Task scheduling algorithm) and SLA-LB (Service level agreement-based Load Balancing) algorithm. TBTS is two-phase scheduling algorithm which schedules the tasks in a batch. It supports task scheduling in virtual machines with distinct configuration. Furthermore, in TBTS algorithm, threshold data generated based on the ETC (Expected Time to Complete) matrix. Virtual machines which execute tasks with the estimated execution time lesser than threshold value are allocated to the particular task. SLA-LB algorithm is a online model which schedules the task dynamically, based on the requirement of clients, like deadline and budget as the two criteria. Prediction based scheduling is implemented in TBTS to increase the system utilization and to improve the load balancing among the machines by allocation of the minimum configuration machine to the task, based on predicted robust threshold value. SLA-LB uses the level of agreement and finds the required system to reduce the Makespan and increase the cloud-utilization. Simulation of proposed algorithms is performed with benchmark datasets (Braun, 2015) and synthetic datasets are generated with random functions. The proposed TBTS and SLA-LB final values of the proposed algorithms are analyzed with assorted scheduling models, namely SLA-MCT, FCFS, EXIST, LBMM, Lbmaxmin, MINMIN and MAXMIN algorithms. Performance metrics such as Makespan, penalty, gain cost and also the VM utilization factor of proposed algorithm compared with existing algorithms. The comparison analysis among various existing algorithms with TBTS and SLA-LB algorithms show that the proposed methods outperform existing algorithms, even in the scalability situation of the dataset and virtual machines.
Survey of agent-based cloud computing applications
2019, Future Generation Computer Systems
Citation Excerpt :
Finally, the Cloud auditor is the last role whose responsibilities include ensuring, from an external and objective point of view, that the SLA agreements are met satisfactorily, as shown in Fig. 6. This role, where MAS have the least number of examples of their applications, is found in only two noteworthy references: (i) the FOSII architecture [129], in which a multiagent architecture automatically detects violations in the SLA agreements; and (ii) the study presented by Ramaswamy et al. [130] in which the client-side MAS detects possible violations in the SLA agreements; and, finally, a study focus on security detection [131]. After analyzing existing research among both computational paradigms, one can easily conclude that the joint use of MAS in CC environments remains incipient.
In the state of the art, there are very few studies on agent-based Cloud Computing. Nevertheless, this is an emerging trend and the number of studies and applications in this field is beginning to increase. Cloud Computing and Agents are complementary technologies. The features of Cloud Computing can provide advanced computational characteristics to multi-agent systems. In turn, the inclusion of agent systems in the core of the Cloud platform makes it possible to incorporate different functionalities, such as reasoning and learning capabilities. This study analyzes the emerging relationship between both distributed systems. Specifically, this study proposes a new classification from the point of view of Cloud Computing, based on the reference architecture proposed by the National Institute of Standards and Technology and the different responsibilities of each of the roles that participate in the Cloud Computing paradigm as identified in the architecture: Provider, Consumer, Broker, Carrier and Auditor.
Ensuring SLA Compliance of Edge Enabled Cloud Service for IoT Application: A Dynamic QoS-Aware Scheme
2024, Communications in Computer and Information Science
A quantum inspired hybrid SSA–GWO algorithm for SLA based task scheduling to improve QoS parameter in cloud computing
2023, Cluster Computing
Three-phase service level agreements and trust management model for monitoring and managing the services by trusted cloud broker
2022, IET Communications

View all citing articles on Scopus

Vincent C. Emeakaroha (M.Sc. B.Sc.) is a Research Assistant at the Distributed Systems Group, Information Systems Institute, Vienna University of Technology (TU Wien). He received Bachelor’s degree in Computer Engineering in 2006 and gained double Master’s in Software Engineering & Internet Computing in 2008 and in Computer Science Management in 2009 all at Vienna University of Technology. He is currently involved in the Austrian national FoSII (Foundations of Self-governing ICT Infrastructures) project funded by the Vienna Science and Technology Fund (WWTF) while pursuing his Ph.D. studies. His research areas of interest include Cloud computing, autonomic computing, energy efficiency in Cloud, SLA and QoS management.

Marco A.S. Netto received his Ph.D. in Computer Science from the University of Melbourne, Australia (2010), and Bachelor’s (2002) and Master’s degree (2004) in Computer Science, both from the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil. He has been working with resource management and job scheduling for high performance computing environments since 2000. Marco’s current research effort is on performance evaluation of systems on virtualized environments and SLA management policies.

Rodrigo N. Calheiros is a Research Fellow in the Cloud Computing and Distributed Systems Laboratory (CLOUDS Lab) in the Dept. of Computer Science and Software Engineering, University of Melbourne, Australia. He completed his Ph.D. degree in Computer Science in 2010 at PUCRS, Brazil, and his M.Sc. degree in 2006 at the same University. His research interests include Cloud Computing and simulation and emulation of distributed systems, with emphasis in Grids and Clouds.

Ivona Brandic is Assistant Professor at the Distributed Systems Group, Information Systems Institute, Vienna University of Technology (TU Wien). Prior to that, she was Assistant Professor at the Department of Scientific Computing, Vienna University. She received her Ph.D. degree from Vienna University of Technology in 2007. From 2003 to 2007 she participated in the special research project AURORA (Advanced Models, Applications and Software Systems for High Performance Computing) and the European Union’s GEMSS (Grid-Enabled Medical Simulation Services) project. She is involved in the European Union’s SCube project and she is leading the Austrian national FoSII (Foundations of Selfgoverning ICT Infrastructures) project funded by the Vienna Science and Technology Fund (WWTF). She is Management Committee member of the European Commission’s COST Action on Energy Efficient Large Scale Distributed Systems. From June–August 2008 she was visiting researcher at the University of Melbourne. Her interests comprise SLA and QoS management, service-oriented architectures, autonomic computing, workflow management, and large scale distributed systems (Cloud, Grid, and Cluster).

Rajkumar Buyya is Professor of Computer Science and Software Engineering; and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft Pty Ltd., a spin-off company of the University, commercializing its innovations in Grid and Cloud Computing. He received B.E and M.E in Computer Science and Engineering from Mysore and Bangalore Universities in 1992 and 1995 respectively; and Doctor of Philosophy (Ph.D.) in Computer Science and Software Engineering from Monash University, Melbourne, Australia in 2002. He received the Chris Wallace Award for Outstanding Research Contribution 2008 from the Computing Research and Education Association of Australasia, CORE, which is an association of university departments of computer science in Australia and New Zealand. Dr. Buyya recently received the “2009 IEEE Medal for Excellence in Scalable Computing” for pioneering the economic paradigm for utility-oriented distributed computing platforms such as Grids and Clouds.

César A.F. De Rose is an Associate Professor in the Computer Science Department at the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil. His primary research interests are parallel and distributed computing and parallel architectures. He is currently conducting research on a variety of topics applied to clusters and Grids, including resource management, resource monitoring, distributed allocation strategies and virtualization. Dr. De Rose received his doctoral degree in Computer Science from the University Karlsruhe, Germany, in 1998.

View full text

Towards autonomic detection of SLA violations in Cloud infrastructures

Abstract

Highlights

Introduction

Section snippets

Related work

Background and motivation

DeSVi architecture

Implementation issues

Evaluation

Conclusion and future work

Acknowledgments

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Computer Networks

Parallel Computing

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

The RESERVOIR model and architecture for open federated cloud computing

IBM Journal of Research and Development

Building an automated and self-configurable emulation testbed for grid applications

Software: Practice and Experience

Energy efficient resource management in virtualized cloud data centers