Towards autonomic detection of SLA violations in Cloud infrastructures

https://doi.org/10.1016/j.future.2011.08.018Get rights and content

Abstract

Cloud computing has become a popular paradigm for implementing scalable computing infrastructures provided on-demand on a case-by-case basis. Self-manageable Cloud infrastructures are required in order to comply with users’ requirements defined by Service Level Agreements (SLAs) and to minimize user interactions with the computing environment. Thus, adequate SLA monitoring strategies and timely detection of possible SLA violations represent challenging research issues. This paper presents the Detecting SLA Violation infrastructure (DeSVi) architecture, sensing SLA violations through sophisticated resource monitoring. Based on the user requests, DeSVi allocates computing resources for a requested service and arranges its deployment on a virtualized environment. Resources are monitored using a novel framework capable of mapping low-level resource metrics (e.g., host up and down time) to user-defined SLAs (e.g., service availability). The detection of possible SLA violations relies on the predefined service level objectives and utilization of knowledge databases to manage and prevent such violations. We evaluate the DeSVi architecture using two application scenarios: (i) image rendering applications based on ray-tracing, and (ii) transactional web applications based on the well-known TPC-W benchmark. These applications exhibit heterogeneous workloads for investigating optimal monitoring interval of SLA parameters. The achieved results show that our architecture is able to monitor and detect SLA violations. The architecture output also provides a guideline on the appropriate monitoring intervals for applications depending on their resource consumption behavior.

Highlights

► We design the Detecting SLA Violation infrastructure (DeSVi) architecture for Clouds. ► It monitors SLA at runtime to detect violation situations in order to avert them. ► It manages autonomously the whole application provisioning lifecycle in Clouds. ► We evaluated the architecture with transactional and HPC application workloads. ► Based on the achieved results, we can determine optimal application monitoring intervals.

Introduction

Cloud computing represents a novel paradigm for the implementation of scalable computing infrastructures combining concepts from virtualization, distributed application design, Grid, and enterprise IT management [1], [2], [3]. Service provisioning in the Cloud relies on Service Level Agreements (SLAs) representing a contract signed between the customer and the service provider including non-functional requirements of the service specified as Quality of Service (QoS) [4], [5]. SLA considers obligations, service pricing, and penalties in case of agreement violations.

Flexible and reliable management of SLA agreements is of paramount importance for both Cloud providers and consumers. On the one hand, prevention of SLA violations avoids penalties providers have to pay and on the other hand, based on flexible and timely reactions to possible SLA violations, user interaction with the system can be minimized, which enables Cloud computing to take roots as a flexible and reliable form of on-demand computing.

Although, there is a large body of work considering development of flexible and self-manageable Cloud computing infrastructures [6], [7], [8], there is still a lack of adequate monitoring infrastructures able to predict possible SLA violations. Most of the available monitoring systems rely either on Grid [9], [10] or service-oriented infrastructures [11], which are not directly compatible to Clouds due to the difference of resource usage model, or due to heavily network-oriented monitoring infrastructures [12]. In Grids [13] resources are mostly owned by different individuals/enterprises, and in some cases, as desktop Grids for instance, resources are only available for usage when the owners are not using them [14]. Therefore, resource availability varies much and this impacts its usage for application provisioning, whereas in Cloud computing, resources are owned by an enterprise (Cloud provider), provisioning them to customers in a pay-as-you-go manner. Therefore, availability of resources is more stable and resources can be provisioned on-demand. Hence, the monitoring strategies used for detection of SLA violations in Grids cannot be directly applied to Clouds.

Furthermore, another important aspect for the usage of SLAs is the required elasticity of Cloud infrastructures. Thus, SLAs are not only used to provide guarantees to end user, they are also used by providers to efficiently manage Cloud infrastructures, considering competing priorities like energy efficiency and attainment of SLA agreements [15], [16] while delivering sufficient elasticity. Moreover, SLAs are also recently used as part of novel Cloud engineering models like Cloud federation [17], [18] where provider can in- or outsource their infrastructure depending on the current load. Thus, since SLA parameters are usually defined by Cloud providers and can comprise various user-defined attributes, current monitoring infrastructures lack appropriate solutions for adequate SLA monitoring. The first challenge is to facilitate mapping of measured metrics by low level tools to application based SLAs. The second challenge is to determine appropriate monitoring intervals at the application level keeping the balance between the early detection of possible SLA violations and system intrusiveness of the monitoring tools.

In this paper we present the novel concept for mapping low-level resource metrics to high-level SLAs—LoM2HiS [19], where system metrics (e.g., system up and down time) are translated to high-level SLAs (e.g., system availability). Thus, LoM2HiS facilitates efficient monitoring of Cloud infrastructures and early detection of possible SLA violations. Furthermore, LoM2HiS framework enables user-driven mappings between the resource metric and SLA parameters by utilizing mapping rules defined with Domain Specific Languages (DSLs). However, determination of optimal measurement intervals of low-level metrics and their translation to SLAs is still an open research issue. Short measurement intervals may negatively affect the overall system performance, whereas long measurement intervals may cause heavy SLA violations.

In order to assist Cloud providers in detecting SLA violations through resource monitoring, we developed the DeSVi architecture [20]. This architecture represents a core step towards achieving flexible and autonomic SLA management. The main components of the DeSVi architecture are: (i) the automatic VM deployer, (ii) application deployer, and (iii) the LoM2HiS framework. Based on user requests, the automatic VM deployer allocates necessary resources for the requested service and arranges its deployment on a virtual machine (VM). After service deployment, LoM2HiS framework monitors the VMs and translates the low-level metrics into high-level SLAs using the specified mapping rules. To realize autonomic SLA management DeSVi utilizes a knowledge database for the evaluation of the monitored information in order to propose reactive actions in case of SLA violation situations.

The main contributions of the paper are: (i) definition of the motivation scenario for the development of the architecture aimed at detecting SLA violations, (ii) conceptual design of the DeSVi architecture for the prediction of SLA violations, (iii) discussion of the implementation choices for the DeSVi, and (iv) extensive evaluation of the architecture in a real computing infrastructure using various SLA parameters and two Cloud applications: an image rendering service based on POV-Ray1 and the TPC-W transactional web e-Commerce benchmark.2

The rest of this paper is organized as follows: Section 2 presents the related work. Section 3 presents the architecture for the autonomic management of Cloud services and the motivating scenario for the development of the DeSVi architecture. Section 4 introduces the DeSVi architecture. In particular we discuss the automatic VM deployer, application deployer, and the monitoring components. Section 5 discusses our implementation choices, whereas Section 6 discusses experimental evaluation of the DeSVi architecture. Section 7 presents our conclusions and describes future work.

Section snippets

Related work

We classify related work into (i) resource monitoring [21], [12], [22], (ii) SLA management including violation detection [23], [24], [25], [26], [27], and (iii) mapping techniques of monitored metrics to SLA parameters [28], [11]. Currently, there is little work in the area of resource monitoring, low-level metrics mapping, and SLA violation detection in Cloud computing. Because of that, we look into the related areas of Grid and Service-Oriented Architecture (SOA) based systems.

Fu et al. [21]

Background and motivation

The processes of service provisioning based on SLA and efficient management of resources in an autonomic manner have been identified as major research challenges in Cloud environments [30], [1]. FoSII project (Foundations of Self-governing Infrastructures) is developing models and concepts for autonomic SLA management and enforcement in Clouds. FoSII components manage the whole lifecycle of self-adaptable Cloud services [6] as explained next.

SLA are used to guarantee customers a certain level

DeSVi architecture

This section describes in detail the Detecting SLA Violation infrastructure-DeSVi architecture, its components, and how the components interact with one another (Fig. 3). The proposed architecture is designed to handle the complete service provisioning management lifecycle in Cloud environments. The service provisioning lifecycle includes activities such as service deployment, resource allocation to tasks, resource monitoring, and SLA violation detection.

The topmost layer represents the users

Implementation issues

In this section, we describe the implementation choices for each DeSVi component. The implementation of the DeSVi components targets the fulfillment of some fundamental Cloud requirements such as scalability, efficiency, and reliability. To achieve these goals, we incorporated, whenever possible, well-established and tested open source tools in the implementation. Results presented in Section 6 where obtained with utilization of the components presented in this section.

Evaluation

This section discusses the evaluation of our approach using two use-case scenarios. The use-case scenarios represent the most dominant application domains provisioned in Clouds today, namely (i) high performance computing applications, which include image processing and scientific simulations; and (ii) transactional applications, which include web applications, social network sites, and media sites. The first use-case scenario comprises three types of ray-tracing applications based on POV-Ray,

Conclusion and future work

Flexible and reliable management of SLA agreements represents an open research issue in Cloud computing. Advantages of flexible and reliable Cloud infrastructures are manifold. For example, prevention of SLA violations avoids unnecessary penalties providers have to pay in case of violations. Moreover, based on flexible and timely reactions to possible SLA violations, interactions with users can be minimized. In this paper we presented DeSVi—the novel architecture for monitoring and detecting

Acknowledgments

This work is supported by the Vienna Science and Technology Fund (WWTF) under grant agreement ICT08-018 Foundations of Self-governing ICT Infrastructures (FoSII) and Australian Research Council. This paper is a substantially extended version of a CloudComp 2010 paper [20]. The experiments were performed in the High Performance Computing Lab at Catholic University of Rio Grande do Sul (LAD-PUCRS) Brazil.

Vincent C. Emeakaroha (M.Sc. B.Sc.) is a Research Assistant at the Distributed Systems Group, Information Systems Institute, Vienna University of Technology (TU Wien). He received Bachelor’s degree in Computer Engineering in 2006 and gained double Master’s in Software Engineering & Internet Computing in 2008 and in Computer Science Management in 2009 all at Vienna University of Technology. He is currently involved in the Austrian national FoSII (Foundations of Self-governing ICT

References (49)

  • D. Abramson et al.

    A computational economy for grid computing and its implementation in the Nimrod-G resource broker

    Future Generation Computer Systems

    (2002)
  • S. Seneviratne et al.

    Task profiling model for load profile prediction

    Future Generation Computer Systems

    (2011)
  • B. Rochwerger et al.

    The RESERVOIR model and architecture for open federated cloud computing

    IBM Journal of Research and Development

    (2009)
  • D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, D. Zagorodnov, The Eucalyptus open-source...
  • I. Brandic, Towards self-manageable cloud services, in: Proceedings of the 33rd Annual IEEE International Computer...
  • FoSII, Foundations of self-governing infrastructures,...
  • R.N. Calheiros et al.

    Building an automated and self-configurable emulation testbed for grid applications

    Software: Practice and Experience

    (2010)
  • A. D’Ambrogio, P. Bocciarelli, A model-driven approach to describe and predict the performance of composite services,...
  • D. Gunter, B. Tierney, B. Crowley, M. Holding, J. Lee, Netlogger: a toolkit for distributed system performance...
  • J.L. Berral, I. Goiri, R. Nou, F. Juliá, J. Guitart, R. Gavaldá, J. Torres, Towards energy-aware scheduling in data...
  • A. Beloglazov et al.

    Energy efficient resource management in virtualized cloud data centers

  • A. Celesti, F. Tusa, M. Villari, A. Puliafito, How to enhance cloud architectures to enable cross-federation, in: IEEE...
  • A. Celesti, F. Tusa, M. Villari, A. Puliafito, Three-phase cross-cloud federation model: the cloud sso authentication,...
  • V.C. Emeakaroha, I. Brandic, M. Maurer, S. Dustdar, Low level metrics to high level SLAs — LoM2HiS framework: bridging...
  • Cited by (167)

    • Survey of agent-based cloud computing applications

      2019, Future Generation Computer Systems
      Citation Excerpt :

      Finally, the Cloud auditor is the last role whose responsibilities include ensuring, from an external and objective point of view, that the SLA agreements are met satisfactorily, as shown in Fig. 6. This role, where MAS have the least number of examples of their applications, is found in only two noteworthy references: (i) the FOSII architecture [129], in which a multiagent architecture automatically detects violations in the SLA agreements; and (ii) the study presented by Ramaswamy et al. [130] in which the client-side MAS detects possible violations in the SLA agreements; and, finally, a study focus on security detection [131]. After analyzing existing research among both computational paradigms, one can easily conclude that the joint use of MAS in CC environments remains incipient.

    View all citing articles on Scopus

    Vincent C. Emeakaroha (M.Sc. B.Sc.) is a Research Assistant at the Distributed Systems Group, Information Systems Institute, Vienna University of Technology (TU Wien). He received Bachelor’s degree in Computer Engineering in 2006 and gained double Master’s in Software Engineering & Internet Computing in 2008 and in Computer Science Management in 2009 all at Vienna University of Technology. He is currently involved in the Austrian national FoSII (Foundations of Self-governing ICT Infrastructures) project funded by the Vienna Science and Technology Fund (WWTF) while pursuing his Ph.D. studies. His research areas of interest include Cloud computing, autonomic computing, energy efficiency in Cloud, SLA and QoS management.

    Marco A.S. Netto received his Ph.D. in Computer Science from the University of Melbourne, Australia (2010), and Bachelor’s (2002) and Master’s degree (2004) in Computer Science, both from the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil. He has been working with resource management and job scheduling for high performance computing environments since 2000. Marco’s current research effort is on performance evaluation of systems on virtualized environments and SLA management policies.

    Rodrigo N. Calheiros is a Research Fellow in the Cloud Computing and Distributed Systems Laboratory (CLOUDS Lab) in the Dept. of Computer Science and Software Engineering, University of Melbourne, Australia. He completed his Ph.D. degree in Computer Science in 2010 at PUCRS, Brazil, and his M.Sc. degree in 2006 at the same University. His research interests include Cloud Computing and simulation and emulation of distributed systems, with emphasis in Grids and Clouds.

    Ivona Brandic is Assistant Professor at the Distributed Systems Group, Information Systems Institute, Vienna University of Technology (TU Wien). Prior to that, she was Assistant Professor at the Department of Scientific Computing, Vienna University. She received her Ph.D. degree from Vienna University of Technology in 2007. From 2003 to 2007 she participated in the special research project AURORA (Advanced Models, Applications and Software Systems for High Performance Computing) and the European Union’s GEMSS (Grid-Enabled Medical Simulation Services) project. She is involved in the European Union’s SCube project and she is leading the Austrian national FoSII (Foundations of Selfgoverning ICT Infrastructures) project funded by the Vienna Science and Technology Fund (WWTF). She is Management Committee member of the European Commission’s COST Action on Energy Efficient Large Scale Distributed Systems. From June–August 2008 she was visiting researcher at the University of Melbourne. Her interests comprise SLA and QoS management, service-oriented architectures, autonomic computing, workflow management, and large scale distributed systems (Cloud, Grid, and Cluster).

    Rajkumar Buyya is Professor of Computer Science and Software Engineering; and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft Pty Ltd., a spin-off company of the University, commercializing its innovations in Grid and Cloud Computing. He received B.E and M.E in Computer Science and Engineering from Mysore and Bangalore Universities in 1992 and 1995 respectively; and Doctor of Philosophy (Ph.D.) in Computer Science and Software Engineering from Monash University, Melbourne, Australia in 2002. He received the Chris Wallace Award for Outstanding Research Contribution 2008 from the Computing Research and Education Association of Australasia, CORE, which is an association of university departments of computer science in Australia and New Zealand. Dr. Buyya recently received the “2009 IEEE Medal for Excellence in Scalable Computing” for pioneering the economic paradigm for utility-oriented distributed computing platforms such as Grids and Clouds.

    César A.F. De Rose is an Associate Professor in the Computer Science Department at the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil. His primary research interests are parallel and distributed computing and parallel architectures. He is currently conducting research on a variety of topics applied to clusters and Grids, including resource management, resource monitoring, distributed allocation strategies and virtualization. Dr. De Rose received his doctoral degree in Computer Science from the University Karlsruhe, Germany, in 1998.

    View full text