Workload modeling for resource usage analysis and simulation in cloud computing

https://doi.org/10.1016/j.compeleceng.2015.08.016Get rights and content

Highlights

  • We model a web application to support analysis and simulation of cloud environments.

  • We implement the model as an extension of the CloudSim simulator.

  • We found that the user behavior has a strong influence on the resource utilization.

  • We found Generalized Extreme Value and Generalized Lambda distributions instead of Exponential distribution to represent session time.

Abstract

Workload modeling enables performance analysis and simulation of cloud resource management policies, which allows cloud providers to improve their systems’ Quality of Service (QoS) and researchers to evaluate new policies without deploying expensive large scale environments. However, workload modeling is challenging in the context of cloud computing due to the virtualization layer overhead, insufficient tracelogs available for analysis, and complex workloads. These factors contribute to a lack of methodologies and models to characterize applications hosted in the cloud. To tackle the above issues, we propose a web application model to capture the behavioral patterns of different user profiles and to support analysis and simulation of resources utilization in cloud environments. A model validation was performed using graphic and statistical hypothesis methods. An implementation of our model is provided as an extension of the CloudSim simulator.

Introduction

Clouds are being used as a platform for various types of applications with different Quality of Service (QoS) aspects, such as performance, availability and reliability. These aspects are specified in a Service Level Agreement (SLA) negotiated between cloud providers and customers. The failure to comply with QoS aspects can compromise the responsiveness and availability of service and incur SLA violations, resulting in penalties to the cloud provider. The development of resource management policies that support QoS is challenging and the evaluation of these policies is even more challenging because clouds observe varying demand, their physical infrastructure has different sizes, software stacks, and physical resources configurations, and users have different profiles and QoS requirements [1]. In addition, reproduction of conditions under which the policies are evaluated and control of evaluation conditions are difficult tasks.

In this context, workload modeling enables performance analysis and simulation, which brings benefits to cloud providers and researchers. Thereby, the evaluation and adjustment of policies can be performed without deployment of expensive large scale environments. Workload models have the advantage of allowing workload adjustment to fit particular situations, controlled modification of parameters, repetition of evaluation conditions, inclusion of additional features, and generalization of patterns found in the application [2], providing a controlled input for researchers. For cloud providers, the evaluation and simulation of resource management policies allow the improvement of their systems’ QoS. Finally, the simulation of workloads based on realistic scenarios enables the production of tracelogs, scarce in cloud environments because of business and confidentiality concerns [3], [4].

Workload modeling and characterization is especially challenging when applied in a highly dynamic environment, such as cloud data centers, for different reasons:(i) heterogeneous hardware is present in a single data center and the virtualization layer incurs overhead caused by I/O processing and interactions with the Virtual Machine Monitor (VMM); and (ii) complex workloads are composed of a wide variety of applications submitted at any time, and with different characteristics and user profiles. These factors contribute to a lack of methodologies to characterize the different behavioral patterns of cloud applications.

To tackle the above issues, a web application model able to capture the behavioral patterns of different user profiles is proposed to support analysis and simulation of resources utilization in cloud environments. The proposed model supports the construction of performance models used by several research domains. Performance models improve resource management because they allow the prediction of how application patterns will change. Thus, resources can be dynamically scaled to meet the expected demand. This is critical to cloud providers that need to provision resources quickly to meet a growing resource demand by their applications.

In this context, the main contribution of this paper is a model capable of representing resource demand of Web application supported by different user profiles in a context of cloud environment. The workload patterns are modeled in the form of statistical distributions. Therefore, the patterns fluctuate based on realistic parameters in order to represent dynamic environments. A model validation is provided through graphical and analytical methods in order to show that the model effectively represents the observed patterns. A secondary contribution of this paper is the validation and implementation of the proposed model as an extension of the CloudSim simulator [1], making the model available for the cloud research community.

The rest of the paper is organized as follows: Section 2 presents the challenges and importance of workload modeling in clouds. Section 3 describes related works. Section 4 details the adopted methodology and how it was achieved. Section 5 presents and discusses the modeling and simulation results. Section 6 concludes and defines future research directions.

Section snippets

Problem statement and motivation

Workload characterization and modeling problems have been addressed over the last years, resulting in models for generation of synthetic workloads similar to those observed on real systems [2]. The main objective of such models is enabling the behavior patterns detection on the collected data.

Related work

The conflict of priorities between cloud providers, aiming high resource utilization with low operating costs, and MapReduce applications users addressing small execution time, led to characterization and modeling of this application type [4], [7], [8]. Chen et al. [4] developed a tool for generation of realistic workloads with the goal of analyzing the tradeoff between latency and resources utilization. In attempt to obtain resource utilization data, statistical models were used to generate

Methodology

Fig. 1 presents the methodology [2] used to create the proposed models. This methodology begins with users submitting requests to cloud environment while the operational metrics are measured and stored in data logs.

After the monitoring and tracing, the user activity and performance modeling is carried out based on three steps: (i) statistical analysis, which analyzes the data characteristics, determines if some data transformation is necessary and defines the candidate distributions to

Statistical analysis

Table 4 presents the descriptive statistics related to the sum of the number of instructions consumed by Apache and MySQL services, CPU, memory and disk utilization for both user profiles, and response time for Biding profile. Regarding the number of instructions and memory, the negative value of skewness is reinforced because the median is greater than the average. This characteristic is clearly observed in the histograms of the number of instructions consumed by Apache and MySQL services (

Conclusion

We applied a well-defined methodology to generate a Web application model for a cloud data center workload. It contains steps and justifications to achieve the distributions and parameters derived from application analyses and it can be extrapolated to other workload categories. Thereby, our model can be easily reproduced by researchers and cloud providers to support different research domains. It was implemented as an extension of the CloudSim simulator and its validation demonstrated that the

Acknowledgments

The authors thank Nikolay Grozev, Ph.D. candidate, for his valuable suggestions on the manuscript. Deborah M.V. Magalhães thanks the financial support from CAPES (Ph.D. scholarship) and CNPq (Doctorate Sandwich Abroad – SWE). This research was funded by the Australian Research Council through Future Fellowship program. This is also a partial result of the National Institute of Science and Technology – Medicine Assisted by Scientific Computing (INCT-MACC) and the SLA4Cloud project (STIC-AmSud

Deborah Magalhães is a Ph.D. student in the Department of Teleinformatics Engineering at the Federal University of Ceará, Brazil, and a previously visitor student in the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, The University of Melbourne, Australia. Her research interests include energy-efficiency provisioning, distributed systems simulation and workload modeling and characterization.

References (32)

  • MenascéD.A.

    TPC-W: a benchmark for e-commerce

    Internet Comput

    (2002)
  • JenkinsonA.F.

    The frequency distribution of the annual maximum (or minimum) values of meteorological elements

    Q J Royal Meteorol Soc

    (1955)
  • CalheirosR.N. et al.

    Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms

    Softw: Pract Exp

    (2011)
  • FeitelsonD.G.

    Workload modeling for computer systems performance evaluation

    (2015)
  • MorenoI. et al.

    An approach for characterizing workloads in Google cloud to derive realistic resource utilization models

    Proceedings of 7th international symposium on service oriented system engineering (SOSE)

    (2013)
  • ChenY. et al.

    Towards understanding cloud performance tradeoffs using statistical workload analysis and replay

    Technical Report

    (2010)
  • MuliaW.D. et al.

    Cloud workload characterization

    IETE Tech Rev

    (2013)
  • ReissC. et al.

    Google cluster-usage traces: format + schema

    Technical Report

    (2011)
  • KavulyaS. et al.

    An analysis of traces from a production MapReduce cluster

    Proceedings of 10th IEEE/ACM international conference on cluster, cloud and grid computing (CCGrid)

    (2010)
  • GanapathiA. et al.

    Statistics-driven workload modeling for the cloud

    Proceedings of international conference on the data engineering workshops

    (2010)
  • GrozevN. et al.

    Performance modelling and simulation of three-tier applications in cloud and multi-cloud environments

    Comput J

    (2013)
  • Rubis: Rice university bidding system. 2013. URL:...
  • TchanaA. et al.

    A self-scalable and auto-regulated request injection benchmarking tool for automatic saturation detection

    IEEE Trans Cloud Comput

    (2014)
  • Openstack cloud software. 2013. URL:...
  • PaxsonV.

    Bro: a system for detecting network intruders in real-time

    Comput Netw: Int J Comput Telecommun

    (1999)
  • HashemianR. et al.

    Web workload generation challenges – an empirical investigation

    Softw: Pract Exp

    (2012)
  • Cited by (93)

    • A GAN-based method for time-dependent cloud workload generation

      2022, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      A promising way to solve these problems is generating synthetic workloads, but it is challenging work due to the strong time dependency, increasingly complex system and unpredictable user behaviors. Previous works are mainly model-based methods [1,19,23,29] that explicitly assume the distribution and fit the distribution parameters based on the real workloads. These methods assume that each workload is independent and identically distributed, and ignore the time dependency.

    • Characterizing Distributed Machine Learning Workloads on Apache Spark: (Experimentation and Deployment Paper)

      2023, Middleware 2023 - Proceedings of the 24th ACM/IFIP International Middleware Conference
    • Deterministic and stochastic approaches in computer modeling and simulation

      2023, Deterministic and Stochastic Approaches in Computer Modeling and Simulation
    View all citing articles on Scopus

    Deborah Magalhães is a Ph.D. student in the Department of Teleinformatics Engineering at the Federal University of Ceará, Brazil, and a previously visitor student in the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, The University of Melbourne, Australia. Her research interests include energy-efficiency provisioning, distributed systems simulation and workload modeling and characterization.

    Rodrigo N. Calheiros is a Research Fellow in the Department of Computing and Information Systems, The University of Melbourne, Australia. His research interests also include virtualization, grid computing, and simulation and emulation of distributed systems.

    Rajkumar Buyya is a Fellow of IEEE, Professor of Computer Science and Software Engineering; Future Fellow of the Australian Research Council; and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also the founding CEO of Manjrasoft, a spin-off company of the University, commercializing its innovations in cloud computing.

    Danielo G. Gomes is currently an assistant professor at Universidade Federal do Ceará, Brazil. He received his Ph.D. in Réseaux et Télécoms from the University of Evry, France. His research interests include Internet of Things, green computing, performance evaluation, mobile cloud computing, integration Cloud-IoT. Danielo is an editorial board member of Computers and Electrical Engineering, Computer Communications and Sustainable Computing.

    View full text