research-article

Modeling virtualized applications using machine learning techniques

Authors:
Sajib Kundu

Florida International University, Miami, FL, USA

Florida International University, Miami, FL, USA
View Profile

,
Raju Rangaswami

Florida International University, Miami, FL, USA

Florida International University, Miami, FL, USA
View Profile

,
Ajay Gulati

VMware, inc., Palo Alto, CA, USA

VMware, inc., Palo Alto, CA, USA
View Profile

,
Ming Zhao

Florida International University, Miami, FL, USA

Florida International University, Miami, FL, USA
View Profile

,
Kaushik Dutta

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 47 Issue 7July 2012pp 3–14https://doi.org/10.1145/2365864.2151028

Published:03 March 2012Publication History

ACM SIGPLAN Notices

Abstract

With the growing adoption of virtualized datacenters and cloud hosting services, the allocation and sizing of resources such as CPU, memory, and I/O bandwidth for virtual machines (VMs) is becoming increasingly important. Accurate performance modeling of an application would help users in better VM sizing, thus reducing costs. It can also benefit cloud service providers who can offer a new charging model based on the VMs' performance instead of their configured sizes. In this paper, we present techniques to model the performance of a VM-hosted application as a function of the resources allocated to the VM and the resource contention it experiences. To address this multi-dimensional modeling problem, we propose and refine the use of two machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). We evaluate these modeling techniques using five virtualized applications from the RUBiS and Filebench suite of benchmarks and demonstrate that their median and 90th percentile prediction errors are within 4.36% and 29.17% respectively. These results are substantially better than regression based approaches as well as direct applications of machine learning techniques without our refinements. We also present a simple and effective approach to VM sizing and empirically demonstrate that it can deliver optimal results for 65% of the sizing problems that we studied and produces close-to-optimal sizes for the remaining 35%.

References

Amazon elastic compute cloud (amazon EC2). http://aws.amazon.com/ec2/.Google Scholar
Fast Artficial Neural Network (FANN). http://leenissen.dk/fann/.Google Scholar
Filebench: a framework for simulating applications on file systems. http://www.solarisinternals.com/wiki/index.php/FileBench.Google Scholar
fio: Flexible I/O tester. http://freshmeat.net/projects/fio/.Google Scholar
fpc: Flexible procedures for clustering. http://cran.r-project.org/web/packages/fpc/index.html.Google Scholar
LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.Google Scholar
The R Project for Statistical Computing. http://www.r-project.org/.Google Scholar
RUBiS: Rice University Bidding System. http://rubis.ow2.org/.Google Scholar
E. Alpaydin. Introduction to machine learning. MIT Press, 2004. Google ScholarDigital Library
M. N. Bennani and D. A. Menascé. Resource allocation for autonomic data centers using analytic performance models. In ICAC, pages 229--240. IEEE Computer Society, 2005. Google ScholarDigital Library
P. Bodik, M. Goldszmidt, A. Fox, D. B. Woodard, and H. Andersen. Fingerprinting the datacenter: Automated classification of performance crises. In EuroSys '10 Proceedings of the 5th European conference on Computer systems, pages 111--124, 2010. Google ScholarDigital Library
A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li. Decentralized Deduplication in SAN Cluster File Systems. In Proc. of USENIX ATC, June 2009. Google ScholarDigital Library
I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. S. Chase. Correlating instrumentation data to system states: A building block for automated diagnoses and control. In Proc. of the 6th USENIX OSDI), 2004. Google ScholarDigital Library
I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, indexing, clustering, and retrieving system history. In Proc. of ACM SOSP, 2005. Google ScholarDigital Library
R. P. Doyle, J. S. Chase, O. M. Asad, W. Jin, and A. Vahdat. Model-based resource provisioning in a web service utility. In USENIX Symposium on Internet Technologies and Systems, 2003. Google ScholarDigital Library
U. Drepper. The Cost of Virtualization. ACM Queue, Feb. 2008. Google ScholarDigital Library
A. Ganapathi, Y. Chen, A. Fox, R. Katz, and D. Patterson. Statistics-driven workload modeling for the cloud. In SMDB, 2010.Google ScholarCross Ref
A. Gulati, I. Ahmad, and C. Waldspurger. PARDA: Proportionate Allocation of Resources for Distributed Storage Access. In Proc. of USENIX FAST, Feb. 2009. Google ScholarDigital Library
A. Gulati, A. Merchant, and P. Varman. mClock: Handling Throughput Variability for Hypervisor IO Scheduling. In 9th USENIX OSDI, October 2010. Google ScholarDigital Library
S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, 2nd edition, 1998. Google ScholarDigital Library
J. Katcher. Postmark: A new file system benchmark. Technical report, Network Appliance, 1997.Google Scholar
E. Kotsovinos. Virtualization: Blessing or Curse? ACM Queue, Jan. 2011. Google ScholarDigital Library
S. Kundu, R. Rangaswami, K. Dutta, and M. Zhao. Application Performance Modeling in a Virtualized Environment. In Proc. of IEEE HPCA, January 2010.Google ScholarCross Ref
X. Liu, X. Zhu, S. Singhal, and M. F. Arlitt. Adaptive entitlement control of resource containers on shared servers. In IM, pages 163--176. IEEE, 2005.Google ScholarCross Ref
J. C. McCullough, Y. Agarwal, J. Chandrashekar, S. Kuppuswamy, A. C. Snoeren, and R. K. Gupta. Evaluating the effectiveness of model-based power characterization. In Proc. of USENIX Annual Technical Conference, 2011. Google ScholarDigital Library
R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: managing performance interference effects for qos-aware clouds. In EuroSys '10, pages 237--250, 2010. Google ScholarDigital Library
P. Padala, K.-Y. Hou, X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, and K. G. Shin. Automated control of multiple virtualized resources. In Proceedings of the 4th ACM European conference on Computer systems/EuroSys, pages 13--16, 2009. Google ScholarDigital Library
J. Rao, X. Bu, C.-Z. Xu, L. Y. Wang, and G. G. Yin. VCONF: a reinforcement learning approach to virtual machines auto-configuration. In ICAC, pages 137--146. ACM, 2009. Google ScholarDigital Library
C. Stewart, T. Kelly, A. Zhang, and K. Shen. A dollar from 15 cents: Cross-platform management for internet services. In Proceedings of the USENIX Annual Techinal Conference, pages 199--212, 2008. Google ScholarDigital Library
S. Uttamchandani, L. Yin, G. A. Alvarez, J. Palmer, and G. Agha. Chameleon: a self-evolving, fully-adaptive resource arbitrator for storage systems. In Proc. of USENIX Annual Technical Conference, 2005. Google ScholarDigital Library
VMware, Inc. Introduction to VMware Infrastructure. 2010. http://www.vmware.com/support/pubs/.Google Scholar
VMware, Inc. vSphere Resource Management Guide: ESX 4.1, ESXi 4.1, vCenter Server 4.1. 2010.Google Scholar
W. Vogels. Beyond Server Consolidation. ACM Queue, Feb. 2008. Google ScholarDigital Library
C. A. Waldspurger. Memory resource management in vmware esx server. In Proc. of USENIX OSDI, 2002. Google ScholarDigital Library
Z. Wang, X. Zhu, and S. Singhal. Utilization and slo-based control for dynamic sizing of resource partitions. In Proc. of 16th IFIP/IEEE Distributed Systems: Operations and Management (DSOM), October 2005. Google ScholarDigital Library
J. Wildstrom, P. Stone, and E. Witchel. CARVE: A cognitive agent for resource value estimation. In ICAC, pages 182--191. IEEE Computer Society, 2008. Google ScholarDigital Library
T. Wood, L. Cherkasova, K. Ozonat, and P. Shenoy. Profiling and modeling resource usage of virtualized applications. In Proc. of ACM/IFIP/USENIX Middleware, 2008. Google ScholarDigital Library
J. Xu, M. Zhao, J. A. B. Fortes, R. Carpenter, and M. S. Yousif. Autonomic resource management in virtualized data centers using fuzzy logic-based approaches. Cluster Computing, 11(3):213--227, 2008. Google ScholarDigital Library
W. Zheng, R. Bianchini, G. J. Janakiraman, J. R. Santos, and Y. Turner. JustRunIt: Experiment-Based Management of Virtualized Data Centers. In Proceeding of the USENIX Annual Technical Conference, 2009. Google ScholarDigital Library

Index Terms

Modeling virtualized applications using machine learning techniques
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods

Recommendations

Modeling virtualized applications using machine learning techniques
VEE '12: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments

With the growing adoption of virtualized datacenters and cloud hosting services, the allocation and sizing of resources such as CPU, memory, and I/O bandwidth for virtual machines (VMs) is becoming increasingly important. Accurate performance modeling ...
Read More
A machine learning approach to live migration modeling
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Live migration is one of the key technologies to improve data center utilization, power efficiency, and maintenance. Various live migration algorithms have been proposed; each exhibiting distinct characteristics in terms of completion time, amount of ...
Read More
Performance modeling and analysis of virtualized multi-tier applications under dynamic workloads

Virtual machine technology facilitates implementation of modern Internet services, especially multi-tier applications. Server virtualization aims to reduce the cost of service provisioning and improve fault tolerance, portability and security of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 47, Issue 7
VEE '12
July 2012
229 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2365864
Issue’s Table of Contents
VEE '12: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
March 2012
248 pages
ISBN:9781450311762
DOI:10.1145/2151024
General Chair:
Steven Hand
University of Cambridge, UK
,
Program Chair:
Dilma da Silva
IBM T. J. Watson Research Center, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 March 2012
Check for updates
Author Tags
VM sizing
cloud data centers
machine learning
performance modeling
virtualization
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 1,192
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling virtualized applications using machine learning techniques

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Modeling virtualized applications using machine learning techniques

A machine learning approach to live migration modeling

Performance modeling and analysis of virtualized multi-tier applications under dynamic workloads