research-article

Combining batch execution and leasing using virtual machines

Authors:
Borja Sotomayor

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Kate Keahey

Argonne National Laboratory, Argonne, IL, USA

Argonne National Laboratory, Argonne, IL, USA
View Profile

,
Ian Foster

Argonne National Laboratory, Argonne, IL, USA

Argonne National Laboratory, Argonne, IL, USA
View Profile

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computingJune 2008Pages 87–96https://doi.org/10.1145/1383422.1383434

Published:23 June 2008Publication History

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

Pages 87–96

ABSTRACT

As cluster computers are used for a wider range of applications, we encounter the need to deliver resources at particular times, to meet particular deadlines, and/or at the same time as other resources are provided elsewhere. To address such requirements, we describe a scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times. We present the design of a lease management architecture, Haizea, that implements leases as virtual machines (VMs), leveraging their ability to suspend, migrate, and resume computations and to provide leased resources with customized application environments. We discuss methods to minimize the overhead introduced by having to deploy VM images before the start of a lease. We also present the results of simulation studies that compare alternative approaches. Using workloads with various mixes of best-effort and advance reservation requests, we compare the performance of our VM-based approach with that of non-VM-based schedulers. We find that a VM-based approach can provide better performance (measured in terms of both total execution time and average delay incurred by best-effort requests) than a scheduler that does not support task pre-emption, and only slightly worse performance than a scheduler that does support task pre-emption. We also compare the impact of different VM image popularity distributions and VM image caching strategies on performance. These results emphasize the importance of VM image caching for the workloads studied and quantify the sensitivity of scheduling performance to VM image popularity distribution.

References

S. Adabala, V. Chadha, P. Chawla, R. Figueiredo, J. Fortes, I. Krsul, A. Matsunaga, M. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu. From virtualized resources to virtual computing grids: the In-VIGO system. Future Gener. Comput. Syst., 21(6):896--909, June 2005.]]Google ScholarDigital Library
A. Andrieux, K. Czajkowski, A. Dan, K. Keahey, H. Ludwig, T. Nakata, J. Pruyne, J. Rofrano, S. Tuecke, and M. Xu. Web services agreement specification (WS-Agreement).]]Google Scholar
R. Bolze, F. Cappello, E. Caron, M. Daydé , F. Desprez, E. Jeannot, Y. Jégou, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard, E.-G. Talbi, and T. Irena. Grid'5000: a large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications, 20(4):481--494, Nov. 2006.]] Google ScholarDigital Library
W. S. Cleveland. Lowess: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35(54), 1981.]]Google Scholar
K. Czajkowski, I. Foster, and C. Kesselman. Resource co-allocation in computational grids. In HPDC '99: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, page 37, Washington, DC, USA, 1999. IEEE Computer Society.]] Google ScholarDigital Library
W. Emeneker and D. Stanzione. Increasing Reliability through Dynamic Virtual Clustering. In High Availabilityand Performance Computing Workshop, 2006.]]Google Scholar
W. Emeneker and D. Stanzione. Efficient Virtual Machine Caching in Dynamic Virtual Clusters. In SRMPDS Workshop, ICAPDS 2007 Conference, December 2007.]]Google Scholar
N. Fallenbeck, H.-J. Picht, M. Smith, and B. Freisleben. Xen and the art of cluster scheduling. In VTDC '06: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing, Washington, DC, USA, 2006. IEEE Computer Society.]] Google ScholarDigital Library
D. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - a status report. 10th Workshop on Job Scheduling Strategies for Parallel Processing, New-York, NY., 2004.]] Google ScholarDigital Library
D. G. Feitelson and L. Rudolph. Metrics and benchmarking for parallel job scheduling. Lecture Notes in Computer Science, 1459:1+, 1998.]] Google ScholarDigital Library
I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy. A distributed resource management architecture that supports advance reservations and co-allocation. In Proceedings of the International Workshop on Quality of Service, 1999.]]Google ScholarCross Ref
I. T. Foster, T. Freeman, K. Keahey, D. Scheftner, B. Sotomayor, and X. Zhang. Virtual clusters for grid communities. In CCGRID, pages 513--520. IEEE Computer Society, 2006.]] Google ScholarDigital Library
T. Freeman, K. Keahey, I. T. Foster, A. Rana, B. Sotomayor, and F. Wuerthwein. Division of labor: Tools for growing and scaling grids. In ICSOC, 2006.]] Google ScholarDigital Library
J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke. Condor-G: A computation management agent for multi-institutional grids. Cluster Computing, 5(3):237--246, 2002.]] Google ScholarDigital Library
P. H. Hargrove and J. C. Duell. Berkeley lab checkpoint/restart (blcr) for linux clusters. Journal of Physics: Conference Series, 46:494--499, 2006.]]Google ScholarCross Ref
I. Raicu, Y.Zhao, C.Dumitrescu, I.Foster, and M.Wilde. Falkon: a fast and light-weight task execution framework. In IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC07), 2007.]] Google ScholarDigital Library
D. Irwin, J. Chase, L. Grit, A. Yumerefendi, D. Becker, and K. G. Yocum. Sharing networked resources with brokered leases. In USENIX Technical Conference, June 2006.]] Google ScholarDigital Library
K. Keahey, I. Foster, T. Freeman, and X. Zhang. Virtual workspaces: Achieving quality of service and quality of life on the grid. Scientific Programming, 13(4):265--276, 2005.]] Google ScholarDigital Library
N. Kiyanclar, G. A. Koenig, and W. Yurcik. Maestro-VC: A paravirtualized execution environment for secure on-demand cluster computing. In CCGRID '06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), page 28, Washington, DC, USA, 2006. IEEE Computer Society.]] Google ScholarDigital Library
I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 7, Washington, DC, USA, 2004. IEEE Computer Society.]] Google ScholarDigital Library
D. A. Lifka. The ANL/IBM SP scheduling system. In IPPS '95: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 295--303, London, UK, 1995. Springer-Verlag.]] Google ScholarDigital Library
M. W. Margo, K. Yoshimoto, P. Kovatch, and P. Andrews. Impact of reservations on production job scheduling. In 13th Workshop on Job Scheduling Strategies for Parallel Processing, 2007.]] Google ScholarDigital Library
A. W. Mu'alem and D. G. Feitelson. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst., 12(6):529--543, 2001.]] Google ScholarDigital Library
H. Nishimura, N. Maruyama, and S. Matsuoka. Virtual clusters on the fly - fast, scalable, and flexible installation. In CCGRID '07: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pages 549--556, Washington, DC, USA, 2007. IEEE Computer Society.]] Google ScholarDigital Library
P. Beckman, S.Nadella, N.Trebon, and I.Beschastnikh. SPRUCE: A system for supporting urgent high-performance computing. IFIP International Federation for Information Processing, Grid-Based Problem Solving Environments, 239:295--311, 2007.]]Google ScholarCross Ref
K. Pruhs, J. Sgall, and E. Torng. Handbook of Scheduling: Algorithms, Models, and Performance Analysis, chapter Online Scheduling. CRC Press, Inc., Boca Raton, FL, USA, 2004.]] Google ScholarDigital Library
P. Ruth, P. McGachey, and D. Xu. VioCluster: Virtualization for dynamic computational domains. Proceedings of the IEEE International Conference on Cluster Computing (Cluster'05), 2005.]]Google ScholarCross Ref
P. Ruth, J. Rhee, D. Xu, R. Kennell, and S. Goasguen. Autonomic live adaptation of virtual computational environments in a multi-domain infrastructure. IEEE International Conference on Autonomic Computing, 2006., 2006.]] Google ScholarDigital Library
G. Singh, C. Kesselman, and E. Deelman. Performance impact of resource provisioning on workflows. Technical Report 05-850, Department of Computer Science, University of South California, 2005.]]Google Scholar
W. Smith, I. Foster, and V. Taylor. Scheduling with advanced reservations. In IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed Processing, page 127, Washington, DC, USA, 2000. IEEE Computer Society.]] Google ScholarDigital Library
Q. Snell, M. J. Clement, D. B. Jackson, and C. Gregory. The performance impact of advance reservation meta-scheduling. In IPDPS '00/JSSPP '00: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 137--153, London, UK, 2000. Springer-Verlag.]] Google ScholarDigital Library
B. Sotomayor. A resource management model for VM-based virtual workspaces. Master's thesis, University of Chicago, February 2007.]]Google Scholar
B. Sotomayor, K. Keahey, and I. Foster. Overhead matters: A model for virtual resource management. In VTDC '06: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing, page 5, Washington, DC, USA, 2006. IEEE Computer Society.]] Google ScholarDigital Library
E. Walker, J. Gardner, V. Litvin, and E. Turner. Creating personal adaptive clusters for managing scientific tasks in a distributed computing environment. In Challenges of Large Applications in Distributed Environments, 2006.]]Google Scholar
S. Yamasaki, N. Maruyama, and S. Matsuoka. Model-based resource selection for efficient virtual cluster deployment. In VTDC '07: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2007.]] Google ScholarDigital Library
H. Zhao and R. Sakellariou. Advance reservation policies for workflows. In 12th Workshop on Job Scheduling Strategies for Parallel Processing, 2006.]] Google ScholarDigital Library
Amazon EC2. http://aws.amazon.com/ec2/.]]Google Scholar
Final report. teragrid co-scheduling/metascheduling requirements analysis team. http://www.teragridforum.org/mediawiki/images/b/b4/MetaschedRatReport.pdf.]]Google Scholar
Parallel workloads archive. http://www.cs.huji.ac.il/labs/parallel/workload/.]]Google Scholar

Index Terms

Combining batch execution and leasing using virtual machines
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance
        Checkpoint / restart
    2. Software system structures
      1. Distributed systems organizing principles

Recommendations

Live gang migration of virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

This paper addresses the problem of simultaneously migrating a group of co-located and live virtual machines (VMs), i.e, VMs executing on the same physical machine. We refer to such a mass simultaneous migration of active VMs as "live gang migration". ...
Read More
Inter-rack live migration of multiple virtual machines
VTDC '12: Proceedings of the 6th international workshop on Virtualization Technologies in Distributed Computing Date

Within datacenters, often multiple virtual machines (VMs) need to be live migrated simultaneously for various reasons such as maintenance, power savings, and load balancing. Such mass simultaneous live migration of multiple VMs can trigger large data ...
Read More
Execution replay of multiprocessor virtual machines
VEE '08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Execution replay of virtual machines is a technique which has many important applications, including debugging, fault-tolerance, and security. Execution replay for single processor virtual machines is well-understood, and available commercially. With ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing
June 2008
252 pages
ISBN:9781595939975
DOI:10.1145/1383422
General Chairs:
Manish Parashar
Rutgers University, USA
,
Karsten Schwan
Georgia Institute of Technology, USA
,
Program Chairs:
Jon Weissman
National e-Science Center, Edinburgh, University of Minnesota, USA
,
Domenico Laforenza
Information Science and Technology Institute, CNR, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
advance reservations
backfilling
batch processing
checkpoint/restart
resource leasing
resource management
virtual machine overhead
virtual machines
virtual workspaces
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate166of966submissions,17%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 170
  Total Citations
  View Citations
- 870
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining batch execution and leasing using virtual machines

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Live gang migration of virtual machines

Inter-rack live migration of multiple virtual machines

Execution replay of multiprocessor virtual machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Combining batch execution and leasing using virtual machines

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Live gang migration of virtual machines

Inter-rack live migration of multiple virtual machines

Execution replay of multiprocessor virtual machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media