ABSTRACT
As cluster computers are used for a wider range of applications, we encounter the need to deliver resources at particular times, to meet particular deadlines, and/or at the same time as other resources are provided elsewhere. To address such requirements, we describe a scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times. We present the design of a lease management architecture, Haizea, that implements leases as virtual machines (VMs), leveraging their ability to suspend, migrate, and resume computations and to provide leased resources with customized application environments. We discuss methods to minimize the overhead introduced by having to deploy VM images before the start of a lease. We also present the results of simulation studies that compare alternative approaches. Using workloads with various mixes of best-effort and advance reservation requests, we compare the performance of our VM-based approach with that of non-VM-based schedulers. We find that a VM-based approach can provide better performance (measured in terms of both total execution time and average delay incurred by best-effort requests) than a scheduler that does not support task pre-emption, and only slightly worse performance than a scheduler that does support task pre-emption. We also compare the impact of different VM image popularity distributions and VM image caching strategies on performance. These results emphasize the importance of VM image caching for the workloads studied and quantify the sensitivity of scheduling performance to VM image popularity distribution.
- S. Adabala, V. Chadha, P. Chawla, R. Figueiredo, J. Fortes, I. Krsul, A. Matsunaga, M. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu. From virtualized resources to virtual computing grids: the In-VIGO system. Future Gener. Comput. Syst., 21(6):896--909, June 2005.]]Google ScholarDigital Library
- A. Andrieux, K. Czajkowski, A. Dan, K. Keahey, H. Ludwig, T. Nakata, J. Pruyne, J. Rofrano, S. Tuecke, and M. Xu. Web services agreement specification (WS-Agreement).]]Google Scholar
- R. Bolze, F. Cappello, E. Caron, M. Daydé , F. Desprez, E. Jeannot, Y. Jégou, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard, E.-G. Talbi, and T. Irena. Grid'5000: a large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications, 20(4):481--494, Nov. 2006.]] Google ScholarDigital Library
- W. S. Cleveland. Lowess: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35(54), 1981.]]Google Scholar
- K. Czajkowski, I. Foster, and C. Kesselman. Resource co-allocation in computational grids. In HPDC '99: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, page 37, Washington, DC, USA, 1999. IEEE Computer Society.]] Google ScholarDigital Library
- W. Emeneker and D. Stanzione. Increasing Reliability through Dynamic Virtual Clustering. In High Availabilityand Performance Computing Workshop, 2006.]]Google Scholar
- W. Emeneker and D. Stanzione. Efficient Virtual Machine Caching in Dynamic Virtual Clusters. In SRMPDS Workshop, ICAPDS 2007 Conference, December 2007.]]Google Scholar
- N. Fallenbeck, H.-J. Picht, M. Smith, and B. Freisleben. Xen and the art of cluster scheduling. In VTDC '06: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing, Washington, DC, USA, 2006. IEEE Computer Society.]] Google ScholarDigital Library
- D. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - a status report. 10th Workshop on Job Scheduling Strategies for Parallel Processing, New-York, NY., 2004.]] Google ScholarDigital Library
- D. G. Feitelson and L. Rudolph. Metrics and benchmarking for parallel job scheduling. Lecture Notes in Computer Science, 1459:1+, 1998.]] Google ScholarDigital Library
- I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy. A distributed resource management architecture that supports advance reservations and co-allocation. In Proceedings of the International Workshop on Quality of Service, 1999.]]Google ScholarCross Ref
- I. T. Foster, T. Freeman, K. Keahey, D. Scheftner, B. Sotomayor, and X. Zhang. Virtual clusters for grid communities. In CCGRID, pages 513--520. IEEE Computer Society, 2006.]] Google ScholarDigital Library
- T. Freeman, K. Keahey, I. T. Foster, A. Rana, B. Sotomayor, and F. Wuerthwein. Division of labor: Tools for growing and scaling grids. In ICSOC, 2006.]] Google ScholarDigital Library
- J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke. Condor-G: A computation management agent for multi-institutional grids. Cluster Computing, 5(3):237--246, 2002.]] Google ScholarDigital Library
- P. H. Hargrove and J. C. Duell. Berkeley lab checkpoint/restart (blcr) for linux clusters. Journal of Physics: Conference Series, 46:494--499, 2006.]]Google ScholarCross Ref
- I. Raicu, Y.Zhao, C.Dumitrescu, I.Foster, and M.Wilde. Falkon: a fast and light-weight task execution framework. In IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC07), 2007.]] Google ScholarDigital Library
- D. Irwin, J. Chase, L. Grit, A. Yumerefendi, D. Becker, and K. G. Yocum. Sharing networked resources with brokered leases. In USENIX Technical Conference, June 2006.]] Google ScholarDigital Library
- K. Keahey, I. Foster, T. Freeman, and X. Zhang. Virtual workspaces: Achieving quality of service and quality of life on the grid. Scientific Programming, 13(4):265--276, 2005.]] Google ScholarDigital Library
- N. Kiyanclar, G. A. Koenig, and W. Yurcik. Maestro-VC: A paravirtualized execution environment for secure on-demand cluster computing. In CCGRID '06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), page 28, Washington, DC, USA, 2006. IEEE Computer Society.]] Google ScholarDigital Library
- I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 7, Washington, DC, USA, 2004. IEEE Computer Society.]] Google ScholarDigital Library
- D. A. Lifka. The ANL/IBM SP scheduling system. In IPPS '95: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 295--303, London, UK, 1995. Springer-Verlag.]] Google ScholarDigital Library
- M. W. Margo, K. Yoshimoto, P. Kovatch, and P. Andrews. Impact of reservations on production job scheduling. In 13th Workshop on Job Scheduling Strategies for Parallel Processing, 2007.]] Google ScholarDigital Library
- A. W. Mu'alem and D. G. Feitelson. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst., 12(6):529--543, 2001.]] Google ScholarDigital Library
- H. Nishimura, N. Maruyama, and S. Matsuoka. Virtual clusters on the fly - fast, scalable, and flexible installation. In CCGRID '07: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pages 549--556, Washington, DC, USA, 2007. IEEE Computer Society.]] Google ScholarDigital Library
- P. Beckman, S.Nadella, N.Trebon, and I.Beschastnikh. SPRUCE: A system for supporting urgent high-performance computing. IFIP International Federation for Information Processing, Grid-Based Problem Solving Environments, 239:295--311, 2007.]]Google ScholarCross Ref
- K. Pruhs, J. Sgall, and E. Torng. Handbook of Scheduling: Algorithms, Models, and Performance Analysis, chapter Online Scheduling. CRC Press, Inc., Boca Raton, FL, USA, 2004.]] Google ScholarDigital Library
- P. Ruth, P. McGachey, and D. Xu. VioCluster: Virtualization for dynamic computational domains. Proceedings of the IEEE International Conference on Cluster Computing (Cluster'05), 2005.]]Google ScholarCross Ref
- P. Ruth, J. Rhee, D. Xu, R. Kennell, and S. Goasguen. Autonomic live adaptation of virtual computational environments in a multi-domain infrastructure. IEEE International Conference on Autonomic Computing, 2006., 2006.]] Google ScholarDigital Library
- G. Singh, C. Kesselman, and E. Deelman. Performance impact of resource provisioning on workflows. Technical Report 05-850, Department of Computer Science, University of South California, 2005.]]Google Scholar
- W. Smith, I. Foster, and V. Taylor. Scheduling with advanced reservations. In IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed Processing, page 127, Washington, DC, USA, 2000. IEEE Computer Society.]] Google ScholarDigital Library
- Q. Snell, M. J. Clement, D. B. Jackson, and C. Gregory. The performance impact of advance reservation meta-scheduling. In IPDPS '00/JSSPP '00: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 137--153, London, UK, 2000. Springer-Verlag.]] Google ScholarDigital Library
- B. Sotomayor. A resource management model for VM-based virtual workspaces. Master's thesis, University of Chicago, February 2007.]]Google Scholar
- B. Sotomayor, K. Keahey, and I. Foster. Overhead matters: A model for virtual resource management. In VTDC '06: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing, page 5, Washington, DC, USA, 2006. IEEE Computer Society.]] Google ScholarDigital Library
- E. Walker, J. Gardner, V. Litvin, and E. Turner. Creating personal adaptive clusters for managing scientific tasks in a distributed computing environment. In Challenges of Large Applications in Distributed Environments, 2006.]]Google Scholar
- S. Yamasaki, N. Maruyama, and S. Matsuoka. Model-based resource selection for efficient virtual cluster deployment. In VTDC '07: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2007.]] Google ScholarDigital Library
- H. Zhao and R. Sakellariou. Advance reservation policies for workflows. In 12th Workshop on Job Scheduling Strategies for Parallel Processing, 2006.]] Google ScholarDigital Library
- Amazon EC2. http://aws.amazon.com/ec2/.]]Google Scholar
- Final report. teragrid co-scheduling/metascheduling requirements analysis team. http://www.teragridforum.org/mediawiki/images/b/b4/MetaschedRatReport.pdf.]]Google Scholar
- Parallel workloads archive. http://www.cs.huji.ac.il/labs/parallel/workload/.]]Google Scholar
Index Terms
- Combining batch execution and leasing using virtual machines
Recommendations
Live gang migration of virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computingThis paper addresses the problem of simultaneously migrating a group of co-located and live virtual machines (VMs), i.e, VMs executing on the same physical machine. We refer to such a mass simultaneous migration of active VMs as "live gang migration". ...
Inter-rack live migration of multiple virtual machines
VTDC '12: Proceedings of the 6th international workshop on Virtualization Technologies in Distributed Computing DateWithin datacenters, often multiple virtual machines (VMs) need to be live migrated simultaneously for various reasons such as maintenance, power savings, and load balancing. Such mass simultaneous live migration of multiple VMs can trigger large data ...
Execution replay of multiprocessor virtual machines
VEE '08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsExecution replay of virtual machines is a technique which has many important applications, including debugging, fault-tolerance, and security. Execution replay for single processor virtual machines is well-understood, and available commercially. With ...
Comments