skip to main content
10.1145/1183401.1183421acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

A case for high performance computing with virtual machines

Published:28 June 2006Publication History

ABSTRACT

Virtual machine (VM) technologies are experiencing a resurgence in both industry and research communities. VMs offer many desirable features such as security, ease of management, OS customization, performance isolation, check-pointing, and migration, which can be very beneficial to the performance and the manageability of high performance computing (HPC) applications. However, very few HPC applications are currently running in a virtualized environment due to the performance overhead of virtualization. Further, using VMs for HPC also introduces additional challenges such as management and distribution of OS images.In this paper we present a case for HPC with virtual machines by introducing a framework which addresses the performance and management overhead associated with VM-based computing. Two key ideas in our design are: Virtual Machine Monitor (VMM) bypass I/O and scalable VM image management. VMM-bypass I/O achieves high communication performance for VMs by exploiting the OS-bypass feature of modern high speed interconnects such as Infini-Band. Scalable VM image management significantly reduces the overhead of distributing and managing VMs in large scale clusters. Our current implementation is based on the Xen VM environment and InfiniBand. However, many of our ideas are readily applicable to other VM environments and high speed interconnects.We carry out detailed analysis on the performance and management overhead of our VM-based HPC framework. Our evaluation shows that HPC applications can achieve almost the same performance as those running in a native, non-virtualized environment. Therefore, our approach holds promise to bring the benefits of VMs to HPC applications with very little degradation in performance.

References

  1. A. Chien et al. Design and Evaluation of an HPVM-Based Windows NT Supercomputer. The International Journal of High Performance Computing Applications, 13(3):201--219, Fall 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Menon et al. Diagnosing Performance Overheads in the Xen Virtual Machine Environment. In Proceedings of the First ACM/USENIX Conference on Virtual Execution Environments (VEE'05), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Argonne National Laboratory. http://www-unix.mcs.anl.gov/mpi/mpich/.Google ScholarGoogle Scholar
  4. Argonne National Laboratory. Zeptoos: The small linux for big computers. http://www-unix.mcs.anl.gov/zeptoos/.Google ScholarGoogle Scholar
  5. A. Awadallah and M. Rosenblum. The vMatrix: A network of virtual machine monitors for dynamic content distribution. In Seventh International Workshop on Web Content Caching and Distribution, 2002.Google ScholarGoogle Scholar
  6. B. Dragovic et al. Xen and the Art of Virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Brightwell and L. A. Fisk. Scalable Parallel Application Launch on Cplant. In Proceedings of SC '01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Chun and D. Culler. User-centric Performance Analysis of Market-based Cluster Batch Schedulers. In Proceedings of CCGrid, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. J. Creasy. The Origin of the VM/370 Time-sharing System. IBM Journal of Research and Development, 25(5):483--490, 1981.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FastOS: Forum to Address Scalable Technology for runtime and Operating Systems. http://www.cs.unm.edu/fastos/.Google ScholarGoogle Scholar
  11. R. Figueiredo, P. Dinda, and J. Fortes. A Case for Grid Computing on Virtual Machines. In Proceedings of International Conference on Distributed Computing Systems (ICDCS), May 2003., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. W. and M. Williamson. Safe Hardware Access with the Xen Virtual Machine Monitor. In Proceedings of OASIS ASPLOS Workshop, 2004.Google ScholarGoogle Scholar
  13. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam. PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HPC-Colony Project: Services and Interfaces for Very Large Linux Clusters. http://www.hpc-colony.org/.Google ScholarGoogle Scholar
  15. IETF IPoIB Workgroup. http://www.ietf.org/html.charters/ipoib-charter.html.Google ScholarGoogle Scholar
  16. InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.2.Google ScholarGoogle Scholar
  17. K. Koch. How does ASCI Actually Complete Multi-month 1000-processor Milestone Simulations? In Proceedings of the Conference on High Speed Computing, 2002.Google ScholarGoogle Scholar
  18. J. Liu, W. Huang, B. Abali, and D. K. Panda. High Performance VMM-Bypass I/O in Virtual Machines. In Proceedings of USENIX '06, Boston, MA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda. High Performance RDMA-Based MPI Implementation over InfiniBand. In Proceedings of ICS '03, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Maccabe, P. G. Bridges, R. Brightwell, R. Riesen, and T. Hudson. Highly Configurable Operating Systems for Ultrascale Systems. In Proceedings of the First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters, 2004.Google ScholarGoogle Scholar
  21. Mellanox Technologies. http://www.mellanox.com.Google ScholarGoogle Scholar
  22. MOLAR: Modular Linux and Adaptive Runtime Support for High-end Computing Operating and Runtime Systems. http://forge-fre.ornl.gov/molar/.Google ScholarGoogle Scholar
  23. MVAPICH Project Website. http://nowlab.cse.ohio-state.edu/projects/mpi-iba/index.html.Google ScholarGoogle Scholar
  24. Myricom, Inc. Myrinet. http://www.myri.com.Google ScholarGoogle Scholar
  25. NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/.Google ScholarGoogle Scholar
  26. Open InfiniBand Alliance. http://www.openib.org.Google ScholarGoogle Scholar
  27. OProfile. http://oprofile.sourceforge.net.Google ScholarGoogle Scholar
  28. A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://www.netlib.org/benchmark/hpl/.Google ScholarGoogle Scholar
  29. F. Petrini, D. J. Kerbyson, and S. Pakin. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In Proceedings of SC '03, Washington, DC, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Pratt. Xen Virtualization. Linux World 2005 Virtualization BOF Presentation.Google ScholarGoogle Scholar
  31. Quadrics, Ltd. QsNet. http://www.quadrics.com.Google ScholarGoogle Scholar
  32. D. Reed, I. Pratt, P. Menage, S. Early, and N. Stratford. Xenoservers: Accountable Execution of Untrusted Programs. In Workshop on Hot Topics in Operating Systems, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Rosenblum and T. Garfinkel. Virtual Machine Monitors: Current Technology and Future Trends. IEEE Computer, 38(5):39--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Sugerman, G. Venkitachalam, and B. H. Lim. Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor. In Proceedings of USENIX, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Top 500 Supercomputer Site. http://www.top500.com.Google ScholarGoogle Scholar
  36. ttylinux. http://www.minimalinux.org/.Google ScholarGoogle Scholar
  37. University of Wisconsin. Condor High Throughput Computing. http://www.cs.wisc.edu/condor/.Google ScholarGoogle Scholar
  38. T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: A User-level Network Interface for Parallel and Distributed Computing. In ACM Symposium on Operating Systems Principles, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active Messages: A Mechanism for Integrated Communication and Computation. In International Symposium on Computer Architecture, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Waldspurger. Memory resource management in VMware ESX server. In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Whitaker, M. Shaw, and S. D. Gribble. Denali: Lightweight Virtual Machines for Distributed and Networked Applications. Technical report, University of Washington, February 2002.Google ScholarGoogle Scholar
  42. W. Yu, J. Wu, and D. K. Panda. Fast and Scalable Startup of MPI Programs In InfiniBand Clusters. In Proceedings of HiPC'04, Banglore, Inida, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A case for high performance computing with virtual machines

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '06: Proceedings of the 20th annual international conference on Supercomputing
          June 2006
          385 pages
          ISBN:1595932828
          DOI:10.1145/1183401

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ICS '06 Paper Acceptance Rate37of141submissions,26%Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader