Abstract
Trends indicate a rapid increase in the number of cores on chip, exhibiting various types of performance and functional asymmetries present in hardware to gain scalability with balanced power vs. performance requirements. This poses new challenges in platform resource management, which are further exacerbated by the need for runtime power budgeting and by the increased dynamics in workload behavior observed in consolidated datacenter and cloudcomputing systems. This paper considers the implications of these challenges for the virtualization layer of abstraction, which is the base layer for resource management in such heterogeneous multicore platforms. Specifically, while existing and upcoming management methods routinely leverage system-level information available to the hypervisor about current and global platform state, we argue that for future systems there will be an increased necessity for additional information about applications and their needs. This 'end-to-end' argument leads us to propose 'performance points' as a general interface between the virtualization system and higher layers like the guest operating systems that run application workloads. Building on concrete examples from past work on APIs with which applications can inform systems of phase or workload changes and conversely, with which systems can indicate to applications desired changes in power consumption, performance points are shown to be an effective way to better exploit asymmetries and gain the power/performance improvements promised by heterogeneous multicore systems.
- A. Baumann, P. Barham, P.-E. Dagand, et al. The multikernel: a new os architecture for scalable multicore systems. In SOSP, pages 29--44, Big Sky, Montana, USA, 2009. Google ScholarDigital Library
- D. Chisnall. The Definitive Guide to the Xen Hypervisor. Prentice Hall Open Source Software Development Series, 1st edition, 2008. Google ScholarDigital Library
- D. D. Clark. The structuring of systems using upcalls. In SOSP, pages 171--180, 1985. Google ScholarDigital Library
- DARPA. Ubiquitous high performance computing. https://www.fbo.gov/utils/view?id=914fa5f0a69d7bedce157d916cc97b6e, 2010.Google Scholar
- G. Diamos, A. Kerr, S. Yalamanchili, et al. Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In PACT, Vienna, Austria, 2010. Google ScholarDigital Library
- G. Diamos and S. Yalamanchili. Harmony: An execution model and runtime for heterogeneous many core systems. In HPDC Hot Topics. Google ScholarDigital Library
- G. Diamos and S. Yalamanchili. Speculative execution on Multi-GPU systems. In IPDPS, Atlanta, USA, 4 2010.Google Scholar
- J. C. Doyle, B. A. Francis, and A. R. Tannenbaum. Feedback Control Theory. Dover Publications, 2009.Google Scholar
- P. Du, P. Luszczek, and J. Dongarra. Opencl evaluation for numerical linear algebra library development. In SAAHPC, Knoxville, USA, 2010.Google Scholar
- P. Dubey. Recognition, mining and synthesis moves computers to the era of tera. In Intel Technology Journal, pages 1--10, 2005.Google Scholar
- B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In OSDI, 1999. Google ScholarDigital Library
- M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. Enabling task parallelism in the cuda scheduler. In PMEA, Raleigh, USA, 2009.Google Scholar
- V. Gupta, A. Gavrilovska, K. Schwan, et al. Gvim: Gpu-accelerated virtual machines. In HPCVirt, 2009. Google ScholarDigital Library
- V. Gupta, J. Xenidis, K. Schwan, et al. Cellule: Lightweight execution environment for accelerator-based systems. Technical Report GIT-CERCS-10-03, Georgia Tech, 2010.Google Scholar
- Intel Corporation. Enabling consistent platform-level services for tightly coupled accelerators. http://tinyurl.com/cler3n.Google Scholar
- V. Kazempour, A. Kamali, and A. Fedorova. Aash: an asymmetry-aware scheduler for hypervisors. In VEE, Pittsburgh, USA, 2010. Google ScholarDigital Library
- A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. IEEE Workload Characterization Symposium, 0:3--12, 2009. Google ScholarDigital Library
- M. Kesavan, A. Gavrilovska, and K. Schwan. Differential virtual time (dvt): Rethinking i/o service differentiation for virtual machines. In SoCC, Indianapolis, USA, 2010. Google ScholarDigital Library
- R. Knauerhase, P. Brett, B. Hohlt, et al. Using os observations to improve performance in multicore systems. IEEE Micro, 28(3), 2008. Google ScholarDigital Library
- D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In EuroSys, Paris, France, 2010. Google ScholarDigital Library
- C. Krasic, M. Saubhasik, A. Sinha, et al. Fair and timely scheduling via cooperative polling. In EuroSys, Nuremberg, Germany, 2009. Google ScholarDigital Library
- T. Kubaska. Scc platform overview. http://communities.intel.com/docs/DOC-5512.Google Scholar
- R. Kumar, D. M. Tullsenand, P. Ranganathan, et al. Single-isa heterogeneous multi-core architectures for multithreaded workload performance. In ISCA, München, Germany, 2004. Google ScholarDigital Library
- S. Kumar, V. Talwar, V. Kumar, et al. A loosely coupled approach to coordinating platform and virtualization management in data centers. In ICAC, June 2009. Google ScholarDigital Library
- S. Kumar, V. Talwar, P. Ranganathan, et al. M-channels and m-brokers: Coordinated management in virtualized systems. In MMCS, June 2008.Google Scholar
- J. Lange, K. Pedretti, P. Dinda, et al. Palacios: A new open source virtual machine monitor for scalable high performance computing. In IPDPS, Atlanta, USA, 2010.Google Scholar
- J. Levon. Oprofile manual. http://oprofile.sourceforge.net/doc/index.html, 2000.Google Scholar
- T. Li, P. Brett, B. Hohlt, R. Knauerhase, et al. Operating system support for shared-isa asymmetric multi-core architectures. In WIOSCA, Beijing, China, 2008.Google Scholar
- D. Luebke, M. Harris, J. Krüger, et al. Gpgpu: general purpose computation on graphics hardware. In SIGGRAPH, Los Angeles, CA, 2004. Google ScholarDigital Library
- C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Micro-42, 2009. Google ScholarDigital Library
- E. Marcial. The ice financial application. https://www.theice.com/homepage.jhtml, August 2010. Private Communication.Google Scholar
- R. Nathuji and K. Schwan. Virtualpower: coordinated power management in virtualized enterprise systems. In SOSP, Stevenson, Washington, USA, 2007. Google ScholarDigital Library
- E. B. Nightingale, O. Hodson, et al. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP, 2009. Google ScholarDigital Library
- NVIDIA. Nvidia cuda compute unified device architecture - programming guide. http://tinyurl.com/cx3tl3, June 2007.Google Scholar
- NVIDIA. Nvidia tesla c870. http://www.nvidia.com/object/tesla_c870.html, Dec. 2007.Google Scholar
- OW2 Consortium. Rubis: Rice university bidding system. http://rubis.ow2.org/index.html.Google Scholar
- A. Ranadive, A. Gavrilovska, and K. Schwan. Fares: Fair resource scheduling for vmm-bypass infiniband devices. In CCGRID, pages 418--427, 2010. Google ScholarDigital Library
- D. Rao and K. Schwan. vnuma-mgr : Managing vm memory on numa platforms. In HiPC, Goa, India, 2010.Google ScholarCross Ref
- B. Ravindran. On recent advances in time/utility function real-time scheduling and resource management. In IEEE ISORC, 2005. Google ScholarDigital Library
- J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilley, 1 edition, July 2007. Google ScholarDigital Library
- I. N. Release. Intel unveils new product plans for high-performance computing. http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm.Google Scholar
- D. Rosu, K. Schwan, S. Yalamanchili, et al. On adaptive resource allocation for complex real-time applications. In RTSS, 1997. Google ScholarDigital Library
- J. C. Saez, M. Prieto, A. Fedorova, et al. A comprehensive scheduler for asymmetric multicore systems. In EuroSys, Paris, France, 2010. Google ScholarDigital Library
- J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Trans. Comput. Syst., 2(4):277--288, 1984. Google ScholarDigital Library
- J. R. Santos, Y. Turner, and J. Mudigonda. Taming heterogeneous nic capabilities for i/o virtualization. In Workshop on I/O Virtualization, San Diego, CA, Dec. 2008. Google ScholarDigital Library
- L. Seiler, D. Carmean, E. Sprangle, et al. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3):1--15, 2008. Google ScholarDigital Library
- A. L. Shimpi. Intel's sandy bridge architecture exposed. http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed.Google Scholar
- J. Sreeram and S. Pande. Glimpses: A profiling tool for rapid spe code prototyping. In Workshop on New Horizons in Compilers, Goa, India, 2007.Google Scholar
- P. Tembey, A. Gavrilovska, and K. Schwan. A case for coordinated resource management in heterogeneous multicore platforms. In WIOSCA, St. Malo, France, June 2010. Google ScholarDigital Library
- G. Teodoro, T. D. R. Hartley, U. Catalyurek, et al. Run-time optimizations for replicated dataflows on heterogeneous environments. In HPDC, Chicago, Illinois, 2010. Google ScholarDigital Library
- J. Torrellas, A. Tucker, and A. Gupta. Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary. SIGMETRICS Perform. Eval. Rev., 21(1), 1993. Google ScholarDigital Library
- V. Uhlig, J. LeVasseur, E. Skoglund, et al. Towards scalable multiprocessor virtual machines. In VM, pages 4--4, San Jose, California, 2004. Google ScholarDigital Library
- A. Velte and T. Velte. Microsoft Virtualization with Hyper-V. McGraw-Hill, Inc., 2010. Google ScholarDigital Library
- C. A. Waldspurger. Memory resource management in vmware esx server. In OSDI, Boston, USA, December 2002. Google ScholarDigital Library
Index Terms
- Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems
Recommendations
GPU virtualization for high performance general purpose computing on the ESX hypervisor
HPC '14: Proceedings of the High Performance Computing SymposiumGraphics Processing Units (GPU) have become important components in high performance computing (HPC) systems for their massively parallel computing capability and energy efficiency. Virtualization technologies are increasingly applied to HPC to reduce ...
GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications
CLOUD '14: Proceedings of the 2014 IEEE International Conference on Cloud ComputingAs more scientific workloads are moved into the cloud, the need for high performance accelerators increases. Accelerators such as GPUs offer improvements in both performance and power efficiency over traditional multi-core processors, however, their use ...
Virtualizing performance asymmetric multi-core systems
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecturePerformance-asymmetric multi-cores consist of heterogeneous cores, which support the same ISA, but have different computing capabilities. To maximize the throughput of asymmetric multi-core systems, operating systems are responsible for scheduling ...
Comments