research-article

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

Authors:
Vishakha Gupta

Intel Labs, Hillsboro, OR, USA

Intel Labs, Hillsboro, OR, USA
View Profile

,
Rob Knauerhase

Intel Labs, Hillsboro, OR, USA

Intel Labs, Hillsboro, OR, USA
View Profile

,
Karsten Schwan

Georgia Institute of Technology, Atlanta, USA

Georgia Institute of Technology, Atlanta, USA
View Profile

Authors Info & Claims

ACM SIGOPS Operating Systems Review Volume 45 Issue 1January 2011pp 3–10https://doi.org/10.1145/1945023.1945026

Published:18 February 2011Publication History

ACM SIGOPS Operating Systems Review

Abstract

Trends indicate a rapid increase in the number of cores on chip, exhibiting various types of performance and functional asymmetries present in hardware to gain scalability with balanced power vs. performance requirements. This poses new challenges in platform resource management, which are further exacerbated by the need for runtime power budgeting and by the increased dynamics in workload behavior observed in consolidated datacenter and cloudcomputing systems. This paper considers the implications of these challenges for the virtualization layer of abstraction, which is the base layer for resource management in such heterogeneous multicore platforms. Specifically, while existing and upcoming management methods routinely leverage system-level information available to the hypervisor about current and global platform state, we argue that for future systems there will be an increased necessity for additional information about applications and their needs. This 'end-to-end' argument leads us to propose 'performance points' as a general interface between the virtualization system and higher layers like the guest operating systems that run application workloads. Building on concrete examples from past work on APIs with which applications can inform systems of phase or workload changes and conversely, with which systems can indicate to applications desired changes in power consumption, performance points are shown to be an effective way to better exploit asymmetries and gain the power/performance improvements promised by heterogeneous multicore systems.

References

A. Baumann, P. Barham, P.-E. Dagand, et al. The multikernel: a new os architecture for scalable multicore systems. In SOSP, pages 29--44, Big Sky, Montana, USA, 2009. Google ScholarDigital Library
D. Chisnall. The Definitive Guide to the Xen Hypervisor. Prentice Hall Open Source Software Development Series, 1st edition, 2008. Google ScholarDigital Library
D. D. Clark. The structuring of systems using upcalls. In SOSP, pages 171--180, 1985. Google ScholarDigital Library
DARPA. Ubiquitous high performance computing. https://www.fbo.gov/utils/view?id=914fa5f0a69d7bedce157d916cc97b6e, 2010.Google Scholar
G. Diamos, A. Kerr, S. Yalamanchili, et al. Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In PACT, Vienna, Austria, 2010. Google ScholarDigital Library
G. Diamos and S. Yalamanchili. Harmony: An execution model and runtime for heterogeneous many core systems. In HPDC Hot Topics. Google ScholarDigital Library
G. Diamos and S. Yalamanchili. Speculative execution on Multi-GPU systems. In IPDPS, Atlanta, USA, 4 2010.Google Scholar
J. C. Doyle, B. A. Francis, and A. R. Tannenbaum. Feedback Control Theory. Dover Publications, 2009.Google Scholar
P. Du, P. Luszczek, and J. Dongarra. Opencl evaluation for numerical linear algebra library development. In SAAHPC, Knoxville, USA, 2010.Google Scholar
P. Dubey. Recognition, mining and synthesis moves computers to the era of tera. In Intel Technology Journal, pages 1--10, 2005.Google Scholar
B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In OSDI, 1999. Google ScholarDigital Library
M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. Enabling task parallelism in the cuda scheduler. In PMEA, Raleigh, USA, 2009.Google Scholar
V. Gupta, A. Gavrilovska, K. Schwan, et al. Gvim: Gpu-accelerated virtual machines. In HPCVirt, 2009. Google ScholarDigital Library
V. Gupta, J. Xenidis, K. Schwan, et al. Cellule: Lightweight execution environment for accelerator-based systems. Technical Report GIT-CERCS-10-03, Georgia Tech, 2010.Google Scholar
Intel Corporation. Enabling consistent platform-level services for tightly coupled accelerators. http://tinyurl.com/cler3n.Google Scholar
V. Kazempour, A. Kamali, and A. Fedorova. Aash: an asymmetry-aware scheduler for hypervisors. In VEE, Pittsburgh, USA, 2010. Google ScholarDigital Library
A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. IEEE Workload Characterization Symposium, 0:3--12, 2009. Google ScholarDigital Library
M. Kesavan, A. Gavrilovska, and K. Schwan. Differential virtual time (dvt): Rethinking i/o service differentiation for virtual machines. In SoCC, Indianapolis, USA, 2010. Google ScholarDigital Library
R. Knauerhase, P. Brett, B. Hohlt, et al. Using os observations to improve performance in multicore systems. IEEE Micro, 28(3), 2008. Google ScholarDigital Library
D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In EuroSys, Paris, France, 2010. Google ScholarDigital Library
C. Krasic, M. Saubhasik, A. Sinha, et al. Fair and timely scheduling via cooperative polling. In EuroSys, Nuremberg, Germany, 2009. Google ScholarDigital Library
T. Kubaska. Scc platform overview. http://communities.intel.com/docs/DOC-5512.Google Scholar
R. Kumar, D. M. Tullsenand, P. Ranganathan, et al. Single-isa heterogeneous multi-core architectures for multithreaded workload performance. In ISCA, München, Germany, 2004. Google ScholarDigital Library
S. Kumar, V. Talwar, V. Kumar, et al. A loosely coupled approach to coordinating platform and virtualization management in data centers. In ICAC, June 2009. Google ScholarDigital Library
S. Kumar, V. Talwar, P. Ranganathan, et al. M-channels and m-brokers: Coordinated management in virtualized systems. In MMCS, June 2008.Google Scholar
J. Lange, K. Pedretti, P. Dinda, et al. Palacios: A new open source virtual machine monitor for scalable high performance computing. In IPDPS, Atlanta, USA, 2010.Google Scholar
J. Levon. Oprofile manual. http://oprofile.sourceforge.net/doc/index.html, 2000.Google Scholar
T. Li, P. Brett, B. Hohlt, R. Knauerhase, et al. Operating system support for shared-isa asymmetric multi-core architectures. In WIOSCA, Beijing, China, 2008.Google Scholar
D. Luebke, M. Harris, J. Krüger, et al. Gpgpu: general purpose computation on graphics hardware. In SIGGRAPH, Los Angeles, CA, 2004. Google ScholarDigital Library
C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Micro-42, 2009. Google ScholarDigital Library
E. Marcial. The ice financial application. https://www.theice.com/homepage.jhtml, August 2010. Private Communication.Google Scholar
R. Nathuji and K. Schwan. Virtualpower: coordinated power management in virtualized enterprise systems. In SOSP, Stevenson, Washington, USA, 2007. Google ScholarDigital Library
E. B. Nightingale, O. Hodson, et al. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP, 2009. Google ScholarDigital Library
NVIDIA. Nvidia cuda compute unified device architecture - programming guide. http://tinyurl.com/cx3tl3, June 2007.Google Scholar
NVIDIA. Nvidia tesla c870. http://www.nvidia.com/object/tesla_c870.html, Dec. 2007.Google Scholar
OW2 Consortium. Rubis: Rice university bidding system. http://rubis.ow2.org/index.html.Google Scholar
A. Ranadive, A. Gavrilovska, and K. Schwan. Fares: Fair resource scheduling for vmm-bypass infiniband devices. In CCGRID, pages 418--427, 2010. Google ScholarDigital Library
D. Rao and K. Schwan. vnuma-mgr : Managing vm memory on numa platforms. In HiPC, Goa, India, 2010.Google ScholarCross Ref
B. Ravindran. On recent advances in time/utility function real-time scheduling and resource management. In IEEE ISORC, 2005. Google ScholarDigital Library
J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilley, 1 edition, July 2007. Google ScholarDigital Library
I. N. Release. Intel unveils new product plans for high-performance computing. http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm.Google Scholar
D. Rosu, K. Schwan, S. Yalamanchili, et al. On adaptive resource allocation for complex real-time applications. In RTSS, 1997. Google ScholarDigital Library
J. C. Saez, M. Prieto, A. Fedorova, et al. A comprehensive scheduler for asymmetric multicore systems. In EuroSys, Paris, France, 2010. Google ScholarDigital Library
J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Trans. Comput. Syst., 2(4):277--288, 1984. Google ScholarDigital Library
J. R. Santos, Y. Turner, and J. Mudigonda. Taming heterogeneous nic capabilities for i/o virtualization. In Workshop on I/O Virtualization, San Diego, CA, Dec. 2008. Google ScholarDigital Library
L. Seiler, D. Carmean, E. Sprangle, et al. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3):1--15, 2008. Google ScholarDigital Library
A. L. Shimpi. Intel's sandy bridge architecture exposed. http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed.Google Scholar
J. Sreeram and S. Pande. Glimpses: A profiling tool for rapid spe code prototyping. In Workshop on New Horizons in Compilers, Goa, India, 2007.Google Scholar
P. Tembey, A. Gavrilovska, and K. Schwan. A case for coordinated resource management in heterogeneous multicore platforms. In WIOSCA, St. Malo, France, June 2010. Google ScholarDigital Library
G. Teodoro, T. D. R. Hartley, U. Catalyurek, et al. Run-time optimizations for replicated dataflows on heterogeneous environments. In HPDC, Chicago, Illinois, 2010. Google ScholarDigital Library
J. Torrellas, A. Tucker, and A. Gupta. Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary. SIGMETRICS Perform. Eval. Rev., 21(1), 1993. Google ScholarDigital Library
V. Uhlig, J. LeVasseur, E. Skoglund, et al. Towards scalable multiprocessor virtual machines. In VM, pages 4--4, San Jose, California, 2004. Google ScholarDigital Library
A. Velte and T. Velte. Microsoft Virtualization with Hyper-V. McGraw-Hill, Inc., 2010. Google ScholarDigital Library
C. A. Waldspurger. Memory resource management in vmware esx server. In OSDI, Boston, USA, December 2002. Google ScholarDigital Library

Index Terms

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems
1. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

GPU virtualization for high performance general purpose computing on the ESX hypervisor
HPC '14: Proceedings of the High Performance Computing Symposium

Graphics Processing Units (GPU) have become important components in high performance computing (HPC) systems for their massively parallel computing capability and energy efficiency. Virtualization technologies are increasingly applied to HPC to reduce ...
Read More
GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications
CLOUD '14: Proceedings of the 2014 IEEE International Conference on Cloud Computing

As more scientific workloads are moved into the cloud, the need for high performance accelerators increases. Accelerators such as GPUs offer improvements in both performance and power efficiency over traditional multi-core processors, however, their use ...
Read More
Virtualizing performance asymmetric multi-core systems
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Performance-asymmetric multi-cores consist of heterogeneous cores, which support the same ISA, but have different computing capabilities. To maximize the throughput of asymmetric multi-core systems, operating systems are responsible for scheduling ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGOPS Operating Systems Review Volume 45, Issue 1
January 2011
160 pages
ISSN:0163-5980
DOI:10.1145/1945023
Issue’s Table of Contents

Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 February 2011
Check for updates
Author Tags
asymmetry
heterogeneous multicore
virtualization
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 651
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

GPU virtualization for high performance general purpose computing on the ESX hypervisor

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications

Virtualizing performance asymmetric multi-core systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

GPU virtualization for high performance general purpose computing on the ESX hypervisor

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications

Virtualizing performance asymmetric multi-core systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media