skip to main content
research-article

Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

Published:18 February 2011Publication History
Skip Abstract Section

Abstract

Trends indicate a rapid increase in the number of cores on chip, exhibiting various types of performance and functional asymmetries present in hardware to gain scalability with balanced power vs. performance requirements. This poses new challenges in platform resource management, which are further exacerbated by the need for runtime power budgeting and by the increased dynamics in workload behavior observed in consolidated datacenter and cloudcomputing systems. This paper considers the implications of these challenges for the virtualization layer of abstraction, which is the base layer for resource management in such heterogeneous multicore platforms. Specifically, while existing and upcoming management methods routinely leverage system-level information available to the hypervisor about current and global platform state, we argue that for future systems there will be an increased necessity for additional information about applications and their needs. This 'end-to-end' argument leads us to propose 'performance points' as a general interface between the virtualization system and higher layers like the guest operating systems that run application workloads. Building on concrete examples from past work on APIs with which applications can inform systems of phase or workload changes and conversely, with which systems can indicate to applications desired changes in power consumption, performance points are shown to be an effective way to better exploit asymmetries and gain the power/performance improvements promised by heterogeneous multicore systems.

References

  1. A. Baumann, P. Barham, P.-E. Dagand, et al. The multikernel: a new os architecture for scalable multicore systems. In SOSP, pages 29--44, Big Sky, Montana, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Chisnall. The Definitive Guide to the Xen Hypervisor. Prentice Hall Open Source Software Development Series, 1st edition, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. D. Clark. The structuring of systems using upcalls. In SOSP, pages 171--180, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. DARPA. Ubiquitous high performance computing. https://www.fbo.gov/utils/view?id=914fa5f0a69d7bedce157d916cc97b6e, 2010.Google ScholarGoogle Scholar
  5. G. Diamos, A. Kerr, S. Yalamanchili, et al. Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In PACT, Vienna, Austria, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Diamos and S. Yalamanchili. Harmony: An execution model and runtime for heterogeneous many core systems. In HPDC Hot Topics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Diamos and S. Yalamanchili. Speculative execution on Multi-GPU systems. In IPDPS, Atlanta, USA, 4 2010.Google ScholarGoogle Scholar
  8. J. C. Doyle, B. A. Francis, and A. R. Tannenbaum. Feedback Control Theory. Dover Publications, 2009.Google ScholarGoogle Scholar
  9. P. Du, P. Luszczek, and J. Dongarra. Opencl evaluation for numerical linear algebra library development. In SAAHPC, Knoxville, USA, 2010.Google ScholarGoogle Scholar
  10. P. Dubey. Recognition, mining and synthesis moves computers to the era of tera. In Intel Technology Journal, pages 1--10, 2005.Google ScholarGoogle Scholar
  11. B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In OSDI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. Enabling task parallelism in the cuda scheduler. In PMEA, Raleigh, USA, 2009.Google ScholarGoogle Scholar
  13. V. Gupta, A. Gavrilovska, K. Schwan, et al. Gvim: Gpu-accelerated virtual machines. In HPCVirt, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Gupta, J. Xenidis, K. Schwan, et al. Cellule: Lightweight execution environment for accelerator-based systems. Technical Report GIT-CERCS-10-03, Georgia Tech, 2010.Google ScholarGoogle Scholar
  15. Intel Corporation. Enabling consistent platform-level services for tightly coupled accelerators. http://tinyurl.com/cler3n.Google ScholarGoogle Scholar
  16. V. Kazempour, A. Kamali, and A. Fedorova. Aash: an asymmetry-aware scheduler for hypervisors. In VEE, Pittsburgh, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. IEEE Workload Characterization Symposium, 0:3--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Kesavan, A. Gavrilovska, and K. Schwan. Differential virtual time (dvt): Rethinking i/o service differentiation for virtual machines. In SoCC, Indianapolis, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Knauerhase, P. Brett, B. Hohlt, et al. Using os observations to improve performance in multicore systems. IEEE Micro, 28(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In EuroSys, Paris, France, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Krasic, M. Saubhasik, A. Sinha, et al. Fair and timely scheduling via cooperative polling. In EuroSys, Nuremberg, Germany, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Kubaska. Scc platform overview. http://communities.intel.com/docs/DOC-5512.Google ScholarGoogle Scholar
  23. R. Kumar, D. M. Tullsenand, P. Ranganathan, et al. Single-isa heterogeneous multi-core architectures for multithreaded workload performance. In ISCA, München, Germany, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Kumar, V. Talwar, V. Kumar, et al. A loosely coupled approach to coordinating platform and virtualization management in data centers. In ICAC, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Kumar, V. Talwar, P. Ranganathan, et al. M-channels and m-brokers: Coordinated management in virtualized systems. In MMCS, June 2008.Google ScholarGoogle Scholar
  26. J. Lange, K. Pedretti, P. Dinda, et al. Palacios: A new open source virtual machine monitor for scalable high performance computing. In IPDPS, Atlanta, USA, 2010.Google ScholarGoogle Scholar
  27. J. Levon. Oprofile manual. http://oprofile.sourceforge.net/doc/index.html, 2000.Google ScholarGoogle Scholar
  28. T. Li, P. Brett, B. Hohlt, R. Knauerhase, et al. Operating system support for shared-isa asymmetric multi-core architectures. In WIOSCA, Beijing, China, 2008.Google ScholarGoogle Scholar
  29. D. Luebke, M. Harris, J. Krüger, et al. Gpgpu: general purpose computation on graphics hardware. In SIGGRAPH, Los Angeles, CA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Micro-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Marcial. The ice financial application. https://www.theice.com/homepage.jhtml, August 2010. Private Communication.Google ScholarGoogle Scholar
  32. R. Nathuji and K. Schwan. Virtualpower: coordinated power management in virtualized enterprise systems. In SOSP, Stevenson, Washington, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. B. Nightingale, O. Hodson, et al. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. NVIDIA. Nvidia cuda compute unified device architecture - programming guide. http://tinyurl.com/cx3tl3, June 2007.Google ScholarGoogle Scholar
  35. NVIDIA. Nvidia tesla c870. http://www.nvidia.com/object/tesla_c870.html, Dec. 2007.Google ScholarGoogle Scholar
  36. OW2 Consortium. Rubis: Rice university bidding system. http://rubis.ow2.org/index.html.Google ScholarGoogle Scholar
  37. A. Ranadive, A. Gavrilovska, and K. Schwan. Fares: Fair resource scheduling for vmm-bypass infiniband devices. In CCGRID, pages 418--427, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Rao and K. Schwan. vnuma-mgr : Managing vm memory on numa platforms. In HiPC, Goa, India, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  39. B. Ravindran. On recent advances in time/utility function real-time scheduling and resource management. In IEEE ISORC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilley, 1 edition, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. I. N. Release. Intel unveils new product plans for high-performance computing. http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm.Google ScholarGoogle Scholar
  42. D. Rosu, K. Schwan, S. Yalamanchili, et al. On adaptive resource allocation for complex real-time applications. In RTSS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. C. Saez, M. Prieto, A. Fedorova, et al. A comprehensive scheduler for asymmetric multicore systems. In EuroSys, Paris, France, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Trans. Comput. Syst., 2(4):277--288, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. R. Santos, Y. Turner, and J. Mudigonda. Taming heterogeneous nic capabilities for i/o virtualization. In Workshop on I/O Virtualization, San Diego, CA, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. L. Seiler, D. Carmean, E. Sprangle, et al. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3):1--15, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. L. Shimpi. Intel's sandy bridge architecture exposed. http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed.Google ScholarGoogle Scholar
  48. J. Sreeram and S. Pande. Glimpses: A profiling tool for rapid spe code prototyping. In Workshop on New Horizons in Compilers, Goa, India, 2007.Google ScholarGoogle Scholar
  49. P. Tembey, A. Gavrilovska, and K. Schwan. A case for coordinated resource management in heterogeneous multicore platforms. In WIOSCA, St. Malo, France, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. G. Teodoro, T. D. R. Hartley, U. Catalyurek, et al. Run-time optimizations for replicated dataflows on heterogeneous environments. In HPDC, Chicago, Illinois, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Torrellas, A. Tucker, and A. Gupta. Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary. SIGMETRICS Perform. Eval. Rev., 21(1), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. V. Uhlig, J. LeVasseur, E. Skoglund, et al. Towards scalable multiprocessor virtual machines. In VM, pages 4--4, San Jose, California, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. Velte and T. Velte. Microsoft Virtualization with Hyper-V. McGraw-Hill, Inc., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. C. A. Waldspurger. Memory resource management in vmware esx server. In OSDI, Boston, USA, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGOPS Operating Systems Review
        ACM SIGOPS Operating Systems Review  Volume 45, Issue 1
        January 2011
        160 pages
        ISSN:0163-5980
        DOI:10.1145/1945023
        Issue’s Table of Contents

        Copyright © 2011 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 February 2011

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader