Abstract
Heterogeneous processors that mix big high performance cores with small low power cores promise excellent single-threaded performance coupled with high multi-threaded throughput and higher performance-per-watt. A significant portion of the commercial multicore heterogeneous processors are likely to have a common instruction set architecture( ISA). However, due to limited design resources and goals, each core is likely to contain ISA extensions not yet implemented in the other core. Therefore, such heterogeneous processors will have inherent functional asymmetry at the ISA level and face significant software challenges. This paper analyzes the software challenges to the operating system and the application layer software on a heterogeneous system with functional asymmetry, where the ISA of the small and big cores overlaps. We look at the widely deployed Intel® Architecture and propose solutions to the software challenges that arise when a heterogeneous processor is designed around it. We broadly categorize functional asymmetries into those that can be exposed to application software and those that should be handled by system software. While one can argue that new software written should be heterogeneity-aware, it is important that we find ways in which legacy software can extract the best performance from heterogeneous multicore systems.
- M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 298--309, June 2005. Google ScholarDigital Library
- A. Barak and O. La'adan. The mosix multicomputer operating system for high performance computing. In Future Generation Computer Systems, 1998. Google ScholarDigital Library
- P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating System Principles, New York, NY, USA, Oct. 2003. ACM. Google ScholarDigital Library
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Sch?upbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating System Principles, pages 29--44, New York, NY, USA, Oct. 2009. ACM. Google ScholarDigital Library
- F. Bellard. Qemu, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference, Apr. 2005. Google ScholarDigital Library
- F. A. Bower, D. J. Sorin, and L. P. Cox. The impact of dynamically heterogeneous multicore processors on thread scheduling. IEEE Micro, 28(3):17--25, May/Jun 2008. Google ScholarDigital Library
- S. Ghiasi, T. Keller, and F. Rawson. Scheduling for heterogeneous processors in server systems. In Proceedings of the 2nd Conference on Computing Frontiers, pages 199--210, May 2005. Google ScholarDigital Library
- M. Gschwind. The Cell* broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor. International Journal of Parallel Programming, 35(3), June 2007. Google ScholarDigital Library
- R. A. Hankins, G. N. Chinya, J. D. Collins, P. H. Wang, R. Rakvic, H. Wang, and J. P. Shen. Multiple instruction stream processor. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 114--127, June 2006. Google ScholarDigital Library
- M. Hill and M. Marty. Amdahl's law in the multicore era. IEEE Computer, 41(7):33--38, July 2008. Google ScholarDigital Library
- Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2: Instruction Set Reference. Intel Corporation, June 2009.Google Scholar
- Intel Corporation. Intel® Virtualization Technology FlexMigration Application Note 323850. http://www.intel.com/Assets/PDF/manual/323850.pdf, May 2010.Google Scholar
- R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS observations to improve performance in multi-core systems. IEEE Micro, 28(3):54--66, May 2008. Google ScholarDigital Library
- D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the Fifth European conference on Computer Systems, New York, NY, USA, Apr. 2010. ACM. Google ScholarDigital Library
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 81--92, Dec. 2003. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 64--75, June 2004. Google ScholarDigital Library
- T. Li, D. Baumberger, D. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, Nov. 2007. Google ScholarDigital Library
- T. Li, P. Brett, B. Hohlt, R. Knauerhase, S. D. McElderry, and S. Hahn. Operating system support for shared-isa asymmetric multi-core architectures. In Proceedings of the Fifth Annual Workshop on the Interaction between Operating Systems and Computer Architecture, June 2009.Google Scholar
- T. Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn. Operating system support for overlapping-isa heterogeneous multi-core architectures. In Proceedings of the Sixteenth International Symposium on High-Performance Computer Architecture, Jan. 2010.Google Scholar
- C. Morin, R. Lottiaux, G. Vall Ãlé, P. Gallard, G. Utard, R. Badrinath, and L. Rilling. Kerrighed: a single system image cluster operating system for high performance computing. In Proceedings of the 9th International Euro-Par Conference, Aug. 2003.Google ScholarCross Ref
- E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In Proceedings of the 22nd ACM Symposium on Operating System Principles, pages 221--234, New York, NY, USA, Oct. 2009. ACM. Google ScholarDigital Library
- NVIDIA. NVIDIA CUDA Programming Guide, Version 1.1. NVIDIA Corporation, Nov. 2007.Google Scholar
- S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive schedling for smt processors. 2000.Google Scholar
- D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first generation CELL processor. In IEEE International Solid-State Circuits Conference Digest of Technical Papers, pages 184--185, Feb. 2005.Google ScholarCross Ref
- M. Rosenblum and T. Garfinkel. Virtual machine monitors: Current technology and future trends. In Computer, volume 38, pages 39--47, May 2005. Google ScholarDigital Library
- J. C. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of the Fifth European conference on Computer Systems, pages 139--152, New York, NY, USA, Apr. 2010. ACM. Google ScholarDigital Library
- S. Saisanthosh Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 506--517, June 2005. Google ScholarDigital Library
- D. Shelepov and A. Fedorova. Scheduling on heterogeneous multicore processors using architectural signatures. In Proceedings of the Fourth Annual Workshop on the Interaction between Operating Systems and Computer Architecture, June 2008.Google Scholar
- P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, pages 156--166, June 2007. Google ScholarDigital Library
Index Terms
- Bridging functional heterogeneity in multicore architectures
Recommendations
Massively LDPC Decoding on Multicore Architectures
Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed ...
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThree out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-...
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...
Comments