ABSTRACT
Previous studies have demonstrated the advantages of single-ISA heterogeneous multi-core architectures for power and performance. However, none of those studies examined how to design such a processor; instead, they started with an assumed combination of pre-existing cores.This work assumes the flexibility to design a multi-core architecture from the ground up and seeks to address the following question: what should be the characteristics of the cores for a heterogeneous multi processor for the highest area or power efficiency? The study is done for varying degrees of thread-level parallelism and for different area and power budgets.The most efficient chip multiprocessors are shown to be heterogeneous, with each core customized to a different subset of application characteristics - no single core is necessarily well suited to all applications. The performance ordering of cores on such processors is different for different applications; there is only a partial ordering among cores in terms of resources and complexity. This methodology produces performance gains as high as 40%. The performance improvements come with the added cost of customization.
- International Technology Roadmap for Semiconductors 2003, http://public.itrs.net.Google Scholar
- M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's Law Through EPI Throttling. In Proceedings of International Symposium on Computer Architecture, 2005. Google ScholarDigital Library
- S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of International Symposium on Computer Architecture, 2005. Google ScholarDigital Library
- S. Ghiasi and D. Grunwald. Aide de camp: Asymmetric dual core design for power and energy reduction. In University of Colorado Technical Report CU-CS-964-03, 2003.Google Scholar
- S. Ghiasi, T. Keller, and F. Rawson. Scheduling for heterogeneous processors in server systems. In Proceedings of Computing Frontiers, 2005. Google ScholarDigital Library
- E. Grochowski, R. Ronen, J. Shen, and H. Wang. Best of both latency and throughput. In Proceedings of IEEE International Conference on Computer Design, 2004. Google ScholarDigital Library
- S. Gupta, S. Keckler, and D. Burger. Technology independent area and delay estimates for microprocessor building blocks. In University of Texas at Austin Technical Report TR-00-05, 1998. Google ScholarDigital Library
- J. Huh, S. W. Keckler, and D. Burger. Exploring the design space of future CMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2001. Google ScholarDigital Library
- R. Kotla, A. Devgan, S. Ghiasi, T. Keller, and F. Rawson. Characterizing the impact of different memory-intensity levels. In Proceedings of IEEE 7th Annual Workshop on Workload Characterization, 2004.Google ScholarCross Ref
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction. In International Symposium on Microarchitecture, Dec. 2003. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance. In International Symposium on Computer Architecture, June 2004. Google ScholarDigital Library
- R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, 2005. Google ScholarDigital Library
- J. Li and J. Martinez. Power-performance implications of thread-level parallelism in chip multiprocessors. In Proceedings of International Symposium on Performance Analysis of Systems and Software, 2005. Google ScholarDigital Library
- S. McFarling. Combining branch predictors. Technical Report TN-36, DEC-WRL, June 1993.Google Scholar
- T. Morad, U. Weiser, and A. Kolodny. ACCMP - assymetric cluster chip-multiprocessing. In CCIT Technical Report 488, 2004.Google Scholar
- T. Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguade. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. In Computer Architecture Letters, Vol 4, July 2005. Google ScholarDigital Library
- J. M. Mulder, N. T. Quach, and M. J. Flynn. An area model for on-chip memories and its applications. In IEEE Journal of Solid State Circuits, Vol 26, No. 2, Feb. 1991.Google ScholarCross Ref
- E. Rich and K. Knight. Artificial Intelligence, 2nd Edition. Morgan Kaufmann, 1991. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Tenth International Comference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002. Google ScholarDigital Library
- P. Shivakumar and N. Jouppi. CACTI 3.0: An integrated cache timing, power and area model. In Technical Report 2001/2, Compaq Computer Corporation, Aug. 2001.Google Scholar
- A. Snavely and D. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading architecture. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, Nov. 2000. Google ScholarDigital Library
- D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In 22nd Annual Computer Measurement Group Conference, Dec. 1996.Google ScholarDigital Library
Index Terms
- Core architecture optimization for heterogeneous chip multiprocessors
Recommendations
Heterogeneous Chip Multiprocessors
Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahl's law. On-chip heterogeneity allows the processor to better match execution resources to ...
Compute Unified Device Architecture Application Suitability
Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a ...
A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesA single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is ...
Comments