ABSTRACT
A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types.
This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.
Supplemental Material
- M. Anderson. A More Cerebral Cortex. IEEE Spectrum, pp. 58--63, Jan. 2010. Google ScholarDigital Library
- M. D. Brown, J. Stark, Y. N. Patt. Select-Free Instruction Scheduling Logic. 34th Int'l Symp. on Microarch., Dec. 2001. Google ScholarDigital Library
- D. Burger, T. M. Austin, S. Bennett. Evaluating Future Microprocessors: The SimpleScalar ToolSet. University of Wisconsin-Madison Technical Report CS-TR-1308, 1996.Google Scholar
- J.C. Dehnert, B.K. Grant, J.P. Banning, R. Johnson, T. Kistler, A. Klaiber, J. Mattson. The Transmeta Code Morphing" Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-life Challenges. Int'l Symp. on Code Generation and Optimization, March 2003. Google ScholarDigital Library
- M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, July 2008. Google ScholarDigital Library
- V. Kathail, S. Aditya, R. Schreiber, B. Ramakrishna Rau, D. C. Cronquist, M. Sivaraman. PICO: Automatically Designing Custom Computers. IEEE Computer, 35(9):39--47, Sep. 2002. Google ScholarDigital Library
- K. R. Kishore, V. Rajagopalan, G. Beloev, R. Thekkath. Architectural Strengths of the MIPS32 74K Core Family. White Paper, May 2000.Google Scholar
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction. Int'l Symposium on Microarchitecture, Dec. 2003. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. 31st Int'l Symposium on Computer Architecture, June 2004. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, and N. P. Jouppi. Core Architecture Optimization for Heterogeneous Chip Multiprocessors. 15th Int'l Symposium on Parallel Architecture and Compilation Techniques, Sep. 2006. Google ScholarDigital Library
- B. C. Lee and D. M. Brooks. Efficiency Trends and Limits from Comprehensive Microarchitectural Adaptivity. 13th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, March 2008. Google ScholarDigital Library
- S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, N. P. Jouppi. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. 42nd Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
- S. McFarling. Combining Branch Predictors. DEC WRL TN-36, 1993.Google Scholar
- E. J. McLellan, D. A. Webb. The Alpha 21264 Microprocessor Architecture. Int'l Conference on Computer Design, Oct. 1998. Google ScholarDigital Library
- T. Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguadé. Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors. Computer Architecture Letters (CAL), 5(1):14--17, 2006. Google ScholarDigital Library
- H. H. Najaf-abadi, E. Rotenberg. Configurational Workload Characterization. ISPASS, 2008. Google ScholarDigital Library
- H. H. Najaf-abadi, E. Rotenberg. Architectural Contesting. 15th Int'l Symp. on High-Perf. Comp. Arch., Feb. 2009.Google Scholar
- H. H. Najaf-abadi, N. K. Choudhary, and E. Rotenberg. Core-Selectability in Chip Multiprocessors. 18th Int'l Conference on Parallel Architectures and Compilation Techniques, Sep. 2009. Google ScholarDigital Library
- S. Palacharla, N. P. Jouppi, J. E. Smith. Complexity-effective Superscalar Processors. Int'l Symposium on Computer Architecture, June 1997. Google ScholarDigital Library
- A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-block Ahead Branch Predictors. 7th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. 10th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002. Google ScholarDigital Library
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner. POWER5 System Microarchitecture. IBM Journal of Research and Development, 49(4/5):505--521, July 2005. Google ScholarDigital Library
- J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, R. Jenkal. FreePDK: An Open-Source Variation-Aware Design Kit. Int'l Conference on Microelectronic Systems Education, 2007. Google ScholarDigital Library
- L. Strozek, D. Brooks. Efficient Architectures through Application Clustering and Architectural Heterogeneity. Int'l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2006. Google ScholarDigital Library
- M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures. 14th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, March 2009. Google ScholarDigital Library
- S. Thoziyoor, N. Muralimanohar, J. H. Ahn, N. P. Jouppi. CACTI 5.1. Tech. Report HPL-2008-20, HP Labs, 2008.Google Scholar
- N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel. Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline. Int'l Conference on Dependable Systems and Networks (DSN), 2004. Google ScholarDigital Library
- http://www.tensilica.com/products/xtensa-customizable.htmGoogle Scholar
- http://www.mips.com/media/files/74k/MIPS_74K_509.pdfGoogle Scholar
- J. Gandhi. FabFetch: A Synthesizable RTL Model of a Pipelined Instruction Fetch Unit for Superscalar Processors. M.S. Thesis, ECE Dep't, NC State University, June 2010.Google Scholar
- H. Mayukh. FabIssue: Automatic RTL Generation of Issue Logic in Superscalar Processors for Core Customization. M.S. Thesis, ECE Dep't, NC State University, June 2010.Google Scholar
- T. A. Shah. FabMem: A Multiported RAM and CAM Compiler for Superscalar Design Space Exploration. M.S. Thesis, ECE Dep't, NC State University, May 2010.Google Scholar
Index Terms
- FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template
Recommendations
FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template
ISCA '11A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse ...
FabScalar: Automating Superscalar Core Design
Providing multiple superscalar core types on a chip, each tailored to different classes of instruction-level behavior, is an exciting direction for increasing processor performance and energy efficiency. Unfortunately, processor design and verification ...
Tuning the continual flow pipeline architecture with virtual register renaming
Continual Flow Pipelines (CFPs) allow a processor core to process hundreds of in-flight instructions without increasing cycle-critical pipeline resources. When a load misses the data cache, CFP checkpoints the processor register state and then moves all ...
Comments