research-article

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

Authors:
Niket K. Choudhary

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Salil V. Wadhavkar

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Tanmay A. Shah

Intel Corporation, Hillsboro, OR, USA

Intel Corporation, Hillsboro, OR, USA
View Profile

,
Hiran Mayukh

University of Wisconsin - Madison, Madison, WI, USA

University of Wisconsin - Madison, Madison, WI, USA
View Profile

,
Jayneel Gandhi

University of Wisconsin - Madison, Madison, WI, USA

University of Wisconsin - Madison, Madison, WI, USA
View Profile

,
Brandon H. Dwiel

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Sandeep Navada

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Hashem H. Najaf-abadi

Intel Corporation, Folsom, CA, USA

Intel Corporation, Folsom, CA, USA
View Profile

,
Eric Rotenberg

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

ISCA '11: Proceedings of the 38th annual international symposium on Computer architectureJune 2011Pages 11–22https://doi.org/10.1145/2000064.2000067

Published:04 June 2011Publication History

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Pages 11–22

ABSTRACT

A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types.

This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

Supplemental Material

isca_1_2.mp4

mp4

162.5 MB

Download

References

M. Anderson. A More Cerebral Cortex. IEEE Spectrum, pp. 58--63, Jan. 2010. Google ScholarDigital Library
M. D. Brown, J. Stark, Y. N. Patt. Select-Free Instruction Scheduling Logic. 34th Int'l Symp. on Microarch., Dec. 2001. Google ScholarDigital Library
D. Burger, T. M. Austin, S. Bennett. Evaluating Future Microprocessors: The SimpleScalar ToolSet. University of Wisconsin-Madison Technical Report CS-TR-1308, 1996.Google Scholar
J.C. Dehnert, B.K. Grant, J.P. Banning, R. Johnson, T. Kistler, A. Klaiber, J. Mattson. The Transmeta Code Morphing" Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-life Challenges. Int'l Symp. on Code Generation and Optimization, March 2003. Google ScholarDigital Library
M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, July 2008. Google ScholarDigital Library
V. Kathail, S. Aditya, R. Schreiber, B. Ramakrishna Rau, D. C. Cronquist, M. Sivaraman. PICO: Automatically Designing Custom Computers. IEEE Computer, 35(9):39--47, Sep. 2002. Google ScholarDigital Library
K. R. Kishore, V. Rajagopalan, G. Beloev, R. Thekkath. Architectural Strengths of the MIPS32 74K Core Family. White Paper, May 2000.Google Scholar
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction. Int'l Symposium on Microarchitecture, Dec. 2003. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. 31st Int'l Symposium on Computer Architecture, June 2004. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, and N. P. Jouppi. Core Architecture Optimization for Heterogeneous Chip Multiprocessors. 15th Int'l Symposium on Parallel Architecture and Compilation Techniques, Sep. 2006. Google ScholarDigital Library
B. C. Lee and D. M. Brooks. Efficiency Trends and Limits from Comprehensive Microarchitectural Adaptivity. 13th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, March 2008. Google ScholarDigital Library
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, N. P. Jouppi. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. 42nd Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
S. McFarling. Combining Branch Predictors. DEC WRL TN-36, 1993.Google Scholar
E. J. McLellan, D. A. Webb. The Alpha 21264 Microprocessor Architecture. Int'l Conference on Computer Design, Oct. 1998. Google ScholarDigital Library
T. Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguadé. Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors. Computer Architecture Letters (CAL), 5(1):14--17, 2006. Google ScholarDigital Library
H. H. Najaf-abadi, E. Rotenberg. Configurational Workload Characterization. ISPASS, 2008. Google ScholarDigital Library
H. H. Najaf-abadi, E. Rotenberg. Architectural Contesting. 15th Int'l Symp. on High-Perf. Comp. Arch., Feb. 2009.Google Scholar
H. H. Najaf-abadi, N. K. Choudhary, and E. Rotenberg. Core-Selectability in Chip Multiprocessors. 18th Int'l Conference on Parallel Architectures and Compilation Techniques, Sep. 2009. Google ScholarDigital Library
S. Palacharla, N. P. Jouppi, J. E. Smith. Complexity-effective Superscalar Processors. Int'l Symposium on Computer Architecture, June 1997. Google ScholarDigital Library
A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-block Ahead Branch Predictors. 7th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarDigital Library
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. 10th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002. Google ScholarDigital Library
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner. POWER5 System Microarchitecture. IBM Journal of Research and Development, 49(4/5):505--521, July 2005. Google ScholarDigital Library
J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, R. Jenkal. FreePDK: An Open-Source Variation-Aware Design Kit. Int'l Conference on Microelectronic Systems Education, 2007. Google ScholarDigital Library
L. Strozek, D. Brooks. Efficient Architectures through Application Clustering and Architectural Heterogeneity. Int'l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2006. Google ScholarDigital Library
M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures. 14th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, March 2009. Google ScholarDigital Library
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, N. P. Jouppi. CACTI 5.1. Tech. Report HPL-2008-20, HP Labs, 2008.Google Scholar
N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel. Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline. Int'l Conference on Dependable Systems and Networks (DSN), 2004. Google ScholarDigital Library
http://www.tensilica.com/products/xtensa-customizable.htmGoogle Scholar
http://www.mips.com/media/files/74k/MIPS_74K_509.pdfGoogle Scholar
J. Gandhi. FabFetch: A Synthesizable RTL Model of a Pipelined Instruction Fetch Unit for Superscalar Processors. M.S. Thesis, ECE Dep't, NC State University, June 2010.Google Scholar
H. Mayukh. FabIssue: Automatic RTL Generation of Issue Logic in Superscalar Processors for Core Customization. M.S. Thesis, ECE Dep't, NC State University, June 2010.Google Scholar
T. A. Shah. FabMem: A Multiported RAM and CAM Compiler for Superscalar Design Space Exploration. M.S. Thesis, ECE Dep't, NC State University, May 2010.Google Scholar

Index Terms

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
    2. Serial architectures
      1. Pipeline computing
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis

Recommendations

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template
ISCA '11

A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse ...
Read More
FabScalar: Automating Superscalar Core Design

Providing multiple superscalar core types on a chip, each tailored to different classes of instruction-level behavior, is an exciting direction for increasing processor performance and energy efficiency. Unfortunately, processor design and verification ...
Read More
Tuning the continual flow pipeline architecture with virtual register renaming

Continual Flow Pipelines (CFPs) allow a processor core to process hundreds of in-flight instructions without increasing cycle-critical pipeline resources. When a load misses the data cache, CFP checkpoints the processor register state and then moves all ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
June 2011
488 pages
ISBN:9781450304726
DOI:10.1145/2000064
General Chairs:
Ravi Iyer
Intel
,
Qing Yang
University of Rhode Island
,
Program Chair:
Antonio González
Intel and UPC
ACM SIGARCH Computer Architecture News Volume 39, Issue 3
ISCA '11
June 2011
462 pages
ISSN:0163-5964
DOI:10.1145/2024723
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
custom processors
heterogeneous (asymmetric) multi-core
instruction-level parallelism (ilp)
superscalar processors
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 928
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

FabScalar: Automating Superscalar Core Design

Tuning the continual flow pipeline architecture with virtual register renaming