skip to main content
10.1145/2000064.2000067acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

Published:04 June 2011Publication History

ABSTRACT

A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types.

This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

Skip Supplemental Material Section

Supplemental Material

isca_1_2.mp4

mp4

162.5 MB

References

  1. M. Anderson. A More Cerebral Cortex. IEEE Spectrum, pp. 58--63, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. D. Brown, J. Stark, Y. N. Patt. Select-Free Instruction Scheduling Logic. 34th Int'l Symp. on Microarch., Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Burger, T. M. Austin, S. Bennett. Evaluating Future Microprocessors: The SimpleScalar ToolSet. University of Wisconsin-Madison Technical Report CS-TR-1308, 1996.Google ScholarGoogle Scholar
  4. J.C. Dehnert, B.K. Grant, J.P. Banning, R. Johnson, T. Kistler, A. Klaiber, J. Mattson. The Transmeta Code Morphing" Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-life Challenges. Int'l Symp. on Code Generation and Optimization, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Kathail, S. Aditya, R. Schreiber, B. Ramakrishna Rau, D. C. Cronquist, M. Sivaraman. PICO: Automatically Designing Custom Computers. IEEE Computer, 35(9):39--47, Sep. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. R. Kishore, V. Rajagopalan, G. Beloev, R. Thekkath. Architectural Strengths of the MIPS32 74K Core Family. White Paper, May 2000.Google ScholarGoogle Scholar
  8. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction. Int'l Symposium on Microarchitecture, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. 31st Int'l Symposium on Computer Architecture, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Kumar, D. M. Tullsen, and N. P. Jouppi. Core Architecture Optimization for Heterogeneous Chip Multiprocessors. 15th Int'l Symposium on Parallel Architecture and Compilation Techniques, Sep. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. C. Lee and D. M. Brooks. Efficiency Trends and Limits from Comprehensive Microarchitectural Adaptivity. 13th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, N. P. Jouppi. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. 42nd Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. McFarling. Combining Branch Predictors. DEC WRL TN-36, 1993.Google ScholarGoogle Scholar
  14. E. J. McLellan, D. A. Webb. The Alpha 21264 Microprocessor Architecture. Int'l Conference on Computer Design, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguadé. Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors. Computer Architecture Letters (CAL), 5(1):14--17, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. H. Najaf-abadi, E. Rotenberg. Configurational Workload Characterization. ISPASS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. H. Najaf-abadi, E. Rotenberg. Architectural Contesting. 15th Int'l Symp. on High-Perf. Comp. Arch., Feb. 2009.Google ScholarGoogle Scholar
  18. H. H. Najaf-abadi, N. K. Choudhary, and E. Rotenberg. Core-Selectability in Chip Multiprocessors. 18th Int'l Conference on Parallel Architectures and Compilation Techniques, Sep. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Palacharla, N. P. Jouppi, J. E. Smith. Complexity-effective Superscalar Processors. Int'l Symposium on Computer Architecture, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-block Ahead Branch Predictors. 7th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. 10th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner. POWER5 System Microarchitecture. IBM Journal of Research and Development, 49(4/5):505--521, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, R. Jenkal. FreePDK: An Open-Source Variation-Aware Design Kit. Int'l Conference on Microelectronic Systems Education, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Strozek, D. Brooks. Efficient Architectures through Application Clustering and Architectural Heterogeneity. Int'l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures. 14th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, March 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Thoziyoor, N. Muralimanohar, J. H. Ahn, N. P. Jouppi. CACTI 5.1. Tech. Report HPL-2008-20, HP Labs, 2008.Google ScholarGoogle Scholar
  27. N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel. Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline. Int'l Conference on Dependable Systems and Networks (DSN), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. http://www.tensilica.com/products/xtensa-customizable.htmGoogle ScholarGoogle Scholar
  29. http://www.mips.com/media/files/74k/MIPS_74K_509.pdfGoogle ScholarGoogle Scholar
  30. J. Gandhi. FabFetch: A Synthesizable RTL Model of a Pipelined Instruction Fetch Unit for Superscalar Processors. M.S. Thesis, ECE Dep't, NC State University, June 2010.Google ScholarGoogle Scholar
  31. H. Mayukh. FabIssue: Automatic RTL Generation of Issue Logic in Superscalar Processors for Core Customization. M.S. Thesis, ECE Dep't, NC State University, June 2010.Google ScholarGoogle Scholar
  32. T. A. Shah. FabMem: A Multiported RAM and CAM Compiler for Superscalar Design Space Exploration. M.S. Thesis, ECE Dep't, NC State University, May 2010.Google ScholarGoogle Scholar

Index Terms

  1. FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
          June 2011
          488 pages
          ISBN:9781450304726
          DOI:10.1145/2000064
          • cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
            ISCA '11
            June 2011
            462 pages
            ISSN:0163-5964
            DOI:10.1145/2024723
            Issue’s Table of Contents

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 June 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate543of3,203submissions,17%

          Upcoming Conference

          ISCA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader