article

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Authors:
Michael Bedford Taylor

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Walter Lee

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Jason Miller

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
David Wentzlaff

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Ian Bratt

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Ben Greenwald

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Henry Hoffmann

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Paul Johnson

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Jason Kim

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
James Psota

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Arvind Saraf

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Nathan Shnidman

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Volker Strumpen

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Matt Frank

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Saman Amarasinghe

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

,
Anant Agarwal

CSAIL, Massachusetts Institute of Technology

CSAIL, Massachusetts Institute of Technology
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 32 Issue 2March 2004https://doi.org/10.1145/1028176.1006733

Published:02 March 2004Publication History

ACM SIGARCH Computer Architecture News

Abstract

This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running existingILP-based sequential programs with reasonable performance in theface of increasing wire delays. Raw approaches this challenge byimplementing plenty of on-chip resources - including logic, wires,and pins - in a tiled arrangement, and exposing them through a newISA, so that the software can take advantage of these resources forparallel applications. Raw supports both ILP and streams by routingoperands between architecturally-exposed functional units overa point-to-point scalar operand network. This network offers lowlatency for scalar data transport. Raw manages the effect of wiredelays by exposing the interconnect and using software to orchestrateboth scalar and stream data transport.We have implemented a prototype Raw microprocessor in IBM's180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. Wehave also implemented ILP and stream compilers. Our evaluationattempts to determine the extent to which Raw succeeds in meetingits goal of serving as a more versatile, general-purpose processor.Central to achieving this goal is Raw's ability to exploit all formsof parallelism, including ILP, DLP, TLP, and Stream parallelism.Specifically, we evaluate the performance of Raw on a diverse setof codes including traditional sequential programs, streaming applications,server workloads and bit-level embedded computation.Our experimental methodology makes use of a cycle-accurate simulatorvalidated against our real hardware. Compared to a 180 nmPentium-III, using commodity PC memory system components, Rawperforms within a factor of 2x for sequential applications with a verylow degree of ILP, about 2x to 9x better for higher levels of ILP, and10x-100x better when highly parallel applications are coded in astream language or optimized by hand. The paper also proposes anew versatility metric and uses it to discuss the generality of Raw.

References

{1} V. Agarwal, et al. Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. 2000 ISCA, pp. 248-259. Google ScholarDigital Library
{2} E. Anderson, et al. LAPACK: A Portable Linear Algebra Library for High-Performance Computers. 1990 ICS, pp. 2-11. Google ScholarDigital Library
{3} M. Annaratone, et al. The Warp Computer: Architecture, Implementation and Performance. IEEE Transactions on Computers 36, 12 (December 1987), pp. 1523- 1538. Google ScholarDigital Library
{4} J. Babb, et al. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. 1997 FCCM, pp. 134-143. Google ScholarDigital Library
{5} R. Barua, et al. Maps: A Compiler-Managed Memory System for Raw Machines. 1999 ISCA, pp. 4-15. Google ScholarDigital Library
{6} M. Bohr. Interconnect Scaling - The Real Limiter to High Performance ULSI. 1995 IEDM, pp. 241-244.Google Scholar
{7} D. Chinnery, et al. Closing the Gap Between ASIC & Custom. Kluwer Academic Publishers, 2002.Google Scholar
{8} K. Diefendorff. Intel Raises the Ante With P858. Microprocessor Report (January 1999), pp. 22-25.Google Scholar
{9} R. Espasa, et al. Tarantula: A Vector Extension to the Alpha Architecture. 2002 ISCA, pp. 281-292. Google ScholarDigital Library
{10} S. Goldstein, et al. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. 1999 ISCA, pp. 28-39. Google ScholarDigital Library
{11} M. I. Gordon, et al. A Stream Compiler for Communication-Exposed Architectures. 2002 ASPLOS, pp. 291-303. Google ScholarDigital Library
{12} T. Gross, et al. iWarp, Anatomy of a Parallel Computing System. The MIT Press, Cambridge, MA, 1998. Google ScholarDigital Library
{13} L. Gwennap. Coppermine Outruns Athlon. Microprocessor Report (October 1999), p. 1.Google Scholar
{14} J. R. Hauser, et al. Garp: A MIPS Processor with Reconfigurable Coprocessor. 1997 FCCM, pp. 12-21. Google ScholarDigital Library
{15} R. Ho, et al. The Future of Wires. Proceedings of the IEEE 89, 4 (April 2001), pp. 490-504.Google ScholarCross Ref
{16} H. Hoffmann, et al. Stream Algorithms and Architecture. Technical Memo MIT-LCS-TM-636, LCS, MIT, 2003.Google Scholar
{17} U. Kapasi, et al. The Imagine Stream Processor. 2002 ICCD, pp. 282-288. Google ScholarDigital Library
{18} H.-S. Kim, et al. An ISA and Microarchitecture for Instruction Level Distributed Processing. 2002 ISCA, pp. 71-81. Google ScholarDigital Library
{19} J. Kim, et al. Energy Characterization of a Tiled Architecture Processor with On-Chip Networks. 2003 ISLPED, pp. 424-427. Google ScholarDigital Library
{20} A. Klein Osowski, et al. Minne SPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research. Computer Architecture Letters 1 (June 2002). Google ScholarDigital Library
{21} C. Kozyrakis, et al. A New Direction for Computer Architecture Research. IEEE Computer 30, 9 (September 1997), pp. 24-32. Google ScholarDigital Library
{22} R. Krashinsky, et al. The Vector-Thread Architecture. 2004 ISCA. Google ScholarDigital Library
{23} J. Kubiatowicz. Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor. PhD thesis, MIT, 1998. Google ScholarDigital Library
{24} W. Lee, et al. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. 1998 ASPLOS, pp. 46-54. Google ScholarDigital Library
{25} W. Lee, et al. Convergent Scheduling. 2002 MICRO, pp. 111-122. Google ScholarDigital Library
{26} D. Lenoski, et al. The Stanford DASH Multiprocessor. IEEE Computer 25, 3 (March 1992), pp. 63-79. Google ScholarDigital Library
{27} R. Mahnkopf, et al. System on a Chip Technology Platform for .18 micron Digital, Mixed Signal & eDRAM applications. 1999 IEDM, pp. 849-852.Google Scholar
{28} K. Mai, et al. Smart Memories: A Modular Reconfigurable Architecture. 2000 ISCA, pp. 161-171. Google ScholarDigital Library
{29} D. Matzke. Will Physical Scalability Sabotage Performance Gains? IEEE Computer 30, 9 (September 1997), pp. 37-39. Google ScholarDigital Library
{30} J. McCalpin. STREAM: Sustainable Memory Bandwidth in High Perf. Computers. http://www.cs.virginia.edu/stream.Google Scholar
{31} C. A. Moritz, et al. SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures. IEEE Transactions on Parallel and Distributed Systems (July 2001), pp. 730-742. Google ScholarDigital Library
{32} S. Naffziger, et al. The Implementation of the Next-Generation 64b Itanium Microprocessor. 2002 ISSCC, pp. 344-345, 472.Google Scholar
{33} R. Nagarajan, et al. A Design Space Evaluation of Grid Processor Architectures. 2001 MICRO, pp. 40-51. Google ScholarDigital Library
{34} M. Narayanan, et al. Generating Permutation Instructions from a High-Level Description. TR UCB-CS-03-1287, UC Berkeley, 2003.Google Scholar
{35} M. Noakes, et al. The J-Machine Multicomputer: An Architectural Evaluation. 1993 ISCA, pp. 224-235. Google ScholarDigital Library
{36} S. Palacharla. Complexity-Effective Superscalar Processors. PhD thesis, University of Wisconsin-Madison, 1998. Google ScholarDigital Library
{37} N. Rovedo, et al. Introducing IBM's First Copper Wiring Foundry Technology: Design, Development, and Qualification of CMOS 7SF, a .18 micron Dual-Oxide Technology for SRAM, ASICs, and Embedded DRAM. Q4 2000 IBM MicroNews, pp. 34-38.Google Scholar
{38} J. Sanchez, et al. Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture. 2000 MICRO, pp. 124-133. Google ScholarDigital Library
{39} D. Shoemaker, et al. NuMesh: An Architecture Optimized for Scheduled Communication. Journal of Supercomputing 10, 3 (1996), pp. 285-302. Google ScholarDigital Library
{40} G. Sohi, et al. Multiscalar Processors. 1995 ISCA, pp. 414-425. Google ScholarDigital Library
{41} J. Suh, et al. A Performance Analysis of PIM, Stream Processing, and Tiled Processing on Memory-Intensive Signal Processing Kernels. 2003 ISCA, pp. 410- 419. Google ScholarDigital Library
{42} M. B. Taylor. Deionizer: A Tool For Capturing And Embedding I/O Calls. Technical Memo, CSAIL/Laboratory for Computer Science, MIT, 2004. http://cag.csail.mit.edu/~mtaylor/deionizer.html.Google Scholar
{43} M. B. Taylor. The Raw Processor Specification. Technical Memo, CSAIL/Laboratory for Computer Science, MIT, 2004.Google Scholar
{44} M. B. Taylor, et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. IEEE Micro (Mar 2002), pp. 25-35. Google ScholarDigital Library
{45} M. B. Taylor, et al. Scalar Oper and Networks: On-Chip Interconnect for ILP in Partitioned Architectures. 2003 HPCA, pp. 341-353. Google ScholarDigital Library
{46} M. B. Taylor, et al. Scalar Operand Networks: Design, Implementation, and Analysis. Technical Memo, CSAIL/LCS, MIT, 2004.Google Scholar
{47} W. Thies, et al. StreamIt: A Language for Streaming Applications. 2002 Compiler Construction, pp. 179-196. Google ScholarDigital Library
{48} E. Waingold, et al. Baring It All to Software: Raw Machines. IEEE Computer 30, 9 (September 1997), pp. 86-93. Google ScholarDigital Library
{49} D. Wentzlaff. Architectural Implications of Bit-level Computation in Communication Applications. Master's thesis, LCS, MIT, 2002.Google Scholar
{50} R. Whaley, et al. Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Computing 27, 1-2 (2001), pp. 3-35.Google ScholarDigital Library
{51} S. Yang, et al. A High Performance 180 nm Generation Logic Technology. 1998 IEDM, pp. 197-200.Google Scholar

Recommendations

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture

This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running ...
Read More
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor
COMPCON '95: Proceedings of the 40th IEEE Computer Society International Conference

The PowerPC 620 RISC microprocessor is the first chip for the application server and technical workstation product line within the PowerPC family. It utilizes a high performance microarchitecture with many advanced superscalar features to exploit ...
Read More
The IBM z13 multithreaded microprocessor

The IBM z13™ system is the latest generation of the IBM z Systems™ mainframes. The z13 microprocessor improves upon the IBM zEnterprise® EC12 (zEC12) processor with two vector execution units, higher instruction execution parallelism, and a simultaneous ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 32, Issue 2
ISCA 2004
March 2004
373 pages
ISSN:0163-5964
DOI:10.1145/1028176
Issue’s Table of Contents
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436
Copyright © 2004 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 March 2004
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 261
  Total Citations
  View Citations
- 1,573
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor

The IBM z13 multithreaded microprocessor