Clock rate versus IPC: the end of the road for conventional microarchitectures

Authors:
Vikas Agarwal

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin
View Profile

,
M. S. Hrishikesh

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin
View Profile

,
Stephen W. Keckler

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin
View Profile

,
Doug Burger

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin

Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 28 Issue 2May 2000pp 248–259https://doi.org/10.1145/342001.339691

Published:01 May 2000Publication History

ACM SIGARCH Computer Architecture News

Abstract

The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as semiconductor devices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance—estimating both clock rate and IPC —of an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scaling and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5%, which is far worse than the annual 50-60% to which we have grown accustomed.

References

1 Vikas Agarwal, Stephen W. Keckler, and Doug Burger. Scaling of microarchitectural structures in future process technologies. Technical Report TR2000-02, Department of Computer Sciences, The University of Texas at Austin, April 2000.Google Scholar
2 David H. Albonesi. Dynamic ipc/clock rate optimization. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 282-292, June 1998. Google ScholarDigital Library
3 B.S. Amrutur and M.A. Horowitz. Speed and power scaling of SRAMs. IEEE Journal of Solid State Circuits, 35(2): 175-185, February 2000.Google ScholarCross Ref
4 Geordie Braceras, Alan Roberts, John Connor, Reid Wistort, Terry Frederick, Marcel Robillard, Stu Hall, Steve Burns, and Matt Graf. A 940MHz data rate 8Mb CMOS SRAM. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 198-199, February 1999.Google Scholar
5 Doug Burger. Hardware Techniques to Improve the Performance of the Processor~Memory Interface. PhD thesis, University of Wisconsin- Madison, December 1998. Google ScholarDigital Library
6 Doug Burger and Todd M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.Google Scholar
7 Doug Burger, Alain Kfigi, and M.S. Hrishikesh. Memory hierarchy extensions to simplescalar 3.0. Technical Report TR99-25, Department of Computer Sciences, The University of Texas at Austin, April 2000.Google Scholar
8 Keith Diefendorff. Power4 focuses on memory bandwidth. Microprocessor Report, 13(13), October 1999.Google Scholar
9 Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, and Whay S. Lee. The M- Machine Multicomputer. In Proceedings of the 28th International Symposium on Microarchitecture, pages 146-156, December 1995. Google ScholarDigital Library
10 Lance Hammond, Basem Nayfeh, and Kunle Olukotun. A single-chip multiprocessor. IEEE Computer, 30(9):79-85, September 1997. Google ScholarDigital Library
11 Mark Horowitz, Ron Ho, and Ken Mai. The future of wires. In Seminconductor Research Corporation Workshop on Interconnects for Systems on a Chip, May 1999.Google Scholar
12 R.E. Kessler. The alpha 21264 microprocessor. IEEE Micro, 19(2):24- 36, March/April 1999. Google ScholarDigital Library
13 David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth International Symposium on Computer Architecture, pages 81-87, May 1981. Google ScholarDigital Library
14 S. R. Kunkel and J. E. Smith. Optimal pipelining in supercomputers. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 404-411, June 1986. Google ScholarDigital Library
15 K. Mai, T. Paaske, N. Jayasena, R. Ho, and M. Horowitz. Smart memories: A modular reconfigurable architecture. In Proccedings of the 2 7th Annual International Symposium on Computer Architecture, June 2000. Google ScholarDigital Library
16 Doug Matzke. Will physical scalability sabotage performance gains? IEEE Computer, 30(9):37-39, September 1997. Google ScholarDigital Library
17 S. Naffziger. A subnanosecond 0.5#m 64b adder design. In Digest of Technical Papers, International Solid-State Circuits Conference, pages 362-363, February 1996.Google Scholar
18 Subbarao Palacharla, Norman P. Jouppi, and J.E. Smith. Complexityeffective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Archtecture, pages 206-218, June 1997. Google ScholarDigital Library
19 Glenn Reinman and Norm Jouppi. Extensions to cacti, 1999. Unpublished document.Google Scholar
20 Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson, Ujval J. Kapasi, and John D. Owens. Register organization for media processing. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, January 2000.Google Scholar
21 Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. In Proceedings of 30th Annual International Symposium on Microarchitecture, pages 138-148, December 1997. Google ScholarDigital Library
22 The national technology roadmap for semiconductors. Semiconductor Industry Association, 1999.Google Scholar
23 Hiroshi Shimizu, Kenji Ijitsu, Hideo Akiyoshi, Keizo Aoyama, Hirotaka Takatsuka, Kou Watanabe, Ryota Nanjo, and Yoshihiro Takao. A 1.4ns access 700MHz 288Kb SRAM macro with expandable architecture. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 190-191,459, February 1999.Google Scholar
24 Gurindar S. Sohi. Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Transactions on Computers, 39(3):349-359, March 1990. Google ScholarDigital Library
25 Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22rid Annual International Symposium on Computer Architecture, pages 414-425, June 1995. Google ScholarDigital Library
26 Standard Performance Evaluation Corporation. SPEC Newsletter, September 1995.Google Scholar
27 Dennis Sylvester and Kurt Keutzer. Rethinking deep-submicron circuit design. IEEE Computer, 32(11):25-33, November 1999. Google ScholarDigital Library
28 A. J. van Genderen and N. P. van der Meijs. Xspace user's manual. Technical Report ET-CAS 96-02, Delft University of Technology, Department of Electrical Engineering, August 1996.Google Scholar
29 Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9):86-93, September 1997. Google ScholarDigital Library
30 Steven J.E. Wilton and Norman P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 95/3, Digital Equipment Corporation, Western Research Laboratory, 1995.Google Scholar
31 Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige Varadarajan. An 18Mb, 12.3GB/s cmos pipeline-burst cache SRAM with 1.54Gb/s/pin. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 200-201,461, February 1999.Google Scholar

Index Terms

Clock rate versus IPC: the end of the road for conventional microarchitectures
1. Computer systems organization

Recommendations

Clock rate versus IPC: the end of the road for conventional microarchitectures
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing ...
Read More
Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC

The increase in the complexity of a wide-issue processor with its pipeline width is one of the primary concerns of the processor designers. In the conventional design, hardware in the processor core is laid out to handle multiple instructions with two-...
Read More
Dynamic IPC/clock rate optimization
ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture

Current microprocessor designs set the functionality and clock rate of the chip at design time based on the configuration that achieves the best overall performance over a range of target applications. The result may be poor performance when running ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 28, Issue 2
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
May 2000
325 pages
ISSN:0163-5964
DOI:10.1145/342001
Chairmen:
Alan Berenbaum
Lucent Technologies, Berkeley Heights, NJ
,
Joel Emer
Compaq Computer Corp., Palo Alto, CA
Issue’s Table of Contents
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
June 2000
327 pages
ISBN:1581132328
DOI:10.1145/339647
Chairmen:
Alan Berenbaum
Lucent Technologies
,
Joel Emer
Compaq Computer Corp.
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 324
  Total Citations
  View Citations
- 3,905
  Total Downloads
- Downloads (Last 12 months)179
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Clock rate versus IPC: the end of the road for conventional microarchitectures

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Clock rate versus IPC: the end of the road for conventional microarchitectures

Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC

Dynamic IPC/clock rate optimization