Abstract
The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as semiconductor devices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance—estimating both clock rate and IPC —of an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scaling and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5%, which is far worse than the annual 50-60% to which we have grown accustomed.
- 1 Vikas Agarwal, Stephen W. Keckler, and Doug Burger. Scaling of microarchitectural structures in future process technologies. Technical Report TR2000-02, Department of Computer Sciences, The University of Texas at Austin, April 2000.Google Scholar
- 2 David H. Albonesi. Dynamic ipc/clock rate optimization. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 282-292, June 1998. Google ScholarDigital Library
- 3 B.S. Amrutur and M.A. Horowitz. Speed and power scaling of SRAMs. IEEE Journal of Solid State Circuits, 35(2): 175-185, February 2000.Google ScholarCross Ref
- 4 Geordie Braceras, Alan Roberts, John Connor, Reid Wistort, Terry Frederick, Marcel Robillard, Stu Hall, Steve Burns, and Matt Graf. A 940MHz data rate 8Mb CMOS SRAM. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 198-199, February 1999.Google Scholar
- 5 Doug Burger. Hardware Techniques to Improve the Performance of the Processor~Memory Interface. PhD thesis, University of Wisconsin- Madison, December 1998. Google ScholarDigital Library
- 6 Doug Burger and Todd M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.Google Scholar
- 7 Doug Burger, Alain Kfigi, and M.S. Hrishikesh. Memory hierarchy extensions to simplescalar 3.0. Technical Report TR99-25, Department of Computer Sciences, The University of Texas at Austin, April 2000.Google Scholar
- 8 Keith Diefendorff. Power4 focuses on memory bandwidth. Microprocessor Report, 13(13), October 1999.Google Scholar
- 9 Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, and Whay S. Lee. The M- Machine Multicomputer. In Proceedings of the 28th International Symposium on Microarchitecture, pages 146-156, December 1995. Google ScholarDigital Library
- 10 Lance Hammond, Basem Nayfeh, and Kunle Olukotun. A single-chip multiprocessor. IEEE Computer, 30(9):79-85, September 1997. Google ScholarDigital Library
- 11 Mark Horowitz, Ron Ho, and Ken Mai. The future of wires. In Seminconductor Research Corporation Workshop on Interconnects for Systems on a Chip, May 1999.Google Scholar
- 12 R.E. Kessler. The alpha 21264 microprocessor. IEEE Micro, 19(2):24- 36, March/April 1999. Google ScholarDigital Library
- 13 David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth International Symposium on Computer Architecture, pages 81-87, May 1981. Google ScholarDigital Library
- 14 S. R. Kunkel and J. E. Smith. Optimal pipelining in supercomputers. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 404-411, June 1986. Google ScholarDigital Library
- 15 K. Mai, T. Paaske, N. Jayasena, R. Ho, and M. Horowitz. Smart memories: A modular reconfigurable architecture. In Proccedings of the 2 7th Annual International Symposium on Computer Architecture, June 2000. Google ScholarDigital Library
- 16 Doug Matzke. Will physical scalability sabotage performance gains? IEEE Computer, 30(9):37-39, September 1997. Google ScholarDigital Library
- 17 S. Naffziger. A subnanosecond 0.5#m 64b adder design. In Digest of Technical Papers, International Solid-State Circuits Conference, pages 362-363, February 1996.Google Scholar
- 18 Subbarao Palacharla, Norman P. Jouppi, and J.E. Smith. Complexityeffective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Archtecture, pages 206-218, June 1997. Google ScholarDigital Library
- 19 Glenn Reinman and Norm Jouppi. Extensions to cacti, 1999. Unpublished document.Google Scholar
- 20 Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson, Ujval J. Kapasi, and John D. Owens. Register organization for media processing. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, January 2000.Google Scholar
- 21 Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. In Proceedings of 30th Annual International Symposium on Microarchitecture, pages 138-148, December 1997. Google ScholarDigital Library
- 22 The national technology roadmap for semiconductors. Semiconductor Industry Association, 1999.Google Scholar
- 23 Hiroshi Shimizu, Kenji Ijitsu, Hideo Akiyoshi, Keizo Aoyama, Hirotaka Takatsuka, Kou Watanabe, Ryota Nanjo, and Yoshihiro Takao. A 1.4ns access 700MHz 288Kb SRAM macro with expandable architecture. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 190-191,459, February 1999.Google Scholar
- 24 Gurindar S. Sohi. Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Transactions on Computers, 39(3):349-359, March 1990. Google ScholarDigital Library
- 25 Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22rid Annual International Symposium on Computer Architecture, pages 414-425, June 1995. Google ScholarDigital Library
- 26 Standard Performance Evaluation Corporation. SPEC Newsletter, September 1995.Google Scholar
- 27 Dennis Sylvester and Kurt Keutzer. Rethinking deep-submicron circuit design. IEEE Computer, 32(11):25-33, November 1999. Google ScholarDigital Library
- 28 A. J. van Genderen and N. P. van der Meijs. Xspace user's manual. Technical Report ET-CAS 96-02, Delft University of Technology, Department of Electrical Engineering, August 1996.Google Scholar
- 29 Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9):86-93, September 1997. Google ScholarDigital Library
- 30 Steven J.E. Wilton and Norman P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 95/3, Digital Equipment Corporation, Western Research Laboratory, 1995.Google Scholar
- 31 Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige Varadarajan. An 18Mb, 12.3GB/s cmos pipeline-burst cache SRAM with 1.54Gb/s/pin. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 200-201,461, February 1999.Google Scholar
Index Terms
- Clock rate versus IPC: the end of the road for conventional microarchitectures
Recommendations
Clock rate versus IPC: the end of the road for conventional microarchitectures
ISCA '00: Proceedings of the 27th annual international symposium on Computer architectureThe doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing ...
Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC
The increase in the complexity of a wide-issue processor with its pipeline width is one of the primary concerns of the processor designers. In the conventional design, hardware in the processor core is laid out to handle multiple instructions with two-...
Dynamic IPC/clock rate optimization
ISCA '98: Proceedings of the 25th annual international symposium on Computer architectureCurrent microprocessor designs set the functionality and clock rate of the chip at design time based on the configuration that achieves the best overall performance over a range of target applications. The result may be poor performance when running ...
Comments