Abstract
A large register set can be exploited by keeping variables and constants in registers instead of in memory. Hardware register windows and compile-time or link-time global register allocation are ways to do this. A measure of the effectiveness of any of these register management schemes is how thoroughly they remove loads and stores. This measure must also count extra loads and stores executed because of window overflow or conflicts between procedures.
By combining profiling, instrumentation, and in-line simulation, we measured the effectiveness of several register management schemes. These included compile-time and link-time schemes for allocating registers, and register window schemes using fixed-size or variable-sized windows. Link-time allocation based on profile information was the clear winner in some cases and did about as well as windows in the rest. Even link-time allocation based on an estimated profile was about as good as windows. Variable-sized windows sometimes did better than fixed-sized windows, but the difference was usually small.
Register windows require extra logic in the data path, which may slow the machine cycle slightly, and often use more chip real estate for additional registers. Proponents of windows suppose that they trade these drawbacks for a reduction in the number of memory references they must make. Our results show that this tradeoff should be made the other way. Keep the hardware simple, because a link-time register allocator can nearly duplicate the improvement in memory reference frequency. Then the cycle time can be as small as possible, resulting in faster programs overall.
- 1 Advanced Micro Devices. Am29000 Streamlined Instruction Processor User's Manual. Advanced Micro Devices, Inc., 901 Thompson Place, P. O. Box 3453, Sunnyvale, CA 94088.Google Scholar
- 2 Russell R. Atkinson and Edward M. McCreight. The Dragon processor. Proceedings of the Second International Conference on Architectural Support for Programming Languages and Operating Systems, pages 65-69. Published as Computer Architecture News 15 (5), Operating Systems Review 21 (4), SIGPLAN Notices 22 (10), October 1987. Google ScholarCross Ref
- 3 Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. Register allocation via coloring. Computer Languages 6: 47-57, 1981.Google ScholarDigital Library
- 4 Jeremy Dion. Personal communication.Google Scholar
- 5 Jack J. Dongarra. Performance of various computers using standard linear equations software in a Fortran environment. Computer Architecture News 11 (5): 22-27, December 1983. Google ScholarDigital Library
- 6 Richard P. Gabriel. Performance and Evaluation of Lisp Systems, pages 116-135. The MIT Press, 1985. Google ScholarDigital Library
- 7 John Hennessy. Stanford benchmark suite. Personal communication.Google Scholar
- 8 John L. Hennessy, Norman P. Jouppi, Steven Przybylski, Christopher Rowen, and Thomas Gross. Design of a high performance VLSI processor. In Randal Bryant, editor, Third Caltech Conference on Very Large Scale integration, pages 33-54. Computer Science Press, 11 Taft Court, Rockville, Maryland.Google Scholar
- 9 John Ousterhout. Personal communication.Google Scholar
- 10 David A. Patterson. Reduced instruction set computers. Communications of the ACM 28 (1): 8-21, January 1985. Google ScholarDigital Library
- 11 George Radin. The 801 minicomputer. Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pages 39-47 (March 1982). Published as SIGARCH Computer Architecture News 10 (2), March 1982, and as SIGPLAN Notices 17 (4), April 1982. Google ScholarDigital Library
- 12 Peter Steenkiste. Lisp on a Reduced- Instruction-Set Processor: Characterization and Optimization. PhD thesis, Stanford University. Available as Stanford Computer Systems Laboratory Technical Report CSL-TR-87-324. March 1987. Google ScholarDigital Library
- 13 David W. Wall. Global register allocation at link-time. Proceedings of the SIGPLAN '86 Symposium on Compiler Construction. Published as SIGPLAN Notices 21 (7): 264-275 (July 1986). Google ScholarDigital Library
Index Terms
- Register windows vs. register allocation
Recommendations
Register windows vs. register allocation
PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementationA large register set can be exploited by keeping variables and constants in registers instead of in memory. Hardware register windows and compile-time or link-time global register allocation are ways to do this. A measure of the effectiveness of any of ...
Register windows vs. register allocation
20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999: A SelectionA large register set can be exploited by keeping variables and constants in registers instead of in memory. Hardware register windows and compile-time or link-time global register allocation are ways to do this. A measure of the effectiveness of any of ...
Differential register allocation
PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementationMicro-architecture designers are very cautious about expanding the number of architected registers (also the register field), because increasing the register field adds to the code size, raises I-cache and memory pressure, complicates processor ...
Comments