ABSTRACT
In this paper we propose a technique that uses an additional mini cache located between the I-Cache and the CPU core, and buffers instructions that are nested within loops and are continuously otherwise fetched from the I-Cache. This mechanism is combined with code modifications, through the compiler, that greatly simplify the required hardware, eliminate unnecessary instruction fetching, and consequently reduce signal switching activity and the dissipated energy.
We show that the additional cache, dubbed L-Cache, is much smaller and simpler than the I-Cache when the compiler assumes the role of allocating instructions in it. Through simulation, we show that, for the SPECfp95 benchmarks, the I-Cache remains disabled most of the time, and the “cheaper” extra cache is used instead. We present experimental results that validate the effectiveness of this technique, and present the energy gains for most of the SPEC95 benchmarks.
- 1.J. Edmondon, "Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor," Digital Techn~cal Journal, vol. 7, no. 1, pp. 119-135, 1995. Google ScholarDigital Library
- 2.A. Kalambur and M. J. Irwin, "An Extended Addressing Mode For Low Power," in International Symposium of Low Power Electronics and Design, pp. 208- 213, IEEE/ACM, Aug. 1997. Google ScholarDigital Library
- 3.V. Tiwari, S. Malik, and A. Wolfe, "Compilation Techniques for Low Energy: An Overview," in Proceedings of the IEEE Symposium on Low Power Electronics, (San Diego, CA), Oct. 1994.Google Scholar
- 4.H. Mehta, R. M. Owens, M. 3. Irwin, R. Chert, and D. Ghosh, "Techniques for Low Energy Software," in International Symposium of Low Power Electronics and Design, pp. 72-75, IEEE/ACM, Aug. 1997. Google ScholarDigital Library
- 5.V. Tiwari, S. Malik, and A. Wolfe, "Power Analysis of Embedded Software: A First Step Towards Software Power Minimization," IEEE Transactions on VLSI Systems, vol. 2, pp. 437-445, Dec. 1994. Google ScholarDigital Library
- 6.J. Chang and M. Pedram, "Register Allocation and Binding for Low Power," in Design Automation Conference, pp. 29-35, IEEE/ACM, 1995. Google ScholarDigital Library
- 7.C. Gebotys, "Low Energy Memory and Register Allocation Using Network Flow," in Design Automation Conference, pp. 435-440, IEEE/ACM, June 1997. Google ScholarDigital Library
- 8.J. Diguet, S. Wuytack, F. Catthoor, and H. D. Man, "Formalized Methodology for Data Reuse Exploration in Hierarchical Memory Mappings," in International Symposium of Low Power Electronics and Design, pp. 30-35, IEEE/ACM, Aug. 1997. Google ScholarDigital Library
- 9.S. Wuytack, F. Catthoor, and H. DeMan, "Transforming Set Data Types to Power Optimal Data Structures," IEEE Transcactions on Computer-Aided Design, vol. 15, pp. 619-629, June 1996. Google ScholarDigital Library
- 10.R. Bajwa, M. Hiraki, H. Kojima, D. Gorny, K. Nitta, A. Shridhar, K. Seki, and K. Sasaki, "Instruction Buffering to Reduce Power in Processors for Signal Processing," IEEE Transactions on VLSI Systems, vol. 5, pp. 417-424, Dec. 1997. Google ScholarDigital Library
- 11.Johnson Kin, Munish Gupta and William Mangione- Smith, "The Filter Cache: An Energy Efficient Memory Structure," in IEEE International Symposium on Microarchitecture, pp. 184-193, Dec. 1997. Google ScholarDigital Library
- 12.S. McFarling, "Procedure Merging with Instruction Caches," in Conference on Programming Language Design and Implementation, pp. 71-79, ACM SIG- PLAN, June 1991. Google ScholarDigital Library
- 13.A. Ayers, R. Gottlieb, and R. Schooler, "Aggressive Inlining," in Conference on Programming Language Design and Implementation, pp. 134-145, ACM SIG- PLAN, June 1997. Google ScholarDigital Library
- 14.A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques and Tools. Addison-Wesley, 1986. Google ScholarDigital Library
- 15.S. McFarling, "Program Optimization for Instruction Caches," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 16-27, ACM SIG- PLAN, June 1989. Google ScholarDigital Library
- 16.S. Wilson and N. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches," DEC WRL Technical Report 93/5, July 1994.Google Scholar
- 17.SpeedShop User's Guide. Silicon Graphics Inc., 1996.Google Scholar
Index Terms
- Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors
Recommendations
Architectural and compiler techniques for energy reduction in high-performance microprocessors
Special section on low-power electronics and designIn this paper, we focus on low-power design techniques for high-performance processors at the architectural and compiler levels. We focus mainly on developing methods for reducing the energy dissipated in the on-chip caches. Energy dissipated in caches ...
Architectural and compiler support for effective instruction prefetching: a cooperative approach
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing ...
Comments