ABSTRACT
Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly.
This paper examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of “dead time” before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce LI cache leakage energy by 4x in SPEC2000 applications without impacting performance. Because our decay-based techniques have notions of competitive on-line algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.
- 1.J. Baer and W. Wang. On the inclusion property in multi-level cache hierarchies. In Proc. ISCA-15, 1988. Google ScholarDigital Library
- 2.S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4), 1999. Google ScholarDigital Library
- 3.W. J. Bowhill et al. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU. Digital Technical Journal, 7(1):100-118, 1995. Google ScholarDigital Library
- 4.D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architecture-Level Power Analysis and Optimizations. In Proc. 1SCA-27, ISCA 2000. Google ScholarDigital Library
- 5.D. Burger, T. M. Austin, and S. Bennett. Evaluating future microprocessors: the SimpleScalar tool set. Tecfi. Report TR-1308, Univ. of Wisconsin-Madison Computer Sciences Dept., July 1996.Google Scholar
- 6.D. Burger, J. Goodman, and A. Kagi. The declining effectiveness of dynamic caching for general-purpose microprocessors. Tech. Report TR- 1216, Univ. of Wisconsin-Madison Computer Sciences Dept.Google Scholar
- 7.Z. Chen et al. Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks. In ISLPED, 1998. Google ScholarDigital Library
- 8.J. Dean, J. Hicks, et al. Profileme: Hardware support for instructionlevel profiling on out-of-order processors. In Ptvc. Micro-30, 1997. Google ScholarDigital Library
- 9.L. Gwennap. Digital 21264 sets new standard. Microprocessor Report, pages 11-16, Oct. 28, 1996.Google Scholar
- 10.H.-H. Lee, G. S. Tyson, M. Farrens. Eager Writeback - a Technique for Improving Bandwidth Utilization. In Proc. Micro-33, Dec. 2000. Google ScholarDigital Library
- 11.E. G. Hallnor and S. K. Reinhardt. A fully associative softwaremanaged cache design. In Proc. 1SCA-27, June 2000. Google ScholarDigital Library
- 12.IBM Corp. Personal communication. November, 2000.Google Scholar
- 13.Intel Corp. Intel architecture optimization manual.Google Scholar
- 14.J. A. Butts and G. Sohi. A Static Power Model for Architects. In Proc. Micro-33, Dec. 2000. Google ScholarDigital Library
- 15.T. Johnson et al. Run-time Cache Bypassing. IEEE Transactions on Computers, 48(12), 1999. Google ScholarDigital Library
- 16.M.B. Kamble and K. Ghose. Analytical Energy Dissipation Models for Low Power Caches. In ISLPED, 1997. Google ScholarDigital Library
- 17.A. Karlin et al. Empirical studies of competitive spinning for a shared-memory muhiprocessor. In Ptvc. SOSP, 1991. Google ScholarDigital Library
- 18.S. Kaxiras and C. Young. Coherence communication prediction in shared-memory multiprocessors. In Proc. HPCA-6, Jan. 2000.Google Scholar
- 19.T. Kimbrel and A. Karlin. Near-optimal parallel prefetching and caching. SIAM Journal on computing, 2000. Google ScholarDigital Library
- 20.A.-C. Lai and B. Falsafi. Selective, Accurate, and Timely Self- Invalidation Using Last-Touch Prediction. In Proc. ISCA-27, May 2000. Google ScholarDigital Library
- 21.A.R. Lebeck and D. A. Wood. Dynamic Self-lnvalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In Proc. ISCA-22, June 1995. Google ScholarDigital Library
- 22.C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. In Proc. Micro-30, Dec. 1997. Google ScholarDigital Library
- 23.J. Peir, Y. Lee, and W. Hsu. Capturing Dynamic Memory Reference Behavior with Adaptive Cache Topology. In Proc. ASPLOS-VIII, Nov. 1998. Google ScholarDigital Library
- 24.M. D. Powell et al. Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep-Submicron Cache Memories. In ISLPED, 2000. Google ScholarDigital Library
- 25.T. Romer, W. Ohlrich, A. Karlin, and B. Bershad. Reducing TLB and memory overhead using online superpage promotion. In Proc. ISCA-22, 1995. Google ScholarDigital Library
- 26.S. Sair and M. Charney. Memory behavior of the SPEC2000 benchmark suite. Technical report, IBM, 2000.Google Scholar
- 27.Semiconductor Industry Association. The International Technology Roadmap for Semiconductors, 1999. hnp://www.semichips.org.Google Scholar
- 28.W. Stallings. Operating Systems. Prentice Hall, 2001. Google ScholarDigital Library
- 29.The Standard Performance Evaluation Corporation. WWW Site. http://www.spec.org, Dec. 2000.Google Scholar
- 30.U.S. Environmental Protection Agency. Energy Star Program web page. http://www.epa.gov/energystar/.Google Scholar
- 31.Z. Wang, K. S. McKinley, and A. L. Rosenberg. Improving replacement decisions in set-associative caches. Technical Report TR- 01-02, University of Massachusetts, Mar. 2001. http://ali-www.cs.- umass.edu/. Google ScholarDigital Library
- 32.K. M. Wilson and K. Olukotun. Designing high bandwidth on-chip caches. In Proc. 1SCA-24, pages 121-32, June 1997. Google ScholarDigital Library
- 33.D. A. Wood, M. D. Hill, and R. E. Kessler. A Model for Estimating Trace-Sample Miss Ratios. In ACM SIGMETRICS, pages 79-89, June 1991. Google ScholarDigital Library
- 34.S.-H. Yang et al. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches. In Proc. HPCA-7, 2001. Google ScholarDigital Library
- 35.T. N. Yeh and Y. Part. A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History. In Proc. ISCA-20, May 1993. Google ScholarDigital Library
- 36.M. Zagha, B. Larson, et al. Performance analysis using the MIPS R I0000 performance counters. In Proc. Supercomputing, 1996. Google ScholarDigital Library
Index Terms
- Cache decay: exploiting generational behavior to reduce cache leakage power
Recommendations
Cache decay: exploiting generational behavior to reduce cache leakage power
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also ...
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and designWhile set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...
Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power
PACS '00: Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised PapersReducing the supply voltage to reduce dynamic power consumption in CMOS devices, inadvertently will lead to an exponential increase in leakage power dissipation. In this work we explore an architectural idea to reduce leakage power in data caches. ...
Comments