article

Let caches decay: reducing leakage energy via exploitation of cache generational behavior

Authors:
Zhigang Hu

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

,
Stefanos Kaxiras

Agere Systems, Murray Hill, NJ

Agere Systems, Murray Hill, NJ
View Profile

,
Margaret Martonosi

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 20 Issue 2pp 161–190https://doi.org/10.1145/507052.507055

Published:01 May 2002Publication History

ACM Transactions on Computer Systems

Abstract

Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for highend servers. Although the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly. This article examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce L1 cache leakage energy by 4x in SPEC2000 applications without having an impact on performance. Because our decay-based techniques have notions of competitive online algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce L1 cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

References

Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd International Symposium on Microarchitecture. Google Scholar
Baer, J. and Wang, W. 1988. On the inclusion property in multi-level cache hierarchies. In Proceedings of the Fifteenth International Symposium on Computer Architecture. Google Scholar
Borkar, S. 1999. Design challenges of technology scaling. IEEE Micro 19, 4. Google Scholar
Bowhill, W. J., Bell, S. L., Benschneider, B. J., Black, A. J., Britton, S. M., Castelino, R. W., Donchin, D. R., Edmondson, J. H., Fair, III, H. R., Gronowski, P. E., Jain, A. K., Kroesen, P. L., Lamere, M. E., Loughlin, B. J., Mehta, S., Mueller, R. O., Preston, R. P., Santhanam, S., Shedd, T. A., Smith, M. J., and Thierauf, S. C. 1995. Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU. Dig. Tech. J. 7, 1, 100--118. Google Scholar
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architecture-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000). Google Scholar
Burger, D., Austin, T. M., and Bennett, S. 1996. Evaluating future microprocessors: The SimpleScalar tool set. Tech. Rep. TR-1308 (July), Univ. of Wisconsin---Madison Computer Sciences Dept.Google Scholar
Burger, D., Goodman, J., and Kagi, A. 1995. The declining effectiveness of dynamic caching for general-purpose microprocessors. Tech. Rep. TR-1216, Univ. of Wisconsin---Madison Computer Sciences Dept.Google Scholar
Butts, J. A. and Sohi, G. 2000. A static power model for architects. In Proceedings of the 33rd International Symposium on Microarchitecture (Dec.). Google Scholar
Chen, Z., Johnson, M., Wei, L., and Roy, K. 1998. Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks. In ISLPED. Google Scholar
Dean, J., Hicks, J. E., Waldspurger, C. A., Weihl, W. E., and Chrysos, G. 1997. Profileme: Hardware support for instruction-level profiling on out-of-order processors. In Proceedings of the Thirtieth International Symposium on Microarchitecture. Google Scholar
Delaluz, V., Kandemir, M., Vijaykrishnan, N., Sivasubramaniam, A., and Irwin, M. J. 2001. Dram energy management using software and hardware directed power mode control. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (Feb.), 84--93. Google Scholar
Gwennap, L. 1996. Digital 21264 sets new standard. Microproc. Rep. 11--16.Google Scholar
Hallnor, E. G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (June). Google Scholar
IBM Corp. 2000. Personal communication. November.Google Scholar
Intel Corp. 1997. Intel architecture optimization manual.Google Scholar
Kamble, M. B. and Ghose, K. 1997. Analytical energy dissipation models for low power caches. In Proceedings of the International Symposium on Low Power Electronics and Design. Google Scholar
Karlin, A. R., Li, K., and Manasse, M. S. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the Eleventh ACM Symposium on Operating System Principles. Google Scholar
Kimbrel, T. and Karlin, A. 2000. Near-optimal parallel prefetching and caching. SIAM J. Comput. Google Scholar
Lai, A.-C. and Falsafi, B. 2000. Selective, accurate, and timely self-invalidation using last-touch prediction. In Proceedings of the 27th International Symposium on Computer Architecture (May). Google Scholar
Lebeck, A., Fan, X., Zeng, H., and Ellis, C. 2000. Power aware page allocation. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, 105--116. Google Scholar
Lebeck, A. R. and Wood, D. A. 1995. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In Proceedings of the 22nd International Symposium on Computer Architecture (June). Google Scholar
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the Thirtieth International Symposium on Microarchitecture (Dec.). Google Scholar
Lee, H.-H., Tyson, G. S., and Farrens, M. 2000. Eager writeback---A technique for improving bandwidth utilization. In Proceedings of the 33rd International Symposium on Microarchitecture (Dec.). Google Scholar
Peir, J., Lee, Y., and Hsu, W. 1998. Capturing dynamic memory reference behavior with adaptive cache topology. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (Nov.). Google Scholar
Powell, M. D., Yang, S.-H., Falsafi, B., Roy, K., and T. N. Vijaykumar. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the International Symposium on Low Power Electronics and Design '00. Google Scholar
Romer, T., Ohlrich, W., Karlin, A., and Bershad, B. 1995. Reducing TLB and memory overhead using online superpage promotion. In Proceedings of the 22nd International Symposium on Computer Architecture. Google Scholar
Sair, S. and Charney, M. 2000. Memory behavior of the SPEC2000 benchmark suite. Tech. Rep., IBM.Google Scholar
Semiconductor Industry Association. 1999. The International Technology Roadmap for Semiconductors. Available at http://www.semichips.org.Google Scholar
Stallings, W. 2001. Operating Systems. Prentice-Hall, Englewood Cliffs, N.J.Google Scholar
The Standard Performance Evaluation Corporation. 2000. WWW Site. http://www.spec.org.Google Scholar
U.S. Environmental Protection Agency. 2001. Energy Star Program Web page. http://www. epa.gov/energystar/.Google Scholar
Wilson, K. M. and Olukotun, K. 1997. Designing high bandwidth on-chip caches. In Proceedings of the 24th International Symposium on Computer Architecture (June), 121--132. Google Scholar
Wood, D. A., Hill, M. D., and Kessler, R. E. 1991. A model for estimating trace-sample miss ratios. In Proceedings of ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems (June), 79--89. Google Scholar
Yang, S.-H., Powell, M. D., Falsafi, B., Roy, K., and Vijaykumar, T. 2001. An integrated circuit/ architecture approach to reducing leakage in deep-submicron high-performance I-caches. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture. Google Scholar
Yeh, T. N. and Patt, Y. 1993. A comparison of dynamic branch predictors that use two levels of branch history. In Proceedings of the Twentieth International Symposium on Computer Architecture (May). Google Scholar
Zagha, M., Larson, B., Turner, S., and Itzkowitz, M. 1996. Performance analysis using the MIPS R10000 performance counters. In Proceedings of Supercomputing. Google Scholar

Index Terms

Let caches decay: reducing leakage energy via exploitation of cache generational behavior
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

As transistors keep shrinking and on-chip caches keep growing, static power dissipation resulting from leakage of caches takes an increasing fraction of total power in processors. Several techniques have already been proposed to reduce leakage power by ...
Read More
Way adaptable D-NUCA caches

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a ...
Read More
Smaller Split L-1 Data Caches for Multi-core Processing Systems
ISPAN '09: Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks

As more cores (processing elements) are included in a single chip, it is likely that the sizes of per core L-1 caches will become smaller while more cores will share L-2 cache resources. It becomes more critical to improve the use of L-1 caches and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computer Systems Volume 20, Issue 2
May 2002
106 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/507052
Issue’s Table of Contents

Copyright © 2002 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2002
Published in tocs Volume 20, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cache memories
cache decay
generational behavior
leakage power
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 444
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Let caches decay: reducing leakage energy via exploitation of cache generational behavior

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

Way adaptable D-NUCA caches

Smaller Split L-1 Data Caches for Multi-core Processing Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Let caches decay: reducing leakage energy via exploitation of cache generational behavior

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

Way adaptable D-NUCA caches

Smaller Split L-1 Data Caches for Multi-core Processing Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media