skip to main content
article

Let caches decay: reducing leakage energy via exploitation of cache generational behavior

Published:01 May 2002Publication History
Skip Abstract Section

Abstract

Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for highend servers. Although the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly. This article examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce L1 cache leakage energy by 4x in SPEC2000 applications without having an impact on performance. Because our decay-based techniques have notions of competitive online algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce L1 cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

References

  1. Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd International Symposium on Microarchitecture. Google ScholarGoogle Scholar
  2. Baer, J. and Wang, W. 1988. On the inclusion property in multi-level cache hierarchies. In Proceedings of the Fifteenth International Symposium on Computer Architecture. Google ScholarGoogle Scholar
  3. Borkar, S. 1999. Design challenges of technology scaling. IEEE Micro 19, 4. Google ScholarGoogle Scholar
  4. Bowhill, W. J., Bell, S. L., Benschneider, B. J., Black, A. J., Britton, S. M., Castelino, R. W., Donchin, D. R., Edmondson, J. H., Fair, III, H. R., Gronowski, P. E., Jain, A. K., Kroesen, P. L., Lamere, M. E., Loughlin, B. J., Mehta, S., Mueller, R. O., Preston, R. P., Santhanam, S., Shedd, T. A., Smith, M. J., and Thierauf, S. C. 1995. Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU. Dig. Tech. J. 7, 1, 100--118. Google ScholarGoogle Scholar
  5. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architecture-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000). Google ScholarGoogle Scholar
  6. Burger, D., Austin, T. M., and Bennett, S. 1996. Evaluating future microprocessors: The SimpleScalar tool set. Tech. Rep. TR-1308 (July), Univ. of Wisconsin---Madison Computer Sciences Dept.Google ScholarGoogle Scholar
  7. Burger, D., Goodman, J., and Kagi, A. 1995. The declining effectiveness of dynamic caching for general-purpose microprocessors. Tech. Rep. TR-1216, Univ. of Wisconsin---Madison Computer Sciences Dept.Google ScholarGoogle Scholar
  8. Butts, J. A. and Sohi, G. 2000. A static power model for architects. In Proceedings of the 33rd International Symposium on Microarchitecture (Dec.). Google ScholarGoogle Scholar
  9. Chen, Z., Johnson, M., Wei, L., and Roy, K. 1998. Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks. In ISLPED. Google ScholarGoogle Scholar
  10. Dean, J., Hicks, J. E., Waldspurger, C. A., Weihl, W. E., and Chrysos, G. 1997. Profileme: Hardware support for instruction-level profiling on out-of-order processors. In Proceedings of the Thirtieth International Symposium on Microarchitecture. Google ScholarGoogle Scholar
  11. Delaluz, V., Kandemir, M., Vijaykrishnan, N., Sivasubramaniam, A., and Irwin, M. J. 2001. Dram energy management using software and hardware directed power mode control. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (Feb.), 84--93. Google ScholarGoogle Scholar
  12. Gwennap, L. 1996. Digital 21264 sets new standard. Microproc. Rep. 11--16.Google ScholarGoogle Scholar
  13. Hallnor, E. G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (June). Google ScholarGoogle Scholar
  14. IBM Corp. 2000. Personal communication. November.Google ScholarGoogle Scholar
  15. Intel Corp. 1997. Intel architecture optimization manual.Google ScholarGoogle Scholar
  16. Kamble, M. B. and Ghose, K. 1997. Analytical energy dissipation models for low power caches. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle Scholar
  17. Karlin, A. R., Li, K., and Manasse, M. S. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the Eleventh ACM Symposium on Operating System Principles. Google ScholarGoogle Scholar
  18. Kimbrel, T. and Karlin, A. 2000. Near-optimal parallel prefetching and caching. SIAM J. Comput. Google ScholarGoogle Scholar
  19. Lai, A.-C. and Falsafi, B. 2000. Selective, accurate, and timely self-invalidation using last-touch prediction. In Proceedings of the 27th International Symposium on Computer Architecture (May). Google ScholarGoogle Scholar
  20. Lebeck, A., Fan, X., Zeng, H., and Ellis, C. 2000. Power aware page allocation. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, 105--116. Google ScholarGoogle Scholar
  21. Lebeck, A. R. and Wood, D. A. 1995. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In Proceedings of the 22nd International Symposium on Computer Architecture (June). Google ScholarGoogle Scholar
  22. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the Thirtieth International Symposium on Microarchitecture (Dec.). Google ScholarGoogle Scholar
  23. Lee, H.-H., Tyson, G. S., and Farrens, M. 2000. Eager writeback---A technique for improving bandwidth utilization. In Proceedings of the 33rd International Symposium on Microarchitecture (Dec.). Google ScholarGoogle Scholar
  24. Peir, J., Lee, Y., and Hsu, W. 1998. Capturing dynamic memory reference behavior with adaptive cache topology. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (Nov.). Google ScholarGoogle Scholar
  25. Powell, M. D., Yang, S.-H., Falsafi, B., Roy, K., and T. N. Vijaykumar. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the International Symposium on Low Power Electronics and Design '00. Google ScholarGoogle Scholar
  26. Romer, T., Ohlrich, W., Karlin, A., and Bershad, B. 1995. Reducing TLB and memory overhead using online superpage promotion. In Proceedings of the 22nd International Symposium on Computer Architecture. Google ScholarGoogle Scholar
  27. Sair, S. and Charney, M. 2000. Memory behavior of the SPEC2000 benchmark suite. Tech. Rep., IBM.Google ScholarGoogle Scholar
  28. Semiconductor Industry Association. 1999. The International Technology Roadmap for Semiconductors. Available at http://www.semichips.org.Google ScholarGoogle Scholar
  29. Stallings, W. 2001. Operating Systems. Prentice-Hall, Englewood Cliffs, N.J.Google ScholarGoogle Scholar
  30. The Standard Performance Evaluation Corporation. 2000. WWW Site. http://www.spec.org.Google ScholarGoogle Scholar
  31. U.S. Environmental Protection Agency. 2001. Energy Star Program Web page. http://www. epa.gov/energystar/.Google ScholarGoogle Scholar
  32. Wilson, K. M. and Olukotun, K. 1997. Designing high bandwidth on-chip caches. In Proceedings of the 24th International Symposium on Computer Architecture (June), 121--132. Google ScholarGoogle Scholar
  33. Wood, D. A., Hill, M. D., and Kessler, R. E. 1991. A model for estimating trace-sample miss ratios. In Proceedings of ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems (June), 79--89. Google ScholarGoogle Scholar
  34. Yang, S.-H., Powell, M. D., Falsafi, B., Roy, K., and Vijaykumar, T. 2001. An integrated circuit/ architecture approach to reducing leakage in deep-submicron high-performance I-caches. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture. Google ScholarGoogle Scholar
  35. Yeh, T. N. and Patt, Y. 1993. A comparison of dynamic branch predictors that use two levels of branch history. In Proceedings of the Twentieth International Symposium on Computer Architecture (May). Google ScholarGoogle Scholar
  36. Zagha, M., Larson, B., Turner, S., and Itzkowitz, M. 1996. Performance analysis using the MIPS R10000 performance counters. In Proceedings of Supercomputing. Google ScholarGoogle Scholar

Index Terms

  1. Let caches decay: reducing leakage energy via exploitation of cache generational behavior

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader