skip to main content
research-article
Free Access

Combining recency of information with selective random and a victim cache in last-level caches

Authors Info & Claims
Published:05 October 2012Publication History
Skip Abstract Section

Abstract

Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement cache hierarchies with large or huge Last-Level Cache (LLC) organizations.

LLC memories are implemented with a high number of ways (e.g., 16) to reduce conflict misses. Typically, caches have implemented the LRU algorithm to exploit temporal locality, but its performance goes away from the optimal as the number of ways increases. In addition, the implementation of a strict LRU algorithm is costly in terms of area and power.

This article focuses on a family of low-cost replacement strategies, whose implementation scales with the number of ways while maintaining the performance. The proposed strategies track the accessing order for just a few blocks, which cannot be replaced. The victim is randomly selected among those blocks exhibiting poor locality. Although, in general, the random policy helps improving the performance, in some applications the scheme fails with respect to the LRU policy leading to performance degradation. This drawback can be overcome by the addition of a small victim cache of the large LLC.

Experimental results show that, using the best version of the family without victim cache, MPKI reduction falls in between 10% and 11% compared to a set of the most representative state-of-the-art algorithms, whereas the reduction grows up to 22% with respect to LRU. The proposal with victim cache achieves speedup improvements, on average, by 4% compared to LRU. In addition, it reduces dynamic energy, on average, up to 8%. Finally, compared to the studied algorithms, hardware complexity is largely reduced by the baseline algorithm of the family.

References

  1. Baer, J.-L. 2010. Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Basu, A., Kirman, N., Kirman, M., Chaudhuri, M., and Martínez, J. 2007. Scavenger: A new last level cache architecture with global block priority. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 421--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Belady, L. A. 1966. A study of replacement algorithms for virtual-storage computer. IBM Syst. J. 5, 2, 78--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. ACM SIGARCH Comput. Archit. News 25, 13--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chaudhuri, M. 2009. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, 401--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dybdahl, H., Stenström, P., and Natvig, L. 2007. An LRU-based Replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches. ACM SIGARCH Comput. Archit. News 35, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jaleel, A., Theobald, K. B., Steely, JR., S. C., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, New York, NY, 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jiang, S. and Zhang, X. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Johnson, T. L., Connors, D. A., Merten, M. C., and Hwu, W.-M. W. 1999. Run-time cache bypassing. IEEE Trans. Comput. 48, 1338--1354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kalla, R., Sinharoy, B., Starke, W. J., and Floyd, M. 2010. Power7: IBM's next-generation server processor. IEEE Micro 30, 7--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kharbutli, M. and Solihin, Y. 2008. Counter-based cache replacement and bypassing algorithms. IEEE Trans. Comput. 57, 433--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lin, W.-F. and Reinhardt, S. K. 2002. Predicting last-touch references under optimal replacement. Tech. Rep. CSE-TR-447-02, University of Michigan.Google ScholarGoogle Scholar
  13. Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. InProceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 222--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Petoumenos, P., Keramidas, G., and Kaxiras, S. 2009. Instruction-based reuse-distance prediction for effective cache management. In Proceedings of the 9th International Symposium on Systems, Architectures, Modeling, and Simulation. IEEE, 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture. ACM, New York, NY, 381--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rivers, J. A., Tam, E. S., Tyson, G. S., Davidson, E. S., and Farrens, M. 1998. Utilizing reuse information in data cache management. In Proceedings of the 12th international conference on supercomputing. ACM, New York, NY, 449--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Smith, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 473--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. SPEC. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2000.Google ScholarGoogle Scholar
  19. Stackhouse, B., Bhimji, S., et al. 2009. A 65 nm 2-billion transistor quad-core itanium processor. IEEE J. Solid-State Circ. 44, 1, 18--31.Google ScholarGoogle ScholarCross RefCross Ref
  20. Thoziyoor, S., Ahn, J. H., Monchiero, M., Brockman, J. B., and Jouppi, N. P. 2008a. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the 35th Annual International Symposium on Computer Architecture. IEEE, 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thoziyoor, S., Muralimanohar, N., Ahn, J. H., and Jouppi, N. P. 2008b. CACTI 5.1. Tech. Rep. Hewlett-Packard Development Company, Palo Alto, CA.Google ScholarGoogle Scholar
  22. Tyson, G., Farrens, M., Matthews, J., and Pleszkun, A. R. 1995. A modified approach to data cache management. In Proceedings of the 28th Annual International Symposium on Microarchitecture. IEEE, 93--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wong, W. A. and Baer, J.-L. 2000. Modified LRU policies for improving second-level cache behavior. In Proceedings of the 6th International Symposium on High Performance Computer Architecture. IEEE, 49--60.Google ScholarGoogle Scholar
  24. Wood, D., Hill, M. D., and Kessler, R. E. 1991. A model for estimating trace-sample miss ratios. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 79--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhang, C. and Xue, B. 2009. Divide-and-Conquer: A bubble replacement for low level caches. In Proceedings of the 23rd International Conference on Supercomputing. ACM, New York, NY, 80--89. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining recency of information with selective random and a victim cache in last-level caches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 3
      September 2012
      313 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/2355585
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 October 2012
      • Accepted: 1 March 2012
      • Revised: 1 October 2011
      • Received: 1 July 2011
      Published in taco Volume 9, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader