research-article

Free Access

Combining recency of information with selective random and a victim cache in last-level caches

Authors:
Alejandro Valero

Universitat Politècnica de València, Valencia, Spain

Universitat Politècnica de València, Valencia, Spain
View Profile

,
Julio Sahuquillo

Universitat Politècnica de València, Valencia, Spain

Universitat Politècnica de València, Valencia, Spain
View Profile

,
Salvador Petit

Universitat Politècnica de València, Valencia, Spain

Universitat Politècnica de València, Valencia, Spain
View Profile

,
Pedro López

Universitat Politècnica de València, Valencia, Spain

Universitat Politècnica de València, Valencia, Spain
View Profile

,
José Duato

Universitat Politècnica de València, Valencia, Spain

Universitat Politècnica de València, Valencia, Spain
View Profile

ACM Transactions on Architecture and Code Optimization Volume 9 Issue 3Article No.: 16pp 1–20https://doi.org/10.1145/2355585.2355589

Published:05 October 2012Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement cache hierarchies with large or huge Last-Level Cache (LLC) organizations.

LLC memories are implemented with a high number of ways (e.g., 16) to reduce conflict misses. Typically, caches have implemented the LRU algorithm to exploit temporal locality, but its performance goes away from the optimal as the number of ways increases. In addition, the implementation of a strict LRU algorithm is costly in terms of area and power.

This article focuses on a family of low-cost replacement strategies, whose implementation scales with the number of ways while maintaining the performance. The proposed strategies track the accessing order for just a few blocks, which cannot be replaced. The victim is randomly selected among those blocks exhibiting poor locality. Although, in general, the random policy helps improving the performance, in some applications the scheme fails with respect to the LRU policy leading to performance degradation. This drawback can be overcome by the addition of a small victim cache of the large LLC.

Experimental results show that, using the best version of the family without victim cache, MPKI reduction falls in between 10% and 11% compared to a set of the most representative state-of-the-art algorithms, whereas the reduction grows up to 22% with respect to LRU. The proposal with victim cache achieves speedup improvements, on average, by 4% compared to LRU. In addition, it reduces dynamic energy, on average, up to 8%. Finally, compared to the studied algorithms, hardware complexity is largely reduced by the baseline algorithm of the family.

References

Baer, J.-L. 2010. Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors. Cambridge University Press. Google ScholarDigital Library
Basu, A., Kirman, N., Kirman, M., Chaudhuri, M., and Martínez, J. 2007. Scavenger: A new last level cache architecture with global block priority. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 421--432. Google ScholarDigital Library
Belady, L. A. 1966. A study of replacement algorithms for virtual-storage computer. IBM Syst. J. 5, 2, 78--101. Google ScholarDigital Library
Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. ACM SIGARCH Comput. Archit. News 25, 13--25. Google ScholarDigital Library
Chaudhuri, M. 2009. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, 401--412. Google ScholarDigital Library
Dybdahl, H., Stenström, P., and Natvig, L. 2007. An LRU-based Replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches. ACM SIGARCH Comput. Archit. News 35, 45--52. Google ScholarDigital Library
Jaleel, A., Theobald, K. B., Steely, JR., S. C., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, New York, NY, 60--71. Google ScholarDigital Library
Jiang, S. and Zhang, X. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 31--42. Google ScholarDigital Library
Johnson, T. L., Connors, D. A., Merten, M. C., and Hwu, W.-M. W. 1999. Run-time cache bypassing. IEEE Trans. Comput. 48, 1338--1354. Google ScholarDigital Library
Kalla, R., Sinharoy, B., Starke, W. J., and Floyd, M. 2010. Power7: IBM's next-generation server processor. IEEE Micro 30, 7--15. Google ScholarDigital Library
Kharbutli, M. and Solihin, Y. 2008. Counter-based cache replacement and bypassing algorithms. IEEE Trans. Comput. 57, 433--447. Google ScholarDigital Library
Lin, W.-F. and Reinhardt, S. K. 2002. Predicting last-touch references under optimal replacement. Tech. Rep. CSE-TR-447-02, University of Michigan.Google Scholar
Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. InProceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 222--233. Google ScholarDigital Library
Petoumenos, P., Keramidas, G., and Kaxiras, S. 2009. Instruction-based reuse-distance prediction for effective cache management. In Proceedings of the 9th International Symposium on Systems, Architectures, Modeling, and Simulation. IEEE, 49--58. Google ScholarDigital Library
Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture. ACM, New York, NY, 381--391. Google ScholarDigital Library
Rivers, J. A., Tam, E. S., Tyson, G. S., Davidson, E. S., and Farrens, M. 1998. Utilizing reuse information in data cache management. In Proceedings of the 12th international conference on supercomputing. ACM, New York, NY, 449--456. Google ScholarDigital Library
Smith, A. J. 1982. Cache memories. ACM Comput. Surv. 14, 473--530. Google ScholarDigital Library
SPEC. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2000.Google Scholar
Stackhouse, B., Bhimji, S., et al. 2009. A 65 nm 2-billion transistor quad-core itanium processor. IEEE J. Solid-State Circ. 44, 1, 18--31.Google ScholarCross Ref
Thoziyoor, S., Ahn, J. H., Monchiero, M., Brockman, J. B., and Jouppi, N. P. 2008a. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the 35th Annual International Symposium on Computer Architecture. IEEE, 51--62. Google ScholarDigital Library
Thoziyoor, S., Muralimanohar, N., Ahn, J. H., and Jouppi, N. P. 2008b. CACTI 5.1. Tech. Rep. Hewlett-Packard Development Company, Palo Alto, CA.Google Scholar
Tyson, G., Farrens, M., Matthews, J., and Pleszkun, A. R. 1995. A modified approach to data cache management. In Proceedings of the 28th Annual International Symposium on Microarchitecture. IEEE, 93--103. Google ScholarDigital Library
Wong, W. A. and Baer, J.-L. 2000. Modified LRU policies for improving second-level cache behavior. In Proceedings of the 6th International Symposium on High Performance Computer Architecture. IEEE, 49--60.Google Scholar
Wood, D., Hill, M. D., and Kessler, R. E. 1991. A model for estimating trace-sample miss ratios. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 79--89. Google ScholarDigital Library
Zhang, C. and Xue, B. 2009. Divide-and-Conquer: A bubble replacement for low level caches. In Proceedings of the 23rd International Conference on Supercomputing. ACM, New York, NY, 80--89. Google ScholarDigital Library

Index Terms

Combining recency of information with selective random and a victim cache in last-level caches
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

MRU-Tour-based Replacement Algorithms for Last-Level Caches
SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing

Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, ...
Read More
Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour
PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

Last-Level Caches (LLCs) implement the LRU algorithm to exploit temporal locality, but its performance is quite far of Belady's optimal algorithm as the number of ways increases. One of the main reasons because of LRU does not reach good performance in ...
Read More
Block value based insertion policy for high performance last-level caches
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Last-level cache performance has been proved to be crucial to the system performance. Essentially, any cache management policy improves performance by retaining blocks that it believes to have higher values preferentially. Most cache management policies ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 3
September 2012
313 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2355585
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2012
- Accepted: 1 March 2012
- Revised: 1 October 2011
- Received: 1 July 2011
Published in taco Volume 9, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Last-Level Cache
MRU-Tour
replacement algorithm
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 662
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining recency of information with selective random and a victim cache in last-level caches

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

MRU-Tour-based Replacement Algorithms for Last-Level Caches

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour

Block value based insertion policy for high performance last-level caches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Combining recency of information with selective random and a victim cache in last-level caches

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

MRU-Tour-based Replacement Algorithms for Last-Level Caches

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour

Block value based insertion policy for high performance last-level caches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media