ABSTRACT
Caches are used to bridge the gap between main memory and the significantly faster processor cores. In multi-core architectures, the last-level cache is often shared between cores. However, sharing a cache causes inter-core interference to emerge. Concurrently running tasks will experience additional cache misses as the competing tasks issue interfering accesses and trigger the eviction of data contained in the shared cache. Thus, to compute a task’s worst-case execution time (WCET), a safe bound on the effects of inter-core cache interference has to be determined. In this paper, we propose a novel analysis approach for shared caches using the least recently used (LRU) replacement policy. The presented analysis leverages timing information to produce tight bounds on the worst-case interference. We describe how inter-core cache interference may be expressed as a function of time using event-arrival curves. Thus, by determining the maximal duration between subsequent accesses to a cache block, it is possible to bound the inter-core interference. This enables us to classify accesses as cache hits or potential misses. We implemented the analysis in a WCET analyzer and evaluated its performance for multi-core systems containing 2, 4, and 8 cores using shared caches from 4 KB to 32 KB. The analysis achieves significant improvements compared to a standard interference analysis with WCET reductions of up to 60%. The average WCET reduction is 9% for dual-core, 15% for quad-core, and 11% for octa-core systems. The analysis runtime overhead ranges from a factor of 4 × to 7 × compared to the baseline analysis.
- Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages(POPL 1977). 238–252. https://doi.org/10.1145/512950.512973Google ScholarDigital Library
- P. Padma Priya Dharishini and P. V. R. Murthy. 2021. Precise Shared Instruction Cache Analysis to Estimate WCET of Multi-threaded Programs. In 2021 IEEE 18th India Council International Conference (INDICON). 1–7. https://doi.org/10.1109/INDICON52576.2021.9691620Google Scholar
- Heiko Falk and Paul Lokuciejewski. 2010. A Compiler Framework for the Reduction of Worst-Case Execution Times. Real-Time Systems 46, 2 (2010), 251–300. https://doi.org/10.1007/s11241-010-9101-xGoogle ScholarDigital Library
- Christian Ferdinand and Reinhard Wilhelm. 1999. Efficient and Precise Cache Behavior Prediction for Real-Time Systems. Real-Time Systems 17, 2 (1999), 131–181. https://doi.org/10.1023/A:1008186323068Google ScholarDigital Library
- Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The Mälardalen WCET Benchmarks – Past, Present and Future. In 10th International Workshop on Worst-Case Execution Time Analysis (WCET 2010), Björn Lisper (Ed.). OCG, Brussels, Belgium, 136–146. https://doi.org/10.4230/OASIcs.WCET.2010.136Google Scholar
- Damien Hardy, Thomas Piquet, and Isabelle Puaut. 2009. Using Bypass to Tighten WCET Estimates for Multi-Core Processors with Shared Instruction Caches. In 30th IEEE Real-Time Systems Symposium (RTSS). 68–77. https://doi.org/10.1109/RTSS.2009.34Google ScholarDigital Library
- Yau-Tsun Steven Li and Sharad Malik. 1997. Performance Analysis of Embedded Software Using Implicit Path Enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 16, 12 (1997), 1477–1487. https://doi.org/10.1109/43.664229Google ScholarDigital Library
- Yun Liang, Huping Ding, Tulika Mitra, Abhik Roychoudhury, Yan Li, and Vivy Suhendra. 2012. Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores. Real-Time Systems 48, 6 (2012), 638–680. https://doi.org/10.1007/s11241-012-9160-2Google ScholarDigital Library
- Mingsong Lv, Nan Guan, Jan Reineke, Reinhard Wilhelm, and Wang Yi. 2016. A Survey on Static Cache Analysis for Real-Time Systems. Leibniz Transactions on Embedded Systems (LITES) 3, 1 (2016), 05:1–05:48. https://doi.org/10.4230/LITES-v003-i001-a005Google Scholar
- Claire Maiza, Hamza Rihani, Juan M. Rivas, Joël Goossens, Sebastian Altmeyer, and Robert I. Davis. 2019. A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems. ACM Computing Surveys (CSUR) 52, 3 (2019), 1–38. https://doi.org/10.1145/3323212Google ScholarDigital Library
- Kartik Nagar. 2016. Precise analysis of Private and Shared Caches for tight WCET Estimates. Ph. D. Dissertation. Indian Institute of Science Bangalore.Google Scholar
- Kartik Nagar and Y. N. Srikant. 2014. Precise Shared Cache Analysis using Optimal Interference Placement. In IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). 125–134. https://doi.org/10.1109/RTAS.2014.6925996Google ScholarCross Ref
- Dominic Oehlert. 2021. Worst Case Execution Time Oriented Code Optimization of Hard Real-Time Multicore Systems. Ph. D. Dissertation. Technische Universität Hamburg.Google Scholar
- Dominic Oehlert, Selma Saidi, and Heiko Falk. 2018. Compiler-Based Extraction of Event Arrival Functions for Real-Time Systems Analysis. In 30th Euromicro Conference on Real-Time Systems (ECRTS). 4:1–4:22. https://doi.org/10.4230/LIPIcs.ECRTS.2018.4Google Scholar
- The Embedded Microprocessor Benchmark Consortium. 2023. About the EEMBC AutoBench™ Performance Benchmark Suite. EEMBC. Retrieved 2023-01-20 from https://www.eembc.org/autobench/Google Scholar
- Valentin Touzeau, Claire Maïza, David Monniaux, and Jan Reineke. 2019. Fast and Exact Analysis for LRU Caches. Proceedings of the ACM on Programming Languages (POPL) 3 (2019), 54:1–54:29. https://doi.org/10.1145/3290367Google ScholarDigital Library
- Wei Zhang, Mingsong Lv, Wanli Chang, and Lei Ju. 2022. Precise and Scalable Shared Cache Contention Analysis for WCET Estimation. In Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC). 1267–1272. https://doi.org/10.1145/3489517.3530613Google ScholarDigital Library
Index Terms
- Analysis of Shared Cache Interference in Multi-Core Systems using Event-Arrival Curves
Recommendations
An Improved Multi-core Shared Cache Replacement Algorithm
DCABES '12: Proceedings of the 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & ScienceMany multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that traditional LRU and its approximation can lead to poor performance and unfairness when the multiple cores compete for ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecturePractical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Comments