ABSTRACT
This paper proposes an approach to safely and tightly bounding data cache performance by computing the worst-case stack distance of data cache accesses. Our approach can not only be applied to direct-mapped caches, but also be used for set-associative or even fully-associative caches without increasing the complexity of analysis. Moreover, the proposed approach can statically categorize worst-case data cache misses into cold, conflict, and capacity misses, which can provide useful insights for designers to enhance the worst-case data cache performance. Our evaluation shows that the proposed data cache timing analysis technique can safely and accurately estimate the worst-case data cache performance, and the overestimation as compared to the observed worst-case data cache misses is within 1% on average.
- K. Beyls and E. D'Hollander. Reuse Distance as a Metric for Cache Behavior. In Proc. of PDCS'01, Aug 2001.Google Scholar
- C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distance. In Proc. of ICS'03, June 2003. Google ScholarDigital Library
- Y. H. Kim, M. D. Hills and D. A. Wood. Implementing stack simulation for highly-associative memories. Technical Report #997, Univ. of Wisconsin, February 1991.Google ScholarDigital Library
- R. White, F. Muller, C. Healy, D. Whalley, and M. Harmon. Timing analysis for data caches and set-associative caches. In Proc. of RTAS, 1997. Google ScholarDigital Library
- Y. S. Li, S. Malik and A. Wolfe. Cache modeling for real-time software: beyond direct mapped instruction caches. In Proc. of the IEEE Real-Time Systems Symposium, 1996. Google ScholarDigital Library
- C. Ferdinand and R. Wilhelm. On predicting data cache behavior for real-time systems. In Proc. of LCTES, 1998. Google ScholarDigital Library
- H. Ramaprasad and F. Mueller. Bounding worst-case data cache behavior by analytically deriving cache reference patterns. In Proc. of RTAS, 2005. Google ScholarDigital Library
- J. Staschulat and R. Ernst. Worst case timing analysis of input dependent data cache behavior. In Proc. of ECRTS, 2006. Google ScholarDigital Library
- S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann Publishers, 1997. Google ScholarDigital Library
- C. Berg, J. Engblom and R. Wilhelm. Requirements for an design of a processor with predictable timing. In Proc. of the Dagstuhl Perspectives Workshop on Design of Systems with Predictable Behavior, 2004.Google Scholar
- C. Rochange and P. Sainrat: Difficulties in computing the WCET for processors with speculative execution. In Proc. of WCET, 2002.Google Scholar
- R. Arnold, F. Muller, D. Whalley, and M. Harmon. Bounding worst-case instruction cache performance. In Proc. of the Real-Time Systems Symposium, 1994.Google ScholarCross Ref
- C. A. Healy, D. B. Whalley, and M. G. Harmon. Integrating the timing analysis of pipelining and instruction caching. In Proc. of the Real-Time Systems Symposium, 1995. Google ScholarDigital Library
- Y. S. Li, S. Malik and A. Wolfe. Performance estimation of embedded software with instruction cache modeling. In Proc. of ICCAD, 1995. Google ScholarDigital Library
- Y. S. Li, S. Malik and A. Wolfe. Efficient microarchitecture modeling and path analysis for real-time software. In Proc. of the 16th IEEE Real-Time Systems Symposium, Dec 1995. Google ScholarDigital Library
- S. Lim, et al. An accurate worst case timing analysis technique for RISC processors. In Proc. of the 15th IEEE Real-Time Systems Symposium, 1994.Google Scholar
- Trimaran homepage, http://www.trimaran.org.Google Scholar
- http://archi.snu.ac.kr/realtime/benchmark/.Google Scholar
- V. Zivojnovic, J. Martinez and C. Schl. DSPstone: A DSP-oriented benchmarking methodology. In Proc. of ICSPAT'94, Oct. 1994.Google Scholar
Index Terms
- Exploiting stack distance to estimate worst-case data cache performance
Recommendations
Stack distance based worst-case instruction cache performance analysis
SAC '11: Proceedings of the 2011 ACM Symposium on Applied ComputingThe worst-case execution time (WCET) analysis is critical to ensure the schedulability and correctness of hard real-time systems. Modern microprocessors, however, make the WCET analysis complicated, mainly because of their performance acceleration ...
Increasing hardware data prefetching performance using the second-level cache
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines
Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...
Comments