ABSTRACT
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the performance of an application. Modern embedded processors often feature cache locking mechanisms that allow memory blocks to be locked in the cache under software control. Cache locking was primarily designed to offer timing predictability for hard real-time applications. Hence, the compiler optimization techniques focus on employing cache locking to improve worst-case execution time. However, cache locking can be quite effective in improving the average-case execution time of general embedded applications as well. In this paper, we explore static instruction cache locking to improve average-case program performance. We introduce temporal reuse profile to accurately and efficiently model the cost and benefit of locking memory blocks in the cache. We propose an optimal algorithm and a heuristic approach that use the temporal reuse profile to determine the most beneficial memory blocks to be locked in the cache. Experimental results show that locking heuristic achieves close to optimal results and can improve the cache miss rate by up to 24% across a suite of real-world benchmarks. Moreover, our heuristic provides significant improvement compared to the state-of-the-art locking algorithm both in terms of performance and efficiency.
- 3rd Generation Intel Xscale Microarchitecture Developers's Manual. Intel, May 2007. http://www.intel.com/design/intelxscale.Google Scholar
- ADSP-BF533 Processor Hardware Reference. Analog Devices, April 2009. http://www.analog.com/static/imported-files/processor_manuals/bf533_hwr_Rev3.4.pdf.Google Scholar
- ARM Cortex A-8 Technical Reference Manual. ARM, Revised March 2004. http://www.arm.com/products/CPUs/families/ARMCortexFamily.html.Google Scholar
- ARM1156T2-S Technical Reference Manual. ARM, Revised July 2007. http://www.arm.com/products/CPUs/families/ARM11Family.html.Google Scholar
- K. Anand and R. Barua. Instruction cache locking inside a binary rewriter. In CASES, 2009. Google ScholarDigital Library
- T. Austin et al. Simplescalar: An infrastructure for computer system modeling. Computer, 35(2):59--67, 2002. Google ScholarDigital Library
- K. Beyls and E. H. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, 2001.Google Scholar
- B. Bryan and J. K. Hollingsworth. An API for runtime code patching. Int. J. High Perform. Comput. Appl., 14(4), 2000. Google ScholarDigital Library
- C. Ding and Y. T. Zhong. Predicting whole-program locality through reuse distance analysis. SIGPLAN Not., 38(5), 2003. Google ScholarDigital Library
- H. Falk et al. Compile-time decided instruction cache locking using worst-case execution paths. In CODES+ISSS, 2007. Google ScholarDigital Library
- N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Trans. Program. Lang. Syst., 21(5):977--1027, 1999. Google ScholarDigital Library
- I. Puaut and D. Decotigny. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In RTSS, 2002. Google ScholarDigital Library
- V. Suhendra and T. Mitra. Exploring locking & partitioning for predictable shared caches on multi-cores. In DAC, 2008. Google ScholarDigital Library
- X. Vera et al. Data cache locking for higher program predictability. In SIGMETRICS, 2003. Google ScholarDigital Library
- H. Yang et al. Improving power efficiency with compiler-assisted cache replacement. J. Embedded Comput., 1(4):487--499, 2005. Google ScholarDigital Library
Index Terms
- Instruction cache locking using temporal reuse profile
Recommendations
Phase-based Cache Locking for Embedded Systems
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSISince caches are commonly used in embedded systems, which typically have stringent design constraints imposed by physical size, battery capacity, real-time deadlines, etc., much research focuses on cache optimizations, such as improved performance and/...
Instruction-Cache Locking for Improving Embedded Systems Performance
Special Issue on Embedded Platforms for Crypto and Regular PapersCache memories in embedded systems play an important role in reducing the execution time of applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, improving the runtime over a ...
Instruction Cache Locking Using Temporal Reuse Profile
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the performance of an application. Modern embedded processors often feature cache ...
Comments