ABSTRACT
Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a solution that trades performance for predictability: at a cost of generally lower performance, the time of accessing the memory becomes predictable.This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way. In order to get predictable cache behavior, we first lock the cache for those parts of the code where the static analysis fails. To minimize the performance degradation, our method loads the cache, if necessary, with data likely to be accessed.Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed program. When compared to an algorithm that assumes compulsory misses when the state of the cache is unknown, our approach eliminates all overestimation for the set of benchmarks, giving an exact WCMP of the transformed program without any significant decrease in performance.
- M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. Cache behaviour prediction by abstract interpretation. In Proceedings of Static Analysis Symposium (SAS'96), Lecture Notes in Computer Science (LNCS) 1145, pages 52--66. Springer-Verlag, September 1996.]] Google ScholarDigital Library
- R. Arnold, F. Mueller, D. Whalley, and M. Harmon. Bounding worst-case instruction cache performance. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94), pages 172--181, 1994.]]Google ScholarCross Ref
- M. Campoy, A. P. Ivars, and J. V. Busquets-Mataix. Static use of locking caches in multitask preemptive real-time systems. In Proceedings of IEEE/IEE Real-Time Embedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium), 2001.]]Google Scholar
- S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing (SC'92), pages 114--124, Nov. 1992.]] Google ScholarDigital Library
- S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layout for hierarchical memory systems. In Proceedings of ACM International Conference on Supercomputing (ICS'99), pages 444--453, Rhodes, Greece, Jun. 1999.]] Google ScholarDigital Library
- S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck. Exact analysis of the cache behavior of nested loops. In ACM SIGPLAN '01 Conference on Programming Language Design and Implementation (PLDI'01), pages 286--297, 2001.]] Google ScholarDigital Library
- S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95), pages 279--290, Jun. 1995.]] Google ScholarDigital Library
- J. Engblom and A. Ermedhal. Modeling complex flows for worst-case execution time analysis. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS 2000), Dec. 2000.]]Google ScholarCross Ref
- C. Ferdinand and R. Wilhelm. Efficient and precise cache behavior prediction for real-time systems. Real-Time Systems, 17:131--181, 1999.]] Google ScholarDigital Library
- B. B. Fraguela, R. Doallo, and E. L. Zapata. Automatic analytical modeling for the estimation of cache misses. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'99), 1999.]] Google ScholarDigital Library
- D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing, 5:587--616, 1988.]] Google ScholarDigital Library
- S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4):703--746, 1999.]] Google ScholarDigital Library
- C. Healey, D. Whalley, and M. Harmon. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95), pages 288--297, 1995.]] Google ScholarDigital Library
- M. Hill. DineroIII: a uniprocessor cache simulator (http://www.cs.wisc.edu/ larus/warts.html).]]Google Scholar
- IBM Microelectronics Division. The PowerPC 440 core, 1999.]]Google Scholar
- M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.]] Google ScholarDigital Library
- S. Kim, S. Min, and R. Ha. Efficient worst case timing analysis of data caching. In Proceedings of IEEE Real-Time Technology and Applications Symposium (RTAS'96), 1996.]] Google ScholarDigital Library
- D. Kirk. SMART (strategic memory allocation for real-time) cache design. In Proceedings of 10th Real-Time Systems Symposium (RTSS'89), Dec. 1989.]]Google ScholarCross Ref
- M. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance of blocked algorithms. In Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91), Apr. 1991.]] Google ScholarDigital Library
- Y. Li, S. Malik, and A. Wolfe. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95), pages 298--307, 1995.]] Google ScholarDigital Library
- Y. Li, S. Malik, and A. Wolfe. Cache modeling and path analysis for real-time software. In Proceedings of 17th Real-Time Systems Symposium (RTSS'96), 1996.]]Google ScholarDigital Library
- S. Lim, Y. Bae, G. Jang, B. Rhee, S. Min, C. Park, H. Shin, K. Park, and C. Kim. An accurate worst case timing analysis technique for RISC processors. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94), pages 97--108, 1994.]]Google Scholar
- T. Lundqvist and P. Stenström. A method to improve the estimated worst-case performance of data caching. In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99), pages 255--262, Dec. 1999.]] Google ScholarDigital Library
- T. Lundqvist and P. Stenström. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of 20th Real-Time Systems Symposium (RTSS'99), Dec. 1999.]] Google ScholarDigital Library
- Motorola Inc. PowerPC 604e RISC Microprocessor Technical Summary, 1996.]]Google Scholar
- T. Mowry, M. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of V Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS'92), pages 62--73, Oct. 1992.]] Google ScholarDigital Library
- G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98), pages 38--49, 1998.]] Google ScholarDigital Library
- F. Sánchez, A. González, and M. Valero. Static locality analysis for cache management. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'97), November 1997.]] Google ScholarDigital Library
- Sun Microelectronics. microSPARC-IIep User's Manual, 1997.]]Google Scholar
- O. Temam, E. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for accessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing (SC'93), pages 410--419, 1993.]] Google ScholarDigital Library
- X. Vera, J. Llosa, A. González, and N. Bermudo. A fast and accurate approach to analyze cache memory behavior. In Proceedings of European Conference on Parallel Computing (Europar'00), 2000.]] Google ScholarDigital Library
- X. Vera and J. Xue. Let's study whole program cache behaviour analytically. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8), Cambridge, Feb. 2002.]] Google ScholarDigital Library
- R. T. White, F. Müller, C. Healy, D. Whalley, and M. Harmon. Timing analysis for data caches and set-associative caches. In Proceedings of Third IEEE Real-Time Technology and Applications Symposium (RTAS'97), pages 192--202, 1997.]] Google ScholarDigital Library
- M. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI`91), pages 30--44, Jun. 1991.]] Google ScholarDigital Library
- J. Xue and X. Vera. Efficient and accurate analytical modeling of whole-program data cache behavior. To Appear in IEEE Transactions of Computers]] Google ScholarDigital Library
Index Terms
- Data cache locking for higher program predictability
Recommendations
Data cache locking for higher program predictability
Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache ...
Data cache locking for tight timing calculations
Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data ...
Dynamic Data-Cache Locking for Minimizing the WCET of a Single Task
Special Issue on LCETES 2015, Special Issue on ACSD 2015 and Special Issue on Embedded Devise Forensics and SecurityCaches have been widely used in modern embedded processors to bridge the increasing speed gap between processors and off-chip memory. In real-time embedded systems, computing the Worst-Case Execution Time (WCET) of a task is essential for the task ...
Comments