Abstract
Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. In addition, caches are a source of unpredictability, resulting in programs sometimes behaving in a different way than expected. Detailed information about the number of cache misses and their causes allows us to predict cache behavior and to detect bottlenecks. Small modifications in the source code may change memory patterns, thereby altering the cache behavior. Code transformations, which take the cache behavior into account, might result in a high cache performance improvement. However, cache memory behavior is very hard to predict, thus making the task of optimizing and timing cache behavior very difficult. This article proposes and evaluates a new compiler framework that times cache behavior for multitasking systems. Our method explores the use of cache partitioning and dynamic cache locking to provide worst-case performance estimates in a safe and tight way for multitasking systems. We use cache partitioning, which divides the cache among tasks to eliminate intertask cache interferences. We combine static cache analysis and cache-locking mechanisms to ensure that all intratask conflicts, and consequently, memory access times, are exactly predictable. The results of our experiments demonstrate the capability of our framework to describe cache behavior at compile time. We compare our timing approach with a system equipped with a nonpartitioned, but statically, locked data cache. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, demonstrating that our fully predictable scheme does not compromise the performance of the transformed programs.
- Alt, M., Ferdinand, C., Martin, F., and Wilhelm, R. 1996. Cache behaviour prediction by abstract interpretation. In Proceedings of Static Analysis Symposium (SAS'96). Lecture Notes in Computer Science (LNCS) 1145. Springer-Verlag, New York, 52--66. Google ScholarDigital Library
- Arnold, R., Müeller, F., Whalley, D., and Harmon, M. 1994. Bounding worst-case instruction cache performance. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 172--181.Google Scholar
- Basumallick, S. and Nielsen, K. 1994. Cache issues in real-time systems. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'94).Google Scholar
- Burns, A. and Wellings, A. 1993. The impact of an Ada run-time system's performance characteristics on scheduling models. In Proceedings of 12th Ada-Europe International Conference. 240--248. Google ScholarDigital Library
- Burns, A., Tindell, K., and Wellings, A. 1995. Effective analysis for engineering real-time fixed priority schedulers. IEEE Transactions on Software Engineering 21, 475--480. Google ScholarDigital Library
- Busquets-Mataix, J. V., Serrano, J. J., .Ors, R., Gil, P., and Wellings, A. 1996. Adding instruction cache effect to schedulability analysis of preemptive real-time systems. In Proceedings of 2nd Real-Time Technology and Applications Symposium (RTAS'96). Google ScholarDigital Library
- Busquets-Mataix, J. V., Serrano, J. J., and Wellings, A. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of 9th Euromicro Workshop on Real-Time Systems (EUROMICRO-RTS'97).Google Scholar
- Campoy, M., Ivars, A. P., and Busquets-Mataix, J. V. 2001. Static use of locking caches in multitask preemptive real-time systems. In Proceedings of IEEE/IEE Real-Time Embedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium).Google Scholar
- Engblom, J. and Ermedahl, A. 1999. Pipeline timing analysis using a trace-driven simulator. In Proceedings of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99). Google ScholarDigital Library
- Engblom, J. and Ermedahl, A. 2000. Modeling complex flows for worst-case execution time analysis. In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS'00). Google ScholarDigital Library
- Ermedahl, A. and Gustafsson, J. 1997. Deriving annotations for tight calculation of execution time. In Proceedings of Euro-Par (EUROPAR'97). 1298--1307. Google ScholarDigital Library
- Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, G. R. Perrin and A. Darte, Eds. Lecture Notes in Computer Science 1132. Springer Verlag, New York, 79--103. Google ScholarDigital Library
- Ferdinand, C. and Wilhelm, R. 1999. Efficient and precise cache behavior prediction for real-time systems. Real-Time Systems 17, 131--181. Google ScholarDigital Library
- Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing 5, 587--616. Google ScholarDigital Library
- Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4, 703--746. Google ScholarDigital Library
- Gustafsson, J. 2000. Analyzing execution time of object-oriented programs using abstract interpretation. Ph.D. thesis, Uppsala University.Google Scholar
- Healey, C. A., Whalley, D., and Harmon, M. 1995. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 288--297. Google ScholarDigital Library
- Hennessy, J. L. and Patterson, D. A. 1996. Computer architecture: a quantitative approach. Morgan Kaufman Pub., San Francisco, CA. Google ScholarDigital Library
- IBM Microelectronics Division. 1999. The PowerPC 440 core.Google Scholar
- Integrated Device Technologies. 2001. 79RC64574/RC64575 Data Sheet.Google Scholar
- Jeffay, K. and Stone, D. L. 1993. Accounting for interrupt handling costs in dynamic priority task systems. In Proceedings of 14th Real-Time Systems Symposium (RTSS'93). 212--221.Google Scholar
- Joseph, M. and Pandya, P. 1986. Finding response times in a real-time system. The Computer Journal 29, 5, 390--395.Google ScholarCross Ref
- Katcher, A. I., Arakawa, H., and Strosnider, J. K. 1993. Engineering and analysis of fixed priority schedulers. IEEE Transactions on Software Engineering 19, 920--934. Google ScholarDigital Library
- Kim, S. K., Min, S. L., and Ha, R. 1996. Efficient worst case timing analysis of data caching. In Proceedings of IEEE Real-Time Technology and Applications Symposium (RTAS'96). Google ScholarDigital Library
- Kirk, D. B. 1989. SMART (strategic memory allocation for real-time) cache design. In Proceedings of 10th Real-Time Systems Symposium (RTSS'89).Google ScholarCross Ref
- Lam, M., Rothberg, E. E., and Wolf, M. E. 1991. The cache performance of blocked algorithms. In Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91). Google ScholarDigital Library
- Lee, C. G., Hahn, J., Seo, Y. M., Min, S. L., Ha, R., Hong, S., Park, C. Y., Lee, M., and Kim, C. S. 1998. Analysis of cache-related preemption delay in fixed-priority preemptive scheduling. IEEE Transaction on Computers 47. Google ScholarDigital Library
- Li, Y. T. S., Malik, S., and Wolfe, A. 1995. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 298--307. Google ScholarDigital Library
- Li, Y. T. S., Malik, S., and Wolfe, A. 1996. Cache modeling and path analysis for real-time software. In Proceedings of 17th Real-Time Systems Symposium (RTSS'96).Google Scholar
- Liedtke, J., Härtig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97). Google ScholarDigital Library
- Lim, S. S., Bae, Y. H., Jang, G. T., Rhee, B. D., Min, S. L., Park, C. Y., Shin, H., Park, K., and Kim, C. S. 1994. An accurate worst case timing analysis technique for RISC processors. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 97--108.Google Scholar
- Lundqvist, T. and Stenström, P. 1998. Integrating path and timing analysis using instruction-level simulation techniqes. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98). 1--15. Google ScholarDigital Library
- Lundqvist, T. and Stenström, P. 1999a. A method to improve the estimated worst-case performance of data caching. In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99). 255--262. Google ScholarDigital Library
- Lundqvist, T. and Stenström, P. 1999b. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of 20th Real-Time Systems Symposium (RTSS'99). Google ScholarDigital Library
- MIPS Technologies. 2001. MIPS32 4Kp- Embedded, MIPS Processor Core.Google Scholar
- Motorola Inc. 1996. PowerPC 604e RISC Microprocessor Technical Summary.Google Scholar
- Müeller, F. 1995. Compiler support for software-based cache partitioning. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'95). Google ScholarDigital Library
- Puaut, I. and Decotigny, D. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of 23th Real-Time Systems Symposium (RTSS'02). Google ScholarDigital Library
- Rivera, G. and Tseng, C.-W. 1998. Data transformations for eliminating conflict misses. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98). 38--49. Google ScholarDigital Library
- Sánchez, F., González, A., and Valero, M. 1997. Static locality analysis for cache management. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'97). Google ScholarDigital Library
- Sun Microelectronics. 1997. microSPARC-IIep User's Manual.Google Scholar
- Tindell, K., Burns, A., and Wellings, A. 1994. An extendible approach for analysing fixed priority hard real-time tasks. Real-Time Systems 6, 1, 133--151. Google ScholarDigital Library
- Vera, X. and Xue, J. 2002. Let's study whole program cache behaviour analytically. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8). Cambridge. Google ScholarDigital Library
- Vera, X., Abella, J., González, A., and Llosa, J. 2003a. Optimizing program locality through CMEs and GAs. In Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03). New Orleans. Google ScholarDigital Library
- Vera, X., Lisper, B., and Xue, J. 2003b. Data cache locking for higher program predictability. In Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'03). 272--282. Google ScholarDigital Library
- Vera, X., Lisper, B., and Xue, J. 2003c. Data caches in multitasking hard real-time systems. In Proceedings of International Real-Time Systems Symposium (RTSS03). Google ScholarDigital Library
- Vera, X., Bermudo, N., Llosa, J., and González, A. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 26, 2. Google ScholarDigital Library
- White, R. T., Müeller, F., Healy, C., Whalley, D., and Harmon, M. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of Third IEEE Real-Time Technology and Applications Symposium (RTAS'97). 192--202. Google ScholarDigital Library
- Wilson, R. P. 1997. Efficient context-sensitive pointer analysis for C programs. Ph.D. thesis, Stanford University. Google ScholarDigital Library
- Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91). 30--44. Google ScholarDigital Library
- Wolfe, A. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems.Google Scholar
- Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publ., Boston, MA. Google ScholarDigital Library
- Xue, J. and Huang, C.-H. 1998. Reuse-driven tiling for data locality. International Journal of Parallel Programming 26, 6, 671--696. Google ScholarDigital Library
Index Terms
- Data cache locking for tight timing calculations
Recommendations
Data cache locking for higher program predictability
Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache ...
Data cache locking for higher program predictability
SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsCaches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache ...
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Comments