skip to main content
research-article

Data cache locking for tight timing calculations

Published:12 December 2007Publication History
Skip Abstract Section

Abstract

Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. In addition, caches are a source of unpredictability, resulting in programs sometimes behaving in a different way than expected. Detailed information about the number of cache misses and their causes allows us to predict cache behavior and to detect bottlenecks. Small modifications in the source code may change memory patterns, thereby altering the cache behavior. Code transformations, which take the cache behavior into account, might result in a high cache performance improvement. However, cache memory behavior is very hard to predict, thus making the task of optimizing and timing cache behavior very difficult. This article proposes and evaluates a new compiler framework that times cache behavior for multitasking systems. Our method explores the use of cache partitioning and dynamic cache locking to provide worst-case performance estimates in a safe and tight way for multitasking systems. We use cache partitioning, which divides the cache among tasks to eliminate intertask cache interferences. We combine static cache analysis and cache-locking mechanisms to ensure that all intratask conflicts, and consequently, memory access times, are exactly predictable. The results of our experiments demonstrate the capability of our framework to describe cache behavior at compile time. We compare our timing approach with a system equipped with a nonpartitioned, but statically, locked data cache. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, demonstrating that our fully predictable scheme does not compromise the performance of the transformed programs.

References

  1. Alt, M., Ferdinand, C., Martin, F., and Wilhelm, R. 1996. Cache behaviour prediction by abstract interpretation. In Proceedings of Static Analysis Symposium (SAS'96). Lecture Notes in Computer Science (LNCS) 1145. Springer-Verlag, New York, 52--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arnold, R., Müeller, F., Whalley, D., and Harmon, M. 1994. Bounding worst-case instruction cache performance. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 172--181.Google ScholarGoogle Scholar
  3. Basumallick, S. and Nielsen, K. 1994. Cache issues in real-time systems. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'94).Google ScholarGoogle Scholar
  4. Burns, A. and Wellings, A. 1993. The impact of an Ada run-time system's performance characteristics on scheduling models. In Proceedings of 12th Ada-Europe International Conference. 240--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Burns, A., Tindell, K., and Wellings, A. 1995. Effective analysis for engineering real-time fixed priority schedulers. IEEE Transactions on Software Engineering 21, 475--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Busquets-Mataix, J. V., Serrano, J. J., .Ors, R., Gil, P., and Wellings, A. 1996. Adding instruction cache effect to schedulability analysis of preemptive real-time systems. In Proceedings of 2nd Real-Time Technology and Applications Symposium (RTAS'96). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Busquets-Mataix, J. V., Serrano, J. J., and Wellings, A. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of 9th Euromicro Workshop on Real-Time Systems (EUROMICRO-RTS'97).Google ScholarGoogle Scholar
  8. Campoy, M., Ivars, A. P., and Busquets-Mataix, J. V. 2001. Static use of locking caches in multitask preemptive real-time systems. In Proceedings of IEEE/IEE Real-Time Embedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium).Google ScholarGoogle Scholar
  9. Engblom, J. and Ermedahl, A. 1999. Pipeline timing analysis using a trace-driven simulator. In Proceedings of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Engblom, J. and Ermedahl, A. 2000. Modeling complex flows for worst-case execution time analysis. In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS'00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ermedahl, A. and Gustafsson, J. 1997. Deriving annotations for tight calculation of execution time. In Proceedings of Euro-Par (EUROPAR'97). 1298--1307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, G. R. Perrin and A. Darte, Eds. Lecture Notes in Computer Science 1132. Springer Verlag, New York, 79--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ferdinand, C. and Wilhelm, R. 1999. Efficient and precise cache behavior prediction for real-time systems. Real-Time Systems 17, 131--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing 5, 587--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4, 703--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gustafsson, J. 2000. Analyzing execution time of object-oriented programs using abstract interpretation. Ph.D. thesis, Uppsala University.Google ScholarGoogle Scholar
  17. Healey, C. A., Whalley, D., and Harmon, M. 1995. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 288--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hennessy, J. L. and Patterson, D. A. 1996. Computer architecture: a quantitative approach. Morgan Kaufman Pub., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. IBM Microelectronics Division. 1999. The PowerPC 440 core.Google ScholarGoogle Scholar
  20. Integrated Device Technologies. 2001. 79RC64574/RC64575 Data Sheet.Google ScholarGoogle Scholar
  21. Jeffay, K. and Stone, D. L. 1993. Accounting for interrupt handling costs in dynamic priority task systems. In Proceedings of 14th Real-Time Systems Symposium (RTSS'93). 212--221.Google ScholarGoogle Scholar
  22. Joseph, M. and Pandya, P. 1986. Finding response times in a real-time system. The Computer Journal 29, 5, 390--395.Google ScholarGoogle ScholarCross RefCross Ref
  23. Katcher, A. I., Arakawa, H., and Strosnider, J. K. 1993. Engineering and analysis of fixed priority schedulers. IEEE Transactions on Software Engineering 19, 920--934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kim, S. K., Min, S. L., and Ha, R. 1996. Efficient worst case timing analysis of data caching. In Proceedings of IEEE Real-Time Technology and Applications Symposium (RTAS'96). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kirk, D. B. 1989. SMART (strategic memory allocation for real-time) cache design. In Proceedings of 10th Real-Time Systems Symposium (RTSS'89).Google ScholarGoogle ScholarCross RefCross Ref
  26. Lam, M., Rothberg, E. E., and Wolf, M. E. 1991. The cache performance of blocked algorithms. In Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lee, C. G., Hahn, J., Seo, Y. M., Min, S. L., Ha, R., Hong, S., Park, C. Y., Lee, M., and Kim, C. S. 1998. Analysis of cache-related preemption delay in fixed-priority preemptive scheduling. IEEE Transaction on Computers 47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Li, Y. T. S., Malik, S., and Wolfe, A. 1995. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 298--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li, Y. T. S., Malik, S., and Wolfe, A. 1996. Cache modeling and path analysis for real-time software. In Proceedings of 17th Real-Time Systems Symposium (RTSS'96).Google ScholarGoogle Scholar
  30. Liedtke, J., Härtig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lim, S. S., Bae, Y. H., Jang, G. T., Rhee, B. D., Min, S. L., Park, C. Y., Shin, H., Park, K., and Kim, C. S. 1994. An accurate worst case timing analysis technique for RISC processors. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 97--108.Google ScholarGoogle Scholar
  32. Lundqvist, T. and Stenström, P. 1998. Integrating path and timing analysis using instruction-level simulation techniqes. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98). 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lundqvist, T. and Stenström, P. 1999a. A method to improve the estimated worst-case performance of data caching. In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99). 255--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lundqvist, T. and Stenström, P. 1999b. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of 20th Real-Time Systems Symposium (RTSS'99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. MIPS Technologies. 2001. MIPS32 4Kp- Embedded, MIPS Processor Core.Google ScholarGoogle Scholar
  36. Motorola Inc. 1996. PowerPC 604e RISC Microprocessor Technical Summary.Google ScholarGoogle Scholar
  37. Müeller, F. 1995. Compiler support for software-based cache partitioning. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'95). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Puaut, I. and Decotigny, D. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of 23th Real-Time Systems Symposium (RTSS'02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rivera, G. and Tseng, C.-W. 1998. Data transformations for eliminating conflict misses. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98). 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sánchez, F., González, A., and Valero, M. 1997. Static locality analysis for cache management. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sun Microelectronics. 1997. microSPARC-IIep User's Manual.Google ScholarGoogle Scholar
  42. Tindell, K., Burns, A., and Wellings, A. 1994. An extendible approach for analysing fixed priority hard real-time tasks. Real-Time Systems 6, 1, 133--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Vera, X. and Xue, J. 2002. Let's study whole program cache behaviour analytically. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8). Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Vera, X., Abella, J., González, A., and Llosa, J. 2003a. Optimizing program locality through CMEs and GAs. In Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03). New Orleans. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vera, X., Lisper, B., and Xue, J. 2003b. Data cache locking for higher program predictability. In Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'03). 272--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vera, X., Lisper, B., and Xue, J. 2003c. Data caches in multitasking hard real-time systems. In Proceedings of International Real-Time Systems Symposium (RTSS03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Vera, X., Bermudo, N., Llosa, J., and González, A. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 26, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. White, R. T., Müeller, F., Healy, C., Whalley, D., and Harmon, M. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of Third IEEE Real-Time Technology and Applications Symposium (RTAS'97). 192--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wilson, R. P. 1997. Efficient context-sensitive pointer analysis for C programs. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91). 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wolfe, A. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems.Google ScholarGoogle Scholar
  52. Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publ., Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xue, J. and Huang, C.-H. 1998. Reuse-driven tiling for data locality. International Journal of Parallel Programming 26, 6, 671--696. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data cache locking for tight timing calculations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 7, Issue 1
            December 2007
            234 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/1324969
            Issue’s Table of Contents

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 December 2007
            • Accepted: 1 April 2006
            • Revised: 1 December 2004
            • Received: 1 April 2003
            Published in tecs Volume 7, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader