ABSTRACT
We propose a technique which leverages configurable data caches to address the problem of cache interference in multitasking embedded systems. Data caches are often necessary to provide the required memory bandwidth. However, caches introduce two important problems for embedded systems. Cache outcomes in multi-tasking environments are notoriously difficult to predict, if not impossible, thus resulting in poor real-time guarantees. Additionally, caches contribute to a significant amount of power. These issues are key factors for many embedded systems. We study the effect of multiple tasks on the data cache, and propose a technique which leverages configurable cache architectures to eliminate inter-task cache interference. By mapping tasks to different cache partitions, interference is completely eliminated with only a minimal impact on performance. Furthermore, dynamic and leakage power are significantly reduced as only a subset of the cache is active at any moment. We introduce a profile-based, static analysis algorithm, which identifies a beneficial cache partitioning. The OS configures the data cache during context-switch by activating the corresponding partition.Our experiments on a large set of multitasking benchmarks demonstrate that our technique not only efficiently eliminates inter-task interference but also significantly reduces both dynamic and leakage power.
- A. Agarwal, J. Hennessy, and M. Horowitz. Cache performance of operating system and multiprogramming workloads. ACM Transactions on Computer Systems, 6(4):393--431, 1988. Google ScholarDigital Library
- D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In International Symposium on Microarchitecture (MICRO), pages 248--259, November 1999. Google ScholarDigital Library
- M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. Cache behaviour prediction by abstract interpretation. In Static Analysis Symposium (SAS), pages 52--66, 1996. Google ScholarDigital Library
- R. Arnold, F. Mueller, D. Whalley, and M. Harmon. Bounding worst-case instruction cache performance. In Real-Time Systems Symposium (RTSS), page 172--181, 1994.Google ScholarCross Ref
- T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Computer, 35(2):59--67, February 2002. Google ScholarDigital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi- processor architecture. In International Symposium on High-Performance Computer Architecture (HPCA), 2005. Google ScholarDigital Library
- H.-C. Chen and J.-S. Chiang. A highly configurable cache architecture for embedded systems. In Communications, Computers and signal Processing (PACRIM), pages 315--318, 2001.Google Scholar
- K. Flautner, N. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. In International Symposium on Computer Architecture (ISCA), pages 148--157, May 2002. Google ScholarDigital Library
- M. Guthaus, J. S. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC-4: Workshop on Workload Characterization, pages 3--14, December 2001. Google ScholarDigital Library
- R. Heckmann, M. Langenbach, S. Thesing, and R. Wilhelm. The influence of processor architecture on the design and the results of wcet tools. Proceedings of the IEEE, 91 (7):1038--1054, July 2003.Google ScholarCross Ref
- J. Hu, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin. Analyzing data reuse for cache reconfiguration. ACM Transactions on Embedded Computing Systems, 4(4):851--876, 2005. Google ScholarDigital Library
- R. Kirner and P. Puschner. Transformation of path information for wcet analysis during compilation. In Euromicro Conference on Real-Time Systems (ECRTS), page 29, 2001. Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In International Symposium on Microarchitecture (MICRO), pages 330--335, December 1997. Google ScholarDigital Library
- Y. S. Li, S. Malik, and A. Wolfe. Cache modeling for real-time software. In IEEE Real-Time Systems Symposium,, pages 148--157, 1997. Google ScholarDigital Library
- A. Malik, B. Moyer, and D. Cermak. A low-power unified cache architecture providing power and performance flexibility. In International Symposium on Low Power Electronics and Design (ISLPED), pages 241--243, 2000. Google ScholarDigital Library
- J. Mogul and A. Borg. The effect of context switches on cache performance. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 75--84, 1991. Google ScholarDigital Library
- F. Mueller. Compiler support for software-based cache partitioning. In Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 125--133, 1995. Google ScholarDigital Library
- M. Powell, S.-H. Yang, B. Falsa., K. Roy, and T. N. Vijaykumar. Gated-vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In International Symposium on Low Power Electronics and Design (ISLPED), pages 90--95, 2000. Google ScholarDigital Library
- M. Powell, S.-H. Yang, B. Falsa., K. Roy, and T. N. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In International Symposium on High - Performance Computer Architecture (HPCA), pages 147--157, 2001. Google ScholarDigital Library
- J. Starner and L. Asplund. Measuring the cache interference cost in preemptive real-time systems. In Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 146--154, 2004. Google ScholarDigital Library
- Y. Tan and I. V. J. Mooney. Wcrt analysis for a uniprocessor with a unified prioritized cache. In Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 175--182, 2005. Google ScholarDigital Library
- K. Tanaka. Fast context switching by hierarchical task allocation and reconfigurable cache. In Innovative Architecture of Future Generation High-Performance Processors and Systems (IWIA), 2003.Google ScholarCross Ref
- D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0: An integrated cache timing, power and area model. Technical report, HP Laboratories Palo Alto, June 2006.Google Scholar
- X. Vera, B. Lisper, and X. Jingling. Data caches in multitasking hard real-time systems. In Real-Time Systems Symposium (RTSS), pages 145--165, 2003. Google ScholarDigital Library
- S. Wang and L. Wang. Thread-associative memory for multicore and multithreaded computing. In International Symposium on Low-Power Electronics and Design (ISLPED), pages 139--142, 2006. Google ScholarDigital Library
- A. Wolfe. Software-based cache partitioning for real-time applications. Journal of Computer and Software Engineering, 2(3):315--327, 1994. Google ScholarDigital Library
- S.-H. Yang, B. Falsa., M. D. Powell, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. Symposium on High-Performance Computer Architecture (HPCA), 00:0151, 2002. Google ScholarDigital Library
- C. Zhang, F. Vahid, and R. Lysecky. A self-tuning cache architecture for embedded systems. ACM Transactions on Embedded Computing Systems, 3(2):407--425, 2004. Google ScholarDigital Library
- C. Zhang, F. Vahid, and W. Najjar. A highly configurable cache architecture for embedded systems. In International Symposium on Computer Architecture (ISCA), pages 136--146, 2003. Google ScholarDigital Library
Index Terms
- Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems
Recommendations
A low-power cache scheme for embedded computing
Issues in embedded single-chip multicore architecturesThis paper proposes an efficient cache scheme to reduce power consumption and conflict misses for single-core or multi-core embedded computing architecture. The proposed cache requires an additional gate stage before it accesses the cache line, which ...
Measuring the cache interference cost in preemptive real-time systems
LCTES '04: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsCaches exploits locality of references to reduce memory access latencies and thereby improve processor performance. When an operating system switches application task or performs other kernel services, the assumption of locality may be violated because ...
Data filter cache with word selection cache for low power embedded processor
RACS '13: Proceedings of the 2013 Research in Adaptive and Convergent SystemsFilter cache was proposed to reduce power consumption. The proposers inserted a small and fast cache, which is called Filter Cache, between core and L1 cache. Filter cache reduced the number of accesses to L1 cache and a significant power savings is ...
Comments