ABSTRACT
Scratchpad memory has been introduced as a replacement for cache memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that scratchpad memory can significantly reduce the energy consumption of the memory hierarchy of embedded systems. This is significant, as the memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction memory scratchpad based on a novel methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the scratchpad memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the scratchpad memory from the main memory is reduced by 88%.
- J. Montanaro et al., "A 160MHz, 32b, 0.5W CMOS RISC microprocessor," JSSC, vol.31 (11), pp. 1703--1712, 1996.Google Scholar
- R. Banakar et.al., "Scratchpad Memory: A Design Alternative for Cache On-chip Memory in Embedded Systems," CODES, 2002. Google ScholarDigital Library
- O. Avissar and R Barua, "An Optimal Memory Allocation Scheme for Scratch-Pad-Based Embedded Systems," ACM Trans. on Embedded Computing Systems, vol. 1, pp. 6--26, 2002. Google ScholarDigital Library
- P.R. Panda, "Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications," European Design and Test Conference, Proceedings of, 1997. Google ScholarDigital Library
- M. Kandemir et.al., "Dynamic Management of Scratch-Pad Memory Space," DAC, 2001. Google ScholarDigital Library
- M. Kandemir and A. Choudhary, "Compiler-Directed Scratch Pad Memory Hierarchy Design and Management," DAC, 2002. Google ScholarDigital Library
- F. Angiolini et.al., "Polynomial-Time Algorithm for On-Chip Scratchpad Memory Partitioning," CASES, 2003. Google ScholarDigital Library
- S. Udayakumaran and R. Barua, "Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems," CASES, 2003. Google ScholarDigital Library
- S. Steinke et.al., "Assigning Program and Data Objects to Scratchpad for Energy Reduction," DATE, 2002. Google ScholarDigital Library
- S. Steinke et.al., "Reducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory," ISSS, 2002. Google ScholarDigital Library
- A. Janapsatya et.al., "Hardware/Software Managed Scratchpad Memory for Embedded System," ICCAD, 2004. Google ScholarDigital Library
- G. Ramalingam, "On Loops, Dominators, and Dominance Frontier," PLDI, 2000. Google ScholarDigital Library
- P. Havlak, "Nesting of Reducible and Irreducible Loops," ACM Transactions on Programming Languages and Systems, 1997. Google ScholarDigital Library
- V. C. Sreedhar et. al., "Identifying Loops using DJ Graphs," ACM Transactions on Programming Languages and Systems, 1996. Google ScholarDigital Library
- B. Steensgaard, "Sequentializing Program Dependence Graphs for Irreducible Programs," Technical Report MSR_TR-93-14, Microsoft Research, Redmond, Washington, October 1993.Google Scholar
- N. Gloy and M. D. Smith, "Procedure Placement Using Temporal-Ordering Information," Programming Languages and Systems. ACM Transactions on, Vol. 32, No. 5, Pages 977--1027, September 1999. Google ScholarDigital Library
- P. Grun et. al., "Access Pattern-Based Memory and Connectivity Architecture Exploration," Embedded Computing Systems. ACM Transactions on, Vol. 2, No. 1, Pages 33.73, February 2003. Google ScholarDigital Library
- A. Ramachandran and M. F. Jacome, "Xtream-Fit: An Energy-Delay Efficient Data Memory Subsystem for Embedded Media Processing," DAC, 2003. Google ScholarDigital Library
- M. Verma et, al., "Dynamic Overlay of Scratchpad Memory for Energy Minimization," CODES+ISSS, 2004. Google ScholarDigital Library
- M. Kandemir et. al., "Exploiting Scratch-Pad Memory Using Presburger Formulas," ISSS, 2001. Google ScholarDigital Library
- S. Parameswaran and J. Henkel, "I-CoPES: fast instruction code placement for embedded systems to improve performance and energy efficiency," ICCAD, 2001. Google ScholarDigital Library
- P. P. Chang and et.al., "IMPACT: an architectural framework for multiple-instruction-issue processors," Computer Architecture News, vol. 19, no. 3, 1991. Google ScholarDigital Library
- S. McFarling, "Program optimization for instruction caches," ASPLOS, 1989. Google ScholarDigital Library
- S.McFarling, "Procedure merging with instruction caches," SIGPLAN Notices, vol. 26, no. 6, 1991. Google ScholarDigital Library
- P. Panda and et.al., "Memory Organization for Improved Data Cache Performance in Embedded Processors, ISSS, 1996. Google ScholarDigital Library
- P. Panda and et.al., "A Data Alignment Technique for Improving Cache Performance," ICCD, 1997.Google Scholar
- H. Tomiyama and H. Yasuura, "Optimal code Placement of Embedded Software for Instruction Cache," EDAC, 1996. Google ScholarDigital Library
- S. Bartolini and C. A. Prete, "A cache-aware program transformation technique suitable for embedded systems," Information and Software Technology 44(13), 2002.Google Scholar
- D. Burger and T. M. Austin, "The SimpleScalar Tool Set, Version 2.0," TR-CS-1342, University of Wisconsin-madison, June 1997. Google ScholarDigital Library
- J. Edler and M. D. Hill, "Dinero IV Trace-Driven Uniprocessor Cache Simulator," http://www.cs.wisc.edu/markhill/DineroIV/.Google Scholar
- D. Brooks et.al., "Wattch: A Framework for Architectural-Level Power Analysis and Optimizations," ISCA, 2000. Google ScholarDigital Library
- P. Shivakumar and N. P. Jouppi, "Cacti 3.0: An Integrated Cache Timing, Power, and Area Model," Technical Report 2001/2, Compaq Computer Corporation, August, 2001. 2001.Google Scholar
- IBM Microelectronics Division, "Embedded DRAM SA-27E," http://ibm.com/chips, 2002.Google Scholar
- C. Lee et.al., "MediaBench: A Tool for Evaluating Multimedia and Communications Systems," IEEE MICRO 30, 1997. Google ScholarDigital Library
Index Terms
- A novel instruction scratchpad memory optimization method based on concomitance metric
Recommendations
Dynamic scratchpad memory management for code in portable systems with an MMU
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache ...
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded softwareIn this paper,we present a dynamic scratchpad memory allocation strategy targeting a horizontally partitioned memory subsystem for contemporary embedded processors. The memory subsystem is equipped with a memory management unit (MMU), and physically ...
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors
Regular PapersThe recent emergence of various Non-Volatile Memories (NVMs), with many attractive characteristics such as low leakage power and high-density, provides us with a new way of addressing the memory power consumption problem. In this article, we target ...
Comments