Abstract
This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.
- Appel, A. and George, L. 2001. Optimal spilling for CISC machines with few registers. In Proceedings of the SIGPLAN '01 Conference on Program Language Design and Implementation (Snowbird, UT, June).]] Google Scholar
- Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In 10th International Symposium on Hardware/Software Codesign (CODES) (Estes Park, CO, May 6--8). ACM, New York.]] Google Scholar
- Barua, R., Lee, W., Amarasinghe, S., and Agarwal, A. 2001. Compiler support for scalable and efficient memory systems. IEEE Trans. Comput., Special Issue on Advances in High Performance Memory Systems (Nov.).]] Google Scholar
- Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. 47, 9 (Sept.).]]Google Scholar
- Consortium, T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]]Google Scholar
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. Available at http://www.eecs.umich.edu/jringenb/mibench/.]]Google Scholar
- Hallnor, G., and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA) (Vancouver, B.C., Canada, June).]] Google Scholar
- Hennessy, J. and Patterson, D. 1996. Computer Architecture A Quantitative Approach. 2nd ed. Morgan-Kaufmann, Palo Alto, Calif.]] Google Scholar
- Lee, T.-C., Tiwari, V., Malik, S., and Fujita, M. 1997. Power analysis and minimization techniques for embedded DSP software. IEEE Trans. VLSI Syst. (Mar.).]] Google Scholar
- Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.]]Google Scholar
- Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for flexible compiler generated data caching. In The 2d Workshop on Intelligent Memory Systems (Boston, MA, Nov. 12).]] Google Scholar
- CPU12 Reference Manual. Motorola Corporation, 2000. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16_BIT/68HC12_FAMILY/REF_MAT/CPU12RM.pdf.]]Google Scholar
- M-CORE---MMC2001 Reference Manual. Motorola Corporation, 1998. http://www.motorola. com/SPS/MCORE/info_documentation.htm.]]Google Scholar
- New York City, Office of Budget and Management. 1999. Website on frequently asked questions on linear programming. http://www.eden.rutgers.edu/∼pil/FAQ.html. New York, NY.]]Google Scholar
- University of Toronto Digital Signal Processing (UTDSP). 1992. University of Toronto Digital Signal Processing (UTDSP) Benchmark Suite. Available at http://www.eecg.toronto.edu/.]]Google Scholar
- Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July).]] Google Scholar
- Paulin, P., Liem, C., Cornero, M., Nacabal, F., and Goossens, G. 1997. Embedded software in real-time signal processing systems: Application and architecture trends. Invited paper, Proc. IEEE 85, 3 (Mar.).]]Google Scholar
- Rutter, P., Orost, J., and Gloistein, D. BTOA: Binary to printable ASCII converter source code. Available at http://www.bookcase.com/library/software/msdos.devel.lang.c.html.]]Google Scholar
- Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems. Dec.]]Google Scholar
- Sjodin, J. and von Platen, C. 2001. Storage allocation for embedded processors. 2001 Compiler and Architecture Support for Embedded Computing Systems. Nov.]] Google Scholar
- TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.]]Google Scholar
Index Terms
- An optimal memory allocation scheme for scratch-pad-based embedded systems
Recommendations
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
This article presents the first memory allocation scheme for embedded systems having a scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. All ...
Heterogeneous memory management for embedded systems
CASES '01: Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systemsThis paper presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in micro-controllers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, ...
Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems
Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the ...
Comments