skip to main content
article

An optimal memory allocation scheme for scratch-pad-based embedded systems

Published:01 November 2002Publication History
Skip Abstract Section

Abstract

This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.

References

  1. Appel, A. and George, L. 2001. Optimal spilling for CISC machines with few registers. In Proceedings of the SIGPLAN '01 Conference on Program Language Design and Implementation (Snowbird, UT, June).]] Google ScholarGoogle Scholar
  2. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In 10th International Symposium on Hardware/Software Codesign (CODES) (Estes Park, CO, May 6--8). ACM, New York.]] Google ScholarGoogle Scholar
  3. Barua, R., Lee, W., Amarasinghe, S., and Agarwal, A. 2001. Compiler support for scalable and efficient memory systems. IEEE Trans. Comput., Special Issue on Advances in High Performance Memory Systems (Nov.).]] Google ScholarGoogle Scholar
  4. Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. 47, 9 (Sept.).]]Google ScholarGoogle Scholar
  5. Consortium, T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]]Google ScholarGoogle Scholar
  6. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. Available at http://www.eecs.umich.edu/jringenb/mibench/.]]Google ScholarGoogle Scholar
  7. Hallnor, G., and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA) (Vancouver, B.C., Canada, June).]] Google ScholarGoogle Scholar
  8. Hennessy, J. and Patterson, D. 1996. Computer Architecture A Quantitative Approach. 2nd ed. Morgan-Kaufmann, Palo Alto, Calif.]] Google ScholarGoogle Scholar
  9. Lee, T.-C., Tiwari, V., Malik, S., and Fujita, M. 1997. Power analysis and minimization techniques for embedded DSP software. IEEE Trans. VLSI Syst. (Mar.).]] Google ScholarGoogle Scholar
  10. Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.]]Google ScholarGoogle Scholar
  11. Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for flexible compiler generated data caching. In The 2d Workshop on Intelligent Memory Systems (Boston, MA, Nov. 12).]] Google ScholarGoogle Scholar
  12. CPU12 Reference Manual. Motorola Corporation, 2000. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16_BIT/68HC12_FAMILY/REF_MAT/CPU12RM.pdf.]]Google ScholarGoogle Scholar
  13. M-CORE---MMC2001 Reference Manual. Motorola Corporation, 1998. http://www.motorola. com/SPS/MCORE/info_documentation.htm.]]Google ScholarGoogle Scholar
  14. New York City, Office of Budget and Management. 1999. Website on frequently asked questions on linear programming. http://www.eden.rutgers.edu/∼pil/FAQ.html. New York, NY.]]Google ScholarGoogle Scholar
  15. University of Toronto Digital Signal Processing (UTDSP). 1992. University of Toronto Digital Signal Processing (UTDSP) Benchmark Suite. Available at http://www.eecg.toronto.edu/.]]Google ScholarGoogle Scholar
  16. Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July).]] Google ScholarGoogle Scholar
  17. Paulin, P., Liem, C., Cornero, M., Nacabal, F., and Goossens, G. 1997. Embedded software in real-time signal processing systems: Application and architecture trends. Invited paper, Proc. IEEE 85, 3 (Mar.).]]Google ScholarGoogle Scholar
  18. Rutter, P., Orost, J., and Gloistein, D. BTOA: Binary to printable ASCII converter source code. Available at http://www.bookcase.com/library/software/msdos.devel.lang.c.html.]]Google ScholarGoogle Scholar
  19. Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems. Dec.]]Google ScholarGoogle Scholar
  20. Sjodin, J. and von Platen, C. 2001. Storage allocation for embedded processors. 2001 Compiler and Architecture Support for Embedded Computing Systems. Nov.]] Google ScholarGoogle Scholar
  21. TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.]]Google ScholarGoogle Scholar

Index Terms

  1. An optimal memory allocation scheme for scratch-pad-based embedded systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader