skip to main content
article

Data and memory optimization techniques for embedded systems

Authors Info & Claims
Published:01 April 2001Publication History
Skip Abstract Section

Abstract

We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation.

We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.

References

  1. AGARWAL, A., KRANTZ, D., AND NATARANJAN, V. 1995. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 6, 9 (Sept.), 943-962.]] Google ScholarGoogle Scholar
  2. AHMAD,I.AND CHEN, C. Y. R. 1991. Post-processor for data path synthesis using multiport memories. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '91, Santa Clara, CA, Nov. 11-14). IEEE Computer Society Press, Los Alamitos, CA, 276-279.]]Google ScholarGoogle Scholar
  3. AHO, A., SETHI, R., AND ULLMAN, J. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA.]] Google ScholarGoogle Scholar
  4. AMARASINGHE, S., ANDERSON, J., LAM, M., AND TSENG, C.-W. 1995. An overview of the suif compiler for scalable parallel machines. In Proceedings of the SIAM Conference on Parallel Processing for Scientific Computing (San Francisco, CA, Feb.). SIAM, Philadelphia, PA.]]Google ScholarGoogle Scholar
  5. BAJWA,R.S.,HIRAKI, M., KOJIMA, H., GORNY,D.J.,NITTA, K., SHRIDHAR, A., SEKI, K., AND SASAKI, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Trans. Very Large Scale Integr. Syst. 5, 4, 417-424.]] Google ScholarGoogle Scholar
  6. BAKSHI,S.AND GAJSKI, D. D. 1995. A memory selection algorithm for high-performance pipelines. In Proceedings of the European Conference EURO-DAC '95 with EURO-VHDL '95 on Design Automation (Brighton, UK, Sept. 18-22), G. Musgrave, Ed. IEEE Computer Society Press, Los Alamitos, CA, 124-129.]] Google ScholarGoogle Scholar
  7. BALAKRISHNAN, M., BANERJI,D.K.,MAJUMDAR,A.K.,LINDERS,J.G.,AND MAJITHIA,J. C. 1990. Allocation of multiport memories in data path synthesis. IEEE Trans. Comput.-Aided Des. 7, 4 (Apr.), 536-540.]]Google ScholarGoogle Scholar
  8. BALASA, F., CATTHOOR, F., AND DE MAN, H. 1994. Dataflow-driven memory allocation for multi-dimensional signal processing systems. In Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design (ICCAD '94, San Jose, CA, Nov. 6-10), J. A. G. Jess and R. Rudell, Eds. IEEE Computer Society Press, Los Alamitos, CA, 31-34.]] Google ScholarGoogle Scholar
  9. BALASA, F., CATTHOOR, F., AND DE MAN, H. 1995. Background memory area estimation for multidimensional signal processing systems. IEEE Trans. Very Large Scale Integr. Syst. 3, 2 (June), 157-172.]] Google ScholarGoogle Scholar
  10. BANERJEE, P., CHANDY, J., GUPTA, M., HODGES, E., HOLM, J., LAIN, A., PALERMO, D., RA- MASWAMY, S., AND SU, E. 1995. The paradigm compiler for distributed-memory multicomputers. IEEE Computer 28, 10 (Oct.), 37-47.]] Google ScholarGoogle Scholar
  11. BANERJEE, U. 1998. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Hingham, MA.]] Google ScholarGoogle Scholar
  12. BANERJEE, U., EIGENMANN, R., NICOLAU, A., AND PADUA, D. A. 1993. Automatic program parallelization. Proc. IEEE 81, 2 (Feb.), 211-243.]]Google ScholarGoogle Scholar
  13. BELLAS, N., HAJJ,I.N.,POLYCHRONOPOULOS,C.D.,AND STAMOULIS, G. 2000. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. Very Large Scale Integr. Syst. 8, 3 (June), 317-326.]] Google ScholarGoogle Scholar
  14. BENINI,L.AND DE MICHELI, G. 2000. System-level power optimization techniques and tools. ACM Trans. Des. Autom. Electron. Syst. 5, 2 (Apr.), 115-192.]] Google ScholarGoogle Scholar
  15. BENINI, L., DE MICHELI, G., MACII, E., PONCINO, M., AND QUER, S. 1998a. Power optimization of core-based systems by address bus encoding. IEEE Trans. Very Large Scale Integr. Syst. 6, 4, 554-562.]] Google ScholarGoogle Scholar
  16. BENINI, L., DE MICHELI, G., MACII, E., SCIUTO, D., AND SILVANO, C. 1998b. Address bus encoding techniques for system-level power optimization. In Proceedings of the Conference on Design, Automation and Test in Europe 98. 861-866.]] Google ScholarGoogle Scholar
  17. BENINI, L., MACII, A., AND PONCINO, M. 2000. A recursive algorithm for low-power memory partitioning. In Proceedings of the IEEE International Symposium on Low Power Design (Rapallo, Italy, Aug.). IEEE Computer Society Press, Los Alamitos, CA, 78-83.]] Google ScholarGoogle Scholar
  18. BROCKMEYER, E., VANDECAPPELLE, A., AND CATTHOOR, F. 2000a. Systematic cycle budget versus system power trade-off: a new perspective on system exploration of real-time datadominated applications. In Proceedings of the IEEE International Symposium on Low Power Design (Rapallo, Italy, Aug.). IEEE Computer Society Press, Los Alamitos, CA, 137-142.]] Google ScholarGoogle Scholar
  19. BROCKMEYER, E., WUYTACK, S., VANDECAPPELLE, A., AND CATTHOOR, F. 2000b. Low power storage cycle budget tool support for hierarchical graphs. In Proceedings of the 13th ACM/IEEE International Symposium on System-Level Synthesis (Madrid, Sept). ACM Press, New York, NY, 20-22.]] Google ScholarGoogle Scholar
  20. CATTHOOR, F., DANCKAERT, K., KULKARNI, C., AND OMNES, T. 2000. Data transfer and storage architecture issues and exploration in multimedia processors. In Programmable Digital Signal Processors: Architecture, Programming, and Applications, Y. H. Yu, Ed. Marcel Dekker, Inc., New York, NY.]]Google ScholarGoogle Scholar
  21. CATTHOOR, F., JANSSEN, M., NACHTERGAELE, L., AND MAN, H. D. 1996. System-level dataflow transformations for power reduction in image and video processing. In Proceedings of the International Conference on Electronic Circuits and Systems on Electronic Circuits and Systems (Oct.). 1025-1028.]]Google ScholarGoogle Scholar
  22. CATTHOOR, F., WUYTACK, S., DE GREEF, E., BALASA, F., NACHTERGAELE, L., AND VANDECAPPELLE, A. 1998. Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic, Dordrecht, Netherlands.]] Google ScholarGoogle Scholar
  23. CATTHOOR, F., FRANSSEN, F., WUYTACK, S., NACHTERGAELE, L., AND DE MAN, H. 1994. Global communication and memory optimizing transformations for low power systems. In Proceed-ings of the International Workshop on Low Power Design. 203-208.]]Google ScholarGoogle Scholar
  24. CHAITIN, G., AUSLANDER, M., CHANDRA, A., COCKE, J., HOPKINS, M., AND MARKSTEIN, P. 1981. Register allocation via coloring. Comput. Lang. 6, 1, 47-57.]]Google ScholarGoogle Scholar
  25. CHANG, H.-K AND LIN, Y.-L. 2000. Array allocation taking into account SDRAM characteristics. In Proceedings of the Asia and South Pacific Conference on Design Automation (Yokohama, Jan.). 497-502.]] Google ScholarGoogle Scholar
  26. CHEN, T.-S. AND SHEU, J.-P. 1994. Communication-free data allocation techniques for parallelizing compilers on multicomputers. IEEE Trans. Parallel Distrib. Syst. 5, 9 (Sept.), 924-938.]] Google ScholarGoogle Scholar
  27. CIERNIAK,M.AND LI, W. 1995. Unifying data and control transformations for distributed shared-memory machines. SIGPLAN Not. 30, 6 (June), 205-217.]] Google ScholarGoogle Scholar
  28. CRUZ, J.-L., GONZALEZ, A., VALERO, M., AND TOPHAM, N. 2000. Multiple-banked register file architectures. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27, Vancouver, B.C., June). ACM, New York, NY, 315-325.]] Google ScholarGoogle Scholar
  29. CUPPU, V., JACOB,B.L.,DAVIS, B., AND MUDGE, T. N. 1999. A performance comparison of contemporary dram architectures. In Proceedings of the International Symposium on Computer Architecture (Atlanta, GA, May). 222-233.]] Google ScholarGoogle Scholar
  30. DA SILVA,J.L.,CATTHOOR, F., VERKEST, D., AND DE MAN, H. 1998. Power exploration for dynamic data types through virtual memory management refinement. In Proceedings of the 1998 International Symposium on Low Power Electronics and Design (ISLPED '98, Monterey, CA, Aug. 10-12), A. Chandrakasan and S. Kiaei, Chairs. ACM Press, New York, NY, 311-316.]] Google ScholarGoogle Scholar
  31. DANCKAERT, K., CATTHOOR, F., AND MAN, H. D. 1996. System-level memory management for weakly parallel image processing. In Proceedings of the Conference on EuroPar'96 Parallel Processing (Lyon, France, Aug.). Springer-Verlag, New York, NY, 217-225.]] Google ScholarGoogle Scholar
  32. DANCKAERT, K., CATTHOOR, F., AND MAN, H. D. 1999. Platform independent data transfer and storage exploration illustrated on a parallel cavity detection algorithm. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA99). 1669-1675.]]Google ScholarGoogle Scholar
  33. DANCKAERT, K., CATTHOOR, F., AND MAN, H. D. 2000. A preprocessing step for global loop transformations for data transfer and storage optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (San Jose CA, Nov.).]] Google ScholarGoogle Scholar
  34. DARTE, A., RISSET, T., AND ROBERT, Y. 1993. Loop nest scheduling and transformations. In Environments and Tools for Parallel Scientific Computing, J. J. Dongarra and B. Tou-rancheau, Eds. Elsevier Advances in parallel computing series. Elsevier Sci. Pub. B. V., Amsterdam, The Netherlands, 309-332.]] Google ScholarGoogle Scholar
  35. DARTE,A.AND ROBERT, Y. 1995. Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. J. Parallel Distrib. Comput. 29, 1 (Aug. 15), 43-59.]] Google ScholarGoogle Scholar
  36. DIGUET,J.PH., WUYTACK, S., CATTHOOR, F., AND DE MAN, H. 1997. Formalized methodology for data reuse exploration in hierarchical memory mappings. In Proceedings of the 1997 International Symposium on Low Power Electronics and Design (ISLPED '97, Monterey, CA, Aug. 18-20), B. Barton, M. Pedram, A. Chandrakasan, and S. Kiaei, Chairs. ACM Press, New York, NY, 30-35.]] Google ScholarGoogle Scholar
  37. DING,C.AND KENNEDY, K. 2000. The memory bandwidth bottleneck and its amelioration by a compiler. In Proceedings of the International Symposium on Parallel and Distributed Processing (Cancun, Mexico, May). 181-189.]] Google ScholarGoogle Scholar
  38. DE GREEF,E.AND CATTHOOR, F. 1996. Reducing storage size for static control programs mapped onto parallel architectures. In Proceedings of the Dagstuhl Seminar on Loop Parallelisation (Schloss Dagstuhl, Germany, Apr.).]]Google ScholarGoogle Scholar
  39. FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parallel Program. 20, 1, 23-53.]]Google ScholarGoogle Scholar
  40. FEAUTRIER, P. 1995. Compiling for massively parallel architectures: A perspective. Microprocess. Microprogram. 41, 5-6 (Oct.), 425-439.]] Google ScholarGoogle Scholar
  41. FRABOULET, A., HUARD, G., AND MIGNOTTE, A. 1999. Loop alignment for memory access optimisation. In Proceedings of the 12th ACM/IEEE International Symposium on System-Level Synthesis (San Jose CA, Dec.). ACM Press, New York, NY, 70-71.]] Google ScholarGoogle Scholar
  42. FRANSSEN, F., BALASA, F., VAN SWAAIJ, M., CATTHOOR, F., AND MAN, H. D. 1993. Modeling multi-dimensional data and control flow. IEEE Trans. Very Large Scale Integr. Syst. 1,3 (Sept.), 319-327.]]Google ScholarGoogle Scholar
  43. FRANSSEN, F., NACHTERGAELE, L., SAMSOM, H., CATTHOOR, F., AND MAN, H. D. 1994. Control flow optimization for fast system simulation and storage minimization. In Proceedings of the International Conference on Design and Test (Paris, Feb.). 20-24.]]Google ScholarGoogle Scholar
  44. GAJSKI, D., DUTT, N., LIN, S., AND WU, A. 1992. High Level Synthesis: Introduction to Chip and System Design. Kluwer Academic Publishers, Hingham, MA.]] Google ScholarGoogle Scholar
  45. GAREY,M.R.AND JOHNSON, D. S. 1979. Computers and Intractibility - A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., New York, NY.]] Google ScholarGoogle Scholar
  46. GHEZ, C., MIRANDA, M., VANDECAPPELLE, A., CATTHOOR, F., AND VERKEST, D. 2000. Systematic high-level address code transformations for piece-wise linear indexing: illustration on a medical imaging algorithm. In Proceedings of the IEEE Workshop on Signal Processing Systems (Lafayette, LA, Oct.). IEEE Press, Piscataway, NJ, 623-632.]]Google ScholarGoogle Scholar
  47. GONZALEZ, A., ALIAGAS, C., AND VALERO, M. 1995. A data cache with multiple caching strategies tuned to different types of locality. In Proceedings of the 9th ACM International Conference on Supercomputing (ICS '95, Barcelona, Spain, July 3-7), M. Valero, Chair. ACM Press, New York, NY, 338-347.]] Google ScholarGoogle Scholar
  48. GOOSSENS, G., VANDEWLLE, J., AND DE MAN, H. 1989. Loop optimization in register-transfer scheduling for DSP-systems. In Proceedings of the 26th ACM/IEEE Conference on Design Automation (DAC '89, Las Vegas, NV, June 25-29), D. E. Thomas, Ed. ACM Press, New York, NY, 826-831.]] Google ScholarGoogle Scholar
  49. GRANT,D.AND DENYER, P. B. 1991. Address generation for array access based on modulus m counters. In Proceedings of the European Conference on Design Automation (EDAC, Feb.). 118-123.]] Google ScholarGoogle Scholar
  50. GRANT, D., DENYER,P.B.,AND FINLAY, I. 1989. Synthesis of address generators. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '89, Santa Clara, CA, Nov.). ACM Press, New York, NY, 116-119.]]Google ScholarGoogle Scholar
  51. GRANT,D.M.,MEERBERGEN,J.V.,AND LIPPENS, P. E. R. 1994. Optimization of address generator hardware. In Proceedings of the 1994 Conference on European Design and Test (Paris, France, Feb.). 325-329.]]Google ScholarGoogle Scholar
  52. GREEF,E.D.,CATTHOOR, F., AND MAN, H. D. 1995. Memory organization for video algorithms on programmable signal processors. In Proceedings of the IEEE International Conference on Computer Design (ICCD '95, Austin TX, Oct.). IEEE Computer Society Press, Los Alamitos, CA, 552-557.]] Google ScholarGoogle Scholar
  53. GREEF,E.D.,CATTHOOR, F., AND MAN, H. D. 1997. Array placement for storage size reduction in embedded multimedia systems. In Proceedings of the International Conference on Applic.-Spec./Array Processors (Zurich, July). 66-75.]] Google ScholarGoogle Scholar
  54. GRUN, P., BALASA, F., AND DUTT, N. 1998. Memory size estimation for multimedia applications. In Proceedings of the Sixth International Workshop on Hardware/Software Codesign (CODES/CASHE '98, Seattle, WA, Mar. 15-18), G. Borriello, A. A. Jerraya, and L. Lavagno, Chairs. IEEE Computer Society Press, Los Alamitos, CA, 145-149.]] Google ScholarGoogle Scholar
  55. GRUN, P., DUTT, N., AND NICOLAU, A. 2000a. Memory aware compilation through accurate timing extraction. In Proceedings of the Conference on Design Automation (Los Angeles, CA, June). ACM Press, New York, NY, 316-321.]] Google ScholarGoogle Scholar
  56. GRUN, P., DUTT, N., AND NICOLAU, A. 2000b. MIST: An algorithm for memory miss traffic management. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (San Jose, CA, Nov.). ACM Press, New York, NY, 431-437.]] Google ScholarGoogle Scholar
  57. GRUN, P., DUTT, N., AND NICOLAU, A. 2001. Access pattern based local memory customization for low power embedded systems. In Proceedings of the Conference on Design, Automation, and Test in Europe (Munich, Mar.).]] Google ScholarGoogle Scholar
  58. GUPTA, M., SCHONBERG, E., AND SRINIVASAN, H. 1996. A unified framework for optimizing communication in data-parallel programs. IEEE Trans. Parallel Distrib. Syst. 7,7, 689-704.]] Google ScholarGoogle Scholar
  59. GUPTA, S., MIRANDA, M., CATTHOOR, F., AND GUPTA, R. 2000. Analysis of high-level address code transformations for programmable processors. In Proceedings of the 3rd ACM/IEEE Conference on Design and Test in Europe (Mar.). ACM Press, New York, NY, 9-13.]] Google ScholarGoogle Scholar
  60. HALAMBI, A., GRUN, P., GANESH, V., KHARE, A., DUTT, N., AND NICOLAU, A. 1999a. Expression: A language for architecture exploration through compiler/simulator retargetability. In Proceedings of the Conference on DATE (Munich, Mar.).]] Google ScholarGoogle Scholar
  61. HALAMBI, A., GRUN, P., TOMIYAMA, H., DUTT, N., AND NICOLAU, A. 1999b. Automatic software toolkit generation for embedded systems-on-chip. In Proceedings of the Conference on ICVC.]]Google ScholarGoogle Scholar
  62. HALL,M.W.,HARVEY,T.J.,KENNEDY, K., MCINTOSH, N., MCKINLEY,K.S.,OLDHAM,J.D., PALECZNY,M.H.,AND ROTH, G. 1993. Experiences using the ParaScope Editor: an interactive parallel programming tool. SIGPLAN Not. 28, 7 (July), 33-43.]] Google ScholarGoogle Scholar
  63. HALL, M., ANDERSON, J., AMARASINGHE, S., MURPHY, B., LIAO, S., BUGNION, E., AND LAM,M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer 29, 12 (Dec.), 84-89.]] Google ScholarGoogle Scholar
  64. HENNESSY,J.L.AND PATTERSON, D. A. 1996. Computer Architecture: A Quantitative Approach. 2nd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA.]] Google ScholarGoogle Scholar
  65. HUANG, C.-Y., CHEN, Y.-S., LIN, Y.-L., AND HSU, Y.-C. 1990. Data path allocation based on bipartite weighted matching. In Proceedings of the 27th ACM/IEEE Conference on Design Automation (DAC '90, Orlando, FL, June 24-28), R. C. Smith, Chair. ACM Press, New York, NY, 499-504.]] Google ScholarGoogle Scholar
  66. ISO/IEC MOVING PICTURE EXPERTS GROUP. 2001. The MPEG Home Page (http://www.cselt.it/ mpeg/)11.]]Google ScholarGoogle Scholar
  67. ITOH, K., SASAKI, K., AND NAKAGOME, Y. 1995. Trends in low-power RAM circuit technologies. Proc. IEEE 83, 4 (Apr.), 524-543.]]Google ScholarGoogle Scholar
  68. JHA,P.K.AND DUTT, N. 1997. Library mapping for memories. In Proceedings of the Conference on European Design and Test (Mar.). 288-292.]] Google ScholarGoogle Scholar
  69. JOUPPI, N. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th International Symposium on Computer Architecture (ISCA '90, Seattle, WA, May). IEEE Press, Piscat-away, NJ, 364-373.]] Google ScholarGoogle Scholar
  70. KANDEMIR, M., VIJAYKRISHNAN, N., IRWIN,M.J.,AND YE, W. 2000. Influence of compiler optimisations on system power. In Proceedings of the Conference on Design Automation (Los Angeles, CA, June). ACM Press, New York, NY, 304-307.]] Google ScholarGoogle Scholar
  71. KARCHMER,D.AND ROSE, J. 1994. Definition and solution of the memory packing problem for field-programmable systems. In Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design (ICCAD '94, San Jose, CA, Nov. 6-10), J. A. G. Jess and R. Rudell, Eds. IEEE Computer Society Press, Los Alamitos, CA, 20-26.]] Google ScholarGoogle Scholar
  72. KELLY,W.AND PUGH, W. 1992. Generating schedules and code within a unified reordering transformation framework. UMIACS-TR-92-126. University of Maryland at College Park, College Park, MD.]] Google ScholarGoogle Scholar
  73. KHARE, A., PANDA,P.R.,DUTT,N.D.,AND NICOLAU, A. 1999. High-level synthesis with SDRAMs and RAMBUS DRAMs. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E82-A, 11 (Nov.), 2347-2355.]]Google ScholarGoogle Scholar
  74. KIM,T.AND LIU, C. L. 1993. Utilization of multiport memories in data path synthesis. In Proceedings of the 30th ACM/IEEE International Conference on Design Automation (DAC '93, Dallas, TX, June 14-18), A. E. Dunlop, Ed. ACM Press, New York, NY, 298-302.]] Google ScholarGoogle Scholar
  75. KIROVSKI, D., LEE, C., POTKONJAK, M., AND MANGIONE-SMITH, W. 1999. Application-driven synthesis of memory-intensive systems-on-chip. IEEE Trans. Comput.-Aided Des. 18,9 (Sept.), 1316-1326.]]Google ScholarGoogle Scholar
  76. KJELDSBERG,P.G.,CATTHOOR, F., AND AAS, E. J. 2000a. Automated data dependency size estimation with a partially fixed execution ordering. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (San Jose, CA, Nov.). ACM Press, New York, NY, 44-50.]] Google ScholarGoogle Scholar
  77. KJELDSBERG,P.G.,CATTHOOR,, F., AND AAS, E. J. 2000b. Storage requirement estimation for data-intensive applications with partially fixed execution ordering. In Proceedings of the ACM/IEEE Workshop on Hardware/Software Co-Design (San Diego CA, May). ACM Press, New York, NY, 56-60.]] Google ScholarGoogle Scholar
  78. KOHAVI, Z. 1978. Switching and Finite Automata Theory. McGraw-Hill, Inc., New York, NY.]] Google ScholarGoogle Scholar
  79. KOLSON,D.J.,NICOLAU, A., AND DUTT, N. 1994. Minimization of memory traffic in high-level synthesis. In Proceedings of the 31st Annual Conference on Design Automation (DAC '94, San Diego, CA, June 6-10), M. Lorenzetti, Chair. ACM Press, New York, NY, 149-154.]] Google ScholarGoogle Scholar
  80. KRAMER,H.AND MULLER, J. 1992. Assignment of global memory elements for multi-process vhdl specifications. In Proceedings of the International Conference on Computer Aided Design. 496-501.]] Google ScholarGoogle Scholar
  81. KULKARNI, C., CATTHOOR, F., AND MAN, H. D. 1999. Cache transformations for low power caching in embedded multimedia processors. In Proceedings of the International Symposium on Parallel Processing (Orlando, FL, Apr.). 292-297.]] Google ScholarGoogle Scholar
  82. KULKARNI, C., CATTHOOR, F., AND MAN, H. D. 2000. Advanced data layout organization for multi-media applications. In Proceedings of the Workshop on Parallel and Distributed Computing in Image Processing, Video Processing, and Multimedia (PDIVM 2000, Cancun, Mexico, May).]] Google ScholarGoogle Scholar
  83. KULKARNI,D.AND STUMM, M. 1995. Linear loop transformations in optimizing compilers for parallel machines. Aust. Comput. J. 27, 2 (May), 41-50.]]Google ScholarGoogle Scholar
  84. KURDAHI,F.J.AND PARKER, A. C. 1987. REAL: A program for REgister ALlocation. In Proceedings of the 24th ACM/IEEE Conference on Design Automation (DAC '87, Miami Beach, FL, June 28-July 1), A. O'Neill and D. Thomas, Eds. ACM Press, New York, NY, 210-215.]] Google ScholarGoogle Scholar
  85. LEE, H.-D. AND HWANG, S.-Y. 1995. A scheduling algorithm for multiport memory minimization in datapath synthesis. In Proceedings of the Conference on Asia Pacific Design Automation (CD-ROM) (ASP-DAC '95, Makuhari, Japan, Aug. 29-Sept. 4), I. Shirakawa, Chair. ACM Press, New York, NY, 93-100.]] Google ScholarGoogle Scholar
  86. LEFEBVRE,V.AND FEAUTRIER, P. 1997. Optimizing storage size for static control programs in automatic parallelizers. In Proceedings of the Conference on EuroPar. Springer-Verlag, New York, NY, 356-363.]] Google ScholarGoogle Scholar
  87. LEUPERS,R.AND MARWEDEL, P. 1996. Algorithms for address assignment in DSP code generation. In Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design (ICCAD '96, San Jose, CA, Nov. 10-14), R. A. Rutenbar and R. H. J. M. Otten, Chairs. IEEE Computer Society Press, Los Alamitos, CA, 109-112.]] Google ScholarGoogle Scholar
  88. LI,W.AND PINGALI, K. 1994. A singular loop transformation framework based on non-singular matrices. Int. J. Parallel Program. 22, 2 (Apr.), 183-205.]] Google ScholarGoogle Scholar
  89. LI,Y.AND HENKEL, J.-R. 1998. A framework for estimation and minimizing energy dissipation of embedded HW/SW systems. In Proceedings of the 35th Annual Conference on Design Automation (DAC '98, San Francisco, CA, June 15-19), B. R. Chawla, R. E. Bryant, and J. M. Rabaey, Chairs. ACM Press, New York, NY, 188-193.]] Google ScholarGoogle Scholar
  90. LI,Y.AND WOLF, W. 1998. Hardware/software co-synthesis with memory hierarchies. In Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design (ICCAD '98, San Jose, CA, Nov. 8-12), H. Yasuura, Chair. ACM Press, New York, NY, 430-436.]] Google ScholarGoogle Scholar
  91. LIEM, C., PAULIN, P., AND JERRAYA, A. 1996. Address calculation for retargetable compilation and exploration of instruction-set architectures. In Proceedings of the 33rd Annual Conference on Design Automation (DAC '96, Las Vegas, NV, June 3-7), T. P. Pennino and E. J. Yoffa, Chairs. ACM Press, New York, NY, 597-600.]] Google ScholarGoogle Scholar
  92. LOVEMAN, D. B. 1977. Program improvement by source-to-source transformation. J. ACM 24, 1 (Jan.), 121-145.]] Google ScholarGoogle Scholar
  93. LY, T., KNAPP, D., MILLER, R., AND MACMILLEN, D. 1995. Scheduling using behavioral templates. In Proceedings of the 32nd ACM/IEEE Conference on Design Automation (DAC '95, San Francisco, CA, June 12-16), B. T. Preas, Ed. ACM Press, New York, NY, 101-106.]] Google ScholarGoogle Scholar
  94. MANJIAKIAN,N.AND ABDELRAHMAN, T. 1995. Fusion of loops for parallelism and locality. Tech. Rep. CSRI-315. Dept. of Computer Science, University of Toronto, Toronto, Ont., Canada.]]Google ScholarGoogle Scholar
  95. MASSELOS, K., CATTHOOR, F., GOUTIS,C.E.,AND MAN, H. D. 1999a. A performance oriented use methodology of power optimizing code transformations for multimedia applications realized on programmable multimedia processors. In Proceedings of the IEEE Workshop on Signal Processing Systems (Taipeh, Taiwan). IEEE Computer Society Press, Los Alamitos, CA, 261-270.]]Google ScholarGoogle Scholar
  96. MASSELOS, K., DANCKAERT, K., CATTHOOR, F., GOUTIS,C.E.,AND DEMAN, H. 1999b. A methodology for power efficient partitioning of data-dominated algorithm specifications within performance constraints. In Proceedings of the IEEE International Symposium on Low Power Design (San Diego CA, Aug.). IEEE Computer Society Press, Los Alamitos, CA, 270-272.]] Google ScholarGoogle Scholar
  97. MCFARLING, S. 1989. Program optimization for instruction caches. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III, Boston, MA, Apr. 3-6), J. Emer, Chair. ACM Press, New York, NY, 183-191.]] Google ScholarGoogle Scholar
  98. MCKINLEY, K. S. 1998. A compiler optimization algorithm for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 9, 8, 769-787.]] Google ScholarGoogle Scholar
  99. MCKINLEY,K.S.,CARR, S., AND TSENG, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18, 4 (July), 424-453.]] Google ScholarGoogle Scholar
  100. MENG, T., GORDON, B., TSENG, E., AND HUNG, A. 1995. Portable video-on-demand in wireless communication. Proc. IEEE 83, 4 (Apr.), 659-690.]]Google ScholarGoogle Scholar
  101. MIRANDA, M., CATTHOOR, F., AND MAN, H. D. 1994. Address equation optimization and hardware sharing for real-time signal processing applications. In Proceedings of the IEEE Workshop on VLSI Signal Processing VII (La Jolla, CA, Oct. 26-28). IEEE Press, Piscat-away, NJ, 208-217.]]Google ScholarGoogle Scholar
  102. MIRANDA,M.A.,CATTHOOR,F.V.M.,JANSSEN, M., AND DE MAN, H. J. 1998. High-level address optimization and synthesis techniques for data-transfer-intensive applications. IEEE Trans. Very Large Scale Integr. Syst. 6, 4, 677-686.]] Google ScholarGoogle Scholar
  103. MISHRA, P., GRUN, P., DUTT, N., AND NICOLAU, A. 2001. Processor-memory co-exploration driven by a memory-aware architecture description language. In Proceedings of the Conference on VLSIDesign (Bangalore).]] Google ScholarGoogle Scholar
  104. MOWRY,T.C.,LAM,M.S.,AND GUPTA, A. 1992. Design and evaluation of a compiler algorithm for prefetching. SIGPLAN Not. 27, 9 (Sept.), 62-73.]] Google ScholarGoogle Scholar
  105. MUSOLL, E., LANG, T., AND CORTADELLA, J. 1998. Working-zone encoding for reducing the energy in microprocessor address buses. IEEE Trans. Very Large Scale Integr. Syst. 6,4, 568-572.]] Google ScholarGoogle Scholar
  106. NEERACHER,M.AND RUHL, R. 1993. Automatic parallelization of linpack routines on distributed memory parallel processors. In Proceedings of the IEEE International Symposium on Parallel Processing (Newport Beach CA, Apr.). IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  107. NICOLAU,A.AND NOVACK, S. 1993. Trailblazing: A hierarchical approach to percolation scheduling. In Proceedings of the International Conference on Parallel Processing: Software (Boca Raton, FL, Aug.). CRC Press, Inc., Boca Raton, FL, 120-124.]] Google ScholarGoogle Scholar
  108. PADUA,D.A.AND WOLFE, M. J. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec.), 1184-1201.]] Google ScholarGoogle Scholar
  109. PANDA, P. R. 1999. Memory bank customization and assignment in behavioral synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (San Jose, CA, Nov.). IEEE Computer Society Press, Los Alamitos, CA, 477-481.]] Google ScholarGoogle Scholar
  110. PANDA,P.AND DUTT, N. 1999. Low-power memory mapping through reducing address bus activity. IEEE Trans. Very Large Scale Integr. Syst. 7, 3 (Sept.), 309-320.]] Google ScholarGoogle Scholar
  111. PANDA,P.R.,DUTT,N.D.,AND NICOLAU, A. 1997. Memory data organization for improved cache performance in embedded processor applications. ACM Trans. Des. Autom. Electron. Syst. 2, 4, 384-409.]] Google ScholarGoogle Scholar
  112. PANDA,P.R.,DUTT,N.D.,AND NICOLAU, A. 1998. Incorporating DRAM access modes into high-level synthesis. IEEE Trans. Comput.-Aided Des. 17, 2 (Feb.), 96-109.]]Google ScholarGoogle Scholar
  113. PANDA,P.R.,DUTT,N.D.,AND NICOLAU, A. 1999a. Local memory exploration and optimization in embedded systems. IEEE Trans. Comput.-Aided Des. 18, 1 (Jan.), 3-13.]]Google ScholarGoogle Scholar
  114. PANDA,P.R.,DUTT,N.D.,AND NICOLAU, A. 1999b. Memory Issues in Embedded Systems-On-Chip: Optimizations and Exploration. Kluwer Academic Publishers, Hingham, MA.]] Google ScholarGoogle Scholar
  115. PANDA,P.R.,DUTT,N.D.,AND NICOLAU, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July), 682-704.]] Google ScholarGoogle Scholar
  116. PARHI, K. 1989. Rate-optimal fully-static multiprocessor scheduling of data-flow signal processing programs. In Proceedings of the IEEE International Symposium on Circuits and Systems (Portland, OR, May). IEEE Press, Piscataway, NJ, 1923-1928.]]Google ScholarGoogle Scholar
  117. PASSOS,N.AND SHA, E. 1994. Full parallelism of uniform nested loops by multi-dimensional retiming. In Proceedings of the 1994 International Conference on Parallel Processing (Aug.). CRC Press, Inc., Boca Raton, FL, 130-133.]] Google ScholarGoogle Scholar
  118. PASSOS, N., SHA, E., AND CHAO, L.-F. 1995. Multi-dimensional interleaving for time-andmemory design optimization. In Proceedings of the IEEE International Conference on Computer Design (Austin TX, Oct.). IEEE Computer Society Press, Los Alamitos, CA, 440-445.]] Google ScholarGoogle Scholar
  119. PAUWELS, M., CATTHOOR, F., LANNEER, D., AND MAN, H. D. 1989. Type-handling in bit-true silicon compilation for dsp. In Proceedings of the European Conference on Circuit Theory and Design (Brighton, U.K., Sept.). 166-170.]]Google ScholarGoogle Scholar
  120. POLYCHRONOPOULOS, C. D. 1988. Compiler optimizations for enhancing parallelism and their impact in architecture design. IEEE Trans. Comput. 37, 8 (Aug.), 991-1004.]] Google ScholarGoogle Scholar
  121. PUGH,W.AND WONNACOTT, D. 1993. An evaluation of exact methods for analysis of value-based array data dependences. In Proceedings of the 6th Workshop on Programming Languages and Compilers for Parallel Computing (Portland OR). 546-566.]] Google ScholarGoogle Scholar
  122. QUILLERE,F.AND RAJOPADHYE, S. 1998. Optimizing memory usage in the polyhedral mode. In Proceedings of the Conference on Massively Parallel Computer Systems (Apr.).]]Google ScholarGoogle Scholar
  123. RAMACHANDRAN, L., GAJSKI, D., AND CHAIYAKUL, V. 1993. An algorithm for array variable clustering. In Proceedings of the IEEE European Conference on Design Automation (EURO-DAC '93). IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  124. SAGHIR,M.A.R.,CHOW, P., AND LEE, C. G. 1996. Exploiting dual data-memory banks in digital signal processors. ACM SIGOPS Oper. Syst. Rev. 30, 5, 234-243.]] Google ScholarGoogle Scholar
  125. SCHMIT,H.AND THOMAS, D. E. 1997. Synthesis of application-specific memory designs. IEEE Trans. Very Large Scale Integr. Syst. 5, 1, 101-111.]] Google ScholarGoogle Scholar
  126. SCHMIT,H.AND THOMAS, D. E. 1995. Address generation for memories containing multiple arrays. In Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD-95, San Jose, CA, Nov. 5-9), R. Rudell, Ed. IEEE Computer Society Press, Los Alamitos, CA, 510-514.]] Google ScholarGoogle Scholar
  127. SEMERIA, L., SATO, K., AND DE MICHELI, G. 2000. Resolution of dynamic memory allocation and pointers for the behavioral synthesis from C. In Proceedings of the European Conference on Design Automation and Test (DATE 2000, Paris, Mar.). 312-319.]] Google ScholarGoogle Scholar
  128. SHACKLEFORD, B., YASUDA, M., OKUSHI, E., KOIZUMI, H., TOMIYAMA, H., AND YASUURA, H. 1997. Memory-cpu size optimization for embedded system designs. In Proceedings of the 34th Conference on Design Automation (DAC '97, Anaheim, CA, June).]] Google ScholarGoogle Scholar
  129. SHANG, W., HODZIC, E., AND CHEN, Z. 1996. On uniformization of affine dependence algorithms. IEEE Trans. Comput. 45, 7 (July), 827-839.]] Google ScholarGoogle Scholar
  130. SHANG, W., O'KEEFE,M.T.,AND FORTES, J. A. B. 1992. Generalized cycle shrinking. In Proceedings of the International Workshop on Algorithms and Parallel VLSI Architectures II (Gers, France, June 3-6), P. Quinton and Y. Robert, Eds. Elsevier Sci. Pub. B. V., Amsterdam, The Netherlands, 131-144.]] Google ScholarGoogle Scholar
  131. SHIUE,W.AND CHAKRABARTI, C. 1999. Memory exploration for low power, embedded systems. In Proceedings of the 36th ACM/IEEE Conference on Design Automation (New Orleans LA, June). ACM Press, New York, NY, 140-145.]] Google ScholarGoogle Scholar
  132. SHIUE, W.-T., TADAS, S., AND CHAKRABARTI, C. 2000. Low power multi-module, multiport memory design for embedded systems. In Proceedings of the IEEE Workshop on Signal Processing Systems (Lafayette, LA, Oct.). IEEE Press, Piscataway, NJ, 529-538.]]Google ScholarGoogle Scholar
  133. SLOCK, P., WUYTACK, S., CATTHOOR, F., AND DE JONG, G. 1997. Fast and extensive system-level memory exploration for ATM applications. In Proceedings of the Tenth International Symposium on System Synthesis (ISSS '97, Antwerp, Belgium, Sept. 17-19), F. Vahid and F. Catthoor, Chairs. IEEE Computer Society Press, Los Alamitos, CA, 74-81.]] Google ScholarGoogle Scholar
  134. STAN,M.R.AND BURLESON, W. P. 1995. Bus-invert coding for low-power I/O. IEEE Trans. Very Large Scale Integr. Syst. 3, 1 (Mar.), 49-58.]] Google ScholarGoogle Scholar
  135. STOK,L.AND JESS, J. A. G. 1992. Foreground memory management in data path synthesis. Int. J. Circuits Theor. Appl. 20, 3, 235-255.]]Google ScholarGoogle Scholar
  136. SU, C.-L. AND DESPAIN, A. M. 1995. Cache design trade-offs for power and performance optimization: a case study. In Proceedings of the 1995 International Symposium on Low Power Design (ISLPD-95, Dana Point, CA, Apr. 23-26), M. Pedram, R. Brodersen, and K. Keutzer, Eds. ACM Press, New York, NY, 63-68.]] Google ScholarGoogle Scholar
  137. SUDARSANAM,A.AND MALIK, S. 2000. Simultaneous reference allocation in code generation for dual data memory bank asips. ACM Trans. Des. Autom. Electron. Syst. 5, 2 (Apr.), 242-264.]] Google ScholarGoogle Scholar
  138. SYNOPSYS INC. 1997. Behavioral Compiler User Guide. Synopsys Inc, Mountain View, CA.]]Google ScholarGoogle Scholar
  139. THIELE, L. 1989. On the design of piecewise regular processor arrays. In Proceedings of the IEEE International Symposium on Circuits and Systems (Portland, OR, May). IEEE Press, Piscataway, NJ, 2239-2242.]]Google ScholarGoogle Scholar
  140. TOMIYAMA, H., HALAMB, A., GRUN, P., DUTT, N., AND NICOLAU, A. 1999. Architecture description languages for systems-on-chip design. In Proceedings of the 6th Asia Pacific Conference on Chip Design Languages (Fukuoka, Japan, Oct.). 109-116.]]Google ScholarGoogle Scholar
  141. TOMIYAMA, H., ISHIHARA, T., INOUE, A., AND YASUURA, H. 1998. Instruction scheduling for power reduction in processor-based system design. In Proceedings of the Conference on Design, Automation and Test in Europe 98. 855-860.]] Google ScholarGoogle Scholar
  142. TOMIYAMA,H.AND YASUURA, H. 1996. Size-constrained code placement for cache miss rate reduction. In Proceedings of the ACM/IEEE International Symposium on System Synthesis (La Jolla, CA, Nov.). ACM Press, New York, NY, 96-101.]] Google ScholarGoogle Scholar
  143. TOMIYAMA,H.AND YASUURA, H. 1997. Code placement techniques for cache miss rate reduction. ACM Trans. Des. Autom. Electron. Syst. 2, 4, 410-429.]] Google ScholarGoogle Scholar
  144. TSENG,C.AND SIEWIOREK, D. P. 1986. Automated synthesis of data paths in digital systems. IEEE Trans. Comput.-Aided Des. 5, 3 (July), 379-395.]]Google ScholarGoogle Scholar
  145. VANDECAPPELLE, A., MIRANDA, M., CATTHOOR,E.B.F.,AND VERKEST, D. 1999. Global multimedia system design exploration using accurate memory organization feedback. In Proceedings of the 36th ACM/IEEE Conference on Design Automation (New Orleans LA, June). ACM Press, New York, NY, 327-332.]] Google ScholarGoogle Scholar
  146. VERBAUWHEDE, I., CATTHOOR, F., VANDEWALLE, J., AND MAN, H. D. 1989. Background memory management for the synthesis of algebraic algorithms on multi-processor dsp chips. In Proceedings of the IFIP 1989 International Conference on VLSI (IFIP VLSI '89, Munich, Aug.). IFIP, 209-218.]]Google ScholarGoogle Scholar
  147. VERBAUWHEDE,I.M.,SCHEERS,C.J.,AND RABAEY, J. M. 1994. Memory estimation for high level synthesis. In Proceedings of the 31st Annual Conference on Design Automation (DAC '94, San Diego, CA, June 6-10), M. Lorenzetti, Chair. ACM Press, New York, NY, 143-148.]] Google ScholarGoogle Scholar
  148. VERHAEGH, W., LIPPENS, P., AARTS, E., KORST, J., VAN MEERBERGEN, J., AND VAN DER WERF,A. 1995. Improved force-directed scheduling in high-throughput digital signal processing. IEEE Trans. Comput.-Aided Des. 14, 8 (Aug.), 945-960.]]Google ScholarGoogle Scholar
  149. VERHAEGH, W., LIPPENS, P., AARTS, E., MEERBERGEN, J., AND VAN DER WERF, A. 1996. Multi-dimensional periodic scheduling: model and complexity. In Proceedings of the Conference on EuroPar'96 Parallel Processing (Lyon, France, Aug.). Springer-Verlag, New York, NY, 226--235.]] Google ScholarGoogle Scholar
  150. WILSON,P.R.,JOHNSTONE, M., NEELY, M., AND BOLES, D. 1995. Dynamic storage allocation: A survey and critical review. In Proceedings of the International Workshop on Memory Management (Kinross, Scotland, Sept.).]] Google ScholarGoogle Scholar
  151. WOLF,M.E.AND LAM, M. S. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 4 (Oct.), 452-471.]] Google ScholarGoogle Scholar
  152. WOLFE, M. 1991. The tiny loop restructuring tool. In Proceedings of the 1991 International Conference on Parallel Processing (Aug.).]]Google ScholarGoogle Scholar
  153. WOLFE, M. 1996. High-Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA.]] Google ScholarGoogle Scholar
  154. WUYTACK, S., CATTHOOR, F., JONG,G.D.,AND MAN, H. D. 1999a. Minimizing the required memory bandwidth in vlsi system realizations. IEEE Trans. Very Large Scale Integr. Syst. 7, 4 (Dec.), 433-441.]] Google ScholarGoogle Scholar
  155. WUYTACK, S., DA SILVA,J.L.,CATTHOOR, F., JONG,G.D.,AND YKMAN-COUVREU, C. 1999b. Memory management for embedded network applications. IEEE Trans. Comput.-Aided Des. 18, 5 (May), 533-544.]]Google ScholarGoogle Scholar
  156. WUYTACK, S., DIGUET, J.-P., CATTHOOR,F.V.M.,AND DE MAN, H. J. 1998. Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. IEEE Trans. Very Large Scale Integr. Syst. 6, 4, 529-537.]] Google ScholarGoogle Scholar
  157. YKMAN-COUVREUR, C., LAMBRECHT, J., VERKEST, D., CATTHOOR, F., AND MAN, H. D. 1999. Exploration and synthesis of dynamic data sets in telecom network applications. In Proceedings of the 12th ACM/IEEE International Symposium on System-Level Synthesis (San Jose CA, Dec.). ACM Press, New York, NY, 125-130.]] Google ScholarGoogle Scholar
  158. ZHAO,Y.AND MALIK, S. 1999. Exact memory size estimation for array computation without loop unrolling. In Proceedings of the 36th ACM/IEEE Conference on Design Automation (New Orleans LA, June). ACM Press, New York, NY, 811-816.]] Google ScholarGoogle Scholar

Index Terms

  1. Data and memory optimization techniques for embedded systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader