ABSTRACT
One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organization. A poor on-chip memory design can have serious power and performance implications when running data-intensive embedded applications. While it is possible to design an application-specific memory architecture, this may not be the best option, in particular when storage demands of individual processors and/or their data sharing patterns can change from one point in execution to another for the same application. In this paper, we consider dynamic configuration of software-managed on-chip memory space to adapt runtime variations in data storage demand and interprocessor sharing patterns. The proposed framework is fully implemented using an optimizing compiler, a polyhedral tool, and a memory partitioner (based on integer linear programming), and tested using a suite of eight data-intensive embedded applications. Our experimental evaluation indicates that the proposed technique is very effective in practice and leads to much less energy consumption than all the alternate memory management schemes tested, including one that comes up with an application-specific memory.
- S. G. Abraham and S. A. Mahlke. Automatic and Efficient Evaluation of Memory Hierarchies for Embedded Systems. In Proceedings of the 32nd Annual International Symposium on Microarchitecture, Haifa, Israel, November 1999. Google ScholarDigital Library
- S. P. Amarasinghe, J. M. Anderson, M. S. Lam, and C. W. Tseng. The SUIF Compiler for Scalable Parallel Machines. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, February, 1995.Google Scholar
- F. Angiolini, L. Benini, and A. Caprara. Polynomial-Time Algorithm for On-Chip Scratch-Pad Memory Partitioning. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, San Jose, CA, 2003. Google ScholarDigital Library
- U. Banerjee. Loop Parallelization. Kluwer Academic Publishers, 1994. Google ScholarDigital Library
- Y. Cao, H. Tomiyama, T. Okuma, and H. Yasuura. Data Memory Design Considering Effective Bitwidth for Low-Energy Embedded Systems. In Proceedings of the 15th International Symposium on System Synthesis, Kyoto, Japan, October 2002. Google ScholarDigital Library
- F. Catthoor, S. Wuytack, E. D. Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle. Custom Memory Management Methodology -- Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, 1998. Google ScholarDigital Library
- S. Cotterell and F. Vahid. Tuning of Loop Cache Architectures to Programs in Embedded System Design. In Proceedings of the 15th international Symposium on System Synthesis, Kyoto, Japan, October 2002. Google ScholarDigital Library
- F. Gharsalli, S. Meftali, F. Rousseau, and A. A. Jerraya. Automatic Generation of Embedded Memory Wrapper for Multiprocessor SoC. In Proceedings of the 39th Design Automation Conference, New Orleans, Louisiana, 1999. Google ScholarDigital Library
- M. Kandemir and A. Choudhary. Compiler-Directed Scratch-Pad Memory Hierarchy Design and Management. In Proceedings of the Design Automation Conference, New Orleans, LA, June 2002. Google ScholarDigital Library
- W. Kelly and W. Pugh. Finding Legal Reordering Transformations Using Mappings. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. pp. 107--124, 1994. Google ScholarDigital Library
- C. H. Koelbel, D. B. Loveman, and R. S. Schreiber. The High Performance Fortran Handbook. MIT Press, 1993. Google ScholarDigital Library
- V. Krishnan and J. Torrellas. A Chip Multiprocessor Architecture with Speculative Multi-threading. IEEE Transactions on Computers, Special Issue on Multi-threaded Architecture, September 1999. Google ScholarDigital Library
- C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the Last Line of Defense Before Hitting the Memory Wall for CMPs. In Proceedings of the International Symposium on High-Performance Computer Architecture, Madrid, Spain, February 2004. Google ScholarDigital Library
- MAJC-5200. http://www.sun.com/microelectronics/MAJC/5200wp.htmlGoogle Scholar
- S. Meftali, F. Gharsalli, F. Rousseau, and A. A. Jerraya. An Optimal Memory Allocation for Application-Specific Multiprocessor System-on-Chip. In Proceedings of the International Symposium on Systems Synthesis, Montreal, Canada, 2001. Google ScholarDigital Library
- MP98: A Mobile Processor. http://www.labs.nec.co.jp/MP98/top-e.htm.Google Scholar
- B. A. Nayfeh, L. Hammond, and K. Olukotun. Evaluating Alternatives for a Multiprocessor Microprocessor. In Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, PA, 1996. Google ScholarDigital Library
- The OpenMP Application Program Interface. http://www.openmp.org/.Google Scholar
- P. R. Panda and L. Chitturi. An Energy-Conscious Algorithm for Memory Port Allocation. In Proceedings of the 2002 IEEE/ACM International Conference on Computer-Aided Design, San Jose, California, November 2002. Google ScholarDigital Library
- P. R. Panda, N. D. Dutt, and A. Nicolau. Architectural Exploration and Optimization of Local Memory in Embedded Systems. In Proceedings of the 10th international Symposium on System Synthesis, Antwerp, Belgium, September 1997. Google ScholarDigital Library
- A. Ramachandran and M. F. Jacome. Xtream-Fit: An Energy-Delay Efficient Data Memory Subsystem for Embedded Media Processing. In Proceedings of the 40th Design Automation Conference, Anaheim, CA, June 2003. Google ScholarDigital Library
- P. Ranganathan, S. V. Adve, and N. P. Jouppi. Reconfigurable Caches and Their Application to Media Processing. In Proceedings of the International Symposium on Computer Architecture, pages 214--224, 2000. Google ScholarDigital Library
- G. Reinman and N. P. Jouppi. CACTI 2.0: An Integrated Cache Timing and Power Model. Compaq, WRL, Research Report 2000/7, February 2000.Google Scholar
- W.-T. Shiue and C. Chakrabarti. Memory Exploration for Low-Power Embedded Systems. In Proceedings of the 36th Design Automation Conferences, New Orleans, LA, 1999. Google ScholarDigital Library
- G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 2002. Google ScholarDigital Library
- S. Udayakumaran and R. Barua. Compiler-Decided Dynamic Memory Allocation for Scratch-Pad Based Embedded Systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, San Jose, CA, 2003. Google ScholarDigital Library
- Y. Li and W. Wolf. Hardware/Software Co-Synthesis with Memory Hierarchies. IEEE Transactions on Computer-Aided Design of Integrated Circuit and Systems, October 1999. Google ScholarDigital Library
Index Terms
- Dynamic on-chip memory management for chip multiprocessors
Recommendations
Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors
The design and performance of next-generation chip multiprocessors (CMPs) will be bound by the limited amount of power that can be dissipated on a single die. We present photonic networks-on-chip (NoC) as a solution to reduce the impact of intra-chip ...
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
With the number of cores of chip multiprocessors (CMPs) rapidly growing as technology scales down, connecting the different components of a CMP in a scalable and efficient way becomes increasingly challenging. In this article, we explore the ...
Heterogeneous Chip Multiprocessors
Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahl's law. On-chip heterogeneity allows the processor to better match execution resources to ...
Comments