ABSTRACT
High performance general-purpose processors are increasingly being used for a variety of application domains - scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features that use a significant fraction of the on-chip transistors are applicable across these different domains. For example, current processor designs often devote the largest fraction of on-chip transistors (up to 80%) to caches. Many workloads, however, do not make effective use of large caches; e.g., media processing workloads which often have streaming data access patterns and large working sets.
This paper proposes a new reconfigurable cache design. This design enables the cache SRAM arrays to be dynamically divided into multiple partitions that can be used for different processor activities. These activities can benefit applications that would otherwise not use the storage allocated to large conventional caches. Our design involves relatively few modifications to conventional cache design, and analysis using a modification of the CACTI analytical model shows a small impact on cache access time. We evaluate one representative use of reconfigurable caches - instruction reuse for media processing. We find this use gives IPC improvements ranging from 1.04X to 1.20X in simulation across eight media processing benchmarks.
- 1.D. H. Albonesi. Selective Cache Ways: On-Demand Cache Resource Allocation. In Proceedings of the 32nd Annual International Conference on Microarchitecture, 1999. Google ScholarDigital Library
- 2.J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S.-T. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? In Proceedings of the Symposium on Operating System Principles, Oct. 1997. Google ScholarDigital Library
- 3.D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Application-Specific Memory Management of Embedded Systems Using Software-Controlled Caches. Technical Report CSG-Memo 427, Massachussetts Institute of Technology, November 1999.Google Scholar
- 4.K. D. Cooper and T. J. Harvey. Compiler Controlled Memory. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 2-11, 1998. Google ScholarDigital Library
- 5.K. Diefendorff and P. K. Dubey. How Multimedia Workloads Will Change Processor Design. In IEEE Micro, pages 43-45, Sep 1997. Google ScholarDigital Library
- 6.J. Eyre and J. Bier. DSP Processors Hit the Mainstream. IEEE Computer, 1998. Google ScholarDigital Library
- 7.F. Gabbay and A. Mendelson. The Effect of Instruction Fetch Bandwidth on Value Prediction. In Proceedings of the 25th International Symposium on Computer Architecture, pages 272-281, 1998. Google ScholarDigital Library
- 8.D. Grunwald, A. Klauser, S. Manne, and A. Pleszkun. Confidence estimation for speculation control. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 122-131, June 1998. Google ScholarDigital Library
- 9.J. Hennessy. The Future of Systems Research. IEEE Computer, 32(8):27-33, August 1999. Google ScholarDigital Library
- 10.C. Lee et al. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proceedings of the 30th Annual International Conference on Microarchitecture, 1997. Google ScholarDigital Library
- 11.D. C. Lee, P. J. Crowley, J.-L. Baer, T. E. Anderson, and B. N. Bershad. Execution Characteristics of Desktop Applications on Windows NT. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 27-38, 1998. Google ScholarDigital Library
- 12.M. H. Lipasti and J. P. Shen. Exceeding the Dataflow Limit via Value Prediction. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 226-237, 1996. Google ScholarDigital Library
- 13.M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value Locality and Load Value Prediction. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 138-147, 1996. Google ScholarDigital Library
- 14.R. Maher. Multimedia Instruction Set Extensions for a Sixth-Generation x86 Processor. In Proceedings of HOTCHIPS-8, 1996.Google Scholar
- 15.A. M. G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 145-156, Nov. 1994. Google ScholarDigital Library
- 16.G. W. McFarland. CMOS Technology Scaling and Its Impact on Cache Delay. PhD thesis, Stanford Univerisity, 1997.Google Scholar
- 17.C. Molina, A. Gonzalez, and J. Tubella. Dynamic Removal of Redundant Computations. In Proceedings of the ACM International Conference on Supercomputing, June 1999. Google ScholarDigital Library
- 18.V. S. Pai, P. Ranganathan, and S. Adve. RSIM: A Simulator for Shared-Memory Multiprocessor and Uniprocessor Systems that Exploit ILP. In Proceedings of the Third Workshop on Computer Architecture Education, 1997. Google ScholarDigital Library
- 19.P. Ranganathan, S. Adve, and N. P. Jouppi. Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions. In Proceedings of the 26th International Symposium on Computer Architecture, pages 124-135, 1999. Google ScholarDigital Library
- 20.G. Reinman and N. P. Jouppi. CACTI 2.0 Beta. In http ://www. research, digital, com/wrl//people/jouppi/ CA CTI.html, 1999.Google Scholar
- 21.S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A Bandwidth-Efficient Architecture for Media Processing. In Proceedings of the 31st Annual International Conference on Microarchitecture, pages 3-13, 1998. Google ScholarDigital Library
- 22.Y. Sazeides and J. E. Smith. The Predictability of Data Values. In Proceedings of the 30th Annual International Conference on Microarchitecture, pages 248-258, 1997. Google ScholarDigital Library
- 23.A. Sodani and G. Sohi. Dynamic Instruction Reuse. In Proceedings of the 2~th International Symposium on Computer Architecture, pages 194-205, 1997. Google ScholarDigital Library
- 24.A. Sodani and G. Sohi. An Empirical Analysis of Instruction Repetition. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 35-45, 1998. Google ScholarDigital Library
- 25.S. Wilton and N. P. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE Journal of Solid-State Circuits, pages 677-687, 1996.Google ScholarCross Ref
Index Terms
- Reconfigurable caches and their application to media processing
Recommendations
Reconfigurable caches and their application to media processing
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)High performance general-purpose processors are increasingly being used for a variety of application domains - scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features that ...
Reconfigurable split data caches: a novel scheme for embedded systems
SAC '07: Proceedings of the 2007 ACM symposium on Applied computingThis paper shows that even very small reconfigurable data caches, when split to serve data streams exhibiting temporal and spatial localities, can improve performance of embedded applications without consuming excessive silicon real estate or power. It ...
Comments