Abstract
This paper evaluates managing the processor's datapath-width at the compiler level by means of exploiting dynamic narrow-width operands. We capitalize on the large occurrence of these operands in multimedia programs to build static narrow-width regions that may be directly exposed to the compiler. We propose to augment the ISA with instructions directly exposing the datapath and the register widths to the compiler. Simple exception management allows this exposition to be only speculative. In this way, we permit the software to speculatively accommodate the execution of a program on a narrower datapath-width in order to save energy. For this purpose, we introduce a novel register file organization, the byte-slice register file, which allows the width of the register file to be dynamically reconfigured, providing both static and dynamic energy savings. We show that by combining the advantages of the byte-slice register file with the advantages provided by clock-gating the datapath on a per-region basis, up to 17% of the datapath dynamic energy can be saved, while a 22% reduction of the register file static energy is achieved.
- Ayala, J.L., López, V.M., Veidenbaum, A., and López C.A. Energy Aware Register File Implementation through Instruction Predecode. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, June 2003.Google ScholarCross Ref
- Bahar, R.I., and Manne, S. Power and Energy Reduction Via Pipeline Balancing. In Proceedings of the 28th International Symposium on Computer Architecture, June 2001. Google ScholarDigital Library
- Balasubramonian, R., Dwarkadas, S., Albonesi, D. Reducing the Complexity of the Register File in Dynamic Superscalar Processor. In Proceedings of the 34th International Symposium on Microarchitecture, December 2001. Google ScholarDigital Library
- Bodin, F., Rohou, E., and Seznec, A. SALTO: System for Assembly-Language Transformation and Optimization. In Proceedings of the Sixth Workshop on Compilers for Parallel Computers, December 1996.Google Scholar
- Brooks, D., and Martonosi, M. Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture, January 1999. Google ScholarDigital Library
- Canal, R., Gonzales, A., and Smith, J.E. Very Low Power Pipelines Using Significance Compression. In Proceedings of the 33th International Symposium on Microarchitecture, December 2000. Google ScholarDigital Library
- Canal, R., Gonzales, A., and Smith, J.E. Software-Controlled Operand-Gating. In Proceedings of the International Symposium on Code Generation and Optimization, March 2004. Google ScholarDigital Library
- Cao, Y., and Yasuura, H. Low-Energy Design using Datapath Width Optimization for Embedded Processor-based Systems. IPSJ Journal, 43(5):1348--1356, May 2002.Google Scholar
- Drach, N., and Sebot, J. SIMD ISA Extensions: Tradeoff between Power Consumption and Performance on a Superscalar Processor. In Proceedings of the Kool Chips Workshop, December 2000.Google Scholar
- Faraboschi, P., Brown, G., Fisher, J.A., Desoli, G., and Homewood, F. Lx: A Technology Platform for Customizable VLIW Embedded Processing. In Proceedings of the 27th International. Symposium on Computer Architecture, June 2000. Google ScholarDigital Library
- Flautner, K., Sung Kim, N., Martin, S., Blaauw, D., and Mudge, T. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proceedings of the 29th International Symposium on Computer Architecture, May 2002. Google ScholarDigital Library
- Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R.B. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proceedings of the 4th IEEE International Workshop on Workload Characterization, pages 3--14, December 2001. Google ScholarDigital Library
- Larsen, S., and Amarasinghe, S. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2000. Google ScholarDigital Library
- Loh, G. Exploiting Data-Width Locality to Increase Superscalar Execution Bandwidth. In Proceedings of the 35th International Symposium on Microarchitecture, November 2002. Google ScholarDigital Library
- Mahlke, S., Ravindran, R., Schlansker, M., Schreiber, R., and Sherwood, T. Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(11), November 2001. Google ScholarDigital Library
- Manne, S., Klauser, A., and Grunwald, D. Pipeline Gating: Speculation Control for Energy Reduction. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998. Google ScholarDigital Library
- Moreno, J.H., et al. An Innovative Low-Power High-Performance Programmable Signal Processor for Digital Communications. IBM Journal of Research and Development, 47(2-3):299--326, March/May 2003. Google ScholarDigital Library
- Nakra, T., Childers, B.R., and Soffa, M.L. Width-Sensitive Scheduling for Resource-Constrained VLIW Processors. In Proceedings of the 3th ACM Workshop on Feedback-Directed and Dynamic Optimization, December 2000.Google Scholar
- Pokam, G., Bihan, S., Simonnet, J., and Bodin, F. SWARP: A Retargetable Preprocessor for Multimedia Instructions. Concurrency and Computation: Practice and Experience, 16(2-3):303--318, February/March 2004. Google ScholarDigital Library
- Scott, J., Hwang Lee, L., Arends, J., and Moyer, W. Designing the Low-Power M.CORE Architecture. In Proceedings of Power Driven Microarchitecture, June 1998.Google Scholar
- Shivakumar, P., and Jouppi, N. CACTI 3.0: An Integrated Cache Timing Power, and Area Model. Technical report, DEC Western research Lab, 2002.Google Scholar
- Smith, I.E., et al. The ZS-I Central Processor. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 199--204, October 1987. Google ScholarCross Ref
- Stephenson, M., Babb, J., and Amarasinghe, S. Bitwidth Analysis with Application to Silicon Compilation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2000. Google ScholarDigital Library
- Tseng, J.H., and Asanovic, K. Banked Multiported Register Files for High-Frequency Superscalar Microprocessors. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
- Vijaykrishnan, N., Kandemir, M., Irwin, M.J., Kim, H.S., and Ye, W. Energy-driven Integrated Hardware-Software Optimizations using SimplePower. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000. Google ScholarDigital Library
- Zhang, Y., Parikh, D., Sankaranarayanan, K., Skadron, K., and Stan, M. Hotleakage: A Temperature-aware Model of Subthreshold and Gate Leakage for Architects. Technical Report CS-2003-05, University of Virginia, Department of Computer Science, March 2003.Google Scholar
Index Terms
- Speculative software management of datapath-width for energy optimization
Recommendations
Speculative software management of datapath-width for energy optimization
LCTES '04: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsThis paper evaluates managing the processor's datapath-width at the compiler level by means of exploiting dynamic narrow-width operands. We capitalize on the large occurrence of these operands in multimedia programs to build static narrow-width regions ...
Three Architectural Models for Compiler-Controlled Speculative Execution
To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed ...
Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution
Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution(1, 2) and predicated execution(3---9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution ...
Comments