Abstract
In application-specific processor design, a common approach to improve performance and efficiency is to use special instructions that execute complex operation patterns. However, in a generic embedded processor with compact Instruction Set Architecture (ISA), these special instructions may lead to large overhead such as: (i) more bits are needed to encode the extra opcodes and operands, resulting in wider instructions; (ii) more Register File (RF) ports are required to provide the extra operands to the function units. Such overhead may increase energy consumption considerably.
In this article, we propose to support flexible operation pair patterns in a processor with a compact 24-bit RISC-like ISA using: (i) a partially reconfigurable decoder that exploits the pattern locality to reduce opcode space requirement; (ii) a software-controlled bypass network to reduce operand encoding bit and RF port requirement. An energy-aware compiler backend is designed for the proposed architecture that performs pattern selection and bypass-aware scheduling to generate energy-efficient codes. Though the proposed design imposes extra constraints on the operation patterns, the experimental results show that for benchmark applications from different domains, the average dynamic instruction count is reduced by over 25%, which is only about 2% less than the architecture without such constraints. The proposed architecture reduces total energy by an average of 15.8% compared to the RISC baseline, while the one without constraints achieves almost no improvement due to its high overhead. When high performance is required, the proposed architecture is able to achieve a speedup of 13.8% with 13.1% energy reduction compared to the baseline by introducing multicycle SFU operations.
- Arm Ltd. 2013. ARM thumb instruction set. http://www.arm.com/.Google Scholar
- Arnold, M. and Corporaal, H. 1999. Automatic detection of recurring operation patterns. In Proceedings of the 7th International Workshop on Hardware/Software Codesign (CODES'99). ACM Press, New York, 22--26. Google ScholarDigital Library
- Atasu, K., Pozzi, L., and Ienne, P. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the 40th Annual Design Automation Conference (DAC'03). ACM Press, New York, 256--261. Google ScholarDigital Library
- Atasu, K., Luk, W., Mencer, O., Ozturan, C., and Dundar, G. 2012. Fish: Fast instruction synthesis for custom processors. IEEE Trans. Very Large Scale Integr. Syst. 20, 1, 52--65. Google ScholarDigital Library
- Balfour, J., Dally, W., Black-Schaffer, D., Parikh, V., and Park, J. 2007. An energy-efficient processor architecture for embedded systems. Comput. Archit. Lett. 7, 1, 29--32. Google ScholarDigital Library
- Balfour, J., Harting, R., and Dally, W. 2009. Operand registers and explicit operand forwarding. Comput. Archit. Lett. 8, 2, 60--63. Google ScholarDigital Library
- Bauer, L., Shafique, M., Kramer, S., and Henkel, J. 2007. Rispp: Rotating instruction set processing platform. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). ACM Press, New York, 791--796. Google ScholarDigital Library
- Cacti. 2009. cacti 5.3, rev 174. http://quid.hpl.hp.com:9081/cacti/.Google Scholar
- Cheng, A., Tyson, G., and Mudge, T. 2004. Fits: Framework-based instruction-set tuning synthesis for embedded application specific processors. In Proceedings of the 41st Annual Design Automation Conference (DAC'04). ACM Press, New York, 920--923. Google ScholarDigital Library
- Clark, N., Zhong, H., and Mahlke, S. 2003. Processor acceleration through automated instruction set customization. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). IEEE Computer Society, 129--140. Google ScholarDigital Library
- Clark, N., Kudlur, M., Park, H., Mahlke, S., and Flautner, K. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'04). IEEE, 30--40. Google ScholarDigital Library
- Clark, N., Blome, J., Chu, M., Mahlke, S., Biles, S., and Flautner, K. 2005. An architecture framework for transparent instruction set customization in embedded processors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE, 272--283. Google ScholarDigital Library
- Cong, J., Han, G., and Zhang, Z. 2005. Architecture and compilation for data bandwidth improvement in configurable embedded processors. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'05). IEEE, 263--270. Google ScholarDigital Library
- Dasika, G., Sethia, A., Mudge, T., and Mahike, S. 2011. PEPSC: A power-efficient processor for scientific computing. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'11). IEEE, 101--110. Google ScholarDigital Library
- Galuzzi, C. and Bertels, K. 2011. The instruction-set extension problem: A survey. ACM Trans. Reconfig. Technol. Syst. 4, 2, 18:1--18:28. Google ScholarDigital Library
- Gonzalez, R. 2006. A software-configurable processor architecture. IEEE Micro 26, 5, 42--51. Google ScholarDigital Library
- Goodman, J. R. and Hsu, W. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd International Conference on Supercomputing (ICS'88). ACM Press, New York, 442--452. Google ScholarDigital Library
- Guo, J., Limberg, T., Matus, E., Mennenga, B., Klemm, R., and Fettweis, G. 2006. Code generation for sta architecture. In Proceedings of the 12th International Conference on Parallel Processing (Euro-Par'06). Springer, 299--310. Google ScholarDigital Library
- Halldorsson, M. and Radhakrishnan, J. 1994. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC'94). ACM Press, New York, 439--448. Google ScholarDigital Library
- Heysters, P. M., Smit, G. J. M., and Molenkamp, E. 2003. Montium - Balancing between energy-efficiency, flexibility and performance. In Engineering of Reconfigurable Systems and Algorithms. CSREA Press, 235--241.Google Scholar
- Huynh, H. P., Sim, J. E., and Mitra, T. 2007. An efficient framework for dynamic reconfiguration of instruction-set customization. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'07). ACM Press, New York, 135--144. Google ScholarDigital Library
- Jayaseelan, R., Liu, H., and Mitra, T. 2006. Exploiting forwarding to improve data bandwidth of instruction-set extensions. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). ACM Press, New York, 43--48. Google ScholarDigital Library
- Karuri, K., Chattopadhyay, A., Hohenauer, M., Leupres, R., Ascheid, G., and Meyr, H. 2007. Increasing data-bandwidth to instruction-set extensions through register clustering. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'07). IEEE, 166--171. Google ScholarDigital Library
- Karuri, K., Chattopadhyay, A., Chen, X., Kammler, D., Hao, L., et al. 2008. A design flow for architecture exploration and implementation of partially reconfigurable processors. IEEE Trans. Very Large Scale Integr. Syst. 16, 10, 1281--1294. Google ScholarDigital Library
- Karuri, K., Leupers, R., Ascheid, G., and Meyr, H. 2009. A generic design flow for application specific processor customization through instruction-set extensions (ises). In Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'09). 204--214. Google ScholarDigital Library
- Kastrup, B., Bink, A., and Hoogerbrugge, J. 1999. ConCISe: A compiler-driven cpld-based instruction set accelerator. In Proceedings of the 7th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'99). IEEE, 92--101. Google ScholarDigital Library
- Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). 75--86. Google ScholarDigital Library
- Leibson, S. 2006. Designing SOCs with Configured Cores: Unleashing the Tensilica Xtensa and Diamond Cores (Systems on Silicon). Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Leupers, R., Karuri, K., Kraemer, S., and Pandey, M. 2006. A design flow for configurable embedded processors based on optimized instruction set extension synthesis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). 581--586. Google ScholarDigital Library
- Mei, B., Vernalde, S., Verkest, D., de Man, H., and Lauwereins, R. 2003. Adres: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In Proceedings of the 13th International Conference on Field Programmable Logic and Application. Lecture Notes in Computer Science, vol. 2778. Springer, 61--70.Google Scholar
- Opencores. 2013. OpenRISC 1200. http://opencores.org/openrisc.Google Scholar
- Park, S., Shrivastava, A., Dutt, N., Nicolau, A., Paek, Y., and Earlie, E. 2006. Bypass aware instruction scheduling for register file power reduction. In Proceedings of the ACM Conference on Language, Compilers, and Tool Support for Embedded Systems (LCTES'06). ACM Press, New York, 173--181. Google ScholarDigital Library
- Pozzi, L. and Ienne, P. 2005. Exploiting pipelining to relax register-file port constraints of instruction set extensions. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). ACM Press, New York, 2--10. Google ScholarDigital Library
- Razdan, R. and Smith, M. 1994. A high-performance microarchitecture with hardware-programmable functional units. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO'94). ACM Press, New York, 172--180. Google ScholarDigital Library
- She, D., He, Y., and Corporaal, H. 2012a. Energy efficient special instruction support in an embedded processor with compact isa. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'12). ACM Press, New York. Google ScholarDigital Library
- She, D., He, Y., Mesman, B., and Corporaal, H. 2012b. Scheduling for register file energy minimization in explicit datapath architectures. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'12). 388--393. Google ScholarDigital Library
- Stretch, Inc. 2013. Stretch software configurable processors. http://www.stretchinc.com/.Google Scholar
- Tensilica, Inc. 2013. Xtensa customizable processors. http://www.tensilica.com/.Google Scholar
- Vassiliadis, S., Wong, S., Gaydadjiev, G., Bertels, K., Kuzmanov, G., and Panainte, E. M. 2004. The molen polymorphic processor. IEEE Trans. Comput. 53, 11, 1363--1375. Google ScholarDigital Library
- Venkatesh, G., Sampson, J., Goulding-Hotta, N., Venkata, S. K., Taylor, M. B., et al. 2011. QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). ACM Press, New York, 163--174. Google ScholarDigital Library
- Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM Press, New York, 205--218. Google ScholarDigital Library
- Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., and Flautner, K. 2009. AnySP: Anytime anywhere anyway signal processing. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 128--139. Google ScholarDigital Library
- Yu, P. and Mitra, T. 2004a. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the 41st Annual Design Automation Conference (DAC'04). ACM Press, New York, 723--728. Google ScholarDigital Library
- Yu, P. and Mitra, T. 2004b. Scalable custom instructions identification for instruction-set extensible processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM Press, New York, 69--78. Google ScholarDigital Library
Index Terms
- An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA
Recommendations
Energy efficient special instruction support in an embedded processor with compact isa
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systemsThe use of special instructions that execute complex operation patterns is a common approach in application specific processor design to improve performance and efficiency. However, in an embedded generic processor with compact instruction set ...
Scheduling for register file energy minimization in explicit datapath architectures
DATE '12: Proceedings of the Conference on Design, Automation and Test in EuropeIn modern processor architectures, the register file (RF) consumes considerable amount of the processor power. It is well known that by allowing software to have explicit fine-grained control over the datapath, the transport-triggered architectures (...
Energy-efficient instruction set synthesis for application-specific processors
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and designSeveral techniques have been proposed to enhance the energy-efficiency of ASIPs (Application-Specific Instruction set Processors). While those techniques can reduce the energy consumption with a minimal change in the instruction set (IS), they fail to ...
Comments