skip to main content
research-article
Free Access

An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA

Authors Info & Claims
Published:16 September 2013Publication History
Skip Abstract Section

Abstract

In application-specific processor design, a common approach to improve performance and efficiency is to use special instructions that execute complex operation patterns. However, in a generic embedded processor with compact Instruction Set Architecture (ISA), these special instructions may lead to large overhead such as: (i) more bits are needed to encode the extra opcodes and operands, resulting in wider instructions; (ii) more Register File (RF) ports are required to provide the extra operands to the function units. Such overhead may increase energy consumption considerably.

In this article, we propose to support flexible operation pair patterns in a processor with a compact 24-bit RISC-like ISA using: (i) a partially reconfigurable decoder that exploits the pattern locality to reduce opcode space requirement; (ii) a software-controlled bypass network to reduce operand encoding bit and RF port requirement. An energy-aware compiler backend is designed for the proposed architecture that performs pattern selection and bypass-aware scheduling to generate energy-efficient codes. Though the proposed design imposes extra constraints on the operation patterns, the experimental results show that for benchmark applications from different domains, the average dynamic instruction count is reduced by over 25%, which is only about 2% less than the architecture without such constraints. The proposed architecture reduces total energy by an average of 15.8% compared to the RISC baseline, while the one without constraints achieves almost no improvement due to its high overhead. When high performance is required, the proposed architecture is able to achieve a speedup of 13.8% with 13.1% energy reduction compared to the baseline by introducing multicycle SFU operations.

References

  1. Arm Ltd. 2013. ARM thumb instruction set. http://www.arm.com/.Google ScholarGoogle Scholar
  2. Arnold, M. and Corporaal, H. 1999. Automatic detection of recurring operation patterns. In Proceedings of the 7th International Workshop on Hardware/Software Codesign (CODES'99). ACM Press, New York, 22--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Atasu, K., Pozzi, L., and Ienne, P. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the 40th Annual Design Automation Conference (DAC'03). ACM Press, New York, 256--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Atasu, K., Luk, W., Mencer, O., Ozturan, C., and Dundar, G. 2012. Fish: Fast instruction synthesis for custom processors. IEEE Trans. Very Large Scale Integr. Syst. 20, 1, 52--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Balfour, J., Dally, W., Black-Schaffer, D., Parikh, V., and Park, J. 2007. An energy-efficient processor architecture for embedded systems. Comput. Archit. Lett. 7, 1, 29--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Balfour, J., Harting, R., and Dally, W. 2009. Operand registers and explicit operand forwarding. Comput. Archit. Lett. 8, 2, 60--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bauer, L., Shafique, M., Kramer, S., and Henkel, J. 2007. Rispp: Rotating instruction set processing platform. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). ACM Press, New York, 791--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cacti. 2009. cacti 5.3, rev 174. http://quid.hpl.hp.com:9081/cacti/.Google ScholarGoogle Scholar
  9. Cheng, A., Tyson, G., and Mudge, T. 2004. Fits: Framework-based instruction-set tuning synthesis for embedded application specific processors. In Proceedings of the 41st Annual Design Automation Conference (DAC'04). ACM Press, New York, 920--923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Clark, N., Zhong, H., and Mahlke, S. 2003. Processor acceleration through automated instruction set customization. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). IEEE Computer Society, 129--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Clark, N., Kudlur, M., Park, H., Mahlke, S., and Flautner, K. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'04). IEEE, 30--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clark, N., Blome, J., Chu, M., Mahlke, S., Biles, S., and Flautner, K. 2005. An architecture framework for transparent instruction set customization in embedded processors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE, 272--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cong, J., Han, G., and Zhang, Z. 2005. Architecture and compilation for data bandwidth improvement in configurable embedded processors. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'05). IEEE, 263--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dasika, G., Sethia, A., Mudge, T., and Mahike, S. 2011. PEPSC: A power-efficient processor for scientific computing. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'11). IEEE, 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Galuzzi, C. and Bertels, K. 2011. The instruction-set extension problem: A survey. ACM Trans. Reconfig. Technol. Syst. 4, 2, 18:1--18:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gonzalez, R. 2006. A software-configurable processor architecture. IEEE Micro 26, 5, 42--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Goodman, J. R. and Hsu, W. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd International Conference on Supercomputing (ICS'88). ACM Press, New York, 442--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guo, J., Limberg, T., Matus, E., Mennenga, B., Klemm, R., and Fettweis, G. 2006. Code generation for sta architecture. In Proceedings of the 12th International Conference on Parallel Processing (Euro-Par'06). Springer, 299--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Halldorsson, M. and Radhakrishnan, J. 1994. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC'94). ACM Press, New York, 439--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Heysters, P. M., Smit, G. J. M., and Molenkamp, E. 2003. Montium - Balancing between energy-efficiency, flexibility and performance. In Engineering of Reconfigurable Systems and Algorithms. CSREA Press, 235--241.Google ScholarGoogle Scholar
  21. Huynh, H. P., Sim, J. E., and Mitra, T. 2007. An efficient framework for dynamic reconfiguration of instruction-set customization. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'07). ACM Press, New York, 135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jayaseelan, R., Liu, H., and Mitra, T. 2006. Exploiting forwarding to improve data bandwidth of instruction-set extensions. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). ACM Press, New York, 43--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Karuri, K., Chattopadhyay, A., Hohenauer, M., Leupres, R., Ascheid, G., and Meyr, H. 2007. Increasing data-bandwidth to instruction-set extensions through register clustering. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'07). IEEE, 166--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Karuri, K., Chattopadhyay, A., Chen, X., Kammler, D., Hao, L., et al. 2008. A design flow for architecture exploration and implementation of partially reconfigurable processors. IEEE Trans. Very Large Scale Integr. Syst. 16, 10, 1281--1294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Karuri, K., Leupers, R., Ascheid, G., and Meyr, H. 2009. A generic design flow for application specific processor customization through instruction-set extensions (ises). In Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'09). 204--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kastrup, B., Bink, A., and Hoogerbrugge, J. 1999. ConCISe: A compiler-driven cpld-based instruction set accelerator. In Proceedings of the 7th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'99). IEEE, 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Leibson, S. 2006. Designing SOCs with Configured Cores: Unleashing the Tensilica Xtensa and Diamond Cores (Systems on Silicon). Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Leupers, R., Karuri, K., Kraemer, S., and Pandey, M. 2006. A design flow for configurable embedded processors based on optimized instruction set extension synthesis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). 581--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mei, B., Vernalde, S., Verkest, D., de Man, H., and Lauwereins, R. 2003. Adres: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In Proceedings of the 13th International Conference on Field Programmable Logic and Application. Lecture Notes in Computer Science, vol. 2778. Springer, 61--70.Google ScholarGoogle Scholar
  31. Opencores. 2013. OpenRISC 1200. http://opencores.org/openrisc.Google ScholarGoogle Scholar
  32. Park, S., Shrivastava, A., Dutt, N., Nicolau, A., Paek, Y., and Earlie, E. 2006. Bypass aware instruction scheduling for register file power reduction. In Proceedings of the ACM Conference on Language, Compilers, and Tool Support for Embedded Systems (LCTES'06). ACM Press, New York, 173--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pozzi, L. and Ienne, P. 2005. Exploiting pipelining to relax register-file port constraints of instruction set extensions. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). ACM Press, New York, 2--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Razdan, R. and Smith, M. 1994. A high-performance microarchitecture with hardware-programmable functional units. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO'94). ACM Press, New York, 172--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. She, D., He, Y., and Corporaal, H. 2012a. Energy efficient special instruction support in an embedded processor with compact isa. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'12). ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. She, D., He, Y., Mesman, B., and Corporaal, H. 2012b. Scheduling for register file energy minimization in explicit datapath architectures. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'12). 388--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Stretch, Inc. 2013. Stretch software configurable processors. http://www.stretchinc.com/.Google ScholarGoogle Scholar
  38. Tensilica, Inc. 2013. Xtensa customizable processors. http://www.tensilica.com/.Google ScholarGoogle Scholar
  39. Vassiliadis, S., Wong, S., Gaydadjiev, G., Bertels, K., Kuzmanov, G., and Panainte, E. M. 2004. The molen polymorphic processor. IEEE Trans. Comput. 53, 11, 1363--1375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Venkatesh, G., Sampson, J., Goulding-Hotta, N., Venkata, S. K., Taylor, M. B., et al. 2011. QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). ACM Press, New York, 163--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM Press, New York, 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., and Flautner, K. 2009. AnySP: Anytime anywhere anyway signal processing. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 128--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yu, P. and Mitra, T. 2004a. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the 41st Annual Design Automation Conference (DAC'04). ACM Press, New York, 723--728. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yu, P. and Mitra, T. 2004b. Scalable custom instructions identification for instruction-set extensible processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM Press, New York, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Architecture and Code Optimization
              ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 3
              September 2013
              310 pages
              ISSN:1544-3566
              EISSN:1544-3973
              DOI:10.1145/2509420
              Issue’s Table of Contents

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 16 September 2013
              • Accepted: 1 April 2013
              • Revised: 1 February 2013
              • Received: 1 December 2012
              Published in taco Volume 10, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader