Skip to main content
Erschienen in: The Journal of Supercomputing 2/2014

01.05.2014

Customized pipeline and instruction set architecture for embedded processing engines

verfasst von: Amir Yazdanbakhsh, Mostafa E. Salehi, Sied Mehdi Fakhraie

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Custom instructions potentially improve execution speed and code compression of embedded applications. However, more efficient custom instructions need higher number of simultaneous registerfile accesses. Larger registerfiles are more power hungry with complex forwarding interconnects. Therefore, due to the limited ports of the base processor registerfile, size and efficiency of custom instructions could be generally limited. Recent researches have focused on overcoming this limitation by some innovative architectural techniques supplemented with customized compilations. However, to the best of our knowledge there are few researches that take into account the complete pipeline design and implementation considerations. This paper proposes a customized instruction set and pipeline architecture for an optimized embedded engine. The proposed architecture increases the performance by enhancing the available registerfile data bandwidth through register access pipelining. The achieved improvements are made by introducing double-word custom instructions whose registerfile accesses are overlapped in the pipeline. Potential hazards in such instructions are resolved by the introduced pipeline backwarding concept, yielding higher performance and code compression. While we study the effectiveness of the proposed architecture on domain-specific workloads from packet-processing benchmarks, the developed framework and architecture are applicable to other embedded application domains.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Swanson S, Putnam A, Mercaldi M, Michelson K, Petersen A, Schwerin A, Oskin M, Eggers SJ (2006) Area-performance trade-offs in tiled dataflow architectures. in: Proceedings of the 33rd international symposium on computer architecture (ISCA’06), pp. 314–326 Swanson S, Putnam A, Mercaldi M, Michelson K, Petersen A, Schwerin A, Oskin M, Eggers SJ (2006) Area-performance trade-offs in tiled dataflow architectures. in: Proceedings of the 33rd international symposium on computer architecture (ISCA’06), pp. 314–326
2.
Zurück zum Zitat Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–59CrossRef Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–59CrossRef
3.
Zurück zum Zitat Lee SJ (2010) A 345 mW heterogeneous many-core processor with an intelligent inference engine for robust object recognition. In: Porceedings of the IEEE international solid-state circuits conference, 2010, pp. 332–334 Lee SJ (2010) A 345 mW heterogeneous many-core processor with an intelligent inference engine for robust object recognition. In: Porceedings of the IEEE international solid-state circuits conference, 2010, pp. 332–334
4.
Zurück zum Zitat Bell S, et al (2008) TILE64\(^{TM}\) processor: a 64-core SoC with mesh interconnect. In: Porceedings ofthe IEEE international solid-state circuits conference, pp. 88–90 Bell S, et al (2008) TILE64\(^{TM}\) processor: a 64-core SoC with mesh interconnect. In: Porceedings ofthe IEEE international solid-state circuits conference, pp. 88–90
5.
Zurück zum Zitat Jotwani R, et al (2010) An x86–64 core implemented in 32 nm SOI CMOS. In: Porceedings of the IEEE international solid-state circuits conference, pp. 106–107 Jotwani R, et al (2010) An x86–64 core implemented in 32 nm SOI CMOS. In: Porceedings of the IEEE international solid-state circuits conference, pp. 106–107
6.
Zurück zum Zitat Howard J, et al (2010) A 48-Core IA-32 message-passing processor with DVFS in 45 nm CMOS. In: Poreedings of the IEEE international solid-state circuits conference, pp. 108–110 Howard J, et al (2010) A 48-Core IA-32 message-passing processor with DVFS in 45 nm CMOS. In: Poreedings of the IEEE international solid-state circuits conference, pp. 108–110
7.
Zurück zum Zitat Shin JL, et al (2010) A 40 nm 16-Core 128-thread CMT SPARC SoC processor. In: Porceedings of the IEEE international solid-state circuits conference, pp. 98–99 Shin JL, et al (2010) A 40 nm 16-Core 128-thread CMT SPARC SoC processor. In: Porceedings of the IEEE international solid-state circuits conference, pp. 98–99
8.
Zurück zum Zitat Johnson C, et al (2010) A wire-speed POWER\(^{TM}\) processor: 2.3G Hz, 45 nm SOI with 16 cores and 64 threads. In: Porceedings of the IEEE international solid-state circuits conference, pp. 104–106 Johnson C, et al (2010) A wire-speed POWER\(^{TM}\) processor: 2.3G Hz, 45 nm SOI with 16 cores and 64 threads. In: Porceedings of the IEEE international solid-state circuits conference, pp. 104–106
9.
Zurück zum Zitat Azizi O, Mahesri A, Lee BC, Patel SJ, Horowitz M (2010) Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In: Proceedings of the 37th international symposium on computer architecture (ISCA’10), pp. 26–36 Azizi O, Mahesri A, Lee BC, Patel SJ, Horowitz M (2010) Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In: Proceedings of the 37th international symposium on computer architecture (ISCA’10), pp. 26–36
10.
Zurück zum Zitat Kapre N, DeHon A (2009) Performance comparison of single-precision SPICE model-evaluation on FPGA, GPU, Cell, and multi-core processors. In: Proceedings of the international conference on field programmable logic and applications, pp. 65–72 Kapre N, DeHon A (2009) Performance comparison of single-precision SPICE model-evaluation on FPGA, GPU, Cell, and multi-core processors. In: Proceedings of the international conference on field programmable logic and applications, pp. 65–72
11.
Zurück zum Zitat Truong DN et al (2009) A 167-processor computational platform in 65 nm CMOS. IEEE J Solid State Circuits 44(4):1130–1144CrossRef Truong DN et al (2009) A 167-processor computational platform in 65 nm CMOS. IEEE J Solid State Circuits 44(4):1130–1144CrossRef
12.
Zurück zum Zitat Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. IEEE Comput 41(7):33–38CrossRef Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. IEEE Comput 41(7):33–38CrossRef
13.
Zurück zum Zitat Borkar S (2007) Thousand core chips—a technology perspective. In: Proceedings of the design automation conference (DAC), pp. 746–749 Borkar S (2007) Thousand core chips—a technology perspective. In: Proceedings of the design automation conference (DAC), pp. 746–749
14.
Zurück zum Zitat Eyerman S, Eeckhout L (2010) Modeling critical sections in Amdahl’s Law and its implications for multicore design. In: Proceedings of the 37th international symposium on computer, architecture (ISCA’10), pp. 362–370 Eyerman S, Eeckhout L (2010) Modeling critical sections in Amdahl’s Law and its implications for multicore design. In: Proceedings of the 37th international symposium on computer, architecture (ISCA’10), pp. 362–370
15.
Zurück zum Zitat Park S, Shrivastava A, Dutt N, Nicolau A, Paek Y, Earlie E (2008) Register file power reduction using bypass sensitive compiler. IEEE Trans Comput Aided Des Integr Circuits Syst 27(6):1155–1159CrossRef Park S, Shrivastava A, Dutt N, Nicolau A, Paek Y, Earlie E (2008) Register file power reduction using bypass sensitive compiler. IEEE Trans Comput Aided Des Integr Circuits Syst 27(6):1155–1159CrossRef
16.
Zurück zum Zitat Nalluri R, Garg R, Panda PR (2007) Customization of register file banking architecture for low power. In: Proceedings of the 20th international conference on VLSI design (VLSID’07), pp. 239–244 Nalluri R, Garg R, Panda PR (2007) Customization of register file banking architecture for low power. In: Proceedings of the 20th international conference on VLSI design (VLSID’07), pp. 239–244
17.
Zurück zum Zitat Bonzini P, Pozzi L (2008) Recurrence-aware instruction set selection for extensible embedded processors. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(10):1259–1267CrossRef Bonzini P, Pozzi L (2008) Recurrence-aware instruction set selection for extensible embedded processors. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(10):1259–1267CrossRef
18.
Zurück zum Zitat Atasu K, Pozzi L, Ienne P (2003) Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proceedings of the design automation conference (DAC), pp. 256–261 Atasu K, Pozzi L, Ienne P (2003) Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proceedings of the design automation conference (DAC), pp. 256–261
19.
Zurück zum Zitat Clark N, Zhong H, Mahlke S (2003) Processor acceleration through automated instruction set customization. In: Proceedings of the 36th Annu. IEEE/ACM, MICRO, pp. 129–140 Clark N, Zhong H, Mahlke S (2003) Processor acceleration through automated instruction set customization. In: Proceedings of the 36th Annu. IEEE/ACM, MICRO, pp. 129–140
20.
Zurück zum Zitat Yu P, Mitra T (2004) Scalable custom instructions identification for instruction-set extensible processors. In: Proceedings of the CASES, pp. 69–78 Yu P, Mitra T (2004) Scalable custom instructions identification for instruction-set extensible processors. In: Proceedings of the CASES, pp. 69–78
21.
Zurück zum Zitat Pozzi L, Atasu K, Ienne P (2006) Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans Comput Aided Des Integr Circuits Syst 25:1209–1229CrossRef Pozzi L, Atasu K, Ienne P (2006) Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans Comput Aided Des Integr Circuits Syst 25:1209–1229CrossRef
22.
Zurück zum Zitat Chen X, Maskell DL, Sun Y (2007) Fast identification of custom instructions for extensible processors. IEEE Trans Comput Aided Des Integr Circuits Syst 26(2):359–368CrossRefMATH Chen X, Maskell DL, Sun Y (2007) Fast identification of custom instructions for extensible processors. IEEE Trans Comput Aided Des Integr Circuits Syst 26(2):359–368CrossRefMATH
23.
Zurück zum Zitat Zyuban VV, Kogge PM (1998) The energy complexity of register files. In: Proceedings of the international symposium on low power, electronic design, pp. 305–310 Zyuban VV, Kogge PM (1998) The energy complexity of register files. In: Proceedings of the international symposium on low power, electronic design, pp. 305–310
24.
Zurück zum Zitat Leupers R, Karuri K, Kraemer S, Pandey M (2006) A design flow for configurable embedded processors based on optimized instruction set extension synthesis. In: Proceedings of the design, automation & test in Europe (DATE) Leupers R, Karuri K, Kraemer S, Pandey M (2006) A design flow for configurable embedded processors based on optimized instruction set extension synthesis. In: Proceedings of the design, automation & test in Europe (DATE)
25.
Zurück zum Zitat Altera Corp. Nios processor reference handbook Altera Corp. Nios processor reference handbook
26.
Zurück zum Zitat Xilinx Inc., Microblaze soft processor core Xilinx Inc., Microblaze soft processor core
27.
Zurück zum Zitat Gonzalez RE (2000) XTENSA: a configurable and extensible processor. IEEE Micro 20:60–70CrossRef Gonzalez RE (2000) XTENSA: a configurable and extensible processor. IEEE Micro 20:60–70CrossRef
28.
Zurück zum Zitat Karuri K, Chattopadhyay A, Hohenauer M, Leupers R, Ascheid G, Meyr H (2007) Increasing data-bandwidth to instruction-set extensions through register clustering. In: Proceedings of the international conference on computer aided design, pp. 166–177 Karuri K, Chattopadhyay A, Hohenauer M, Leupers R, Ascheid G, Meyr H (2007) Increasing data-bandwidth to instruction-set extensions through register clustering. In: Proceedings of the international conference on computer aided design, pp. 166–177
29.
Zurück zum Zitat Fischer JA, Faraboschi P, Young C (2005) Embedded computing: a VLIW approach to architecture. Elsevier Inc, Compiler and Tools, Amsterdam Fischer JA, Faraboschi P, Young C (2005) Embedded computing: a VLIW approach to architecture. Elsevier Inc, Compiler and Tools, Amsterdam
30.
Zurück zum Zitat Kim NS, Mudge T (2003) Reducing register ports using delayed write-back queues and operand pre-fetch. In: Proceedings of the 17th annual international conference on Supercomputing, pp. 172–182 Kim NS, Mudge T (2003) Reducing register ports using delayed write-back queues and operand pre-fetch. In: Proceedings of the 17th annual international conference on Supercomputing, pp. 172–182
31.
Zurück zum Zitat Pozzi L, Ienne P (2005) Exploiting pipelining to relax register-file port constraints of instruction set extensions. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems, pp. 2–10 Pozzi L, Ienne P (2005) Exploiting pipelining to relax register-file port constraints of instruction set extensions. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems, pp. 2–10
32.
Zurück zum Zitat Atasu K, Dimond R, Mencer O, Luk W, Özturan C, Dünda G (2007) Optimizing instruction-set extensible processors under data bandwidth constraints. In: Proceedings of the design automation and test in, Europe, Mar. 2007, pp. 588–593 Atasu K, Dimond R, Mencer O, Luk W, Özturan C, Dünda G (2007) Optimizing instruction-set extensible processors under data bandwidth constraints. In: Proceedings of the design automation and test in, Europe, Mar. 2007, pp. 588–593
33.
Zurück zum Zitat Atasu K, Ozturan C, Dundar G, Mencer O, Luk W (2008) CHIPS: custom hardware instruction processor synthesis. IEEE Trans Comput Aided Des Integr Circuits Syst 27(3):528–541CrossRef Atasu K, Ozturan C, Dundar G, Mencer O, Luk W (2008) CHIPS: custom hardware instruction processor synthesis. IEEE Trans Comput Aided Des Integr Circuits Syst 27(3):528–541CrossRef
34.
Zurück zum Zitat Verma Ajay K, Brisk Philip, Ienne Paolo (2010) Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration. IEEE Trans Comput Aided Des Integr Circuits Syst 29(3):341–354CrossRef Verma Ajay K, Brisk Philip, Ienne Paolo (2010) Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration. IEEE Trans Comput Aided Des Integr Circuits Syst 29(3):341–354CrossRef
35.
Zurück zum Zitat Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the design automation conference (DAC), pp. 395–400 Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the design automation conference (DAC), pp. 395–400
36.
Zurück zum Zitat Moreano N, Borin E, de Souza C, Araujo G (2005) Efficient datapath merging for partially reconfigurable architectures. IEEE Trans Comput Aided Des Integr Circuits Syst 24(7):969–980CrossRef Moreano N, Borin E, de Souza C, Araujo G (2005) Efficient datapath merging for partially reconfigurable architectures. IEEE Trans Comput Aided Des Integr Circuits Syst 24(7):969–980CrossRef
37.
Zurück zum Zitat Dinh Q, Chen D, Wong MDF (2008) Efficient ASIP design for configurable processors with fine-grained resource sharing. In: Proceedings of the ACM/SIGDA 16th international symposium on FPGA, pp. 99–106 Dinh Q, Chen D, Wong MDF (2008) Efficient ASIP design for configurable processors with fine-grained resource sharing. In: Proceedings of the ACM/SIGDA 16th international symposium on FPGA, pp. 99–106
38.
Zurück zum Zitat Zuluaga M, Topham N (2009) Design-space exploration of resource-sharing solutions for custom instruction set extensions. IEEE Trans Comput Aided Des Integr Circuits Syst 28(12):1788–1801CrossRef Zuluaga M, Topham N (2009) Design-space exploration of resource-sharing solutions for custom instruction set extensions. IEEE Trans Comput Aided Des Integr Circuits Syst 28(12):1788–1801CrossRef
39.
Zurück zum Zitat Hennessy JL, Patterson DA (2005) Computer organization and design: the hardware/software interface, the Morgan Kaufmann Series in computer architecture and design, 3rd edn. Elsevier Inc., Amsterdam Hennessy JL, Patterson DA (2005) Computer organization and design: the hardware/software interface, the Morgan Kaufmann Series in computer architecture and design, 3rd edn. Elsevier Inc., Amsterdam
40.
Zurück zum Zitat Powell PMD, Vijaykumar TN (2002) Reducing register ports for higher speed and lower energy. In: Proceedings of the 35th annual IEEE/ACM international symposium on microarchitecture, pp. 171–182 Powell PMD, Vijaykumar TN (2002) Reducing register ports for higher speed and lower energy. In: Proceedings of the 35th annual IEEE/ACM international symposium on microarchitecture, pp. 171–182
41.
Zurück zum Zitat Cong J, et al (2005) Instruction set extension with shadow registers for configurable processors. In: Proceedings of the FPGA, pp. 99–106 Cong J, et al (2005) Instruction set extension with shadow registers for configurable processors. In: Proceedings of the FPGA, pp. 99–106
42.
Zurück zum Zitat Liu H, Jayaseelan R, Mitra T (2006) Exploiting forwarding to improve data bandwidth of instruction-set extensions. In: Proceedings of the design automation conference (DAC), pp. 43–48 Liu H, Jayaseelan R, Mitra T (2006) Exploiting forwarding to improve data bandwidth of instruction-set extensions. In: Proceedings of the design automation conference (DAC), pp. 43–48
43.
Zurück zum Zitat Chen X, Maskell DL (2007) Supporting multiple-input, multiple-output custom functions in configurable processors. J Syst Architect 53:263–271CrossRef Chen X, Maskell DL (2007) Supporting multiple-input, multiple-output custom functions in configurable processors. J Syst Architect 53:263–271CrossRef
44.
Zurück zum Zitat Salehi ME, Fakhraie SM (2009) Quantitative analysis of packet-processing applications regarding architectural guidelines for network-processing-engine development. J Syst Architect 55:373–386CrossRef Salehi ME, Fakhraie SM (2009) Quantitative analysis of packet-processing applications regarding architectural guidelines for network-processing-engine development. J Syst Architect 55:373–386CrossRef
45.
Zurück zum Zitat Salehi ME, Fakhraie SM, Yazdanbakhsh A (2012) Instruction set architectural guidelines for embedded packet-processing engines. J Syst Architect 58:112–125CrossRef Salehi ME, Fakhraie SM, Yazdanbakhsh A (2012) Instruction set architectural guidelines for embedded packet-processing engines. J Syst Architect 58:112–125CrossRef
47.
Zurück zum Zitat Yazdanbakhsh A, Salehi ME, Fakhraie SM (2010) Architecture-aware graph-covering algorithm for custom instruction selection. In: Proceedings of the international conference on future information technology (FutureTech), pp. 1–6 Yazdanbakhsh A, Salehi ME, Fakhraie SM (2010) Architecture-aware graph-covering algorithm for custom instruction selection. In: Proceedings of the international conference on future information technology (FutureTech), pp. 1–6
48.
Zurück zum Zitat Yazdanbakhsh A, Salehi ME, Fakhraie SM (2010) Locality considerations in exploring custom instruction selection algorithms. In: Proceedings of the ASQED Yazdanbakhsh A, Salehi ME, Fakhraie SM (2010) Locality considerations in exploring custom instruction selection algorithms. In: Proceedings of the ASQED
49.
Zurück zum Zitat Yazdanbakhsh A, Kamal M, Salehi ME, Noori H, Fakhraie SM (2010) Energy-aware design space exploration of registerfile for extensible processors. In: Proceedings of the SAMOS Yazdanbakhsh A, Kamal M, Salehi ME, Noori H, Fakhraie SM (2010) Energy-aware design space exploration of registerfile for extensible processors. In: Proceedings of the SAMOS
50.
Zurück zum Zitat Sakai S, Togasaki M, Yamazaki K (2003) A note on greedy algorithms for the maximum weighted independent set problem. Discret Appl Math 126:313–322CrossRefMATHMathSciNet Sakai S, Togasaki M, Yamazaki K (2003) A note on greedy algorithms for the maximum weighted independent set problem. Discret Appl Math 126:313–322CrossRefMATHMathSciNet
51.
Zurück zum Zitat Ramaswamy R, Weng N, Wolf T (2009) Analysis of network processing workloads. J Syst Architect 55(10—-12):421–433CrossRef Ramaswamy R, Weng N, Wolf T (2009) Analysis of network processing workloads. J Syst Architect 55(10—-12):421–433CrossRef
52.
Zurück zum Zitat Biswas P, Atasu K, Choudhary V, Pozzi L, Dutt N, Ienne P (2004) Introduction of local memory elements in instruction set extensions. In: Proceedings of the 41st design automation conference, June 2004, pp. 729–734 Biswas P, Atasu K, Choudhary V, Pozzi L, Dutt N, Ienne P (2004) Introduction of local memory elements in instruction set extensions. In: Proceedings of the 41st design automation conference, June 2004, pp. 729–734
53.
Zurück zum Zitat She D, He Y, Corporaal H (2012) Energy efficient special instruction support in an embedded processor with compact ISA. In: proceedings of the CASES, pp. 131–140 She D, He Y, Corporaal H (2012) Energy efficient special instruction support in an embedded processor with compact ISA. In: proceedings of the CASES, pp. 131–140
54.
Zurück zum Zitat Wu D, Ahn J, Lee I, Choi K (2012) Resource-shared custom instruction generation under performance/area constraints. International symposium on system on chip (SoC), pp. 1–6 Wu D, Ahn J, Lee I, Choi K (2012) Resource-shared custom instruction generation under performance/area constraints. International symposium on system on chip (SoC), pp. 1–6
Metadaten
Titel
Customized pipeline and instruction set architecture for embedded processing engines
verfasst von
Amir Yazdanbakhsh
Mostafa E. Salehi
Sied Mehdi Fakhraie
Publikationsdatum
01.05.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-1075-8

Weitere Artikel der Ausgabe 2/2014

The Journal of Supercomputing 2/2014 Zur Ausgabe