Skip to main content
Erschienen in: Design Automation for Embedded Systems 1/2022

12.01.2022

New paradigm of FPGA-based computational intelligence from surveying the implementation of DNN accelerators

verfasst von: Yang You, Yinghui Chang, Weikang Wu, Bingrui Guo, Hongyin Luo, Xiaojie Liu, Bijing Liu, Kairong Zhao, Shan He, Lin Li, Donghui Guo

Erschienen in: Design Automation for Embedded Systems | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapid development of Artificial Intelligence, Internet of Things, 5G, and other technologies, a number of emerging intelligent applications represented by image recognition, voice recognition, autonomous driving, and intelligent manufacturing have appeared. These applications require efficient and intelligent processing systems for massive data calculations, so it is urgent to apply better DNN in a faster way. Although, compared with GPU, FPGA has a higher energy efficiency ratio, and shorter development cycle and better flexibility than ASIC. However, FPGA is not a perfect hardware platform either for computational intelligence. This paper provides a survey of the latest acceleration work related to the familiar DNNs and proposes three new directions to break the bottleneck of the DNN implementation. So as to improve calculating speed and energy efficiency of edge devices, intelligent embedded approaches including model compression and optimized data movement of the entire system are most commonly used. With the gradual slowdown of Moore’s Law, the traditional Von Neumann Architecture generates a “Memory Wall” problem, resulting in more power-consuming. In-memory computation will be the right medicine in the post-Moore law era. More complete software/hardware co-design environment will direct researchers’ attention to explore deep learning algorithms and run the algorithm on the hardware level in a faster way. These new directions start a relatively new paradigm in computational intelligence, which have attracted substantial attention from the research community and demonstrated greater potential over traditional techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:​1603.​04467
2.
Zurück zum Zitat Akin B, Franchetti F, Hoe JC (2015) Data reorganization in memory using 3D-stacked dram. ACM SIGARCH Comput Archit News 43(3S):131–143CrossRef Akin B, Franchetti F, Hoe JC (2015) Data reorganization in memory using 3D-stacked dram. ACM SIGARCH Comput Archit News 43(3S):131–143CrossRef
5.
Zurück zum Zitat Boo Y, Sung W (2017) Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. In: 2017 IEEE international workshop on signal processing systems (SIPS) Boo Y, Sung W (2017) Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. In: 2017 IEEE international workshop on signal processing systems (SIPS)
6.
Zurück zum Zitat Cadambi S, Durdanovic I, Jakkula V, Sankaradass M, Cosatto E, Chakradhar S, Graf HP (2009) A massively parallel FPGA-based coprocessor for support vector machines. In: Proceedings of the 2009 17th IEEE symposium on field programmable custom computing machines, pp 115–122. https://doi.org/10.1109/Fccm.2009.34 Cadambi S, Durdanovic I, Jakkula V, Sankaradass M, Cosatto E, Chakradhar S, Graf HP (2009) A massively parallel FPGA-based coprocessor for support vector machines. In: Proceedings of the 2009 17th IEEE symposium on field programmable custom computing machines, pp 115–122. https://​doi.​org/​10.​1109/​Fccm.​2009.​34
9.
Zurück zum Zitat Chen G, Meng H, Liang Y, Huang K (2020) GPU-accelerated real-time stereo estimation with binary neural network. IEEE Trans Parallel Distrib Syst 31(12):2896–2907CrossRef Chen G, Meng H, Liang Y, Huang K (2020) GPU-accelerated real-time stereo estimation with binary neural network. IEEE Trans Parallel Distrib Syst 31(12):2896–2907CrossRef
10.
Zurück zum Zitat Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not 49:269–284CrossRef Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not 49:269–284CrossRef
12.
Zurück zum Zitat Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N et al (2014) Dadiannao: a machine-learning supercomputer. In: Proceedings of the 47th annual IEEE/ACM international symposium on microarchitecture, pp 609–622. IEEE Computer Society Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N et al (2014) Dadiannao: a machine-learning supercomputer. In: Proceedings of the 47th annual IEEE/ACM international symposium on microarchitecture, pp 609–622. IEEE Computer Society
15.
Zurück zum Zitat Chi P, Li S, Xu C, Zhang T, Zhao J, Liu Y, Wang Y, Xie Y (2016) Prime: a novel processing-in-memory architecture for neural network computation in reram-based main memory. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA), vol 3, pp 27–39. IEEE Press Chi P, Li S, Xu C, Zhang T, Zhao J, Liu Y, Wang Y, Xie Y (2016) Prime: a novel processing-in-memory architecture for neural network computation in reram-based main memory. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA), vol 3, pp 27–39. IEEE Press
16.
Zurück zum Zitat Cloutier J, Cosatto E, Pigeon S, Boyer FR, Simard PY (1996) VIP: an FPGA-based processor for image processing and neural networks. In: Proceedings of fifth international conference on microelectronics for neural networks, pp 330–336. IEEE Cloutier J, Cosatto E, Pigeon S, Boyer FR, Simard PY (1996) VIP: an FPGA-based processor for image processing and neural networks. In: Proceedings of fifth international conference on microelectronics for neural networks, pp 330–336. IEEE
17.
Zurück zum Zitat Deng L, Li J, Huang JT, Yao K, Yu D, Seide F, Seltzer M, Zweig G, He X, Williams J (2013) Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 8604–8608. IEEE Deng L, Li J, Huang JT, Yao K, Yu D, Seide F, Seltzer M, Zweig G, He X, Williams J (2013) Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 8604–8608. IEEE
18.
Zurück zum Zitat Dong H, Jiang L, Li TJ, Liang XY (2018) A systematic FPGA acceleration design for applications based on convolutional neural networks. In: Advances in materials, machinery, electronics II, vol 1955 Dong H, Jiang L, Li TJ, Liang XY (2018) A systematic FPGA acceleration design for applications based on convolutional neural networks. In: Advances in materials, machinery, electronics II, vol 1955
19.
Zurück zum Zitat Du ZD, Fasthuber R, Chen TS, Ienne P, Li L, Luo T, Feng, XB, Chen YJ, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), pp 92–104. https://doi.org/10.1145/2749469.2750389 Du ZD, Fasthuber R, Chen TS, Ienne P, Li L, Luo T, Feng, XB, Chen YJ, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), pp 92–104. https://​doi.​org/​10.​1145/​2749469.​2750389
22.
Zurück zum Zitat Farmahini-Farahani A, Ahn JH, Morrow K, Kim NS (2014) Drama: an architecture for accelerated processing near memory. IEEE Comput Archit Lett 14(1):26–29CrossRef Farmahini-Farahani A, Ahn JH, Morrow K, Kim NS (2014) Drama: an architecture for accelerated processing near memory. IEEE Comput Archit Lett 14(1):26–29CrossRef
23.
Zurück zum Zitat Finker R, del Campo I, Echanobe J, Doctor F (2013) Multilevel adaptive neural network architecture for implementing single-chip intelligent agents on FPGAs. In: 2013 international joint conference on neural networks (IJCNN) Finker R, del Campo I, Echanobe J, Doctor F (2013) Multilevel adaptive neural network architecture for implementing single-chip intelligent agents on FPGAs. In: 2013 international joint conference on neural networks (IJCNN)
24.
Zurück zum Zitat Foucher C, Muller F, Giulieri A (2012) Fast integration of hardware accelerators for dynamically reconfigurable architecture. In: 2012 7th international workshop on reconfigurable and communication-centric systems-on-chip (RECOSOC) Foucher C, Muller F, Giulieri A (2012) Fast integration of hardware accelerators for dynamically reconfigurable architecture. In: 2012 7th international workshop on reconfigurable and communication-centric systems-on-chip (RECOSOC)
26.
Zurück zum Zitat Geng T, Wang T, Sanaullah A, Yang C, Patel R, Herbordt M (2018) A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing. In: 2018 28th international conference on field programmable logic and applications (FPL), pp 394–3944 Geng T, Wang T, Sanaullah A, Yang C, Patel R, Herbordt M (2018) A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing. In: 2018 28th international conference on field programmable logic and applications (FPL), pp 394–3944
27.
Zurück zum Zitat Gokhale V, Jin J, Dundar A, Martini B, Culurciello E (2014) A 240 g-ops/s mobile coprocessor for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 682–687 Gokhale V, Jin J, Dundar A, Martini B, Culurciello E (2014) A 240 g-ops/s mobile coprocessor for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 682–687
28.
Zurück zum Zitat Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6645–6649 Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6645–6649
33.
Zurück zum Zitat Han S, Kang JL, Mao HZ, Hu YM, Li X, Li YB, Xie DL, Luo H, Yao S, Wang Y, Yang HZ, Dally WJ (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA’17: proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 75–84. https://doi.org/10.1145/3020078.3021745 Han S, Kang JL, Mao HZ, Hu YM, Li X, Li YB, Xie DL, Luo H, Yao S, Wang Y, Yang HZ, Dally WJ (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA’17: proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 75–84. https://​doi.​org/​10.​1145/​3020078.​3021745
34.
Zurück zum Zitat Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz M, Dally W (2016) EIE: efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). pp 243–254. https://doi.org/10.1109/ISCA.2016.30 Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz M, Dally W (2016) EIE: efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). pp 243–254. https://​doi.​org/​10.​1109/​ISCA.​2016.​30
35.
Zurück zum Zitat Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Advances in neural information processing systems 28 (NIPS 2015), vol 28 Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Advances in neural information processing systems 28 (NIPS 2015), vol 28
36.
Zurück zum Zitat Hennessy JL, Patterson DA (2018) A new golden age for computer architecture: domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. Turing lecture at international symposium on computer architecture (ISCA’18), Los Angles, USA Hennessy JL, Patterson DA (2018) A new golden age for computer architecture: domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. Turing lecture at international symposium on computer architecture (ISCA’18), Los Angles, USA
37.
Zurück zum Zitat Horowitz M (2014) Computing’s energy problem (and what we can do about it). In: 2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC), vol 57, pp 10–14 Horowitz M (2014) Computing’s energy problem (and what we can do about it). In: 2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC), vol 57, pp 10–14
38.
Zurück zum Zitat Hsien-De Huang T, Yu CM, Kao HY (2017) Data-driven and deep learning methodology for deceptive advertising and phone scams detection. In: 2017 conference on technologies and applications of artificial intelligence (TAAI), pp 166–171 Hsien-De Huang T, Yu CM, Kao HY (2017) Data-driven and deep learning methodology for deceptive advertising and phone scams detection. In: 2017 conference on technologies and applications of artificial intelligence (TAAI), pp 166–171
39.
Zurück zum Zitat Irfan M, Ullah Z, Cheung RCC (2019) D-TCAM: a high-performance distributed RAM based TCAM architecture on FPGAs. IEEE Access 7:96060–96069CrossRef Irfan M, Ullah Z, Cheung RCC (2019) D-TCAM: a high-performance distributed RAM based TCAM architecture on FPGAs. IEEE Access 7:96060–96069CrossRef
42.
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: MM 2014—proceedings of the 2014 ACM conference on multimedia Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: MM 2014—proceedings of the 2014 ACM conference on multimedia
43.
Zurück zum Zitat Jiang W, Song Z, Zhan J, He Z, Jiang K (2020) Optimized co-scheduling of mixed-precision neural network accelerator for real-time multitasking applications. J Syst Archit 110:101775CrossRef Jiang W, Song Z, Zhan J, He Z, Jiang K (2020) Optimized co-scheduling of mixed-precision neural network accelerator for real-time multitasking applications. J Syst Archit 110:101775CrossRef
45.
Zurück zum Zitat Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded FPGA. In: Santambrogio M, Gohringer D, Stroobandt D, Mentens N, Nurmi J (eds) 2017 27th international conference on field programmable logic and applications (FPL), pp 1–4 Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded FPGA. In: Santambrogio M, Gohringer D, Stroobandt D, Mentens N, Nurmi J (eds) 2017 27th international conference on field programmable logic and applications (FPL), pp 1–4
46.
Zurück zum Zitat Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A et al (2017) In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 1–12. IEEE Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A et al (2017) In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 1–12. IEEE
48.
Zurück zum Zitat Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over DNN accelerators via programmable interconnects. In: Proceedings of the 23rd international conference on architectural support for programming languages and operating systems, pp 461–475 Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over DNN accelerators via programmable interconnects. In: Proceedings of the 23rd international conference on architectural support for programming languages and operating systems, pp 461–475
51.
Zurück zum Zitat LeCun Y, Denker JS, Solla SA (2000) Optimal brain damage. In: Advances in neural information processing systems, vol 2, pp 598–605 LeCun Y, Denker JS, Solla SA (2000) Optimal brain damage. In: Advances in neural information processing systems, vol 2, pp 598–605
52.
54.
Zurück zum Zitat Li X, Cai Y, Han J, Zeng X (2017) A high utilization FPGA-based accelerator for variable-scale convolutional neural network. In: 2017 IEEE 12th international conference on ASIC (ASICON), pp 944–947. IEEE Li X, Cai Y, Han J, Zeng X (2017) A high utilization FPGA-based accelerator for variable-scale convolutional neural network. In: 2017 IEEE 12th international conference on ASIC (ASICON), pp 944–947. IEEE
56.
Zurück zum Zitat Liu W, Lin J, Wang Z (2020) A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Trans Circuits Syst I Regul Pap PP(99):1–14 Liu W, Lin J, Wang Z (2020) A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Trans Circuits Syst I Regul Pap PP(99):1–14
57.
Zurück zum Zitat Lu HY, Wang M, Foroosh H, Tappen M, Penksy M (2015) Sparse convolutional neural networks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 806–814 Lu HY, Wang M, Foroosh H, Tappen M, Penksy M (2015) Sparse convolutional neural networks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 806–814
62.
Zurück zum Zitat Ma YF, Cao Y, Vrudhula S, Seo JS (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: FPGA’17: proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 45–54. https://doi.org/10.1145/3020078.3021736 Ma YF, Cao Y, Vrudhula S, Seo JS (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: FPGA’17: proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 45–54. https://​doi.​org/​10.​1145/​3020078.​3021736
65.
Zurück zum Zitat Meiners CR, Liu AX, Torng E (2007) TCAM razor: a systematic approach towards minimizing packet classifiers in TCAMs. In: 2007 IEEE international conference on network protocols, pp 266–275 Meiners CR, Liu AX, Torng E (2007) TCAM razor: a systematic approach towards minimizing packet classifiers in TCAMs. In: 2007 IEEE international conference on network protocols, pp 266–275
66.
Zurück zum Zitat Meloni P, Capotondi A, Deriu G, Brian M, Conti F, Rossi D, Raffo L, Benini L (2018) Neuraghe:exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on ZYNQ SOCS.ACM Trans Reconfig Technol Syst 11(3). https://doi.org/10.1145/3284357 Meloni P, Capotondi A, Deriu G, Brian M, Conti F, Rossi D, Raffo L, Benini L (2018) Neuraghe:exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on ZYNQ SOCS.ACM Trans Reconfig Technol Syst 11(3). https://​doi.​org/​10.​1145/​3284357
68.
Zurück zum Zitat Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: 2016 21st Asia and South Pacific design automation conference (ASP-DAC), pp 575–580 Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: 2016 21st Asia and South Pacific design automation conference (ASP-DAC), pp 575–580
70.
Zurück zum Zitat Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarizec convolutional neural network on an fpga. In: 2017 27th international conference on field programmable logic and applications (FPL), pp 1–4. IEEE Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarizec convolutional neural network on an fpga. In: 2017 27th international conference on field programmable logic and applications (FPL), pp 1–4. IEEE
71.
Zurück zum Zitat Norige E, Liu AX, Torng E (2018) A ternary unification framework for optimizing TCAM-based packet classification systems. IEEE/ACM Trans Netw 26(2):657–670CrossRef Norige E, Liu AX, Torng E (2018) A ternary unification framework for optimizing TCAM-based packet classification systems. IEEE/ACM Trans Netw 26(2):657–670CrossRef
72.
Zurück zum Zitat Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: 2016 international conference on field-programmable technology (FPT), pp 77–84. IEEE Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: 2016 international conference on field-programmable technology (FPT), pp 77–84. IEEE
73.
Zurück zum Zitat Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st international conference on computer design (ICCD), pp 13–19. IEEE Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st international conference on computer design (ICCD), pp 13–19. IEEE
74.
Zurück zum Zitat Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on FPGA. In: 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP), pp 11–18 Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on FPGA. In: 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP), pp 11–18
76.
Zurück zum Zitat Qiu JT, Wang J, Yao S, Guo KY, Li BX, Zhou EJ, Yu JC, Tang TQ, Xu NY, Song S, Wang Y, Yang HZ (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays (FPGA’16), pp 26–35. https://doi.org/10.1145/2847263.2847265 Qiu JT, Wang J, Yao S, Guo KY, Li BX, Zhou EJ, Yu JC, Tang TQ, Xu NY, Song S, Wang Y, Yang HZ (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays (FPGA’16), pp 26–35. https://​doi.​org/​10.​1145/​2847263.​2847265
77.
Zurück zum Zitat Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In: Proceedings of the 2016 design, automation & test in Europe conference & exhibition (date), pp 1393–1398 Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In: Proceedings of the 2016 design, automation & test in Europe conference & exhibition (date), pp 1393–1398
79.
Zurück zum Zitat Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V (2016) Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News 44(3):14–26CrossRef Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V (2016) Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News 44(3):14–26CrossRef
80.
Zurück zum Zitat Shin D, Lee J, Lee J, Lee J, Yoo HJ (2018) Dnpu: an energy-efficient deep-learning processor with heterogeneous multi-core architecture. IEEE Micro 38(5):85–93CrossRef Shin D, Lee J, Lee J, Lee J, Yoo HJ (2018) Dnpu: an energy-efficient deep-learning processor with heterogeneous multi-core architecture. IEEE Micro 38(5):85–93CrossRef
81.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
82.
Zurück zum Zitat Song L, Qian X, Li H, Chen Y (2017) Pipelayer: a pipelined reram-based accelerator for deep learning. In: 2017 IEEE international symposium on high performance computer architecture (HPCA), pp 541–552. IEEE Song L, Qian X, Li H, Chen Y (2017) Pipelayer: a pipelined reram-based accelerator for deep learning. In: 2017 IEEE international symposium on high performance computer architecture (HPCA), pp 541–552. IEEE
85.
Zurück zum Zitat Waldrop MM (2016) The chips are down for Moore’s law. Nat News 530(7589):144 Waldrop MM (2016) The chips are down for Moore’s law. Nat News 530(7589):144
86.
Zurück zum Zitat Wang JS, Lou QW, Zhang XF, Zhu C, Lin YH, Chen DM (2018) Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th international conference on field programmable logic and applications (FPL), pp 163–169. https://doi.org/10.1109/Fpl.2018.00035 Wang JS, Lou QW, Zhang XF, Zhu C, Lin YH, Chen DM (2018) Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th international conference on field programmable logic and applications (FPL), pp 163–169. https://​doi.​org/​10.​1109/​Fpl.​2018.​00035
89.
91.
Zurück zum Zitat Yu NG, Qiu S, Hu XL, Li JM (2017) Accelerating convolutional neural networks by group-wise 2D-filter pruning. In: 2017 international joint conference on neural networks (IJCNN), pp 2502–2509 Yu NG, Qiu S, Hu XL, Li JM (2017) Accelerating convolutional neural networks by group-wise 2D-filter pruning. In: 2017 international joint conference on neural networks (IJCNN), pp 2502–2509
93.
94.
Zurück zum Zitat Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170. ACM Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170. ACM
95.
Zurück zum Zitat Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: FPGA’17: proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 35–44. https://doi.org/10.1145/3020078.3021727 Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: FPGA’17: proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 35–44. https://​doi.​org/​10.​1145/​3020078.​3021727
97.
Zurück zum Zitat Zhang SJ, Du ZD, Zhang L, Lan HY, Liu SL, Li L, Guo Q, Chen TS, Chen YJ (2016) Cambricon-x: an accelerator for sparse neural networks. In: 2016 49th annual IEEE/ACM international symposium on microarchitecture (Micro) Zhang SJ, Du ZD, Zhang L, Lan HY, Liu SL, Li L, Guo Q, Chen TS, Chen YJ (2016) Cambricon-x: an accelerator for sparse neural networks. In: 2016 49th annual IEEE/ACM international symposium on microarchitecture (Micro)
99.
Zurück zum Zitat Zhou X, Zhang J, Wan J, Zhou L, Wei Z, Zhang J (2019) Scheduling-efficient framework for neural network on heterogeneous distributed systems and mobile edge computing systems. IEEE Access 7:171853–171863CrossRef Zhou X, Zhang J, Wan J, Zhou L, Wei Z, Zhang J (2019) Scheduling-efficient framework for neural network on heterogeneous distributed systems and mobile edge computing systems. IEEE Access 7:171853–171863CrossRef
101.
Zurück zum Zitat Zuo W, Liang Y, Li P, Rupnow K, Chen D, Cong J (2013) Improving high level synthesis optimization opportunity through polyhedral transformations. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays, pp 9–18. ACM Zuo W, Liang Y, Li P, Rupnow K, Chen D, Cong J (2013) Improving high level synthesis optimization opportunity through polyhedral transformations. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays, pp 9–18. ACM
Metadaten
Titel
New paradigm of FPGA-based computational intelligence from surveying the implementation of DNN accelerators
verfasst von
Yang You
Yinghui Chang
Weikang Wu
Bingrui Guo
Hongyin Luo
Xiaojie Liu
Bijing Liu
Kairong Zhao
Shan He
Lin Li
Donghui Guo
Publikationsdatum
12.01.2022
Verlag
Springer US
Erschienen in
Design Automation for Embedded Systems / Ausgabe 1/2022
Print ISSN: 0929-5585
Elektronische ISSN: 1572-8080
DOI
https://doi.org/10.1007/s10617-021-09256-8

Premium Partner