Skip to main content
Erschienen in: Neural Computing and Applications 4/2020

06.10.2018 | Review

A survey of FPGA-based accelerators for convolutional neural networks

verfasst von: Sparsh Mittal

Erschienen in: Neural Computing and Applications | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, custom hardware accelerators are vital for boosting their performance. The high energy efficiency, computing capabilities and reconfigurability of FPGA make it a promising platform for hardware acceleration of CNNs. In this paper, we present a survey of techniques for implementing and optimizing CNN algorithms on FPGA. We organize the works in several categories to bring out their similarities and differences. This paper is expected to be useful for researchers in the area of artificial intelligence, hardware architecture and system design.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Following acronyms are used frequently in this paper: bandwidth (BW), batch normalization (B-NORM), binarized CNN (BNN), block RAM (BRAM), convolution (CONV), digital signal processing units (DSPs), directed acyclic graph (DAG), design space exploration (DSE), fast Fourier transform (FFT), feature map (fmap), fixed point (FxP), floating point (FP), frequency-domain CONV (FDC), fully connected (FC), hardware (HW), high-level synthesis (HLS), inverse FFT (IFFT), local response normalization (LRN), lookup tables (LUTs), matrix multiplication (MM), matrix–vector multiplication (MVM), multiply–add–accumulate (MAC), processing engine/unit (PE/PU), register transfer level (RTL), single instruction multiple data (SIMD).
 
Literatur
1.
Zurück zum Zitat Ovtcharov K, Ruwase O, Kim J-Y, Fowers J, Strauss K, Chung ES (2015) Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper vol 2, no 11 Ovtcharov K, Ruwase O, Kim J-Y, Fowers J, Strauss K, Chung ES (2015) Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper vol 2, no 11
2.
Zurück zum Zitat Mittal S, Vetter J (2015) A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput Surv 47:19CrossRef Mittal S, Vetter J (2015) A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput Surv 47:19CrossRef
3.
Zurück zum Zitat Zhao R, Song W, Zhang W, Xing T, Lin J-H, Srivastava M B, Gupta R, Zhang Z (2017) Accelerating binarized convolutional neural networks with software-programmable FPGAs. In: FPGA, pp 15–24 Zhao R, Song W, Zhang W, Xing T, Lin J-H, Srivastava M B, Gupta R, Zhang Z (2017) Accelerating binarized convolutional neural networks with software-programmable FPGAs. In: FPGA, pp 15–24
4.
Zurück zum Zitat Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo J-s, Cao Y (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In International symposium on field-programmable gate arrays, pp 16–25 Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo J-s, Cao Y (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In International symposium on field-programmable gate arrays, pp 16–25
5.
Zurück zum Zitat Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: International symposium on field-programmable gate arrays, pp 35–44 Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: International symposium on field-programmable gate arrays, pp 35–44
6.
Zurück zum Zitat Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or − 1. arXiv preprint arXiv:1602.02830 Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or − 1. arXiv preprint arXiv:​1602.​02830
7.
Zurück zum Zitat Zhang C, Fang Z, Zhou P, Pan P, Cong J (2016) Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In: International conference on computer-aided design (ICCAD), pp 1–8 Zhang C, Fang Z, Zhou P, Pan P, Cong J (2016) Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In: International conference on computer-aided design (ICCAD), pp 1–8
8.
Zurück zum Zitat Motamedi M, Gysel P, Ghiasi S (2017) PLACID: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans Multimed Comput Commun Appl (TOMM) 13(4):62 Motamedi M, Gysel P, Ghiasi S (2017) PLACID: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans Multimed Comput Commun Appl (TOMM) 13(4):62
9.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9
10.
Zurück zum Zitat Moini S, Alizadeh B, Emad M, Ebrahimpour R (2017) A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans Circuits Syst II Express Briefs 64:1217–1221CrossRef Moini S, Alizadeh B, Emad M, Ebrahimpour R (2017) A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans Circuits Syst II Express Briefs 64:1217–1221CrossRef
11.
Zurück zum Zitat Abdelouahab K, Pelcat M, Sérot J, Bourrasset C, Berry F (2017) Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed Syst Lett 9:113–116CrossRef Abdelouahab K, Pelcat M, Sérot J, Bourrasset C, Berry F (2017) Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed Syst Lett 9:113–116CrossRef
12.
Zurück zum Zitat Xilinx (2015) Ultrascale architecture FPGAs memory interface solutions v7.0. Technical Report Xilinx (2015) Ultrascale architecture FPGAs memory interface solutions v7.0. Technical Report
13.
Zurück zum Zitat Mittal S (2014) A survey of techniques for managing and leveraging caches in GPUs. J Circuits Syst Comput (JCSC) 23(8):1430002CrossRef Mittal S (2014) A survey of techniques for managing and leveraging caches in GPUs. J Circuits Syst Comput (JCSC) 23(8):1430002CrossRef
14.
Zurück zum Zitat Chang AXM, Zaidy A, Gokhale V, Culurciello E (2017) Compiling deep learning models for custom hardware accelerators. arXiv preprint arXiv:1708.00117 Chang AXM, Zaidy A, Gokhale V, Culurciello E (2017) Compiling deep learning models for custom hardware accelerators. arXiv preprint arXiv:​1708.​00117
15.
Zurück zum Zitat Mittal S, Vetter J (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):69:1–69:35CrossRef Mittal S, Vetter J (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):69:1–69:35CrossRef
16.
Zurück zum Zitat Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J Emerg Technol Comput (JETC) 14:18 Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J Emerg Technol Comput (JETC) 14:18
17.
Zurück zum Zitat Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable binarized neural network inference. In: International symposium on field-programmable gate arrays, pp 65–74 Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable binarized neural network inference. In: International symposium on field-programmable gate arrays, pp 65–74
18.
Zurück zum Zitat Park J, Sung W (2016) FPGA based implementation of deep neural networks using on-chip memory only. In: International conference on acoustics, speech and signal processing (ICASSP), pp 1011–1015 Park J, Sung W (2016) FPGA based implementation of deep neural networks using on-chip memory only. In: International conference on acoustics, speech and signal processing (ICASSP), pp 1011–1015
19.
Zurück zum Zitat Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: International conference on computer design (ICCD), pp 13–19 Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: International conference on computer design (ICCD), pp 13–19
20.
Zurück zum Zitat Rahman A, Oh S, Lee J, Choi K (2017) Design space exploration of FPGA accelerators for convolutional neural networks. In: 2017 Design, automation & test in Europe conference & exhibition (DATE). IEEE, pp 1147–1152 Rahman A, Oh S, Lee J, Choi K (2017) Design space exploration of FPGA accelerators for convolutional neural networks. In: 2017 Design, automation & test in Europe conference & exhibition (DATE). IEEE, pp 1147–1152
21.
Zurück zum Zitat Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays, pp 161–170 Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays, pp 161–170
22.
Zurück zum Zitat Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: International symposium on field-programmable custom computing machines (FCCM), pp 152–159 Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: International symposium on field-programmable custom computing machines (FCCM), pp 152–159
23.
Zurück zum Zitat Moss DJ, Nurvitadhi E, Sim J, Mishra A, Marr D, Subhaschandra S, Leong PH (2017) High performance binary neural networks on the Xeon + FPGA platform. In: International conference on field programmable logic and applications (FPL), pp 1–4 Moss DJ, Nurvitadhi E, Sim J, Mishra A, Marr D, Subhaschandra S, Leong PH (2017) High performance binary neural networks on the Xeon + FPGA platform. In: International conference on field programmable logic and applications (FPL), pp 1–4
24.
Zurück zum Zitat Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: International symposium on field-programmable gate arrays, pp 45–54 Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: International symposium on field-programmable gate arrays, pp 45–54
25.
Zurück zum Zitat Yonekawa H, Nakahara H (2017) On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 98–105 Yonekawa H, Nakahara H (2017) On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 98–105
26.
Zurück zum Zitat Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International conference on field-programmable technology (FPT), pp 77–84 Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International conference on field-programmable technology (FPT), pp 77–84
27.
Zurück zum Zitat Zhang Y, Wang C, Gong L, Lu Y, Sun F, Xu C, Li X, Zhou X (2017) A power-efficient accelerator based on FPGAs for LSTM network. In: International conference on cluster computing (CLUSTER), pp 629–630 Zhang Y, Wang C, Gong L, Lu Y, Sun F, Xu C, Li X, Zhou X (2017) A power-efficient accelerator based on FPGAs for LSTM network. In: International conference on cluster computing (CLUSTER), pp 629–630
28.
Zurück zum Zitat Feng G, Hu Z, Chen S, Wu F (2016) Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks. In: International conference on solid-state and integrated circuit technology (ICSICT), pp 624–626 Feng G, Hu Z, Chen S, Wu F (2016) Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks. In: International conference on solid-state and integrated circuit technology (ICSICT), pp 624–626
29.
Zurück zum Zitat Wang D, An J, Xu K (2016) PipeCNN: an OpenCL-based FPGA accelerator for large-scale convolution neuron networks. arXiv preprint arXiv:1611.02450 Wang D, An J, Xu K (2016) PipeCNN: an OpenCL-based FPGA accelerator for large-scale convolution neuron networks. arXiv preprint arXiv:​1611.​02450
30.
Zurück zum Zitat Liu Z, Dou Y, Jiang J, Xu J (2016) Automatic code generation of convolutional neural networks in FPGA implementation. In: International conference on field-programmable technology (FPT). IEEE, pp 61–68 Liu Z, Dou Y, Jiang J, Xu J (2016) Automatic code generation of convolutional neural networks in FPGA implementation. In: International conference on field-programmable technology (FPT). IEEE, pp 61–68
31.
Zurück zum Zitat Samragh M, Ghasemzadeh M, Koushanfar F (2017) Customizing neural networks for efficient FPGA implementation. In: International symposium on field-programmable custom computing machines (FCCM), pp 85–92 Samragh M, Ghasemzadeh M, Koushanfar F (2017) Customizing neural networks for efficient FPGA implementation. In: International symposium on field-programmable custom computing machines (FCCM), pp 85–92
32.
Zurück zum Zitat Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on FPGA. In: International conference on application-specific systems, architectures and processors (ASAP), pp 11–18 Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on FPGA. In: International conference on application-specific systems, architectures and processors (ASAP), pp 11–18
33.
Zurück zum Zitat Fraser NJ, Umuroglu Y, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Scaling binarized neural networks on reconfigurable logic. In: Workshop on parallel programming and run-time management techniques for many-core architectures and design tools and architectures for multicore embedded computing platforms (PARMA-DITAM), pp 25–30 Fraser NJ, Umuroglu Y, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Scaling binarized neural networks on reconfigurable logic. In: Workshop on parallel programming and run-time management techniques for many-core architectures and design tools and architectures for multicore embedded computing platforms (PARMA-DITAM), pp 25–30
34.
Zurück zum Zitat Xiao Q, Liang Y, Lu L, Yan S, Tai Y-W (2017) Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In: Design automation conference, p 62 Xiao Q, Liang Y, Lu L, Yan S, Tai Y-W (2017) Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In: Design automation conference, p 62
35.
Zurück zum Zitat Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: International conference on field programmable logic and applications (FPL), pp 1–8 Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: International conference on field programmable logic and applications (FPL), pp 1–8
36.
Zurück zum Zitat Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In: Design, automation & test in Europe(DATE), pp 1393–1398 Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In: Design, automation & test in Europe(DATE), pp 1393–1398
37.
Zurück zum Zitat Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans Reconfig Technol Syst (TRETS) 10(3):17 Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans Reconfig Technol Syst (TRETS) 10(3):17
38.
Zurück zum Zitat Zhang X, Liu X, Ramachandran A, Zhuge C, Tang S, Ouyang P, Cheng Z, Rupnow K, Chen D (2017) High-performance video content recognition with long-term recurrent convolutional network for FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4 Zhang X, Liu X, Ramachandran A, Zhuge C, Tang S, Ouyang P, Cheng Z, Rupnow K, Chen D (2017) High-performance video content recognition with long-term recurrent convolutional network for FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4
39.
Zurück zum Zitat Ma Y, Suda N, Cao Y, Seo J-s, Vrudhula S (2016) Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–8 Ma Y, Suda N, Cao Y, Seo J-s, Vrudhula S (2016) Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–8
40.
Zurück zum Zitat Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: International symposium on computer architecture, ser. ISCA ’17, pp 535–547CrossRef Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: International symposium on computer architecture, ser. ISCA ’17, pp 535–547CrossRef
41.
Zurück zum Zitat Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: FPGA Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: FPGA
42.
Zurück zum Zitat Kim JH, Grady B, Lian R, Brothers J, Anderson JH (2017) FPGA-based CNN inference accelerator synthesized from multi-threaded C software. In: IEEE SOCC Kim JH, Grady B, Lian R, Brothers J, Anderson JH (2017) FPGA-based CNN inference accelerator synthesized from multi-threaded C software. In: IEEE SOCC
43.
Zurück zum Zitat Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In: Design automation conference (DAC), pp 1–6 Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In: Design automation conference (DAC), pp 1–6
44.
Zurück zum Zitat Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: International symposium on field-programmable gate arrays, pp 26–35 Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: International symposium on field-programmable gate arrays, pp 26–35
45.
Zurück zum Zitat Qiao Y, Shen J, Xiao T, Yang Q, Wen M, Zhang C (2017) FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr Comput Pract Exp 29(20):e3850CrossRef Qiao Y, Shen J, Xiao T, Yang Q, Wen M, Zhang C (2017) FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr Comput Pract Exp 29(20):e3850CrossRef
46.
Zurück zum Zitat Page A, Jafari A, Shea C, Mohsenin T (2017) SPARCNet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):31 Page A, Jafari A, Shea C, Mohsenin T (2017) SPARCNet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):31
47.
Zurück zum Zitat Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: International conference on application-specific systems, architectures and processors (ASAP), pp 107–114 Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: International conference on application-specific systems, architectures and processors (ASAP), pp 107–114
48.
Zurück zum Zitat Liang S, Yin S, Liu L, Luk W, Wei S (2018) FP-BNN: binarized neural network on FPGA. Neurocomputing 275:1072–1086CrossRef Liang S, Yin S, Liu L, Luk W, Wei S (2018) FP-BNN: binarized neural network on FPGA. Neurocomputing 275:1072–1086CrossRef
49.
Zurück zum Zitat Natale G, Bacis M, Santambrogio MD (2017) On how to design dataflow FPGA-based accelerators for convolutional neural networks. In: IEEE computer society annual symposium on VLSI (ISVLSI), pp 639–644 Natale G, Bacis M, Santambrogio MD (2017) On how to design dataflow FPGA-based accelerators for convolutional neural networks. In: IEEE computer society annual symposium on VLSI (ISVLSI), pp 639–644
50.
Zurück zum Zitat Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on FPGAs. In: International symposium on field-programmable custom computing machines (FCCM), pp 101–108 Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on FPGAs. In: International symposium on field-programmable custom computing machines (FCCM), pp 101–108
51.
Zurück zum Zitat Zhang C, Wu D, Sun J, Sun G, Luo G, Cong J (2016) Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In: International symposium on low power electronics and design, pp 326–331 Zhang C, Wu D, Sun J, Sun G, Luo G, Cong J (2016) Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In: International symposium on low power electronics and design, pp 326–331
52.
Zurück zum Zitat DiCecco R, Lacey G, Vasiljevic J, Chow P, Taylor G, Areibi S (2016) Caffeinated FPGAs: FPGA framework for convolutional neural networks. In: International conference on field-programmable technology (FPT), pp 265–268 DiCecco R, Lacey G, Vasiljevic J, Chow P, Taylor G, Areibi S (2016) Caffeinated FPGAs: FPGA framework for convolutional neural networks. In: International conference on field-programmable technology (FPT), pp 265–268
53.
Zurück zum Zitat Venieris SI, Bouganis C-S (2016) fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs. In: International symposium on field-programmable custom computing machines (FCCM), pp 40–47 Venieris SI, Bouganis C-S (2016) fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs. In: International symposium on field-programmable custom computing machines (FCCM), pp 40–47
54.
Zurück zum Zitat Zeng H, Chen R, Prasanna VK (2017) Optimizing frequency domain implementation of CNNs on FPGAs. Technical report Zeng H, Chen R, Prasanna VK (2017) Optimizing frequency domain implementation of CNNs on FPGAs. Technical report
55.
Zurück zum Zitat Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International conference on field programmable logic and applications (FPL). IEEE, pp 1–9 Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International conference on field programmable logic and applications (FPL). IEEE, pp 1–9
56.
Zurück zum Zitat Lin J-H, Xing T, Zhao R, Zhang Z, Srivastava M, Tu Z, Gupta RK (2017) Binarized convolutional neural networks with separable filters for efficient hardware acceleration. In: Computer vision and pattern recognition workshop (CVPRW) Lin J-H, Xing T, Zhao R, Zhang Z, Srivastava M, Tu Z, Gupta RK (2017) Binarized convolutional neural networks with separable filters for efficient hardware acceleration. In: Computer vision and pattern recognition workshop (CVPRW)
57.
Zurück zum Zitat Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4 Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4
58.
Zurück zum Zitat Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4 Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4
59.
Zurück zum Zitat Meloni P, Deriu G, Conti F, Loi I, Raffo L, Benini L (2016) Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA. In: ACM international conference on computing frontiers, pp 376–383 Meloni P, Deriu G, Conti F, Loi I, Raffo L, Benini L (2016) Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA. In: ACM international conference on computing frontiers, pp 376–383
60.
Zurück zum Zitat Abdelouahab K, Bourrasset C, Pelcat M, Berry F, Quinton J-C, Serot J (2016) A holistic approach for optimizing DSP block utilization of a CNN implementation on FPGA. In: International conference on distributed smart camera, pp 69–75 Abdelouahab K, Bourrasset C, Pelcat M, Berry F, Quinton J-C, Serot J (2016) A holistic approach for optimizing DSP block utilization of a CNN implementation on FPGA. In: International conference on distributed smart camera, pp 69–75
61.
Zurück zum Zitat Gankidi PR, Thangavelautham J (2017) FPGA architecture for deep learning and its application to planetary robotics. In: IEEE aerospace conference, pp 1–9 Gankidi PR, Thangavelautham J (2017) FPGA architecture for deep learning and its application to planetary robotics. In: IEEE aerospace conference, pp 1–9
62.
Zurück zum Zitat Venieris SI, Bouganis C-S (2017) Latency-driven design for FPGA-based convolutional neural networks. In: International conference on field programmable logic and applications (FPL), pp 1–8 Venieris SI, Bouganis C-S (2017) Latency-driven design for FPGA-based convolutional neural networks. In: International conference on field programmable logic and applications (FPL), pp 1–8
63.
Zurück zum Zitat Shen Y, Ferdman M, Milder P (2017) Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer. In: International symposium on field-programmable custom computing machines (FCCM) Shen Y, Ferdman M, Milder P (2017) Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer. In: International symposium on field-programmable custom computing machines (FCCM)
64.
Zurück zum Zitat Zhang J, Li J (2017) Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In: FPGA, pp 25–34 Zhang J, Li J (2017) Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In: FPGA, pp 25–34
65.
Zurück zum Zitat Guo K, Sui L, Qiu J, Yao S, Han S, Wang Y, Yang H (2016) Angel-eye: a complete design flow for mapping CNN onto customized hardware. In: IEEE computer society annual symposium on VLSI (ISVLSI), pp 24–29 Guo K, Sui L, Qiu J, Yao S, Han S, Wang Y, Yang H (2016) Angel-eye: a complete design flow for mapping CNN onto customized hardware. In: IEEE computer society annual symposium on VLSI (ISVLSI), pp 24–29
66.
Zurück zum Zitat Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA, pp 75–84 Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA, pp 75–84
67.
Zurück zum Zitat Wang Y, Xu J, Han Y, Li H, Li X (2016) DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In: Design automation conference (DAC). IEEE, pp 1–6 Wang Y, Xu J, Han Y, Li H, Li X (2016) DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In: Design automation conference (DAC). IEEE, pp 1–6
68.
Zurück zum Zitat Guan Y, Xu N, Zhang C, Yuan Z, Cong J (2017) Using data compression for optimizing FPGA-based convolutional neural network accelerators. In: International workshop on advanced parallel processing technologies, pp 14–26 Guan Y, Xu N, Zhang C, Yuan Z, Cong J (2017) Using data compression for optimizing FPGA-based convolutional neural network accelerators. In: International workshop on advanced parallel processing technologies, pp 14–26
69.
Zurück zum Zitat Cadambi S, Majumdar A, Becchi M, Chakradhar S, Graf HP (2010) A programmable parallel accelerator for learning and classification. In: International conference on parallel architectures and compilation techniques, pp 273–284 Cadambi S, Majumdar A, Becchi M, Chakradhar S, Graf HP (2010) A programmable parallel accelerator for learning and classification. In: International conference on parallel architectures and compilation techniques, pp 273–284
70.
Zurück zum Zitat Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Asia and South Pacific design automation conference (ASP-DAC), pp 575–580 Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Asia and South Pacific design automation conference (ASP-DAC), pp 575–580
71.
Zurück zum Zitat Han X, Zhou D, Wang S, Kimura S (2016) CNN-MERP: an FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In: International conference on computer design (ICCD), pp 320–327 Han X, Zhou D, Wang S, Kimura S (2016) CNN-MERP: an FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In: International conference on computer design (ICCD), pp 320–327
72.
Zurück zum Zitat Sharma H, Park J, Mahajan D, Amaro E, Kim JK, Shao C, Mishra A, Esmaeilzadeh H (2016) From high-level deep neural models to FPGAs. In: International symposium on microarchitecture (MICRO). IEEE, pp 1–12 Sharma H, Park J, Mahajan D, Amaro E, Kim JK, Shao C, Mishra A, Esmaeilzadeh H (2016) From high-level deep neural models to FPGAs. In: International symposium on microarchitecture (MICRO). IEEE, pp 1–12
73.
Zurück zum Zitat Baskin C, Liss N, Mendelson A, Zheltonozhskii E (2017) Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052 Baskin C, Liss N, Mendelson A, Zheltonozhskii E (2017) Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:​1708.​00052
74.
Zurück zum Zitat Gokhale V, Zaidy A, Chang AXM, Culurciello E (2017) Snowflake: an efficient hardware accelerator for convolutional neural networks. In: IEEE international symposium on circuits and systems (ISCAS), pp 1–4 Gokhale V, Zaidy A, Chang AXM, Culurciello E (2017) Snowflake: an efficient hardware accelerator for convolutional neural networks. In: IEEE international symposium on circuits and systems (ISCAS), pp 1–4
75.
Zurück zum Zitat Lee M, Hwang K, Park J, Choi S, Shin S, Sung W (2016) “FPGA-based low-power speech recognition with recurrent neural networks. In: International workshop on signal processing systems (SiPS), pp 230–235 Lee M, Hwang K, Park J, Choi S, Shin S, Sung W (2016) “FPGA-based low-power speech recognition with recurrent neural networks. In: International workshop on signal processing systems (SiPS), pp 230–235
76.
Zurück zum Zitat Mahajan D, Park J, Amaro E, Sharma H, Yazdanbakhsh A, Kim JK, Esmaeilzadeh H (2016) Tabla: a unified template-based framework for accelerating statistical machine learning. In: International symposium on high performance computer architecture (HPCA). IEEE, pp 14–26 Mahajan D, Park J, Amaro E, Sharma H, Yazdanbakhsh A, Kim JK, Esmaeilzadeh H (2016) Tabla: a unified template-based framework for accelerating statistical machine learning. In: International symposium on high performance computer architecture (HPCA). IEEE, pp 14–26
77.
Zurück zum Zitat Prost-Boucle A, Bourge A, Pétrot F, Alemdar H, Caldwell N, Leroy V (2017) Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–7 Prost-Boucle A, Bourge A, Pétrot F, Alemdar H, Caldwell N, Leroy V (2017) Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–7
78.
Zurück zum Zitat Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer CNN accelerators. In: International symposium on microarchitecture (MICRO), pp 1–12 Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer CNN accelerators. In: International symposium on microarchitecture (MICRO), pp 1–12
79.
Zurück zum Zitat Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):62:1–62:33 Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):62:1–62:33
80.
Zurück zum Zitat Mittal S, Vetter J (2016) A survey of architectural approaches for data compression in cache and main memory systems. IEEE Trans Parallel Distrib Syst (TPDS) 27:1524–1536CrossRef Mittal S, Vetter J (2016) A survey of architectural approaches for data compression in cache and main memory systems. IEEE Trans Parallel Distrib Syst (TPDS) 27:1524–1536CrossRef
81.
Zurück zum Zitat Winograd S (1980) Arithmetic complexity of computations, vol 33. SIAM, PhiladelphiaCrossRef Winograd S (1980) Arithmetic complexity of computations, vol 33. SIAM, PhiladelphiaCrossRef
82.
Zurück zum Zitat Maguire LP, McGinnity TM, Glackin B, Ghani A, Belatreche A, Harkin J (2007) Challenges for large-scale implementations of spiking neural networks on FPGAs. Neurocomputing 71(1):13–29CrossRef Maguire LP, McGinnity TM, Glackin B, Ghani A, Belatreche A, Harkin J (2007) Challenges for large-scale implementations of spiking neural networks on FPGAs. Neurocomputing 71(1):13–29CrossRef
Metadaten
Titel
A survey of FPGA-based accelerators for convolutional neural networks
verfasst von
Sparsh Mittal
Publikationsdatum
06.10.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 4/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3761-1

Weitere Artikel der Ausgabe 4/2020

Neural Computing and Applications 4/2020 Zur Ausgabe