nach oben

Neural Computing and Applications

Erschienen in:

06.10.2018 | Review

A survey of FPGA-based accelerators for convolutional neural networks

verfasst von: Sparsh Mittal

Erschienen in: Neural Computing and Applications | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, custom hardware accelerators are vital for boosting their performance. The high energy efficiency, computing capabilities and reconfigurability of FPGA make it a promising platform for hardware acceleration of CNNs. In this paper, we present a survey of techniques for implementing and optimizing CNN algorithms on FPGA. We organize the works in several categories to bring out their similarities and differences. This paper is expected to be useful for researchers in the area of artificial intelligence, hardware architecture and system design.

Vorheriger Artikel Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration

Nächster Artikel A half-precision compressive sensing framework for end-to-end person re-identification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Following acronyms are used frequently in this paper: bandwidth (BW), batch normalization (B-NORM), binarized CNN (BNN), block RAM (BRAM), convolution (CONV), digital signal processing units (DSPs), directed acyclic graph (DAG), design space exploration (DSE), fast Fourier transform (FFT), feature map (fmap), fixed point (FxP), floating point (FP), frequency-domain CONV (FDC), fully connected (FC), hardware (HW), high-level synthesis (HLS), inverse FFT (IFFT), local response normalization (LRN), lookup tables (LUTs), matrix multiplication (MM), matrix–vector multiplication (MVM), multiply–add–accumulate (MAC), processing engine/unit (PE/PU), register transfer level (RTL), single instruction multiple data (SIMD).

Ovtcharov K, Ruwase O, Kim J-Y, Fowers J, Strauss K, Chung ES (2015) Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper vol 2, no 11

Mittal S, Vetter J (2015) A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput Surv 47:19CrossRef

Zhao R, Song W, Zhang W, Xing T, Lin J-H, Srivastava M B, Gupta R, Zhang Z (2017) Accelerating binarized convolutional neural networks with software-programmable FPGAs. In: FPGA, pp 15–24

Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo J-s, Cao Y (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In International symposium on field-programmable gate arrays, pp 16–25

Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: International symposium on field-programmable gate arrays, pp 35–44

Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or − 1. arXiv preprint arXiv:1602.02830

Zhang C, Fang Z, Zhou P, Pan P, Cong J (2016) Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In: International conference on computer-aided design (ICCAD), pp 1–8

Motamedi M, Gysel P, Ghiasi S (2017) PLACID: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans Multimed Comput Commun Appl (TOMM) 13(4):62

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9

10.

Moini S, Alizadeh B, Emad M, Ebrahimpour R (2017) A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans Circuits Syst II Express Briefs 64:1217–1221CrossRef

11.

Abdelouahab K, Pelcat M, Sérot J, Bourrasset C, Berry F (2017) Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed Syst Lett 9:113–116CrossRef

12.

Xilinx (2015) Ultrascale architecture FPGAs memory interface solutions v7.0. Technical Report

13.

Mittal S (2014) A survey of techniques for managing and leveraging caches in GPUs. J Circuits Syst Comput (JCSC) 23(8):1430002CrossRef

14.

Chang AXM, Zaidy A, Gokhale V, Culurciello E (2017) Compiling deep learning models for custom hardware accelerators. arXiv preprint arXiv:1708.00117

15.

Mittal S, Vetter J (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):69:1–69:35CrossRef

16.

Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J Emerg Technol Comput (JETC) 14:18

17.

Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable binarized neural network inference. In: International symposium on field-programmable gate arrays, pp 65–74

18.

Park J, Sung W (2016) FPGA based implementation of deep neural networks using on-chip memory only. In: International conference on acoustics, speech and signal processing (ICASSP), pp 1011–1015

19.

Peemen M, Setio AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. In: International conference on computer design (ICCD), pp 13–19

20.

Rahman A, Oh S, Lee J, Choi K (2017) Design space exploration of FPGA accelerators for convolutional neural networks. In: 2017 Design, automation & test in Europe conference & exhibition (DATE). IEEE, pp 1147–1152

21.

Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays, pp 161–170

22.

Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: International symposium on field-programmable custom computing machines (FCCM), pp 152–159

23.

Moss DJ, Nurvitadhi E, Sim J, Mishra A, Marr D, Subhaschandra S, Leong PH (2017) High performance binary neural networks on the Xeon + FPGA platform. In: International conference on field programmable logic and applications (FPL), pp 1–4

24.

Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: International symposium on field-programmable gate arrays, pp 45–54

25.

Yonekawa H, Nakahara H (2017) On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 98–105

26.

Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International conference on field-programmable technology (FPT), pp 77–84

27.

Zhang Y, Wang C, Gong L, Lu Y, Sun F, Xu C, Li X, Zhou X (2017) A power-efficient accelerator based on FPGAs for LSTM network. In: International conference on cluster computing (CLUSTER), pp 629–630

28.

Feng G, Hu Z, Chen S, Wu F (2016) Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks. In: International conference on solid-state and integrated circuit technology (ICSICT), pp 624–626

29.

Wang D, An J, Xu K (2016) PipeCNN: an OpenCL-based FPGA accelerator for large-scale convolution neuron networks. arXiv preprint arXiv:1611.02450

30.

Liu Z, Dou Y, Jiang J, Xu J (2016) Automatic code generation of convolutional neural networks in FPGA implementation. In: International conference on field-programmable technology (FPT). IEEE, pp 61–68

31.

Samragh M, Ghasemzadeh M, Koushanfar F (2017) Customizing neural networks for efficient FPGA implementation. In: International symposium on field-programmable custom computing machines (FCCM), pp 85–92

32.

Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on FPGA. In: International conference on application-specific systems, architectures and processors (ASAP), pp 11–18

33.

Fraser NJ, Umuroglu Y, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Scaling binarized neural networks on reconfigurable logic. In: Workshop on parallel programming and run-time management techniques for many-core architectures and design tools and architectures for multicore embedded computing platforms (PARMA-DITAM), pp 25–30

34.

Xiao Q, Liang Y, Lu L, Yan S, Tai Y-W (2017) Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In: Design automation conference, p 62

35.

Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: International conference on field programmable logic and applications (FPL), pp 1–8

36.

Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In: Design, automation & test in Europe(DATE), pp 1393–1398

37.

Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans Reconfig Technol Syst (TRETS) 10(3):17

38.

Zhang X, Liu X, Ramachandran A, Zhuge C, Tang S, Ouyang P, Cheng Z, Rupnow K, Chen D (2017) High-performance video content recognition with long-term recurrent convolutional network for FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4

39.

Ma Y, Suda N, Cao Y, Seo J-s, Vrudhula S (2016) Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–8

40.

Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: International symposium on computer architecture, ser. ISCA ’17, pp 535–547CrossRef

41.

Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: FPGA

42.

Kim JH, Grady B, Lian R, Brothers J, Anderson JH (2017) FPGA-based CNN inference accelerator synthesized from multi-threaded C software. In: IEEE SOCC

43.

Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In: Design automation conference (DAC), pp 1–6

44.

Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: International symposium on field-programmable gate arrays, pp 26–35

45.

Qiao Y, Shen J, Xiao T, Yang Q, Wen M, Zhang C (2017) FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr Comput Pract Exp 29(20):e3850CrossRef

46.

Page A, Jafari A, Shea C, Mohsenin T (2017) SPARCNet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):31

47.

Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: International conference on application-specific systems, architectures and processors (ASAP), pp 107–114

48.

Liang S, Yin S, Liu L, Luk W, Wei S (2018) FP-BNN: binarized neural network on FPGA. Neurocomputing 275:1072–1086CrossRef

49.

Natale G, Bacis M, Santambrogio MD (2017) On how to design dataflow FPGA-based accelerators for convolutional neural networks. In: IEEE computer society annual symposium on VLSI (ISVLSI), pp 639–644

50.

Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on FPGAs. In: International symposium on field-programmable custom computing machines (FCCM), pp 101–108

51.

Zhang C, Wu D, Sun J, Sun G, Luo G, Cong J (2016) Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In: International symposium on low power electronics and design, pp 326–331

52.

DiCecco R, Lacey G, Vasiljevic J, Chow P, Taylor G, Areibi S (2016) Caffeinated FPGAs: FPGA framework for convolutional neural networks. In: International conference on field-programmable technology (FPT), pp 265–268

53.

Venieris SI, Bouganis C-S (2016) fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs. In: International symposium on field-programmable custom computing machines (FCCM), pp 40–47

54.

Zeng H, Chen R, Prasanna VK (2017) Optimizing frequency domain implementation of CNNs on FPGAs. Technical report

55.

Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International conference on field programmable logic and applications (FPL). IEEE, pp 1–9

56.

Lin J-H, Xing T, Zhao R, Zhang Z, Srivastava M, Tu Z, Gupta RK (2017) Binarized convolutional neural networks with separable filters for efficient hardware acceleration. In: Computer vision and pattern recognition workshop (CVPRW)

57.

Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4

58.

Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–4

59.

Meloni P, Deriu G, Conti F, Loi I, Raffo L, Benini L (2016) Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA. In: ACM international conference on computing frontiers, pp 376–383

60.

Abdelouahab K, Bourrasset C, Pelcat M, Berry F, Quinton J-C, Serot J (2016) A holistic approach for optimizing DSP block utilization of a CNN implementation on FPGA. In: International conference on distributed smart camera, pp 69–75

61.

Gankidi PR, Thangavelautham J (2017) FPGA architecture for deep learning and its application to planetary robotics. In: IEEE aerospace conference, pp 1–9

62.

Venieris SI, Bouganis C-S (2017) Latency-driven design for FPGA-based convolutional neural networks. In: International conference on field programmable logic and applications (FPL), pp 1–8

63.

Shen Y, Ferdman M, Milder P (2017) Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer. In: International symposium on field-programmable custom computing machines (FCCM)

64.

Zhang J, Li J (2017) Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In: FPGA, pp 25–34

65.

Guo K, Sui L, Qiu J, Yao S, Han S, Wang Y, Yang H (2016) Angel-eye: a complete design flow for mapping CNN onto customized hardware. In: IEEE computer society annual symposium on VLSI (ISVLSI), pp 24–29

66.

Han S, Kang J, Mao H, Hu Y, Li X, Li Y, Xie D, Luo H, Yao S, Wang Y et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA, pp 75–84

67.

Wang Y, Xu J, Han Y, Li H, Li X (2016) DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In: Design automation conference (DAC). IEEE, pp 1–6

68.

Guan Y, Xu N, Zhang C, Yuan Z, Cong J (2017) Using data compression for optimizing FPGA-based convolutional neural network accelerators. In: International workshop on advanced parallel processing technologies, pp 14–26

69.

Cadambi S, Majumdar A, Becchi M, Chakradhar S, Graf HP (2010) A programmable parallel accelerator for learning and classification. In: International conference on parallel architectures and compilation techniques, pp 273–284

70.

Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Asia and South Pacific design automation conference (ASP-DAC), pp 575–580

71.

Han X, Zhou D, Wang S, Kimura S (2016) CNN-MERP: an FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In: International conference on computer design (ICCD), pp 320–327

72.

Sharma H, Park J, Mahajan D, Amaro E, Kim JK, Shao C, Mishra A, Esmaeilzadeh H (2016) From high-level deep neural models to FPGAs. In: International symposium on microarchitecture (MICRO). IEEE, pp 1–12

73.

Baskin C, Liss N, Mendelson A, Zheltonozhskii E (2017) Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052

74.

Gokhale V, Zaidy A, Chang AXM, Culurciello E (2017) Snowflake: an efficient hardware accelerator for convolutional neural networks. In: IEEE international symposium on circuits and systems (ISCAS), pp 1–4

75.

Lee M, Hwang K, Park J, Choi S, Shin S, Sung W (2016) “FPGA-based low-power speech recognition with recurrent neural networks. In: International workshop on signal processing systems (SiPS), pp 230–235

76.

Mahajan D, Park J, Amaro E, Sharma H, Yazdanbakhsh A, Kim JK, Esmaeilzadeh H (2016) Tabla: a unified template-based framework for accelerating statistical machine learning. In: International symposium on high performance computer architecture (HPCA). IEEE, pp 14–26

77.

Prost-Boucle A, Bourge A, Pétrot F, Alemdar H, Caldwell N, Leroy V (2017) Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In: International conference on field programmable logic and applications (FPL), pp 1–7

78.

Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer CNN accelerators. In: International symposium on microarchitecture (MICRO), pp 1–12

79.

Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):62:1–62:33

80.

Mittal S, Vetter J (2016) A survey of architectural approaches for data compression in cache and main memory systems. IEEE Trans Parallel Distrib Syst (TPDS) 27:1524–1536CrossRef

81.

Winograd S (1980) Arithmetic complexity of computations, vol 33. SIAM, PhiladelphiaCrossRef

82.

Maguire LP, McGinnity TM, Glackin B, Ghani A, Belatreche A, Harkin J (2007) Challenges for large-scale implementations of spiking neural networks on FPGAs. Neurocomputing 71(1):13–29CrossRef

Titel: A survey of FPGA-based accelerators for convolutional neural networks
verfasst von: Sparsh Mittal
Publikationsdatum: 06.10.2018
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 4/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-018-3761-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4/2020

Basic filters for convolutional neural networks applied to music: Training or design?

A half-precision compressive sensing framework for end-to-end person re-identification

Automatic chord label personalization through deep learning of shared harmonic interval profiles

This time with feeling: learning expressive musical performance

The emergence of deep learning: new opportunities for music and audio technologies

Understanding auditory representations of emotional expressions with neural networks