nach oben

The Journal of Supercomputing

Erschienen in:

27.09.2022

Optimization of FPGA-based CNN accelerators using metaheuristics

verfasst von: Sadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri, Ahmad Shawahna

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general central processing units (CPUs) unable to deliver the desired real-time performance. At the same time, field-programmable gate arrays (FPGAs) have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to other computing technologies such as graphics processing units (GPUs). The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver the optimal performance more challenging. This is because of the exponential increase in the design variables that must be considered when implementing a \(\text{Multi-CLP}\) accelerator as CNN’s complexity increases. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a \(\text {Multi-CLP}\) accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors (DSPs), block random access memories (BRAMs), and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The \(\text {SA-/TS-based}\) \(\text {Multi-CLP}\) achieves \(1.31{\times}~-~2.37{\times}\) higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.

Vorheriger Artikel Computer vision-based deep learning for supervising excavator operations and measuring real-time earthwork productivity

Nächster Artikel SUDV: Malicious fog node management framework for software update dissemination in connected vehicles

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Hu X, Lu X, Hori C (2014) Mandarin speech recognition using convolution neural network with augmented tone features. In: The 9th International Symposium on Chinese Spoken Language Processing. pp 15–18 https://doi.org/10.1109/ISCSLP.2014.6936674

Khalil-Hani M, Sung LS (2014) A convolutional neural network approach for face verification. In: 2014 International Conference on High Performance Computing Simulation (HPCS). pp 707–714 https://doi.org/10.1109/HPCSim.20146903759

Farfade S S, M J Saberian, Li-J Li (2015) Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. pp 643–650 https://doi.org/10.1145/2671188.2749408

Zheng J, Wang Y, Zeng W (2015) CNN based vehicle counting with virtual coil in traffic surveillance video. In: 2015 IEEE International Conference on Multimedia Big Data. pp 280–281. https://doi.org/10.1109/BigMM.2015.56

Wang R, Xu Z (2015) A pedestrian and vehicle rapid identification model based on convolutional neural network. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. pp 1–4. https://doi.org/10.1145/2808492.2808524

Lau MM, Lim KH, Gopalai AA (2015) Malaysia traffic sign recognition with convolutional neural network. In: 2015 IEEE International Conference on Digital Signal Processing DSP. pp 1006–1010. https://doi.org/10.1109/ICDSP.2015.7252029

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process syst 25

Shawahna A, Sait SM, V A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859 CrossRef

Feng X, Jiang Y, Yang X et al (2019) Computer vision algorithms and hardware implementations: a survey. Integration 69:309–320. https://doi.org/10.1016/j.vlsi.2019.07.005CrossRef

10.

Ghimire D, Kil D, Kim S (2022) A survey on Efficient convolutional neural networks and hardware acceleration. Electronics. https://doi.org/10.3390/electronics11060945CrossRef

11.

Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: International conference on artificial neural networks. Springer. 8681:281–290. https://doi.org/10.1007/978-3-319-11179-7_36

12.

Howard AG, Zhu M, Chen B et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. In: arXiv preprint arXiv:1704.04861

13.

Horng GJ, Liu MX, Chen CC (2020) The smart image recognition mechanism for crop harvesting system in intelligent agriculture. IEEE Sensor J 20(5):2766–2781. https://doi.org/10.1109/JSEN.2019.2954287CrossRef

14.

Jiang H, Li X, Safara F (2021) IoT-based agriculture: deep learning in detecting apple fruit diseases. Microprocess Microsyst. https://doi.org/10.1016/j.micproCrossRef

15.

Li H, Fan X, Jiao L, et al (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL). pp 1–9. https://doi.org/10.1109/FPL.2016.7577308

16.

Zhang C, Li P, Guang Y, et al. 2015 Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. pp 161–170. https://doi.org/10.1145/2684746.2689060

17.

Suda N, Chandra V, Dasika G, et al. (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. pp 16–25. https://doi.org/10.1145/2847263.2847276

18.

Shen Y, Ferdman M, Milder P (2017) Maximizing CNN accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). pp 535–547. https://doi.org/10.1145/3079856.3080221

19.

20.

Osman IH, Kelly JP (1996) Metaheuristics: an overview. Meta-heur. https://doi.org/10.1007/978-1-4613-1361-8_1CrossRef

21.

Rere LMR, Fanany MI, Arymurthy AM (2015) Simulated annealing algorithm for deep learning. Proc Comput Sci 72:137–144. https://doi.org/10.1016/j.procsCrossRef

22.

Iandola FN, Han S,Moskewicz MW, et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size. In: arXiv preprint arXiv:1409.1556

23.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv preprintarXiv:1409.1556

24.

Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9

25.

Shawahna A, Sait SM, El-Maleh A et al (2022) FxP-QNet: a post-training quantizer for the design of mixed low-precision DNNs with dynamic fixed-point representation. IEEE Access 10:30202–30231. https://doi.org/10.1109/ACCESS.2022.3157893CrossRef

26.

Cho M, Kim Y (2021) FPGA-based convolutional neural network accelerator with resource optimized approximate multiply accumulate unit. Electronics. https://doi.org/10.3390/electronics10222859CrossRef

27.

Pouchet LN, Zhang P, Sadayappan P, et al. (2013) Polyhedral-based data reuse optimization for configurable computing. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. pp 29–38. https://doi.org/10.1145/2435264.2435273

28.

Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65CrossRef

29.

Xilinx. Vivado Design Suite Product Guide: Floating- Point Operator v7.1 [Online]. Available:https://docs.xilinx.com/v/u/en-US/pg060-floatingpoint (2020)

30.

Xilinx. User Guide: 7 Series FPGAs Memory Resources [Online]. Available:https://docs.xilinx.com/v/u/en -US/ug473_7Series_Memory_Resources (2019)

31.

Sait Sadiq M, Habib Y (1999) Iterative computer algorithms with applications in engineering: solving combinatorial optimization problems. IEEE, Los Alamitos, CA . p 387

32.

Sait SM, Youssef H (1999) VLSI physical design automation: theory and practice. World Scientific, 6

33.

Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/science.220.4598.671CrossRefMATH

34.

Cerny V (1985) Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J optimiz Theory Appl 45(1):41–51. https://doi.org/10.1007/BF00940812CrossRefMATH

35.

Metropolis N, Rosenbluth AW, Rosenbluth MN, et al (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.1063/1.1699114CrossRefMATH

36.

Youssef H, Sait SM, Adiche H (2001) Evolutionary algorithms, simulated annealing and tabu search: a comparative study. Eng Appl Artif Intell 14(2):167–181CrossRef

37.

Glover F (1989) Tabu search—part I. ORSA J comput 1(3):190–206. https://doi.org/10.1287/ijoc.1.3.190CrossRefMATH

38.

Glover F (1990) Tabu search—part II. ORSA J comput 2(1):4–32CrossRefMATH

39.

Glover F, Laguna M. (1998) "Tabu search”. In: Handbook of combinatorial optimization. Springer., pp. 2093–2229 https://doi.org/10.1007/978-1-4613-0303-9_33

40.

Glover F, Laguna M (1998) “Tabu search”. In: Handbook of combinatorial optimization. Springer: 2093–2229. https://doi.org/10.1007/978-1-4613-0303-9_33

41.

Russakovsky O, Deng J, Hao S et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-yCrossRef

42.

Xilinx.(2019) User Guide: VC707 Evaluation Board for the Virtex-7 FPGA [Online]. Available: https://docs.xilinx.com/v/u/en-US/ug885_VC707_ Eval_Bd

43.

Xilinx.(2019) User Guide: VC709 Evaluation Board for the Virtex-7 FPGA [Online]. Available: https://docs.xilinx.com/v/u/en-US/ug887-vc709-eval-board-v7-fpga

44.

Garcia P, Bhowmik D, Stewart R et al (2019) Optimized memory allocation and power minimization for FPGA-based image processing. J Imaging 5(1):7. https://doi.org/10.3390/jimaging5010007CrossRef

Titel: Optimization of FPGA-based CNN accelerators using metaheuristics
verfasst von: Sadiq M. Sait
Aiman El-Maleh
Mohammad Altakrouri
Ahmad Shawahna
Publikationsdatum: 27.09.2022
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 4/2023
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-022-04787-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4/2023

Correction to: ReNo: novel switch architecture for reliability improvement of NoCs

Energy-aware disjoint dominating sets-based whale optimization algorithm for data collection in WSNs

A Bayesian-based classification framework for financial time series trend prediction

A randomized algorithm for the wait-free consensus problem

Representation of gene regulation networks by hypothesis logic-based Boolean systems

Prediction model of sparse autoencoder-based bidirectional LSTM for wastewater flow rate

Premium Partner