Skip to main content
Top
Published in: The Journal of Supercomputing 4/2023

27-09-2022

Optimization of FPGA-based CNN accelerators using metaheuristics

Authors: Sadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri, Ahmad Shawahna

Published in: The Journal of Supercomputing | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general central processing units (CPUs) unable to deliver the desired real-time performance. At the same time, field-programmable gate arrays (FPGAs) have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to other computing technologies such as graphics processing units (GPUs). The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver the optimal performance more challenging. This is because of the exponential increase in the design variables that must be considered when implementing a \(\text{Multi-CLP}\) accelerator as CNN’s complexity increases. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a \(\text {Multi-CLP}\) accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors (DSPs), block random access memories (BRAMs), and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The \(\text {SA-/TS-based}\) \(\text {Multi-CLP}\) achieves \(1.31{\times}~-~2.37{\times}\) higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
7.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process syst 25 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process syst 25
8.
go back to reference Shawahna A, Sait SM, V A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859 CrossRef Shawahna A, Sait SM, V A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859 CrossRef
12.
go back to reference Howard AG, Zhu M, Chen B et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. In: arXiv preprint arXiv:1704.04861 Howard AG, Zhu M, Chen B et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. In: arXiv preprint arXiv:​1704.​04861
16.
17.
go back to reference Suda N, Chandra V, Dasika G, et al. (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. pp 16–25. https://doi.org/10.1145/2847263.2847276 Suda N, Chandra V, Dasika G, et al. (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. pp 16–25. https://​doi.​org/​10.​1145/​2847263.​2847276
22.
go back to reference Iandola FN, Han S,Moskewicz MW, et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size. In: arXiv preprint arXiv:1409.1556 Iandola FN, Han S,Moskewicz MW, et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size. In: arXiv preprint arXiv:​1409.​1556
23.
go back to reference Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv preprintarXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv preprintarXiv:​1409.​1556
24.
go back to reference Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9 Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9
28.
go back to reference Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65CrossRef Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65CrossRef
31.
go back to reference Sait Sadiq M, Habib Y (1999) Iterative computer algorithms with applications in engineering: solving combinatorial optimization problems. IEEE, Los Alamitos, CA . p 387 Sait Sadiq M, Habib Y (1999) Iterative computer algorithms with applications in engineering: solving combinatorial optimization problems. IEEE, Los Alamitos, CA . p 387
32.
go back to reference Sait SM, Youssef H (1999) VLSI physical design automation: theory and practice. World Scientific, 6 Sait SM, Youssef H (1999) VLSI physical design automation: theory and practice. World Scientific, 6
36.
go back to reference Youssef H, Sait SM, Adiche H (2001) Evolutionary algorithms, simulated annealing and tabu search: a comparative study. Eng Appl Artif Intell 14(2):167–181CrossRef Youssef H, Sait SM, Adiche H (2001) Evolutionary algorithms, simulated annealing and tabu search: a comparative study. Eng Appl Artif Intell 14(2):167–181CrossRef
Metadata
Title
Optimization of FPGA-based CNN accelerators using metaheuristics
Authors
Sadiq M. Sait
Aiman El-Maleh
Mohammad Altakrouri
Ahmad Shawahna
Publication date
27-09-2022
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 4/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-022-04787-8

Other articles of this Issue 4/2023

The Journal of Supercomputing 4/2023 Go to the issue

Premium Partner