Skip to main content
Top
Published in: Evolutionary Intelligence 4/2022

26-06-2021 | Special Issue

YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method

Author: Congjun Liu

Published in: Evolutionary Intelligence | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

YOLOv2 is an object detection algorithm grounded on the Darknet neural network, widely applied in the advanced driver assistance system. Nevertheless, the YOLOv2 algorithm must be accelerated on a high-performance computing platform before being put into practical usage. Various computing platforms have their specific features. The merits or drawbacks of the accelerated platform are hard for the developers to recognize and pick up the right alternative based on real demands. This paper analyzes the pros and cons of embedded GPU and FPGA for improving the YOLOv2 algorithm concerning development speed, power efficiency, and computing performance. The analysis provides the developers with insights into choosing the hardware to optimize the YOLOv2 algorithm. According to the experimental data, it is found that if FPGA is optimized profoundly, the performance of power efficiency, as well as speed, will exceed embedded GPU. However, the FPGA development procedure is tough and demands much more time for developers than the GPU development process. Finally, we propose a balanced method to take advantage of GPU’s development speed and FGPA’s high performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167
3.
go back to reference Jablin TB, Prabhu P, Jablin JA, Johnson NP, Beard SR, August DI (2011) Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pp 142–151 Jablin TB, Prabhu P, Jablin JA, Johnson NP, Beard SR, August DI (2011) Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pp 142–151
4.
go back to reference Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678
5.
go back to reference Lei J, Dl Li, Yl Zhou, Liu W (2019) Optimization and acceleration of flow simulations for cfd on cpu/gpu architecture. J Braz Soc Mech Sci Eng 41(7):290CrossRef Lei J, Dl Li, Yl Zhou, Liu W (2019) Optimization and acceleration of flow simulations for cfd on cpu/gpu architecture. J Braz Soc Mech Sci Eng 41(7):290CrossRef
6.
go back to reference Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755 Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755
7.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 21–37
8.
go back to reference Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 31–40 Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 31–40
9.
go back to reference Naphade M, Anastasiu DC, Sharma A, Jagrlamudi V, Jeon H, Liu K, Chang MC, Lyu S, Gao Z (2017) The nvidia ai city challenge. In: 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1–6 Naphade M, Anastasiu DC, Sharma A, Jagrlamudi V, Jeon H, Liu K, Chang MC, Lyu S, Gao Z (2017) The nvidia ai city challenge. In: 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1–6
11.
go back to reference Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, pp 26–35 Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, pp 26–35
12.
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
13.
go back to reference Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
14.
go back to reference Shan L, Zhang M, Deng L, Gong G (2016) A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Computer Engineering and Technology . Springer, pp 102–111 Shan L, Zhang M, Deng L, Gong G (2016) A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Computer Engineering and Technology . Springer, pp 102–111
15.
go back to reference Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 535–547. IEEE Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 535–547. IEEE
16.
go back to reference Venieris SI, Bouganis CS (2016) fpgaconvnet: a framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual international symposium on field-programmable custom computing machines (FCCM), pp. 40–47. IEEE Venieris SI, Bouganis CS (2016) fpgaconvnet: a framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual international symposium on field-programmable custom computing machines (FCCM), pp. 40–47. IEEE
17.
go back to reference Wai YJ, bin Mohd Yussof Z, bin Salim SI, Chuan LK (2018) Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int J Adv Comput Sci Appl 9(10):506–512 Wai YJ, bin Mohd Yussof Z, bin Salim SI, Chuan LK (2018) Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int J Adv Comput Sci Appl 9(10):506–512
18.
go back to reference Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: Proceedings of the 54th annual design automation conference 2017, pp 1–6 Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: Proceedings of the 54th annual design automation conference 2017, pp 1–6
19.
go back to reference Wu J, Guo S, Li J, Zeng D (2016) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887CrossRef Wu J, Guo S, Li J, Zeng D (2016) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887CrossRef
20.
go back to reference Wu J, Guo S, Huang H, Liu W, Xiang Y (2018) Information and communications technologies for sustainable development goals: state-of-the-art, needs and perspectives. IEEE Commun Surv Tutorials 20(3):2389–2406CrossRef Wu J, Guo S, Huang H, Liu W, Xiang Y (2018) Information and communications technologies for sustainable development goals: state-of-the-art, needs and perspectives. IEEE Commun Surv Tutorials 20(3):2389–2406CrossRef
21.
go back to reference Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aided Des Integr Circuits Syst 38(11):2072–2085CrossRef Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aided Des Integr Circuits Syst 38(11):2072–2085CrossRef
22.
go back to reference Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170 Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170
Metadata
Title
YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method
Author
Congjun Liu
Publication date
26-06-2021
Publisher
Springer Berlin Heidelberg
Published in
Evolutionary Intelligence / Issue 4/2022
Print ISSN: 1864-5909
Electronic ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-021-00612-y

Other articles of this Issue 4/2022

Evolutionary Intelligence 4/2022 Go to the issue

Premium Partner