Skip to main content
Top
Published in: World Wide Web 4/2021

07-02-2020

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Authors: Yongbon Koo, Sunghoon Kim, Young-guk Ha

Published in: World Wide Web | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better performance than the classical feature-based algorithms. Darknet [31] is a deep learning object detection framework, which is well known for its fast speed and simple structure. Unfortunately, Darknet can only work with Nvidia CUDA [6] for accelerating its deep learning calculations. For this reason, users have only limited options of selecting appropriate graphic cards. Open computing language (OpenCL) [35], an open standard for cross-platform, parallel programming of heterogeneous systems, is available for the general hardware accelerators. However, many deep learning frameworks including Darknet have no support for OpenCL.
In our previous paper, we presented OpenCL-Darknet [19], which transformed the CUDA-based Darknet into an open standard OpenCL backend. The original OpenCL-Darknet successfully showed its ability for the general graphics processing unit (GPU) hardware. However, it could not achieve competitive performance compared with the CUDA version, and it only supported a limited platform. In this study, we improved the performance of OpenCL-Darknet with several optimization techniques and added support for various architectures. We also evaluated OpenCL-Darknet not only in AMD R7 accelerated processing unit (APU) with OpenCL 2.0, but also in Nvidia GPU and ARM Mali embedded GPU with OpenCL 1.2 Profile. The evaluation using the standard object detection datasets showed that our advanced OpenCL-Darknet reduced the processing time by at most 50% on average for various deep learning object detection networks compared with our original implementation. We also showed that our OpenCL deep learning framework has competitiveness compared with the CUDA-based one.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Badía, J., Belloch, J., Cobos, M., Igual, F., Quintana-Ortí, E.: Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. J. Supercomput. 75(3), 1284–1297 (2019)CrossRef Badía, J., Belloch, J., Cobos, M., Igual, F., Quintana-Ortí, E.: Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. J. Supercomput. 75(3), 1284–1297 (2019)CrossRef
2.
go back to reference D. Barry, M. Shah, M. Keijsers, H. Khan, and B. Hopman, “xYOLO: A Model For Real-Time Object Detection In Humanoid Soccer On Low-End Hardware,” arXiv preprint, 2019 D. Barry, M. Shah, M. Keijsers, H. Khan, and B. Hopman, “xYOLO: A Model For Real-Time Object Detection In Humanoid Soccer On Low-End Hardware,” arXiv preprint, 2019
3.
go back to reference Beck, K.: Test Driven Development: by Example. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (2002) Beck, K.: Test Driven Development: by Example. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (2002)
6.
go back to reference Cook, S.: CUDA Programming: a Developer's Guide to Parallel Computing with GPUs. Morgan Kaufmann Publishers Inc., San Francisco (2013) Cook, S.: CUDA Programming: a Developer's Guide to Parallel Computing with GPUs. Morgan Kaufmann Publishers Inc., San Francisco (2013)
8.
go back to reference N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, 2005, pp. 886–893 N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, 2005, pp. 886–893
9.
go back to reference N. Dalal, B. Triggs, and C. Schmid, “Human Detection Using Oriented Histograms of Flow and Appearance,” in Computer Vision (ECCV 2006), Springer Berlin Heidelberg, 2006, pp. 428–441 N. Dalal, B. Triggs, and C. Schmid, “Human Detection Using Oriented Histograms of Flow and Appearance,” in Computer Vision (ECCV 2006), Springer Berlin Heidelberg, 2006, pp. 428–441
11.
go back to reference A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2012 A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2012
12.
go back to reference Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016) Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
13.
go back to reference J. Gu, Y. Liu, Y. Gao, and M. Zhu, “OpenCL Caffe: Accelerating and Enabling a Cross Platform Machine Learning Framework,” in Proc. The 4th International Workshop on OpenCL, New York, 2016, pp 8:1–8:5 J. Gu, Y. Liu, Y. Gao, and M. Zhu, “OpenCL Caffe: Accelerating and Enabling a Cross Platform Machine Learning Framework,” in Proc. The 4th International Workshop on OpenCL, New York, 2016, pp 8:1–8:5
14.
go back to reference H. Haseljic, E. Cogo, I. Prazina, R. Turcinhodzic, E. Buza, and A. Akagic, “OpenCL Superpixel Implementation on a General Purpose Multi-core CPU,” in Proc. of 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 2018 H. Haseljic, E. Cogo, I. Prazina, R. Turcinhodzic, E. Buza, and A. Akagic, “OpenCL Superpixel Implementation on a General Purpose Multi-core CPU,” in Proc. of 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 2018
15.
go back to reference Hendry, Chern, R.: Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019)CrossRef Hendry, Chern, R.: Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019)CrossRef
16.
go back to reference Ji, Y., Kim, S., Kim, Y., Lee, K.: Human-like sign-language learning method using deep learning. ETRI J. 40, 435–445 (2018)CrossRef Ji, Y., Kim, S., Kim, Y., Lee, K.: Human-like sign-language learning method using deep learning. ETRI J. 40, 435–445 (2018)CrossRef
17.
go back to reference Kim, J., Ryu, J.H., Han, T.M.: Multimodal Interface based on novel HMI UI/UX for in-vehicle infotainment system. ETRI J. 37(4), 793–803 (2015)CrossRef Kim, J., Ryu, J.H., Han, T.M.: Multimodal Interface based on novel HMI UI/UX for in-vehicle infotainment system. ETRI J. 37(4), 793–803 (2015)CrossRef
18.
go back to reference Y. Koo, J. Kim, and W. Han, “A method for driving control authority transition for cooperative autonomous vehicle,” in Proc. 2015 IEEE Intelligent Vehicles Symposium, Seoul, 2015, pp. 394–399 Y. Koo, J. Kim, and W. Han, “A method for driving control authority transition for cooperative autonomous vehicle,” in Proc. 2015 IEEE Intelligent Vehicles Symposium, Seoul, 2015, pp. 394–399
19.
go back to reference Y. Koo, C. You, and S. Kim, “OpenCL-Darknet: An OpenCL Implementation for Object Detection,” in Proc. The 1st International Workshop on Driving Computing Platform for Autonomous Vehicles, Shanghai, 2018 Y. Koo, C. You, and S. Kim, “OpenCL-Darknet: An OpenCL Implementation for Object Detection,” in Proc. The 1st International Workshop on Driving Computing Platform for Autonomous Vehicles, Shanghai, 2018
20.
go back to reference W. Lee, and W. Loh, “G-OPTICS: fast ordering density-based cluster objects using graphics processing units,” in Int. J. Web Grid Serv., vol. 14(3), 2018 W. Lee, and W. Loh, “G-OPTICS: fast ordering density-based cluster objects using graphics processing units,” in Int. J. Web Grid Serv., vol. 14(3), 2018
21.
go back to reference L. Liao, K. Li, K. Li, C. Yang, and Q. Tian, “UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi−/Many-core Clusters,” in Proc. of the 47th International Conference on Parallel Processing, Eugene, OR, USA, 2018 L. Liao, K. Li, K. Li, C. Yang, and Q. Tian, “UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi−/Many-core Clusters,” in Proc. of the 47th International Conference on Parallel Processing, Eugene, OR, USA, 2018
22.
go back to reference T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European conference on computer vision, pp. 740–755, 2014 T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European conference on computer vision, pp. 740–755, 2014
23.
go back to reference Montemerlo, M., Becker, J., Bhat, S., Dahlkamp, H., Dolgov, D., Ettinger, S., Haehnel, D., Hilden, T., Hoffmann, G., Huhnke, B., Johnston, D., Klumpp, S., Langer, D., Levandowski, A., Levinson, J., Marcil, J., Orenstein, D., Paefgen, J., Penny, I., Petrovskaya, A., Pflueger, M., Stanek, G., Stavens, D., Vogt, A., Thrun, S.: Junior: the Stanford entry in the urban challenge. J. Field Rob. 25(9), 569–597 (2008)CrossRef Montemerlo, M., Becker, J., Bhat, S., Dahlkamp, H., Dolgov, D., Ettinger, S., Haehnel, D., Hilden, T., Hoffmann, G., Huhnke, B., Johnston, D., Klumpp, S., Langer, D., Levandowski, A., Levinson, J., Marcil, J., Orenstein, D., Paefgen, J., Penny, I., Petrovskaya, A., Pflueger, M., Stanek, G., Stavens, D., Vogt, A., Thrun, S.: Junior: the Stanford entry in the urban challenge. J. Field Rob. 25(9), 569–597 (2008)CrossRef
24.
go back to reference A. Neubeck and L. Van Gool, “Efficient Non-Maximum Suppression,” in Proc. The 18th International Conference on Pattern Recognition, Washington, 2006, pp. 850–855 A. Neubeck and L. Van Gool, “Efficient Non-Maximum Suppression,” in Proc. The 18th International Conference on Pattern Recognition, Washington, 2006, pp. 850–855
25.
go back to reference Noh, S., An, K.: Decision-making framework for automated driving in highway environments. IEEE Trans. Intell. Transp. Syst. 19(1), 58–71 (2018)CrossRef Noh, S., An, K.: Decision-making framework for automated driving in highway environments. IEEE Trans. Intell. Transp. Syst. 19(1), 58–71 (2018)CrossRef
26.
go back to reference Noh, S., Park, B., An, K., Koo, Y., Han, W.: Co-pilot agent for vehicle/driver cooperative and autonomous driving. ETRI J. 37(5), 1032–1043 (2015)CrossRef Noh, S., Park, B., An, K., Koo, Y., Han, W.: Co-pilot agent for vehicle/driver cooperative and autonomous driving. ETRI J. 37(5), 1032–1043 (2015)CrossRef
27.
go back to reference C. Nugteren, “CLBlast: A Tuned OpenCL BLAS Library,” arXiv preprint, 2017 C. Nugteren, “CLBlast: A Tuned OpenCL BLAS Library,” arXiv preprint, 2017
28.
go back to reference C. Nugteren, “CLTune: A Generic Auto-Tuner for OpenCL Kernels,” arXiv preprint, 2017 C. Nugteren, “CLTune: A Generic Auto-Tuner for OpenCL Kernels,” arXiv preprint, 2017
29.
go back to reference C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” in Proc. 6th International Conference on Computer Vision, Bombay, 1998, pp. 555–562 C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” in Proc. 6th International Conference on Computer Vision, Bombay, 1998, pp. 555–562
30.
go back to reference Park, M., Lee, S., Han, W.: Development of steering control system for autonomous vehicle using geometry-based path tracking algorithm. ETRI J. 37(3), 617–625 (2015)CrossRef Park, M., Lee, S., Han, W.: Development of steering control system for autonomous vehicle using geometry-based path tracking algorithm. ETRI J. 37(3), 617–625 (2015)CrossRef
32.
go back to reference J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv preprint, 2016 J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv preprint, 2016
33.
go back to reference S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Proc. Advances in Neural Information Processing Systems, Montréal, 2015, pp. 91–99 S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Proc. Advances in Neural Information Processing Systems, Montréal, 2015, pp. 91–99
34.
go back to reference Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20, 23–38 (1998)CrossRef Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20, 23–38 (1998)CrossRef
35.
go back to reference Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)CrossRef Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)CrossRef
36.
go back to reference B. Su and K. Keutzer, “clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs,” in Proc. The 26th ACM International Conference on Supercomputing, New York, 2012, pp 353–364 B. Su and K. Keutzer, “clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs,” in Proc. The 26th ACM International Conference on Supercomputing, New York, 2012, pp 353–364
37.
go back to reference Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)CrossRef Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)CrossRef
Metadata
Title
OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework
Authors
Yongbon Koo
Sunghoon Kim
Young-guk Ha
Publication date
07-02-2020
Publisher
Springer US
Published in
World Wide Web / Issue 4/2021
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-020-00778-y

Other articles of this Issue 4/2021

World Wide Web 4/2021 Go to the issue

Premium Partner