Skip to main content
Top
Published in: The Journal of Supercomputing 9/2023

01-02-2023

LODNU: lightweight object detection network in UAV vision

Authors: Naiyuan Chen, Yan Li, Zhuomin Yang, Zhensong Lu, Sai Wang, Junang Wang

Published in: The Journal of Supercomputing | Issue 9/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the development of unmanned aerial vehicle (UAV) technology, using UAV to detect objects has become a research focus. However, most of the current object detection algorithms based on deep learning are resource consuming, which is difficult to deploy on embedded devices lacking memory and computing power, such as UAV. To meet these challenges, this paper proposes a lightweight object detection network in UAV vision (LODNU) based on YOLOv4, which can meet the application requirements of resource-constrained devices while ensuring the detection accuracy. Based on YOLOv4, LODNU uses depth-wise separable convolution to reconstruct the backbone network to reduce the parameters of the model, and embeds improved coordinate attention in the backbone network to improve the extraction ability of key object features. Meanwhile, the adaptive scale weighted feature fusion module is added to the path aggregation network to improve the accuracy of multi-scale object detection. In addition, in order to balance the proportion of sample size, we propose a patching data augmentation. LODNU achieves the performance near to YOLOv4 with fewer parameters. Compared with YOLOv4, LODNU achieves more balanced results between model size and accuracy in the VisDrone2019 dataset. The experimental show that the parameters of LODNU are only 13.1% of YOLOv4, and the floating-point operations are only 13.6% of YOLOv4. The mean average precision of LODNU in the VisDrone2019 dataset is 31.4%, which is 77% of YOLOv4.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gupta H, Verma OP (2022) monitoring and surveillance of urban road traffic using low altitude drone images: a deep learning approach. Multimed Tools Appl 81(14):19683–19703CrossRef Gupta H, Verma OP (2022) monitoring and surveillance of urban road traffic using low altitude drone images: a deep learning approach. Multimed Tools Appl 81(14):19683–19703CrossRef
2.
go back to reference Wang S, Zhao J, Ta N, Zhao X, Xiao M, Wei H (2021) A real-time deep learning forest fire monitoring algorithm based on an improved pruned+ kd model. J Real Time Image Proc 18(6):2319–2329CrossRef Wang S, Zhao J, Ta N, Zhao X, Xiao M, Wei H (2021) A real-time deep learning forest fire monitoring algorithm based on an improved pruned+ kd model. J Real Time Image Proc 18(6):2319–2329CrossRef
3.
go back to reference Huang Z, Zhang T, Liu P, Lu X (2020) outdoor independent charging platform system for power patrol UAV. In: 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp 1–5 Huang Z, Zhang T, Liu P, Lu X (2020) outdoor independent charging platform system for power patrol UAV. In: 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp 1–5
4.
go back to reference Han J, Zhang D, Cheng G et al (2014) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337CrossRef Han J, Zhang D, Cheng G et al (2014) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337CrossRef
5.
go back to reference Shi Z, Yu X, Jiang Z et al (2013) Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans Geosci Remote Sens 52(8):4511–4523 Shi Z, Yu X, Jiang Z et al (2013) Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans Geosci Remote Sens 52(8):4511–4523
6.
go back to reference Everingham M, Eslami S, Van Gool L et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRef Everingham M, Eslami S, Van Gool L et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRef
7.
go back to reference Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252CrossRefMathSciNet Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252CrossRefMathSciNet
8.
go back to reference Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755 Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755
9.
go back to reference Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318CrossRefMATH Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318CrossRefMATH
10.
go back to reference He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
11.
go back to reference Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708 Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
12.
go back to reference Girshick R, Donahue J, Darrell T et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRef Girshick R, Donahue J, Darrell T et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRef
13.
go back to reference Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988 Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
14.
go back to reference Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 2117–2125 Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 2117–2125
15.
go back to reference Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768 Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
16.
go back to reference Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp 21–37
17.
go back to reference Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788 Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
18.
19.
go back to reference Chen Y, Chen X, Chen L, He D, Zheng J, Xu C, Lin Y, Liu L (2021) UAV lightweight object detection based on the improved yolo algorithm. In: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, pp 1502–1506 Chen Y, Chen X, Chen L, He D, Zheng J, Xu C, Lin Y, Liu L (2021) UAV lightweight object detection based on the improved yolo algorithm. In: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, pp 1502–1506
20.
go back to reference Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722 Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
21.
go back to reference Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
22.
go back to reference Dai J, Li Y, He K et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29:379–387 Dai J, Li Y, He K et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29:379–387
23.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
24.
go back to reference Yu F, Wang D, Shelhamer E et al (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2403–2412 Yu F, Wang D, Shelhamer E et al (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2403–2412
25.
go back to reference Kim S-W, Kook H-K, Sun J-Y et al (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250 Kim S-W, Kook H-K, Sun J-Y et al (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250
26.
go back to reference Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045 Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045
27.
go back to reference Iandola FN, Han S, Moskewicz MW, et al (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 Iandola FN, Han S, Moskewicz MW, et al (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:​1602.​07360
28.
go back to reference Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861
29.
go back to reference Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520 Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
30.
go back to reference Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324 Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324
31.
go back to reference Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856 Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856
32.
go back to reference Ma N, Zhang X, Liu M et al (2021) Activate or not: learning customized activation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8032–8042 Ma N, Zhang X, Liu M et al (2021) Activate or not: learning customized activation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8032–8042
34.
go back to reference He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034 He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034
35.
go back to reference Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626 Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626
36.
go back to reference Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13024–13033 Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13024–13033
38.
go back to reference Sun W, Dai L, Zhang X, Chang P, He X (2022) RSOD: real-time small object detection algorithm in uav-based traffic monitoring. Appl Intell 52(8):8448–8463CrossRef Sun W, Dai L, Zhang X, Chang P, He X (2022) RSOD: real-time small object detection algorithm in uav-based traffic monitoring. Appl Intell 52(8):8448–8463CrossRef
39.
go back to reference Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
40.
go back to reference Wang H, Wu Z, Liu Z, Cai H, Zhu L, Gan C, Han S (2020) Hat: hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187 Wang H, Wu Z, Liu Z, Cai H, Zhu L, Gan C, Han S (2020) Hat: hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:​2005.​14187
41.
go back to reference Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H (2020) Delight: deep and light-weight transformer. arXiv preprint arXiv:2008.00623 Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H (2020) Delight: deep and light-weight transformer. arXiv preprint arXiv:​2008.​00623
Metadata
Title
LODNU: lightweight object detection network in UAV vision
Authors
Naiyuan Chen
Yan Li
Zhuomin Yang
Zhensong Lu
Sai Wang
Junang Wang
Publication date
01-02-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 9/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05065-x

Other articles of this Issue 9/2023

The Journal of Supercomputing 9/2023 Go to the issue

Premium Partner