Top

The Journal of Supercomputing

Published in:

01-02-2023

LODNU: lightweight object detection network in UAV vision

Authors: Naiyuan Chen, Yan Li, Zhuomin Yang, Zhensong Lu, Sai Wang, Junang Wang

Published in: The Journal of Supercomputing | Issue 9/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

With the development of unmanned aerial vehicle (UAV) technology, using UAV to detect objects has become a research focus. However, most of the current object detection algorithms based on deep learning are resource consuming, which is difficult to deploy on embedded devices lacking memory and computing power, such as UAV. To meet these challenges, this paper proposes a lightweight object detection network in UAV vision (LODNU) based on YOLOv4, which can meet the application requirements of resource-constrained devices while ensuring the detection accuracy. Based on YOLOv4, LODNU uses depth-wise separable convolution to reconstruct the backbone network to reduce the parameters of the model, and embeds improved coordinate attention in the backbone network to improve the extraction ability of key object features. Meanwhile, the adaptive scale weighted feature fusion module is added to the path aggregation network to improve the accuracy of multi-scale object detection. In addition, in order to balance the proportion of sample size, we propose a patching data augmentation. LODNU achieves the performance near to YOLOv4 with fewer parameters. Compared with YOLOv4, LODNU achieves more balanced results between model size and accuracy in the VisDrone2019 dataset. The experimental show that the parameters of LODNU are only 13.1% of YOLOv4, and the floating-point operations are only 13.6% of YOLOv4. The mean average precision of LODNU in the VisDrone2019 dataset is 31.4%, which is 77% of YOLOv4.

previous article Received signal strength-based location verification technique in Wireless Sensor Network using Spline curve

next article Performance research on a task offloading strategy in a two-tier edge structure-based MEC system

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Gupta H, Verma OP (2022) monitoring and surveillance of urban road traffic using low altitude drone images: a deep learning approach. Multimed Tools Appl 81(14):19683–19703CrossRef

Wang S, Zhao J, Ta N, Zhao X, Xiao M, Wei H (2021) A real-time deep learning forest fire monitoring algorithm based on an improved pruned+ kd model. J Real Time Image Proc 18(6):2319–2329CrossRef

Huang Z, Zhang T, Liu P, Lu X (2020) outdoor independent charging platform system for power patrol UAV. In: 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp 1–5

Han J, Zhang D, Cheng G et al (2014) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337CrossRef

Shi Z, Yu X, Jiang Z et al (2013) Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans Geosci Remote Sens 52(8):4511–4523

Everingham M, Eslami S, Van Gool L et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRef

Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252CrossRefMathSciNet

Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755

Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318CrossRefMATH

10.

He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

11.

Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708

12.

Girshick R, Donahue J, Darrell T et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRef

13.

Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988

14.

Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 2117–2125

15.

Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768

16.

Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp 21–37

17.

Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

18.

Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

19.

Chen Y, Chen X, Chen L, He D, Zheng J, Xu C, Lin Y, Liu L (2021) UAV lightweight object detection based on the improved yolo algorithm. In: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, pp 1502–1506

20.

Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722

21.

Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef

22.

Dai J, Li Y, He K et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29:379–387

23.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

24.

Yu F, Wang D, Shelhamer E et al (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2403–2412

25.

Kim S-W, Kook H-K, Sun J-Y et al (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 234–250

26.

Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045

27.

Iandola FN, Han S, Moskewicz MW, et al (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360

28.

Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

29.

Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520

30.

Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324

31.

Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856

32.

Ma N, Zhang X, Liu M et al (2021) Activate or not: learning customized activation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8032–8042

33.

Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941

34.

He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034

35.

Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 618–626

36.

Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-yolov4: scaling cross stage partial network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13024–13033

37.

Jocher G (2020) Yolov5. https://github.com/ultralytics/yolov5. Accessed 26 Nov 2021

38.

Sun W, Dai L, Zhang X, Chang P, He X (2022) RSOD: real-time small object detection algorithm in uav-based traffic monitoring. Appl Intell 52(8):8448–8463CrossRef

39.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

40.

Wang H, Wu Z, Liu Z, Cai H, Zhu L, Gan C, Han S (2020) Hat: hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187

41.

Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H (2020) Delight: deep and light-weight transformer. arXiv preprint arXiv:2008.00623

Title: LODNU: lightweight object detection network in UAV vision
Authors: Naiyuan Chen
Yan Li
Zhuomin Yang
Zhensong Lu
Sai Wang
Junang Wang
Publication date: 01-02-2023
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 9/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-023-05065-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 9/2023

Dynamic two-side matching of tasks and resources in wide-area distributed computing environments

A differential machine learning approach for trust prediction in signed social networks

An unsupervised learning-guided multi-node failure-recovery model for distributed graph processing systems

LSTM-SN: complex text classifying with LSTM fusion social network

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Aspect-level sentiment classification via location enhanced aspect-merged graph convolutional networks

Premium Partner