Top

International Journal of Multimedia Information Retrieval

Published in:

01-06-2024 | Regular Paper

DAF-Net: dense attention feature pyramid network for multiscale object detection

Authors: Divine Njengwie Achinek, Ibrahim Shehi Shehu, Athuman Mohamed Athuman, Xianping Fu

Published in: International Journal of Multimedia Information Retrieval | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In recent years, object detection has become one of the most prominent components in computer vision. State-of-the-art object detectors now employ convolutional neural networks (CNNs) techniques alongside other deep neural network techniques to improve detection performance and accuracy. Most of the recent object detectors employ feature pyramid network (FPN) and their variants while others use combinations of attention mechanisms to achieve better performance. The open question is object detectors inconsistency between the lower layer features, their resolution receptive field and semantic information with the upper layers features in detecting objects. Although some researchers have attempted to address this issue, we exploit ideas surrounding the field and proposed a more prominent architecture called dense attention feature pyramid network (DAF-Net) for multiscale object detection. DAF-Net consists of two attention models, the spatial attention model and channel attention model. Different from other attention models, we proposed lightweight attention models which are fully data-driven then implemented a dense connected attention FPN to reduce the model’s complexity and resolve the learning of redundant feature maps. First, we developed the two attention models then used only the spatial attention model in the backbone of our network, and finally used both attention models to filter and maintain a steady flow of semantic information from lower layers to improve the model’s accuracy and efficiency. Experimental results on underwater images from the National Natural Science Foundation of China (NSFC) (Underwater Image Dataset, National Natural Science Foundation of China (NSFC). Online, retrieved from http://www.cnurpc.org/index.html), MS COCO dataset, and PASCAL VOC dataset indicate higher accuracy and better detection results using the proposed model compared to the benchmark model YOLOX-Darknet53 (Ge in Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430). Our model achieved 70.2mAP, 48.9 mAP, and 83.9 mAP on (NSFC), MS COCO, and PASCAL VOC datasets, respectively, compared with benchmark model 68.9mAP on (NSFC), 47.7mAP on MS COCO, and 82.4mAP on PASCAL VOC.

previous article Progressive spatial–temporal transfer model for unsupervised person re-identification

next article Multi-knowledge-driven enhanced module for visible-infrared cross-modal person Re-identification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Li Y, Zhou S, Chen H (2022) Attention-based fusion factor in FPN for object detection. Appl Intell 1–10.

Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475

Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: IEEE International Conference on Computer Vision 3038–3046

Yuan Y, Xiong Z, Wang Qi (2017) An incremental framework11 for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929CrossRef

Chen K, Tao W (2018) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimedia 21(1):86–97MathSciNetCrossRef

Hongwei Hu, Ma Bo, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Trans Multimedia 21(2):510–521

Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: IEEE International Conference on Automatic Face & Gesture Recognition 650–657. IEEE

Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: A face detection benchmark. In IEEE Conference on Computer Vision and Pattern Recognition 5525–5533.

Yang S, Luo P, Loy C-C, Tang X (2018) Facenessnet: Face detection through deep facial part responses. IEEE Trans Pattern Anal Mach Intell 40(8):1845–1859CrossRef

10.

Njengwie Achinek D, Shehi Shehu I, Mohamed Athuman A, Fu X (2021) Enhanced single shot multiple detection for real-time object detection in multiple scenes. In: The 5th International Conference on Computer Science and Application Engineering (pp. 1–9).

11.

Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).

12.

Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.

13.

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).

14.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg C (2016) Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham.

15.

Wang CY, Yeh IH, Liao HYM (2021) You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:2105.04206.

16.

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proc Syst 28.

17.

He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).

18.

Pramanik A, Pal SK, Maiti J, Mitra P (2021) Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans Emerg Top Comput Intell 6(1):171–181CrossRef

19.

Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759–8768).

20.

Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 234–250).

21.

Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12595–12604).

22.

Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV) (pp. 169–185).

23.

Nie J, Anwer RM, Cholakkal H, Khan FS, Pang Y, Shao L (2019) Enriched feature guided refinement network for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 9537–9546).

24.

Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7036–7045).

25.

Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3578–3587).

26.

Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405CrossRef

27.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).

28.

Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef

29.

Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6945–6954).

30.

Sukhbaatar S, Grave E, Lample G, Jegou H, Joulin A (2019) Augmenting self-attention with persistent memory. arXiv preprint arXiv:1907.01470.

31.

Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16519–16529).

32.

Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV) (pp. 3–19).

33.

Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13

34.

Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475.

35.

Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677.

36.

Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).

37.

Birodkar V, Lu Z, Li S, Rathod V, Huang J (2021) The surprising impact of mask-head architecture on novel class segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7015–7025).

38.

Cheng B, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1290–1299).

39.

Han C, Zhao Q, Zhang S, Chen Y, Zhang Z, Yuan J (2022) YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception. arXiv preprint arXiv:2208.11434.

40.

Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv preprint arXiv:1709.05943.

41.

Wang CY, Bochkovskiy A, Liao HYM (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696.

42.

Zheng W, Tang W, Jiang L, Fu CW (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14494–14503).

43.

He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef

44.

Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755). Springer, Cham.

45.

Everingham M, Eslami SM, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111(1):98–136CrossRef

46.

Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516.

47.

Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790).

48.

Huang X, Wang X, Lv W, Bai X, Long X, Deng K, Yoshie O (2021) PP-YOLOv2: A practical object detector. arXiv preprint arXiv:2104.10419.

49.

Glenn-Jocher YFL (2021) 3181. Ultralytics: Github; https://github.com/ultralytics/yolov5/discussions/3181m1.

50.

Underwater Image Dataset, National Natural Science Foundation of China (NSFC). Online, retrieved from http://www.cnurpc.org/index.html.

51.

Priyadarshni D, Kolekar MH (2020) Underwater object detection and tracking. In: Soft Computing: Theories and Applications (pp. 837–846). Springer, Singapore.

52.

Han F, Yao J, Zhu H, Wang C (2020) Underwater image processing and object detection based on deep CNN method. J Sensors

53.

Castrillón M, Déniz O, Hernández D, Lorenzo J (2011) A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Mach Vis Appl 22(3):481–494

54.

Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464–7475).

55.

Glenn J (2023) Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics.

Title: DAF-Net: dense attention feature pyramid network for multiscale object detection
Authors: Divine Njengwie Achinek
Ibrahim Shehi Shehu
Athuman Mohamed Athuman
Xianping Fu
Publication date: 01-06-2024
Publisher: Springer London
Published in: International Journal of Multimedia Information Retrieval / Issue 2/2024
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-024-00323-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2024

Unsupervised graph reasoning distillation hashing for multimodal hamming space search with vision-language model

Domain-specific image captioning: a comprehensive review

A spatiotemporal bidirectional network for video salient object detection using multiscale transfer learning

Progressive spatial–temporal transfer model for unsupervised person re-identification

State of art and emerging trends on group recommender system: a comprehensive review

Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification

Premium Partner