Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 2/2024

01-06-2024 | Regular Paper

DAF-Net: dense attention feature pyramid network for multiscale object detection

Authors: Divine Njengwie Achinek, Ibrahim Shehi Shehu, Athuman Mohamed Athuman, Xianping Fu

Published in: International Journal of Multimedia Information Retrieval | Issue 2/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, object detection has become one of the most prominent components in computer vision. State-of-the-art object detectors now employ convolutional neural networks (CNNs) techniques alongside other deep neural network techniques to improve detection performance and accuracy. Most of the recent object detectors employ feature pyramid network (FPN) and their variants while others use combinations of attention mechanisms to achieve better performance. The open question is object detectors inconsistency between the lower layer features, their resolution receptive field and semantic information with the upper layers features in detecting objects. Although some researchers have attempted to address this issue, we exploit ideas surrounding the field and proposed a more prominent architecture called dense attention feature pyramid network (DAF-Net) for multiscale object detection. DAF-Net consists of two attention models, the spatial attention model and channel attention model. Different from other attention models, we proposed lightweight attention models which are fully data-driven then implemented a dense connected attention FPN to reduce the model’s complexity and resolve the learning of redundant feature maps. First, we developed the two attention models then used only the spatial attention model in the backbone of our network, and finally used both attention models to filter and maintain a steady flow of semantic information from lower layers to improve the model’s accuracy and efficiency. Experimental results on underwater images from the National Natural Science Foundation of China (NSFC) (Underwater Image Dataset, National Natural Science Foundation of China (NSFC). Online, retrieved from http://​www.​cnurpc.​org/​index.​html), MS COCO dataset, and PASCAL VOC dataset indicate higher accuracy and better detection results using the proposed model compared to the benchmark model YOLOX-Darknet53 (Ge in Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430). Our model achieved 70.2mAP, 48.9 mAP, and 83.9 mAP on (NSFC), MS COCO, and PASCAL VOC datasets, respectively, compared with benchmark model 68.9mAP on (NSFC), 47.7mAP on MS COCO, and 82.4mAP on PASCAL VOC.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Li Y, Zhou S, Chen H (2022) Attention-based fusion factor in FPN for object detection. Appl Intell 1–10. Li Y, Zhou S, Chen H (2022) Attention-based fusion factor in FPN for object detection. Appl Intell 1–10.
2.
go back to reference Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475 Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:​2005.​11475
3.
go back to reference Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: IEEE International Conference on Computer Vision 3038–3046 Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: IEEE International Conference on Computer Vision 3038–3046
4.
go back to reference Yuan Y, Xiong Z, Wang Qi (2017) An incremental framework11 for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929CrossRef Yuan Y, Xiong Z, Wang Qi (2017) An incremental framework11 for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929CrossRef
5.
go back to reference Chen K, Tao W (2018) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimedia 21(1):86–97MathSciNetCrossRef Chen K, Tao W (2018) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimedia 21(1):86–97MathSciNetCrossRef
6.
go back to reference Hongwei Hu, Ma Bo, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Trans Multimedia 21(2):510–521 Hongwei Hu, Ma Bo, Shen J, Sun H, Shao L, Porikli F (2018) Robust object tracking using manifold regularized convolutional neural networks. IEEE Trans Multimedia 21(2):510–521
7.
go back to reference Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: IEEE International Conference on Automatic Face & Gesture Recognition 650–657. IEEE Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: IEEE International Conference on Automatic Face & Gesture Recognition 650–657. IEEE
8.
go back to reference Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: A face detection benchmark. In IEEE Conference on Computer Vision and Pattern Recognition 5525–5533. Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: A face detection benchmark. In IEEE Conference on Computer Vision and Pattern Recognition 5525–5533.
9.
go back to reference Yang S, Luo P, Loy C-C, Tang X (2018) Facenessnet: Face detection through deep facial part responses. IEEE Trans Pattern Anal Mach Intell 40(8):1845–1859CrossRef Yang S, Luo P, Loy C-C, Tang X (2018) Facenessnet: Face detection through deep facial part responses. IEEE Trans Pattern Anal Mach Intell 40(8):1845–1859CrossRef
10.
go back to reference Njengwie Achinek D, Shehi Shehu I, Mohamed Athuman A, Fu X (2021) Enhanced single shot multiple detection for real-time object detection in multiple scenes. In: The 5th International Conference on Computer Science and Application Engineering (pp. 1–9). Njengwie Achinek D, Shehi Shehu I, Mohamed Athuman A, Fu X (2021) Enhanced single shot multiple detection for real-time object detection in multiple scenes. In: The 5th International Conference on Computer Science and Application Engineering (pp. 1–9).
11.
go back to reference Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125). Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
13.
go back to reference Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788). Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
14.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg C (2016) Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg C (2016) Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham.
15.
go back to reference Wang CY, Yeh IH, Liao HYM (2021) You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:2105.04206. Wang CY, Yeh IH, Liao HYM (2021) You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:​2105.​04206.
16.
go back to reference Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proc Syst 28. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proc Syst 28.
17.
go back to reference He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (pp. 2961–2969). He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
18.
go back to reference Pramanik A, Pal SK, Maiti J, Mitra P (2021) Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans Emerg Top Comput Intell 6(1):171–181CrossRef Pramanik A, Pal SK, Maiti J, Mitra P (2021) Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans Emerg Top Comput Intell 6(1):171–181CrossRef
19.
go back to reference Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759–8768). Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759–8768).
20.
go back to reference Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 234–250). Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 234–250).
21.
go back to reference Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12595–12604). Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12595–12604).
22.
go back to reference Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV) (pp. 169–185). Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV) (pp. 169–185).
23.
go back to reference Nie J, Anwer RM, Cholakkal H, Khan FS, Pang Y, Shao L (2019) Enriched feature guided refinement network for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 9537–9546). Nie J, Anwer RM, Cholakkal H, Khan FS, Pang Y, Shao L (2019) Enriched feature guided refinement network for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 9537–9546).
24.
go back to reference Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7036–7045). Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7036–7045).
25.
go back to reference Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3578–3587). Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3578–3587).
26.
go back to reference Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405CrossRef Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405CrossRef
27.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141). Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
28.
go back to reference Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef
29.
go back to reference Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6945–6954). Pirinen A, Sminchisescu C (2018) Deep reinforcement learning of region proposal networks for object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6945–6954).
30.
go back to reference Sukhbaatar S, Grave E, Lample G, Jegou H, Joulin A (2019) Augmenting self-attention with persistent memory. arXiv preprint arXiv:1907.01470. Sukhbaatar S, Grave E, Lample G, Jegou H, Joulin A (2019) Augmenting self-attention with persistent memory. arXiv preprint arXiv:​1907.​01470.
31.
go back to reference Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16519–16529). Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16519–16529).
32.
go back to reference Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV) (pp. 3–19). Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV) (pp. 3–19).
33.
go back to reference Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13 Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on an adaptive attention mechanism. Sci Rep 10(1):1–13
34.
go back to reference Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475. Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:​2005.​11475.
35.
go back to reference Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:​1706.​02677.
36.
go back to reference Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV) (pp. 801–818). Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
37.
go back to reference Birodkar V, Lu Z, Li S, Rathod V, Huang J (2021) The surprising impact of mask-head architecture on novel class segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7015–7025). Birodkar V, Lu Z, Li S, Rathod V, Huang J (2021) The surprising impact of mask-head architecture on novel class segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7015–7025).
38.
go back to reference Cheng B, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1290–1299). Cheng B, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1290–1299).
39.
go back to reference Han C, Zhao Q, Zhang S, Chen Y, Zhang Z, Yuan J (2022) YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception. arXiv preprint arXiv:2208.11434. Han C, Zhao Q, Zhang S, Chen Y, Zhang Z, Yuan J (2022) YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception. arXiv preprint arXiv:​2208.​11434.
40.
go back to reference Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv preprint arXiv:1709.05943. Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv preprint arXiv:​1709.​05943.
41.
go back to reference Wang CY, Bochkovskiy A, Liao HYM (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696. Wang CY, Bochkovskiy A, Liao HYM (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:​2207.​02696.
42.
go back to reference Zheng W, Tang W, Jiang L, Fu CW (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14494–14503). Zheng W, Tang W, Jiang L, Fu CW (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14494–14503).
43.
go back to reference He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
44.
go back to reference Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755). Springer, Cham. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision (pp. 740–755). Springer, Cham.
45.
go back to reference Everingham M, Eslami SM, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111(1):98–136CrossRef Everingham M, Eslami SM, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111(1):98–136CrossRef
47.
go back to reference Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790). Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790).
48.
51.
go back to reference Priyadarshni D, Kolekar MH (2020) Underwater object detection and tracking. In: Soft Computing: Theories and Applications (pp. 837–846). Springer, Singapore. Priyadarshni D, Kolekar MH (2020) Underwater object detection and tracking. In: Soft Computing: Theories and Applications (pp. 837–846). Springer, Singapore.
52.
go back to reference Han F, Yao J, Zhu H, Wang C (2020) Underwater image processing and object detection based on deep CNN method. J Sensors Han F, Yao J, Zhu H, Wang C (2020) Underwater image processing and object detection based on deep CNN method. J Sensors
53.
go back to reference Castrillón M, Déniz O, Hernández D, Lorenzo J (2011) A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Mach Vis Appl 22(3):481–494 Castrillón M, Déniz O, Hernández D, Lorenzo J (2011) A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Mach Vis Appl 22(3):481–494
54.
go back to reference Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464–7475). Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464–7475).
Metadata
Title
DAF-Net: dense attention feature pyramid network for multiscale object detection
Authors
Divine Njengwie Achinek
Ibrahim Shehi Shehu
Athuman Mohamed Athuman
Xianping Fu
Publication date
01-06-2024
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 2/2024
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-024-00323-x

Other articles of this Issue 2/2024

International Journal of Multimedia Information Retrieval 2/2024 Go to the issue

Premium Partner