MFANet: Multi-scale feature fusion network with attention mechanism

Wang, Gaihua; Gan, Xin; Cao, Qingcheng; Zhai, Qianyu

doi:10.1007/s00371-022-02503-4

MFANet: Multi-scale feature fusion network with attention mechanism

Original article
Published: 11 May 2022

Volume 39, pages 2969–2980, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

1186 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

In order to improve the detection accuracy of the network, it proposes multi-scale feature fusion and attention mechanism net (MFANet) based on deep learning, which integrates pyramid module and channel attention mechanism effectively. Pyramid module is designed for feature fusion in the channel and space dimensions. Channel attention mechanism obtains feature maps in different receptive fields, which divides each feature map into two groups and uses different convolutions to obtain weights. Experimental results show that our strategy boosts state-of-the-arts by 1–2% box AP on object detection benchmarks. Among them, the accuracy of MFANet reaches 34.2% in box AP on COCO dataset. Compared with the current typical algorithms, the proposed method achieves significant performance in detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 4

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Attention-based fusion factor in FPN for object detection

Article 16 March 2022

AgBFPN: Attention Guided Bidirectional Feature Pyramid Network for Object Detection

References

Sugiura, M., Miyauchi, C. M., Kotozaki, Y.: Neural mechanism for mirrored self-face recognition. Cereb. Cortex 25(9), 2806–14 (2015)
Article Google Scholar
Boulgourisa, N.V., Plataniotis, K., Hatzinakos, D.: Gait recognition using linear time normalization. Pattern Recogn. 39(5), 969–979 (2006)
Article MATH Google Scholar
Mei, J., Zhou, D., Cao, J., et al.: HDINet: hierarchical dual-sensor interaction Network for RGBT tracking. IEEE Sens. J. 21(15), 16915–16926 (2021). https://doi.org/10.1109/JSEN.2021.3078455
Article Google Scholar
Chaudhry, H., Rahim, M. S. M., Saba, T.: Crowd detection and counting using a static and dynamic platform: state of the art. Int. J. Comput. Vis. Robot. 9(3), 228–59 (2009)
Article Google Scholar
Cerezo, E., Pérez, F., Pueyo, X.: A survey on participating media rendering techniques. Vis. Comput. 21(5), 303–328 (2005)
Article Google Scholar
Wang, G., Zhai, Q.: Feature fusion network based on strip pooling. Sci. Rep. 11(1), 1–8 (2021)
Google Scholar
Verschae, R., Ruiz-del-Solar, J.: Object detection: current and future directions. Front. Robot. AI 2, 29 (2005)
Google Scholar
Xiao, Y., Tian, Z., Yu, J.: A review of object detection based on deep learning. Multimed. Tools Appl. 79(33/34), 23729–91 (2020)
Google Scholar
Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with r* cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1080–1088 (2015)
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–49 (2017)
Article Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Luo, P.: Sparse r-cnn: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Cham (2016)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)
Lin, T.-Y., Goyal, P., Girshick, R.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–27 (2020)
Article Google Scholar
Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. In: European Conference on Computer Vision, pp. 355–371. Springer, Cham (2020)
Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: Varifocalnet: An IOU-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523 (2021)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125(2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045(2019)
Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: Improving multi-scale feature learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Qiao, S., Chen, L.C., Yuille, A.: Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 10213–10224 (2021)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Pang, Y.: Attention scaling for crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4706–4715 (2020)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Kong, T., Sun, F., Liu, H.: FoveaBox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–98 (2020)
Article MATH Google Scholar
Li, D., Huang, C., Liu, Y.: YOLOv3 target detection algorithm based on channel attention mechanism. In: 2021 3rd International Conference on Natural Language Processing (ICNLP), pp. 179–183. IEEE (2021)

Download references

Funding

This work is supported in part by the National Key R &D Program of China under Grant 2017YFB1302400.

Author information

G. Wang, X. Gan, Q. Cao and Q. Zhai: These authors contributed equally to this work.

Authors and Affiliations

School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, 430068, China
Gaihua Wang, Xin Gan, Qingcheng Cao & Qianyu Zhai
Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan, 430068, China
Gaihua Wang

Authors

Gaihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Gan
View author publications
You can also search for this author in PubMed Google Scholar
Qingcheng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Qianyu Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Gaihua Wang, Xin Gan, Qing Caocheng and Qianyu Zhai conceived the experiments. Xin Gan and Qingcheng Cao conducted the experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xin Gan.

Ethics declarations

Conflict of interest

This article has no conflict of interest with any individual or organization.

Code or data availability

Code and data are available.

Ethics approval

The experiments in this article are all realized through program operation, which will not cause harm to humans and animals and will not cause moral and ethical problems.

Consent to participate

Welcome readers to communicate.

Consent for publication

Completed at Hubei University of Technology on December 14, 2021.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, G., Gan, X., Cao, Q. et al. MFANet: Multi-scale feature fusion network with attention mechanism. Vis Comput 39, 2969–2980 (2023). https://doi.org/10.1007/s00371-022-02503-4

Download citation

Accepted: 11 April 2022
Published: 11 May 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00371-022-02503-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MFANet: Multi-scale feature fusion network with attention mechanism

Abstract

Access this article

Similar content being viewed by others

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Attention-based fusion factor in FPN for object detection

AgBFPN: Attention Guided Bidirectional Feature Pyramid Network for Object Detection

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code or data availability

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MFANet: Multi-scale feature fusion network with attention mechanism

Abstract

Access this article

Similar content being viewed by others

Multi-scale Attention-Based Feature Pyramid Networks for Object Detection

Attention-based fusion factor in FPN for object detection

AgBFPN: Attention Guided Bidirectional Feature Pyramid Network for Object Detection

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code or data availability

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation