Skip to main content
Top
Published in:

26-08-2022

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

Authors: Yunhua Lu, Minghui Su, Yong Wang, Zhi Liu, Tao Peng

Published in: Cognitive Computation | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As the research scene in object detection becomes increasingly complex, the extracted feature information needs to be further improved. Many multi-scale feature pyramid network methods have been proposed to improve detection accuracy. However, most of them just follow a simple chain aggregation structure, resulting in not considering the distinction between multi-scale objects. Modern cognitive research presents that human cognitive ability is not a simple image-based matching process. It has an inherent process of information decomposition and reconstruction. Inspired by this theory, a new feature pyramid network model denoted as SuFPN based on discriminative learning is proposed to solve the problem of multi-scale object detection. In SuFPN, the correlation between the underlying location information and the deep feature information is fully considered. Firstly, object features are extracted through top-down path and lateral connection. Then deformable convolution is used to extract object discriminant spatial information. Finally, the attention mechanism is introduced to generate a discriminative feature map with enhanced spatial and channel interdependence, which provides excellent location information for the feature pyramid while considering semantic information. The proposed SuFPN is validated on the PASCAL VOC and COCO datasets. The Average Precision (AP) value reaches 80.0 on the PASCAL VOC dataset, which is 1.7 points higher than the feature pyramid networks (FPN), and 39.2 on the COCO dataset, which is 1.8 points higher than the FPN. The result demonstrates that our SuFPN outperforms other advanced methods in the multi-scale detection precision.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2961-9. He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2961-9.
2.
go back to reference Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 2117-25. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 2117-25.
3.
go back to reference Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 6154-62. Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 6154-62.
4.
go back to reference Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2980-8. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2980-8.
5.
go back to reference Tian Z, Shen C, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 9627-36. Tian Z, Shen C, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 9627-36.
6.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer; 2016. p. 21-37. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer; 2016. p. 21-37.
8.
go back to reference Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 169-85. Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 169-85.
9.
go back to reference Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431-40. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431-40.
10.
go back to reference Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1520-8. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1520-8.
11.
go back to reference Cai Z, Fan Q, Feris RS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision. Springer; 2016. p. 354-70. Cai Z, Fan Q, Feris RS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision. Springer; 2016. p. 354-70.
12.
go back to reference Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 845-53. Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 845-53.
13.
go back to reference Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y. Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 5936-44. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y. Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 5936-44.
14.
go back to reference Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ. Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 234-50. Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ. Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 234-50.
15.
go back to reference Zhou P, Ni B, Geng C, Hu J, Xu Y. Scale-transferrable object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 528-37. Zhou P, Ni B, Geng C, Hu J, Xu Y. Scale-transferrable object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 528-37.
16.
go back to reference Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8759-68. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8759-68.
17.
go back to reference Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D. Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 821-30. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D. Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 821-30.
18.
go back to reference Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer; 2014. p. 740-55. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer; 2014. p. 740-55.
19.
go back to reference Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 7263-71. Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 7263-71.
20.
go back to reference Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. 2013. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:​1312.​6229. 2013.
21.
go back to reference Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, et al. NAS-FCOS: Fast neural architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 11943-51. Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, et al. NAS-FCOS: Fast neural architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 11943-51.
22.
go back to reference Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL. Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 5813-21. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL. Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 5813-21.
23.
go back to reference Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, et al. M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. p. 9259-66. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, et al. M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. p. 9259-66.
24.
go back to reference Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 2015;201. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 2015;201.
25.
go back to reference Guo C, Fan B, Zhang Q, Xiang S, Pan C. Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 12595-604. Guo C, Fan B, Zhang Q, Xiang S, Pan C. Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 12595-604.
26.
go back to reference Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al. Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10186-95. Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al. Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10186-95.
27.
go back to reference Girshick R, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for accurate object detection and semantic segmentation. IEEE Computer Society. 2013. Girshick R, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for accurate object detection and semantic segmentation. IEEE Computer Society. 2013.
28.
go back to reference Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst. 2012;25:1097–105. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst. 2012;25:1097–105.
29.
go back to reference He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.CrossRef He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.CrossRef
30.
go back to reference Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1440-8. Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1440-8.
31.
go back to reference Sun K, Xiao B, Liu D, Wang J. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 5693-703. Sun K, Xiao B, Liu D, Wang J. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 5693-703.
32.
go back to reference Ghiasi G, Lin TY, Le QV. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 7036-45. Ghiasi G, Lin TY, Le QV. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 7036-45.
33.
go back to reference Xu H, Yao L, Zhang W, Liang X, Li Z. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 6649-58. Xu H, Yao L, Zhang W, Liang X, Li Z. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 6649-58.
34.
go back to reference Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10781-90. Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10781-90.
35.
go back to reference Wang X, Zhang S, Yu Z, Feng L, Zhang W. Scale-equalizing pyramid convolution for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 13359-68. Wang X, Zhang S, Yu Z, Feng L, Zhang W. Scale-equalizing pyramid convolution for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 13359-68.
36.
go back to reference Liang T, Wang Y, Tang Z, Hu G, Ling H. OPANAS: One-shot path aggregation network architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 10195-203. Liang T, Wang Y, Tang Z, Hu G, Ling H. OPANAS: One-shot path aggregation network architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 10195-203.
37.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 3156-64. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 3156-64.
38.
go back to reference Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 7132-41. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 7132-41.
39.
go back to reference Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3-19. Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3-19.
40.
go back to reference Wang SH, Fernandes S, Zhu Z, Zhang YD. AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sensors J. 2021. Wang SH, Fernandes S, Zhu Z, Zhang YD. AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sensors J. 2021.
41.
go back to reference Zhang YD, Zhang Z, Zhang X, Wang SH. MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray. Pattern Recogn Lett. 2021;150:8–16.CrossRef Zhang YD, Zhang Z, Zhang X, Wang SH. MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray. Pattern Recogn Lett. 2021;150:8–16.CrossRef
42.
go back to reference Li X, Lai T, Wang S, Chen Q, Yang C, Chen R, et al.; IEEE. Weighted feature pyramid networks for object detection. IEEE Computer Society. 2013:1500-4. Li X, Lai T, Wang S, Chen Q, Yang C, Chen R, et al.; IEEE. Weighted feature pyramid networks for object detection. IEEE Computer Society. 2013:1500-4.
43.
go back to reference He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1026-34. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1026-34.
Metadata
Title
Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection
Authors
Yunhua Lu
Minghui Su
Yong Wang
Zhi Liu
Tao Peng
Publication date
26-08-2022
Publisher
Springer US
Published in
Cognitive Computation / Issue 2/2023
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-022-10052-0

Other articles of this Issue 2/2023

Cognitive Computation 2/2023 Go to the issue

Premium Partner