Skip to main content
Erschienen in:

26.08.2022

Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection

verfasst von: Yunhua Lu, Minghui Su, Yong Wang, Zhi Liu, Tao Peng

Erschienen in: Cognitive Computation | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the research scene in object detection becomes increasingly complex, the extracted feature information needs to be further improved. Many multi-scale feature pyramid network methods have been proposed to improve detection accuracy. However, most of them just follow a simple chain aggregation structure, resulting in not considering the distinction between multi-scale objects. Modern cognitive research presents that human cognitive ability is not a simple image-based matching process. It has an inherent process of information decomposition and reconstruction. Inspired by this theory, a new feature pyramid network model denoted as SuFPN based on discriminative learning is proposed to solve the problem of multi-scale object detection. In SuFPN, the correlation between the underlying location information and the deep feature information is fully considered. Firstly, object features are extracted through top-down path and lateral connection. Then deformable convolution is used to extract object discriminant spatial information. Finally, the attention mechanism is introduced to generate a discriminative feature map with enhanced spatial and channel interdependence, which provides excellent location information for the feature pyramid while considering semantic information. The proposed SuFPN is validated on the PASCAL VOC and COCO datasets. The Average Precision (AP) value reaches 80.0 on the PASCAL VOC dataset, which is 1.7 points higher than the feature pyramid networks (FPN), and 39.2 on the COCO dataset, which is 1.8 points higher than the FPN. The result demonstrates that our SuFPN outperforms other advanced methods in the multi-scale detection precision.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2961-9. He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2961-9.
2.
Zurück zum Zitat Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 2117-25. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 2117-25.
3.
Zurück zum Zitat Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 6154-62. Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 6154-62.
4.
Zurück zum Zitat Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2980-8. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 2980-8.
5.
Zurück zum Zitat Tian Z, Shen C, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 9627-36. Tian Z, Shen C, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 9627-36.
6.
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer; 2016. p. 21-37. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer; 2016. p. 21-37.
7.
8.
Zurück zum Zitat Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 169-85. Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 169-85.
9.
Zurück zum Zitat Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431-40. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431-40.
10.
Zurück zum Zitat Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1520-8. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1520-8.
11.
Zurück zum Zitat Cai Z, Fan Q, Feris RS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision. Springer; 2016. p. 354-70. Cai Z, Fan Q, Feris RS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision. Springer; 2016. p. 354-70.
12.
Zurück zum Zitat Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 845-53. Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 845-53.
13.
Zurück zum Zitat Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y. Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 5936-44. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y. Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 5936-44.
14.
Zurück zum Zitat Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ. Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 234-50. Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ. Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 234-50.
15.
Zurück zum Zitat Zhou P, Ni B, Geng C, Hu J, Xu Y. Scale-transferrable object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 528-37. Zhou P, Ni B, Geng C, Hu J, Xu Y. Scale-transferrable object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 528-37.
16.
Zurück zum Zitat Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8759-68. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8759-68.
17.
Zurück zum Zitat Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D. Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 821-30. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D. Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 821-30.
18.
Zurück zum Zitat Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer; 2014. p. 740-55. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer; 2014. p. 740-55.
19.
Zurück zum Zitat Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 7263-71. Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 7263-71.
20.
Zurück zum Zitat Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. 2013. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:​1312.​6229. 2013.
21.
Zurück zum Zitat Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, et al. NAS-FCOS: Fast neural architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 11943-51. Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, et al. NAS-FCOS: Fast neural architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 11943-51.
22.
Zurück zum Zitat Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL. Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 5813-21. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL. Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 5813-21.
23.
Zurück zum Zitat Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, et al. M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. p. 9259-66. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, et al. M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. p. 9259-66.
24.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 2015;201. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 2015;201.
25.
Zurück zum Zitat Guo C, Fan B, Zhang Q, Xiang S, Pan C. Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 12595-604. Guo C, Fan B, Zhang Q, Xiang S, Pan C. Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 12595-604.
26.
Zurück zum Zitat Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al. Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10186-95. Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al. Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10186-95.
27.
Zurück zum Zitat Girshick R, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for accurate object detection and semantic segmentation. IEEE Computer Society. 2013. Girshick R, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for accurate object detection and semantic segmentation. IEEE Computer Society. 2013.
28.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst. 2012;25:1097–105. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst. 2012;25:1097–105.
29.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.CrossRef He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.CrossRef
30.
Zurück zum Zitat Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1440-8. Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1440-8.
31.
Zurück zum Zitat Sun K, Xiao B, Liu D, Wang J. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 5693-703. Sun K, Xiao B, Liu D, Wang J. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 5693-703.
32.
Zurück zum Zitat Ghiasi G, Lin TY, Le QV. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 7036-45. Ghiasi G, Lin TY, Le QV. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 7036-45.
33.
Zurück zum Zitat Xu H, Yao L, Zhang W, Liang X, Li Z. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 6649-58. Xu H, Yao L, Zhang W, Liang X, Li Z. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 6649-58.
34.
Zurück zum Zitat Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10781-90. Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 10781-90.
35.
Zurück zum Zitat Wang X, Zhang S, Yu Z, Feng L, Zhang W. Scale-equalizing pyramid convolution for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 13359-68. Wang X, Zhang S, Yu Z, Feng L, Zhang W. Scale-equalizing pyramid convolution for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 13359-68.
36.
Zurück zum Zitat Liang T, Wang Y, Tang Z, Hu G, Ling H. OPANAS: One-shot path aggregation network architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 10195-203. Liang T, Wang Y, Tang Z, Hu G, Ling H. OPANAS: One-shot path aggregation network architecture search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 10195-203.
37.
Zurück zum Zitat Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 3156-64. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 3156-64.
38.
Zurück zum Zitat Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 7132-41. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 7132-41.
39.
Zurück zum Zitat Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3-19. Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3-19.
40.
Zurück zum Zitat Wang SH, Fernandes S, Zhu Z, Zhang YD. AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sensors J. 2021. Wang SH, Fernandes S, Zhu Z, Zhang YD. AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sensors J. 2021.
41.
Zurück zum Zitat Zhang YD, Zhang Z, Zhang X, Wang SH. MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray. Pattern Recogn Lett. 2021;150:8–16.CrossRef Zhang YD, Zhang Z, Zhang X, Wang SH. MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray. Pattern Recogn Lett. 2021;150:8–16.CrossRef
42.
Zurück zum Zitat Li X, Lai T, Wang S, Chen Q, Yang C, Chen R, et al.; IEEE. Weighted feature pyramid networks for object detection. IEEE Computer Society. 2013:1500-4. Li X, Lai T, Wang S, Chen Q, Yang C, Chen R, et al.; IEEE. Weighted feature pyramid networks for object detection. IEEE Computer Society. 2013:1500-4.
43.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1026-34. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 1026-34.
Metadaten
Titel
Learning Discriminated Features Based on Feature Pyramid Networks and Attention for Multi-scale Object Detection
verfasst von
Yunhua Lu
Minghui Su
Yong Wang
Zhi Liu
Tao Peng
Publikationsdatum
26.08.2022
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2023
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-022-10052-0