Skip to main content
main-content

Tipp

Weitere Artikel dieser Ausgabe durch Wischen aufrufen

Erschienen in: Neural Processing Letters 4/2022

07.04.2021

Stacked Pyramid Attention Network for Object Detection

verfasst von: Shijie Hao, Zhonghao Wang, Fuming Sun

Erschienen in: Neural Processing Letters | Ausgabe 4/2022

Einloggen, um Zugang zu erhalten
share
TEILEN

Abstract

Scale variation is one of the primary challenges in object detection. Recently, different strategies have been introduced to address this challenge, achieving promising performance. However, limitations still exist in these detectors. On the one hand, as for the large-scale deep layers, the localizing power of the features is relatively low. On the other hand, as for the small-scale shallow layers, the categorizing ability of the features is relatively weak. Actually, the limitations are self-solving, as the above two aspects can be mutually beneficial to each other. Therefore, we propose the Stacked Pyramid Attention Network (SPANet) to bridge the gap between different scales. In SPANet, two lightweight modules, i.e. top-down feature map attention module (TDFAM) and bottom-up feature map attention module (BUFAM), are designed. Via learning the channel attention and spatial attention, each module effectively builds connections between features from adjacent scales. By progressively integrating BUFAM and TDFAM into two encoder–decoder structures, two novel feature aggregating branches are built. In this way, the branches fully complement the localizing power from small-scale features and the categorizing power from large-scale features, therefore enhancing the detection accuracy while keeping lightweight. Extensive experiments on two challenging benchmarks (PASCAL VOC and MS COCO datasets) demonstrate the effectiveness of our SPANet, showing that our model reaches a competitive trade-off between accuracy and speed.
Literatur
1.
Zurück zum Zitat Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2874–2883 Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2874–2883
2.
Zurück zum Zitat Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms—improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 5561–5569 Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms—improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 5561–5569
3.
Zurück zum Zitat Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162 Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
4.
Zurück zum Zitat Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30:3372–3386 CrossRef Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30:3372–3386 CrossRef
5.
Zurück zum Zitat Cao J, Cholakkal H, Anwer RM, Khan FS, Pang Y, Shao L (2020) D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11485–11494 Cao J, Cholakkal H, Anwer RM, Khan FS, Pang Y, Shao L (2020) D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11485–11494
6.
Zurück zum Zitat Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) Detnas: backbone search for object detection. In: Advances in neural information processing systems (NIPS), pp 6638–6648 Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) Detnas: backbone search for object detection. In: Advances in neural information processing systems (NIPS), pp 6638–6648
7.
Zurück zum Zitat Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NIPS), pp 379–387 Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NIPS), pp 379–387
8.
Zurück zum Zitat Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 764–773 Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 764–773
9.
Zurück zum Zitat Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338 CrossRef Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338 CrossRef
11.
Zurück zum Zitat Fu Z, Jin Z, Qi G, Shen C, Jiang R, Chen Y, Hua X (2018) Previewer for multi-scale object detector. In: Proceedings of the 26th ACM international conference on multimedia (MM), pp 265–273 Fu Z, Jin Z, Qi G, Shen C, Jiang R, Chen Y, Hua X (2018) Previewer for multi-scale object detector. In: Proceedings of the 26th ACM international conference on multimedia (MM), pp 265–273
12.
Zurück zum Zitat Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7036–7045 Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7036–7045
13.
Zurück zum Zitat Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587 Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587
14.
Zurück zum Zitat Guo Y, Wu Z, Shen D (2020) Learning longitudinal classification-regression model for infant hippocampus segmentation. Neurocomputing 391:191–198 CrossRef Guo Y, Wu Z, Shen D (2020) Learning longitudinal classification-regression model for infant hippocampus segmentation. Neurocomputing 391:191–198 CrossRef
15.
Zurück zum Zitat Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302–321 CrossRef Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302–321 CrossRef
16.
Zurück zum Zitat He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969 He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
17.
Zurück zum Zitat Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7310–7311 Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7310–7311
18.
Zurück zum Zitat Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874 Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874
19.
Zurück zum Zitat Ji Z, Kong Q, Wang H, Pang Y (2019) Small and dense commodity object detection with multi-scale receptive field attention. In: Proceedings of the 27th ACM international conference on multimedia (MM), pp 1349–1357 Ji Z, Kong Q, Wang H, Pang Y (2019) Small and dense commodity object detection with multi-scale receptive field attention. In: Proceedings of the 27th ACM international conference on multimedia (MM), pp 1349–1357
20.
Zurück zum Zitat Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750 Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
21.
Zurück zum Zitat Li H, Sun F, Liu L, Wang L (2015) A novel traffic sign detection method via color segmentation and robust shape matching. Neurocomputing 169:77–88 CrossRef Li H, Sun F, Liu L, Wang L (2015) A novel traffic sign detection method via color segmentation and robust shape matching. Neurocomputing 169:77–88 CrossRef
22.
Zurück zum Zitat Li S, Yang L, Huang J, Hua XS, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6609–6618 Li S, Yang L, Huang J, Hua XS, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6609–6618
23.
Zurück zum Zitat Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6054–6063 Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6054–6063
24.
Zurück zum Zitat Li Y, Pang Y, Shen J, Cao J, Shao L (2020) Netnet: neighbor erasing and transferring network for better single shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13349–13358 Li Y, Pang Y, Shen J, Cao J, Shao L (2020) Netnet: neighbor erasing and transferring network for better single shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13349–13358
25.
Zurück zum Zitat Li Z, Lang C, Liew J, Hou Q, Li Y, Feng J (2020) Cross-layer feature pyramid network for salient object detection. arXiv preprint arXiv:​2002.​10864 Li Z, Lang C, Liew J, Hou Q, Li Y, Feng J (2020) Cross-layer feature pyramid network for salient object detection. arXiv preprint arXiv:​2002.​10864
26.
Zurück zum Zitat Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 740–755 Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 740–755
27.
Zurück zum Zitat Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125 Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
28.
Zurück zum Zitat Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988 Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
29.
Zurück zum Zitat Liu S, Huang D, et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400 Liu S, Huang D, et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400
30.
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 21–37
31.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
32.
Zurück zum Zitat Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 6851–6860 Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 6851–6860
33.
Zurück zum Zitat Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy CC, et al (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2403–2412 Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy CC, et al (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2403–2412
34.
Zurück zum Zitat Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271
35.
Zurück zum Zitat Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788 Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
36.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99 Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99
37.
Zurück zum Zitat Shrivastava A, Sukthankar R, Malik J, Gupta A (2016) Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv:​1612.​06851 Shrivastava A, Sukthankar R, Malik J, Gupta A (2016) Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv:​1612.​06851
38.
Zurück zum Zitat Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9627–9636 Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9627–9636
39.
Zurück zum Zitat Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171 CrossRef Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171 CrossRef
40.
Zurück zum Zitat Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2965–2974 Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2965–2974
41.
Zurück zum Zitat Woo S, Park J, Lee JY, So KI (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19 Woo S, Park J, Lee JY, So KI (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
42.
Zurück zum Zitat Yang B, Yang C, Liu Q, Yin X (2019) Joint rotation-invariance face detection and alignment with angle-sensitivity cascaded networks. In: Proceedings of the 27th ACM international conference on multimedia (MM), pp 1473–1480 Yang B, Yang C, Liu Q, Yin X (2019) Joint rotation-invariance face detection and alignment with angle-sensitivity cascaded networks. In: Proceedings of the 27th ACM international conference on multimedia (MM), pp 1473–1480
44.
Zurück zum Zitat Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia (MM), pp 516–520 Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia (MM), pp 516–520
45.
Zurück zum Zitat Zhang C, Kim J (2019) Object detection with location-aware deformable convolution and backward attention filtering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9452–9461 Zhang C, Kim J (2019) Object detection with location-aware deformable convolution and backward attention filtering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9452–9461
46.
Zurück zum Zitat Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212 Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212
47.
Zurück zum Zitat Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5813–5821 Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5813–5821
48.
Zurück zum Zitat Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. Proc AAAI Conf Artif Intell 33:9259–9266 Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. Proc AAAI Conf Artif Intell 33:9259–9266
49.
Zurück zum Zitat Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859 Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859
50.
Zurück zum Zitat Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4126–4134 Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4126–4134
51.
Zurück zum Zitat Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 391–405 Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 391–405
Metadaten
Titel
Stacked Pyramid Attention Network for Object Detection
verfasst von
Shijie Hao
Zhonghao Wang
Fuming Sun
Publikationsdatum
07.04.2021
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 4/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10505-x

Weitere Artikel der Ausgabe 4/2022

Neural Processing Letters 4/2022 Zur Ausgabe