Skip to main content
Top

2020 | OriginalPaper | Chapter

Pixel-Semantic Revising of Position: One-Stage Object Detector with Shared Encoder-Decoder

Authors : Qian Li, Nan Guo, Xiaochun Ye, Dongrui Fan, Zhimin Tang

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, many methods have been proposed for object detection. However, they cannot detect objects by semantic features, adaptively. According to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms. However, they require more cost. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism, extracting shared features. We consider features of different levels from backbone (e.g., ResNet-50) as the basis features. Then, we feed the features into a simple module, followed by a detector header to detect objects. Meantime, we use the semantic features to revise geometric locations, and the detector is a pixel-semantic revising of position. More importantly, this work analyzes the impact of different pooling strategies (e.g., mean, maximum or minimum) on multi-scale objects, and finds the minimum pooling can improve detection performance on small objects better. Compared with state-of-the-art MNC based on ResNet-101 for the standard MSCOCO 2014 baseline, our method improves detection AP of 3.8%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef
2.
go back to reference Bae, S.H.: Object detection based on region decomposition and assembly Bae, S.H.: Object detection based on region decomposition and assembly
3.
go back to reference Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades
4.
go back to reference Dai, J., Yi, L., He, K., Jian, S.: R-FCN: object detection via region-based fully convolutional networks (2016) Dai, J., Yi, L., He, K., Jian, S.: R-FCN: object detection via region-based fully convolutional networks (2016)
5.
go back to reference Fei, W., et al.: Residual attention network for image classification (2017) Fei, W., et al.: Residual attention network for image classification (2017)
6.
go back to reference Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017) Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:​1701.​06659 (2017)
7.
go back to reference Girshick, R.: Fast R-CNN. In: Computer Science (2015) Girshick, R.: Fast R-CNN. In: Computer Science (2015)
8.
go back to reference Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (2014)
9.
go back to reference Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
10.
go back to reference Hu, X., Xu, X., Xiao, Y., Chen, H., Heng, P.A.: SINet: a scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans. Intell. Transp. Syst. 20(3), 1010–1019 (2019)CrossRef Hu, X., Xu, X., Xiao, Y., Chen, H., Heng, P.A.: SINet: a scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans. Intell. Transp. Syst. 20(3), 1010–1019 (2019)CrossRef
11.
go back to reference Huang, L., Yi, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. In: Computer Science (2015) Huang, L., Yi, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. In: Computer Science (2015)
12.
go back to reference Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015) Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
13.
go back to reference Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5936–5944 (2017) Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5936–5944 (2017)
14.
go back to reference Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints (2018) Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints (2018)
15.
go back to reference Lin, T.Y., Dollár, P., Girshick, R., He, K., Belongie, S.: Feature pyramid networks for object detection (2016) Lin, T.Y., Dollár, P., Girshick, R., He, K., Belongie, S.: Feature pyramid networks for object detection (2016)
16.
go back to reference Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017) Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017)
18.
go back to reference Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection (2015) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection (2015)
19.
go back to reference Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision & Pattern Recognition (2017) Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)
20.
go back to reference Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018) Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)
21.
go back to reference Ren, S., Girshick, R., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef Ren, S., Girshick, R., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef
22.
go back to reference Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: IEEE 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp. 761–769 (2016) Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: IEEE 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp. 761–769 (2016)
23.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Computer Science (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Computer Science (2014)
24.
go back to reference Singh, B., Davis, L.S.: An analysis of scale invariance in object detection - snip Singh, B., Davis, L.S.: An analysis of scale invariance in object detection - snip
25.
go back to reference Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019) Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
26.
go back to reference Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module (2018) Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module (2018)
27.
go back to reference Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network (2016) Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network (2016)
28.
go back to reference Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Ling, H.: M2Det: a single-shot object detector based on multi-level feature pyramid network (2018) Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Ling, H.: M2Det: a single-shot object detector based on multi-level feature pyramid network (2018)
29.
go back to reference Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection (2019) Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection (2019)
Metadata
Title
Pixel-Semantic Revising of Position: One-Stage Object Detector with Shared Encoder-Decoder
Authors
Qian Li
Nan Guo
Xiaochun Ye
Dongrui Fan
Zhimin Tang
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-63820-7_59

Premium Partner