Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2021

15.05.2021 | Short Paper

An approach to improve SSD through mask prediction of multi-scale feature maps

verfasst von: Peng Sun, Yaqin Zhao, Songhao Zhu

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 779–788 Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 779–788
3.
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, conference proceedings. Springer, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, conference proceedings. Springer, pp 21–37
4.
Zurück zum Zitat Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, conference proceedings, pp 1440–1448 Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, conference proceedings, pp 1440–1448
5.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:​1506.​01497
6.
Zurück zum Zitat He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, conference proceedings, pp 2961–2969 He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, conference proceedings, pp 2961–2969
7.
Zurück zum Zitat Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, conference proceedings, pp 821–830 Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, conference proceedings, pp 821–830
10.
Zurück zum Zitat Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection. arXiv preprint arXiv:1705.09587 Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection. arXiv preprint arXiv:​1705.​09587
11.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
12.
Zurück zum Zitat Luo W, Li Y, Urtasun R, Zemel R (207) Understanding the effective receptive field in deep convolutional neural networks. arXiv preprint arXiv:1701.04128 Luo W, Li Y, Urtasun R, Zemel R (207) Understanding the effective receptive field in deep convolutional neural networks. arXiv preprint arXiv:​1701.​04128
14.
Zurück zum Zitat Liu S, Huang D (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), conference proceedings, pp 385–400 Liu S, Huang D (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), conference proceedings, pp 385–400
15.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
16.
Zurück zum Zitat Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. arXiv preprint arXiv:1605.06409 Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. arXiv preprint arXiv:​1605.​06409
17.
Zurück zum Zitat Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 2117–2125 Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 2117–2125
18.
Zurück zum Zitat Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 7263–7271
19.
20.
Zurück zum Zitat Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), conference proceedings, pp 734–750 Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), conference proceedings, pp 734–750
21.
Zurück zum Zitat Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 5813–5821 Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 5813–5821
22.
Zurück zum Zitat Yi J, Wu P, Metaxas DN (2019) ASSD: attentive single shot multibox detector. Comput Vis Image Underst 189:102827CrossRef Yi J, Wu P, Metaxas DN (2019) ASSD: attentive single shot multibox detector. Comput Vis Image Underst 189:102827CrossRef
23.
Zurück zum Zitat Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, conference proceedings. Springer, pp 818–833 Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, conference proceedings. Springer, pp 818–833
24.
Zurück zum Zitat Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 8759–8768 Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 8759–8768
25.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 770–778
26.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 1–9
27.
Zurück zum Zitat Iwatsu R, Hyun JM, Kuwahara K (1993) Mixed convection in a driven cavity with a stable vertical temperature gradient. Int J Heat Mass Transf 36(6):1601–1608CrossRef Iwatsu R, Hyun JM, Kuwahara K (1993) Mixed convection in a driven cavity with a stable vertical temperature gradient. Int J Heat Mass Transf 36(6):1601–1608CrossRef
28.
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
29.
Zurück zum Zitat Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning, conference proceedings. PMLR, pp 7354–7363 Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning, conference proceedings. PMLR, pp 7354–7363
30.
Zurück zum Zitat Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, conference proceedings, pp 1919–1927 Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, conference proceedings, pp 1919–1927
31.
Zurück zum Zitat He X, Cheng K, Chen Q, Hu Q, Wang P, Cheng J (2019) Compact global descriptor for neural networks. arXiv preprint arXiv:1907.09665 He X, Cheng K, Chen Q, Hu Q, Wang P, Cheng J (2019) Compact global descriptor for neural networks. arXiv preprint arXiv:​1907.​09665
32.
Zurück zum Zitat Qin Z, Li Z, Zhang Z, Bao Y, Yu G, Peng Y, Sun J (2019) Thundernet: towards real-time generic object detection on mobile devices. In: Proceedings of the IEEE/CVF international conference on computer vision, conference proceedings, pp 6718–6727 Qin Z, Li Z, Zhang Z, Bao Y, Yu G, Peng Y, Sun J (2019) Thundernet: towards real-time generic object detection on mobile devices. In: Proceedings of the IEEE/CVF international conference on computer vision, conference proceedings, pp 6718–6727
33.
34.
Zurück zum Zitat Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 528–537 Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 528–537
35.
Zurück zum Zitat Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 4203–421 Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, conference proceedings, pp 4203–421
Metadaten
Titel
An approach to improve SSD through mask prediction of multi-scale feature maps
verfasst von
Peng Sun
Yaqin Zhao
Songhao Zhu
Publikationsdatum
15.05.2021
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2021
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-021-00993-x

Weitere Artikel der Ausgabe 3/2021

Pattern Analysis and Applications 3/2021 Zur Ausgabe

Premium Partner