Skip to main content
Erschienen in: Neural Computing and Applications 10/2019

19.04.2018 | Original Article

An enhanced SSD with feature fusion and visual reasoning for object detection

verfasst von: Jiaxu Leng, Ying Liu

Erschienen in: Neural Computing and Applications | Ausgabe 10/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Single Shot Multibox Detector (SSD) is one of the top performing object detection algorithms in terms of both accuracy and speed. SSD achieves impressive performance on various datasets by using different output layers for object detection. However, each layer in the feature pyramid is used independently, and SSD considers only the fine-grained details of the objects but ignores the context surrounding objects. In this paper, we proposed an enhanced SSD, called ESSD, that improved the performance of the conventional SSD by fusing feature maps of different output layers, instead of growing layers close to the input data. Our method used two-way transfer of feature information and feature fusion to enhance the network. To assist further with object detection, we proposed a visual reasoning method that utilized fully the relationships between objects instead of using only the features of the objects themselves. This addition of visual reasoning proved very effective for detecting objects that are too small or have small features. To evaluate the proposed ESSD, we trained the model with VOC2007 and VOC2012 training sets and evaluated the performance on the Pascal VOC2007 test set. For \(300 \times 300\) input, ESSD achieved 79.2% mean average precision (mAP) at 52.0 frames per second (FPS), and for \(512 \times 512\) input, this approach achieved 82.4% mAP at 18.6 FPS. These results demonstrated that our proposed method can achieve state-of-the-art mAP, which is a better result than provided by the conventional SSD and other advanced detectors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2129–2137 Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2129–2137
2.
Zurück zum Zitat Dai J, Li Y, He K, et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process. Syst, pp 379–387 Dai J, Li Y, He K, et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process. Syst, pp 379–387
3.
Zurück zum Zitat Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
4.
Zurück zum Zitat Bell S, Lawrence Zitnick C, Bala K, et al (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883 Bell S, Lawrence Zitnick C, Bala K, et al (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
5.
Zurück zum Zitat Fukui A, Park D H, Yang D, et al (2016) Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 Fukui A, Park D H, Yang D, et al (2016) Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:​1606.​01847
6.
Zurück zum Zitat Kong T, Yao A, Chen Y, et al (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853 Kong T, Yao A, Chen Y, et al (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
7.
Zurück zum Zitat Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector[C]. In: European conference on computer vision. Springer, Cham, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector[C]. In: European conference on computer vision. Springer, Cham, pp 21–37
8.
Zurück zum Zitat Gao Y, Beijbom O, Zhang N, et al (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326 Gao Y, Beijbom O, Zhang N, et al (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326
9.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR 2005: IEEE computer society conference on computer vision and pattern recognition, 2005, vol 1. IEEE, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR 2005: IEEE computer society conference on computer vision and pattern recognition, 2005, vol 1. IEEE, pp 886–893
10.
Zurück zum Zitat Erhan D, Szegedy C, Toshev A, et al (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154 Erhan D, Szegedy C, Toshev A, et al (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
11.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
12.
Zurück zum Zitat Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15), Montreal, 7–12 December 2015, pp 1990–1998 Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15), Montreal, 7–12 December 2015, pp 1990–1998
13.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), Lake Tahoe, 3–6 December 2012, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), Lake Tahoe, 3–6 December 2012, pp 1097–1105
14.
Zurück zum Zitat Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531CrossRef Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531CrossRef
15.
Zurück zum Zitat Girshick RB, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRef Girshick RB, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRef
16.
Zurück zum Zitat Uijlings JR, De Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRef Uijlings JR, De Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRef
17.
Zurück zum Zitat Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405 Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405
18.
Zurück zum Zitat He K, Zhang X, Ren S, et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361 He K, Zhang X, Ren S, et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361
19.
Zurück zum Zitat Girshick RB (2015) Fast R-CNN. In: International conference on computer vision, pp 1440–1448 Girshick RB (2015) Fast R-CNN. In: International conference on computer vision, pp 1440–1448
20.
Zurück zum Zitat Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405 Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405
21.
Zurück zum Zitat Ren S, He K, Girshick RB et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick RB et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
22.
Zurück zum Zitat Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, 27–30 June 2016, pp 779–788 Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, 27–30 June 2016, pp 779–788
23.
Zurück zum Zitat Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv preprint, p 1612 Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv preprint, p 1612
25.
Zurück zum Zitat Everingham M, Van Gool L, Williams CKI et al (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338CrossRef Everingham M, Van Gool L, Williams CKI et al (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338CrossRef
26.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Metadaten
Titel
An enhanced SSD with feature fusion and visual reasoning for object detection
verfasst von
Jiaxu Leng
Ying Liu
Publikationsdatum
19.04.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 10/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3486-1

Weitere Artikel der Ausgabe 10/2019

Neural Computing and Applications 10/2019 Zur Ausgabe

Premium Partner