Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 5/2022

10.10.2021 | Original Article

Towards unified on-road object detection and depth estimation from a single image

verfasst von: Guofei Lian, Yan Wang, Huabiao Qin, Guancheng Chen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 5/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

On-road object detection based on convolutional neural network (CNN) is an important problem in the field of automatic driving. However, traditional 2D object detection aims to accomplish object classification and location in image space, lacking the ability to acquire the depth information. Besides, it is inefficient to cascade the object detection and monocular depth estimation network for realizing 2.5D object detection. To address this problem, we propose a unified multi-task learning mechanism of object detection and depth estimation. Firstly, we propose an innovative loss function, namely projective consistency loss, which uses the perspective projection principle to model the transformation relationship between the target size and the depth value. Therefore, the object detection task and the depth estimation task can be mutually constrained. Then, we propose a global multi-scale feature extracting scheme by combining the Global Context (GC) and Atrous Spatial Pyramid Pooling (ASPP) block in an appropriate way, which can promote effective feature learning and collaborative learning between object detection and depth estimation. Comprehensive experiments conducted on KITTI and Cityscapes dataset show that our approach achieves high mAP and low distance estimation error, outperforming other state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Unitbox: an advanced object detection network. ACM (2016) Unitbox: an advanced object detection network. ACM (2016)
2.
Zurück zum Zitat Barzegar S, Sharifi A, Manthouri M (2020) Super-resolution using lightweight detailnet network. Multimed Tools Appl 79(1):1119–1136CrossRef Barzegar S, Sharifi A, Manthouri M (2020) Super-resolution using lightweight detailnet network. Multimed Tools Appl 79(1):1119–1136CrossRef
3.
Zurück zum Zitat Brenner E, Smeets JB (2018) Depth perception. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2:1–30 Brenner E, Smeets JB (2018) Depth perception. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2:1–30
4.
Zurück zum Zitat Caiwu L, Fan Q, Shunling R (2020) An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model. Opto-Electron Eng 47(1):190161 Caiwu L, Fan Q, Shunling R (2020) An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model. Opto-Electron Eng 47(1):190161
5.
Zurück zum Zitat Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops
6.
Zurück zum Zitat Chen F, Hong B (2005) Object detecting method based on background image difference using dynamic threshold. J Harbin Inst Technol 7 Chen F, Hong B (2005) Object detecting method based on background image difference using dynamic threshold. J Harbin Inst Technol 7
8.
Zurück zum Zitat Collado JM, Hilario C, de la Escalera A, Armingol JM (2004) Model based vehicle detection for intelligent vehicles. In: IEEE Intelligent Vehicles Symposium 2004, pp 572–577 Collado JM, Hilario C, de la Escalera A, Armingol JM (2004) Model based vehicle detection for intelligent vehicles. In: IEEE Intelligent Vehicles Symposium 2004, pp 572–577
9.
Zurück zum Zitat Dijk, T.v, Croon, G.d (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Dijk, T.v, Croon, G.d (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
10.
Zurück zum Zitat Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
11.
Zurück zum Zitat Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press, Cambridge Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press, Cambridge
13.
Zurück zum Zitat Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8 Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
15.
Zurück zum Zitat Garg R, BG, VK, Carneiro G, Reid I, (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Computer Vision—ECCV 2016, pp 740–756 Garg R, BG, VK, Carneiro G, Reid I, (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Computer Vision—ECCV 2016, pp 740–756
16.
Zurück zum Zitat Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361 Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361
17.
Zurück zum Zitat Ghlert N, Jourdan N, Cordts M, Franke U, Denzler J (2020) Cityscapes 3d: dataset and benchmark for 9 dof vehicle detection Ghlert N, Jourdan N, Cordts M, Franke U, Denzler J (2020) Cityscapes 3d: dataset and benchmark for 9 dof vehicle detection
18.
Zurück zum Zitat Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
19.
Zurück zum Zitat Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
20.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
21.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
22.
Zurück zum Zitat Liu S , Di H , Wang Y (2017) Receptive Field Block Net for Accurate and Fast Object Detection Liu S , Di H , Wang Y (2017) Receptive Field Block Net for Accurate and Fast Object Detection
23.
Zurück zum Zitat Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
24.
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision—ECCV 2016, Cham, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision—ECCV 2016, Cham, pp 21–37
25.
Zurück zum Zitat Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
26.
Zurück zum Zitat Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
28.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99 Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99
29.
Zurück zum Zitat Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
30.
Zurück zum Zitat Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
31.
Zurück zum Zitat Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
32.
Zurück zum Zitat Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
33.
Zurück zum Zitat Xu Y, He P (2019) Yolov3 vehicle detection algorithm with improved loss function. Inform Commun 12:4–7 Xu Y, He P (2019) Yolov3 vehicle detection algorithm with improved loss function. Inform Commun 12:4–7
35.
Zurück zum Zitat Zhang Z, Wang H, Ji Z, Wei Y (2018) A vehicle real-time detection algorithm based on yolov2 framework. In: Real-time Image & Video Processing Zhang Z, Wang H, Ji Z, Wei Y (2018) A vehicle real-time detection algorithm based on yolov2 framework. In: Real-time Image & Video Processing
Metadaten
Titel
Towards unified on-road object detection and depth estimation from a single image
verfasst von
Guofei Lian
Yan Wang
Huabiao Qin
Guancheng Chen
Publikationsdatum
10.10.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 5/2022
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-021-01444-z

Weitere Artikel der Ausgabe 5/2022

International Journal of Machine Learning and Cybernetics 5/2022 Zur Ausgabe

Neuer Inhalt