Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 5/2022

10-10-2021 | Original Article

Towards unified on-road object detection and depth estimation from a single image

Authors: Guofei Lian, Yan Wang, Huabiao Qin, Guancheng Chen

Published in: International Journal of Machine Learning and Cybernetics | Issue 5/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

On-road object detection based on convolutional neural network (CNN) is an important problem in the field of automatic driving. However, traditional 2D object detection aims to accomplish object classification and location in image space, lacking the ability to acquire the depth information. Besides, it is inefficient to cascade the object detection and monocular depth estimation network for realizing 2.5D object detection. To address this problem, we propose a unified multi-task learning mechanism of object detection and depth estimation. Firstly, we propose an innovative loss function, namely projective consistency loss, which uses the perspective projection principle to model the transformation relationship between the target size and the depth value. Therefore, the object detection task and the depth estimation task can be mutually constrained. Then, we propose a global multi-scale feature extracting scheme by combining the Global Context (GC) and Atrous Spatial Pyramid Pooling (ASPP) block in an appropriate way, which can promote effective feature learning and collaborative learning between object detection and depth estimation. Comprehensive experiments conducted on KITTI and Cityscapes dataset show that our approach achieves high mAP and low distance estimation error, outperforming other state-of-the-art methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Unitbox: an advanced object detection network. ACM (2016) Unitbox: an advanced object detection network. ACM (2016)
2.
go back to reference Barzegar S, Sharifi A, Manthouri M (2020) Super-resolution using lightweight detailnet network. Multimed Tools Appl 79(1):1119–1136CrossRef Barzegar S, Sharifi A, Manthouri M (2020) Super-resolution using lightweight detailnet network. Multimed Tools Appl 79(1):1119–1136CrossRef
3.
go back to reference Brenner E, Smeets JB (2018) Depth perception. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2:1–30 Brenner E, Smeets JB (2018) Depth perception. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2:1–30
4.
go back to reference Caiwu L, Fan Q, Shunling R (2020) An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model. Opto-Electron Eng 47(1):190161 Caiwu L, Fan Q, Shunling R (2020) An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model. Opto-Electron Eng 47(1):190161
5.
go back to reference Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops
6.
go back to reference Chen F, Hong B (2005) Object detecting method based on background image difference using dynamic threshold. J Harbin Inst Technol 7 Chen F, Hong B (2005) Object detecting method based on background image difference using dynamic threshold. J Harbin Inst Technol 7
8.
go back to reference Collado JM, Hilario C, de la Escalera A, Armingol JM (2004) Model based vehicle detection for intelligent vehicles. In: IEEE Intelligent Vehicles Symposium 2004, pp 572–577 Collado JM, Hilario C, de la Escalera A, Armingol JM (2004) Model based vehicle detection for intelligent vehicles. In: IEEE Intelligent Vehicles Symposium 2004, pp 572–577
9.
go back to reference Dijk, T.v, Croon, G.d (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Dijk, T.v, Croon, G.d (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
10.
go back to reference Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
11.
go back to reference Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press, Cambridge Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press, Cambridge
13.
go back to reference Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8 Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
15.
go back to reference Garg R, BG, VK, Carneiro G, Reid I, (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Computer Vision—ECCV 2016, pp 740–756 Garg R, BG, VK, Carneiro G, Reid I, (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Computer Vision—ECCV 2016, pp 740–756
16.
go back to reference Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361 Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361
17.
go back to reference Ghlert N, Jourdan N, Cordts M, Franke U, Denzler J (2020) Cityscapes 3d: dataset and benchmark for 9 dof vehicle detection Ghlert N, Jourdan N, Cordts M, Franke U, Denzler J (2020) Cityscapes 3d: dataset and benchmark for 9 dof vehicle detection
18.
go back to reference Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
19.
go back to reference Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
20.
go back to reference He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
21.
go back to reference He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
22.
go back to reference Liu S , Di H , Wang Y (2017) Receptive Field Block Net for Accurate and Fast Object Detection Liu S , Di H , Wang Y (2017) Receptive Field Block Net for Accurate and Fast Object Detection
23.
go back to reference Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
24.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision—ECCV 2016, Cham, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision—ECCV 2016, Cham, pp 21–37
25.
go back to reference Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
26.
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
28.
go back to reference Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99 Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99
29.
go back to reference Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
30.
go back to reference Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
31.
go back to reference Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
32.
go back to reference Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
33.
go back to reference Xu Y, He P (2019) Yolov3 vehicle detection algorithm with improved loss function. Inform Commun 12:4–7 Xu Y, He P (2019) Yolov3 vehicle detection algorithm with improved loss function. Inform Commun 12:4–7
35.
go back to reference Zhang Z, Wang H, Ji Z, Wei Y (2018) A vehicle real-time detection algorithm based on yolov2 framework. In: Real-time Image & Video Processing Zhang Z, Wang H, Ji Z, Wei Y (2018) A vehicle real-time detection algorithm based on yolov2 framework. In: Real-time Image & Video Processing
Metadata
Title
Towards unified on-road object detection and depth estimation from a single image
Authors
Guofei Lian
Yan Wang
Huabiao Qin
Guancheng Chen
Publication date
10-10-2021
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 5/2022
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-021-01444-z

Other articles of this Issue 5/2022

International Journal of Machine Learning and Cybernetics 5/2022 Go to the issue