Skip to main content
Top
Published in: Neural Computing and Applications 8/2021

22-07-2020 | Original Article

Geometry understanding from autonomous driving scenarios based on feature refinement

Authors: Mingliang Zhai, Xuezhi Xiang

Published in: Neural Computing and Applications | Issue 8/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, many deep learning applications benefit from multi-task learning with several related objectives. In autonomous driving scenarios, being able to infer motion and spatial information accurately is essential for scene understanding. In this paper, we propose a unified framework for unsupervised joint learning of optical flow, depth and camera pose. Specifically, we use a feature refinement module to adaptively discriminate and recalibrate feature, which can integrate local features with their global dependencies to capture rich contextual relationships. Given a monocular video, our network firstly calculates rigid optical flow by estimating depth and camera pose. Then, we design an auxiliary flow network for inferring non-rigid flow field. In addition, a forward–backward consistency check is adopted for occlusion reasoning. Extensive analyses on KITTI dataset are conducted to verify the effectiveness of our proposed approach. The experimental results show that our proposed network can produce sharper, clearer and detailed depth and flow maps. In addition, our network achieves potential performance compared to the recent state-of-the-art approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ArXiv: 1603.04467 Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ArXiv:​ 1603.​04467
2.
go back to reference Chen S, Tan X, Wang B, Hu X (2018) Reverse attention for salient object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 236–252CrossRef Chen S, Tan X, Wang B, Hu X (2018) Reverse attention for salient object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 236–252CrossRef
3.
go back to reference Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, vd Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). pp 2758–2766 Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, vd Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). pp 2758–2766
4.
go back to reference Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc., Red Hook, pp 2366–2374 Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc., Red Hook, pp 2366–2374
5.
go back to reference Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 2002–2011 Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 2002–2011
6.
go back to reference Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 740–756CrossRef Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 740–756CrossRef
7.
go back to reference Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6602–6611 Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6602–6611
8.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 7132–7141
9.
go back to reference Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 1647–1655 Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 1647–1655
11.
go back to reference Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV). pp 239–248 Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV). pp 239–248
12.
go back to reference Lee W, Chuang P, Wang YF (2019) Perceptual quality preserving image super-resolution via channel attention. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1737–1741 Lee W, Chuang P, Wang YF (2019) Perceptual quality preserving image super-resolution via channel attention. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1737–1741
13.
go back to reference Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039CrossRef Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039CrossRef
14.
go back to reference Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 5667–5675 Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 5667–5675
15.
go back to reference Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: A versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163CrossRef Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: A versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163CrossRef
16.
go back to reference Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM: Dense tracking and mapping in real-time. In: 2011 International conference on computer vision. pp 2320–2327 Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM: Dense tracking and mapping in real-time. In: 2011 International conference on computer vision. pp 2320–2327
17.
go back to reference Ren Z, Yan J, Ni B, Liu B, Yang X, Zha H (2017) Unsupervised deep learning for optical flow estimation. In: AAAI conference on artificial intelligence (AAAI) Ren Z, Yan J, Ni B, Liu B, Yang X, Zha H (2017) Unsupervised deep learning for optical flow estimation. In: AAAI conference on artificial intelligence (AAAI)
18.
go back to reference Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210CrossRef Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210CrossRef
19.
go back to reference Song K, Yang H, Yin Z (2019) Multi-scale attention deep neural network for fast accurate object detection. IEEE Trans Circ Syst Video Technol 29(10):2972–2985CrossRef Song K, Yang H, Yin Z (2019) Multi-scale attention deep neural network for fast accurate object detection. IEEE Trans Circ Syst Video Technol 29(10):2972–2985CrossRef
20.
go back to reference Sun D, Yang X, Liu M, Kautz J (2018) PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 8934–8943 Sun D, Yang X, Liu M, Kautz J (2018) PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 8934–8943
21.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6450–6458 Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6450–6458
22.
go back to reference Wang H, Wang L (2018) Cross-agent action recognition. IEEE Trans Circ Syst Video Technol 28(10):2908–2919CrossRef Wang H, Wang L (2018) Cross-agent action recognition. IEEE Trans Circ Syst Video Technol 28(10):2908–2919CrossRef
23.
go back to reference Wang X, Hu Y, Radwin RG, Lee JD (2018) Frame-subsampled, drift-resilient video object tracking. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1573–1577 Wang X, Hu Y, Radwin RG, Lee JD (2018) Frame-subsampled, drift-resilient video object tracking. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1573–1577
24.
go back to reference Wang Y, Yang Y, Yang Z, Zhao L, Wang P, Xu W (2018) Occlusion aware unsupervised learning of optical flow. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 4884–4893 Wang Y, Yang Y, Yang Z, Zhao L, Wang P, Xu W (2018) Occlusion aware unsupervised learning of optical flow. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 4884–4893
25.
go back to reference Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRef Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRef
26.
go back to reference Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 3–19CrossRef Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 3–19CrossRef
27.
go back to reference Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Lego: Learning edge with geometry all at once by watching videos. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 225–234 Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Lego: Learning edge with geometry all at once by watching videos. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 225–234
28.
go back to reference Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: AAAI conference on artificial intelligence Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: AAAI conference on artificial intelligence
29.
go back to reference Yin Z, Shi J (2018) Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 1983–1992 Yin Z, Shi J (2018) Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 1983–1992
30.
go back to reference Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504–2517CrossRef Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504–2517CrossRef
31.
go back to reference Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid IM (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 340–349 Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid IM (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 340–349
32.
go back to reference Zhang C, Kim J (2019) Modeling long- and short-term temporal context for video object detection. In: 2019 IEEE international conference on image processing (ICIP). pp 71–75 Zhang C, Kim J (2019) Modeling long- and short-term temporal context for video object detection. In: 2019 IEEE international conference on image processing (ICIP). pp 71–75
33.
go back to reference Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380CrossRef Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380CrossRef
34.
go back to reference Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 294–310 Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 294–310
35.
go back to reference Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6612–6619 Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6612–6619
36.
go back to reference Zhu Y, Newsam S (2018) Learning optical flow via dilated networks and occlusion reasoning. In: 2018 25th IEEE international conference on image processing (ICIP). pp 3333–3337 Zhu Y, Newsam S (2018) Learning optical flow via dilated networks and occlusion reasoning. In: 2018 25th IEEE international conference on image processing (ICIP). pp 3333–3337
37.
go back to reference Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2019) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2019) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef
Metadata
Title
Geometry understanding from autonomous driving scenarios based on feature refinement
Authors
Mingliang Zhai
Xuezhi Xiang
Publication date
22-07-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 8/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05192-z

Other articles of this Issue 8/2021

Neural Computing and Applications 8/2021 Go to the issue

Premium Partner