Skip to main content
Erschienen in: Neural Computing and Applications 8/2021

22.07.2020 | Original Article

Geometry understanding from autonomous driving scenarios based on feature refinement

verfasst von: Mingliang Zhai, Xuezhi Xiang

Erschienen in: Neural Computing and Applications | Ausgabe 8/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, many deep learning applications benefit from multi-task learning with several related objectives. In autonomous driving scenarios, being able to infer motion and spatial information accurately is essential for scene understanding. In this paper, we propose a unified framework for unsupervised joint learning of optical flow, depth and camera pose. Specifically, we use a feature refinement module to adaptively discriminate and recalibrate feature, which can integrate local features with their global dependencies to capture rich contextual relationships. Given a monocular video, our network firstly calculates rigid optical flow by estimating depth and camera pose. Then, we design an auxiliary flow network for inferring non-rigid flow field. In addition, a forward–backward consistency check is adopted for occlusion reasoning. Extensive analyses on KITTI dataset are conducted to verify the effectiveness of our proposed approach. The experimental results show that our proposed network can produce sharper, clearer and detailed depth and flow maps. In addition, our network achieves potential performance compared to the recent state-of-the-art approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ArXiv: 1603.04467 Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ArXiv:​ 1603.​04467
2.
Zurück zum Zitat Chen S, Tan X, Wang B, Hu X (2018) Reverse attention for salient object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 236–252CrossRef Chen S, Tan X, Wang B, Hu X (2018) Reverse attention for salient object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 236–252CrossRef
3.
Zurück zum Zitat Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, vd Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). pp 2758–2766 Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, vd Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). pp 2758–2766
4.
Zurück zum Zitat Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc., Red Hook, pp 2366–2374 Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc., Red Hook, pp 2366–2374
5.
Zurück zum Zitat Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 2002–2011 Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 2002–2011
6.
Zurück zum Zitat Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 740–756CrossRef Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 740–756CrossRef
7.
Zurück zum Zitat Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6602–6611 Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6602–6611
8.
Zurück zum Zitat Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 7132–7141
9.
Zurück zum Zitat Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 1647–1655 Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 1647–1655
11.
Zurück zum Zitat Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV). pp 239–248 Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV). pp 239–248
12.
Zurück zum Zitat Lee W, Chuang P, Wang YF (2019) Perceptual quality preserving image super-resolution via channel attention. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1737–1741 Lee W, Chuang P, Wang YF (2019) Perceptual quality preserving image super-resolution via channel attention. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1737–1741
13.
Zurück zum Zitat Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039CrossRef Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039CrossRef
14.
Zurück zum Zitat Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 5667–5675 Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 5667–5675
15.
Zurück zum Zitat Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: A versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163CrossRef Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: A versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163CrossRef
16.
Zurück zum Zitat Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM: Dense tracking and mapping in real-time. In: 2011 International conference on computer vision. pp 2320–2327 Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM: Dense tracking and mapping in real-time. In: 2011 International conference on computer vision. pp 2320–2327
17.
Zurück zum Zitat Ren Z, Yan J, Ni B, Liu B, Yang X, Zha H (2017) Unsupervised deep learning for optical flow estimation. In: AAAI conference on artificial intelligence (AAAI) Ren Z, Yan J, Ni B, Liu B, Yang X, Zha H (2017) Unsupervised deep learning for optical flow estimation. In: AAAI conference on artificial intelligence (AAAI)
18.
Zurück zum Zitat Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210CrossRef Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210CrossRef
19.
Zurück zum Zitat Song K, Yang H, Yin Z (2019) Multi-scale attention deep neural network for fast accurate object detection. IEEE Trans Circ Syst Video Technol 29(10):2972–2985CrossRef Song K, Yang H, Yin Z (2019) Multi-scale attention deep neural network for fast accurate object detection. IEEE Trans Circ Syst Video Technol 29(10):2972–2985CrossRef
20.
Zurück zum Zitat Sun D, Yang X, Liu M, Kautz J (2018) PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 8934–8943 Sun D, Yang X, Liu M, Kautz J (2018) PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 8934–8943
21.
Zurück zum Zitat Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6450–6458 Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6450–6458
22.
Zurück zum Zitat Wang H, Wang L (2018) Cross-agent action recognition. IEEE Trans Circ Syst Video Technol 28(10):2908–2919CrossRef Wang H, Wang L (2018) Cross-agent action recognition. IEEE Trans Circ Syst Video Technol 28(10):2908–2919CrossRef
23.
Zurück zum Zitat Wang X, Hu Y, Radwin RG, Lee JD (2018) Frame-subsampled, drift-resilient video object tracking. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1573–1577 Wang X, Hu Y, Radwin RG, Lee JD (2018) Frame-subsampled, drift-resilient video object tracking. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 1573–1577
24.
Zurück zum Zitat Wang Y, Yang Y, Yang Z, Zhao L, Wang P, Xu W (2018) Occlusion aware unsupervised learning of optical flow. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 4884–4893 Wang Y, Yang Y, Yang Z, Zhao L, Wang P, Xu W (2018) Occlusion aware unsupervised learning of optical flow. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 4884–4893
25.
Zurück zum Zitat Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRef Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRef
26.
Zurück zum Zitat Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 3–19CrossRef Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 3–19CrossRef
27.
Zurück zum Zitat Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Lego: Learning edge with geometry all at once by watching videos. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 225–234 Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Lego: Learning edge with geometry all at once by watching videos. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 225–234
28.
Zurück zum Zitat Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: AAAI conference on artificial intelligence Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: AAAI conference on artificial intelligence
29.
Zurück zum Zitat Yin Z, Shi J (2018) Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 1983–1992 Yin Z, Shi J (2018) Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 1983–1992
30.
Zurück zum Zitat Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504–2517CrossRef Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504–2517CrossRef
31.
Zurück zum Zitat Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid IM (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 340–349 Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid IM (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 340–349
32.
Zurück zum Zitat Zhang C, Kim J (2019) Modeling long- and short-term temporal context for video object detection. In: 2019 IEEE international conference on image processing (ICIP). pp 71–75 Zhang C, Kim J (2019) Modeling long- and short-term temporal context for video object detection. In: 2019 IEEE international conference on image processing (ICIP). pp 71–75
33.
Zurück zum Zitat Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380CrossRef Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380CrossRef
34.
Zurück zum Zitat Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 294–310 Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 294–310
35.
Zurück zum Zitat Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6612–6619 Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). pp 6612–6619
36.
Zurück zum Zitat Zhu Y, Newsam S (2018) Learning optical flow via dilated networks and occlusion reasoning. In: 2018 25th IEEE international conference on image processing (ICIP). pp 3333–3337 Zhu Y, Newsam S (2018) Learning optical flow via dilated networks and occlusion reasoning. In: 2018 25th IEEE international conference on image processing (ICIP). pp 3333–3337
37.
Zurück zum Zitat Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2019) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2019) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28(1):113–126MathSciNetCrossRef
Metadaten
Titel
Geometry understanding from autonomous driving scenarios based on feature refinement
verfasst von
Mingliang Zhai
Xuezhi Xiang
Publikationsdatum
22.07.2020
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 8/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05192-z

Weitere Artikel der Ausgabe 8/2021

Neural Computing and Applications 8/2021 Zur Ausgabe