Skip to main content
Erschienen in: Neural Computing and Applications 13/2021

02.01.2021 | Original Article

Transformer guided geometry model for flow-based unsupervised visual odometry

verfasst von: Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

Erschienen in: Neural Computing and Applications | Ausgabe 13/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:​1603.​04467
2.
Zurück zum Zitat Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: ICRA Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: ICRA
3.
Zurück zum Zitat Azuma RT (1997) A survey of augmented reality. Teleop Virtual Environ 6(4):355–385CrossRef Azuma RT (1997) A survey of augmented reality. Teleop Virtual Environ 6(4):355–385CrossRef
5.
Zurück zum Zitat Blanco-Claraco JL, Moreno-Dueñas FÁ, González-Jiménez J (2014) The málaga urban dataset: high-rate stereo and lidar in a realistic urban scenario. Int J Robot Res 33(2):207–214CrossRef Blanco-Claraco JL, Moreno-Dueñas FÁ, González-Jiménez J (2014) The málaga urban dataset: high-rate stereo and lidar in a realistic urban scenario. Int J Robot Res 33(2):207–214CrossRef
6.
Zurück zum Zitat Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: ICCV Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: ICCV
7.
Zurück zum Zitat Clark R, Wang S, Wen H, Markham A, Trigoni N (2017) Vinet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI Clark R, Wang S, Wen H, Markham A, Trigoni N (2017) Vinet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI
8.
Zurück zum Zitat Costante G, Ciarfuglia TA (2018) LS-VO: learning dense optical subspace for robust visual odometry estimation. In: IEEE robotics and automation letters Costante G, Ciarfuglia TA (2018) LS-VO: learning dense optical subspace for robust visual odometry estimation. In: IEEE robotics and automation letters
9.
Zurück zum Zitat DeSouza GN, Kak AC (2002) Vision for mobile robot navigation: A survey. In: TPAMI DeSouza GN, Kak AC (2002) Vision for mobile robot navigation: A survey. In: TPAMI
10.
Zurück zum Zitat Ding Y, Lin L, Wang L, Zhang M, Li D (2020) Digging into the multiscale structure for a more refined depth map and 3d reconstruction. Neural Comput Appl 32:11217–11228CrossRef Ding Y, Lin L, Wang L, Zhang M, Li D (2020) Digging into the multiscale structure for a more refined depth map and 3d reconstruction. Neural Comput Appl 32:11217–11228CrossRef
11.
Zurück zum Zitat Do Q, Jain LC (2010) Application of neural processing paradigm in visual landmark recognition and autonomous robot navigation. Neural Comput Appl 19(2):237–254CrossRef Do Q, Jain LC (2010) Application of neural processing paradigm in visual landmark recognition and autonomous robot navigation. Neural Comput Appl 19(2):237–254CrossRef
12.
Zurück zum Zitat Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: ICCV Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: ICCV
13.
Zurück zum Zitat Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems
14.
Zurück zum Zitat Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. In: TPAMI Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. In: TPAMI
15.
Zurück zum Zitat Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular slam. In: ECCV Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular slam. In: ECCV
16.
Zurück zum Zitat Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR
17.
Zurück zum Zitat Geiger A, Ziegler J, Stiller C (2011) Stereoscan: dense 3d reconstruction in real-time. In: IV Geiger A, Ziegler J, Stiller C (2011) Stereoscan: dense 3d reconstruction in real-time. In: IV
18.
Zurück zum Zitat Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: CVPR Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: CVPR
19.
Zurück zum Zitat Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR
20.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
21.
Zurück zum Zitat Hong C, Kiong L (2018) Topological Gaussian ARAM for biologically inspired topological map building. Neural Comput Appl 29(4):1055–1072CrossRef Hong C, Kiong L (2018) Topological Gaussian ARAM for biologically inspired topological map building. Neural Comput Appl 29(4):1055–1072CrossRef
22.
Zurück zum Zitat Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems
23.
Zurück zum Zitat Ji Y, Zhang H, Jie Z, Ma L, Wu, QJ (2020) Casnet: a cross-attention siamese network for video salient object detection. In: IEEE transactions on neural networks and learning systems Ji Y, Zhang H, Jie Z, Ma L, Wu, QJ (2020) Casnet: a cross-attention siamese network for video salient object detection. In: IEEE transactions on neural networks and learning systems
24.
Zurück zum Zitat Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV
25.
Zurück zum Zitat Klein G. Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, pp 1–10 Klein G. Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, pp 1–10
26.
Zurück zum Zitat Li J, Zhang Y, Chen Z, Wang J, Fang M, Luo C, Wang H (2020) A novel edgeenabled slam solution using projected depth image information. Neural Comput Appl 32:15369–15381CrossRef Li J, Zhang Y, Chen Z, Wang J, Fang M, Luo C, Wang H (2020) A novel edgeenabled slam solution using projected depth image information. Neural Comput Appl 32:15369–15381CrossRef
27.
Zurück zum Zitat Li R, Wang S, Long Z, Gu D (2018) Undeepvo: monocular visual odometry through unsupervised deep learning. In: ICRA Li R, Wang S, Long Z, Gu D (2018) Undeepvo: monocular visual odometry through unsupervised deep learning. In: ICRA
28.
Zurück zum Zitat Li S, Cui H, Li Y, Liu B, Lou Y (2013) Decentralized control of collaborative redundant manipulators with partial command coverage via locally connected recurrent neural networks. Neural Comput Appl 23(3):1051–1060CrossRef Li S, Cui H, Li Y, Liu B, Lou Y (2013) Decentralized control of collaborative redundant manipulators with partial command coverage via locally connected recurrent neural networks. Neural Comput Appl 23(3):1051–1060CrossRef
29.
Zurück zum Zitat Li Y, Li S, Ge Y (2013) A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing 104:170–179CrossRef Li Y, Li S, Ge Y (2013) A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing 104:170–179CrossRef
30.
Zurück zum Zitat Li Y, Ushiku Y, Harada T (2019) Pose graph optimization for unsupervised monocular visual odometry. In: ICRA Li Y, Ushiku Y, Harada T (2019) Pose graph optimization for unsupervised monocular visual odometry. In: ICRA
31.
Zurück zum Zitat Li Y, Zhang J, Li S (2018) Stmvo: biologically inspired monocular visual odometry. Neural Comput Appl 29(6):215–225CrossRef Li Y, Zhang J, Li S (2018) Stmvo: biologically inspired monocular visual odometry. Neural Comput Appl 29(6):215–225CrossRef
32.
Zurück zum Zitat Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR
33.
Zurück zum Zitat Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR
34.
Zurück zum Zitat Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: CVPR Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: CVPR
35.
Zurück zum Zitat Mitic M, Vukovic N, Petrovic M, Miljkovic Z (2018) Chaotic metaheuristic algorithms for learning and reproduction of robot motion trajectories. Neural Comput Appl 30(4):1065–1083CrossRef Mitic M, Vukovic N, Petrovic M, Miljkovic Z (2018) Chaotic metaheuristic algorithms for learning and reproduction of robot motion trajectories. Neural Comput Appl 30(4):1065–1083CrossRef
36.
Zurück zum Zitat Mur-Artal R, Tardós JD (2017) ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. In: IEEE Transactions on Robotics Mur-Artal R, Tardós JD (2017) ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. In: IEEE Transactions on Robotics
37.
38.
Zurück zum Zitat Pilzer A, Lathuiliere S, Sebe N, Ricci E (2019) Refine and distill: exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: CVPR Pilzer A, Lathuiliere S, Sebe N, Ricci E (2019) Refine and distill: exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: CVPR
39.
Zurück zum Zitat Qin T, Pan J, Cao S, Shen S (2019) A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638 Qin T, Pan J, Cao S, Shen S (2019) A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:​1901.​03638
40.
Zurück zum Zitat Roberts RJW (2014) Optical flow templates for mobile robot environment understanding. Ph.D. thesis, Georgia Institute of Technology Roberts RJW (2014) Optical flow templates for mobile robot environment understanding. Ph.D. thesis, Georgia Institute of Technology
41.
Zurück zum Zitat Shen T, Luo Z, Zhou L, Deng H, Zhang R, Fang T, Quan L (2019) Beyond photometric loss for self-supervised ego-motion estimation. In: ICRA Shen T, Luo Z, Zhou L, Deng H, Zhang R, Fang T, Quan L (2019) Beyond photometric loss for self-supervised ego-motion estimation. In: ICRA
42.
Zurück zum Zitat Sun D, Yang X, Liu MY, Kautz J (2018) PWC-NET: CNNS for optical flow using pyramid, warping, and cost volume. In: CVPR Sun D, Yang X, Liu MY, Kautz J (2018) PWC-NET: CNNS for optical flow using pyramid, warping, and cost volume. In: CVPR
43.
Zurück zum Zitat Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: CVPR Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: CVPR
44.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems
45.
Zurück zum Zitat Wang R, Pizer SM, Frahm JM (2019) Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In: CVPR Wang R, Pizer SM, Frahm JM (2019) Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In: CVPR
46.
Zurück zum Zitat Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA
47.
Zurück zum Zitat Wang S, Clark R, Wen H, Trigoni N (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 37(4–5):513–542CrossRef Wang S, Clark R, Wen H, Trigoni N (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 37(4–5):513–542CrossRef
48.
Zurück zum Zitat Wang Y, Wang P, Yang Z, Luo C, Yang Y, Xu W (2019) Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: CVPR Wang Y, Wang P, Yang Z, Luo C, Yang Y, Xu W (2019) Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: CVPR
49.
Zurück zum Zitat Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. In: TIP Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. In: TIP
50.
Zurück zum Zitat Wong A, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: CVPR Wong A, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: CVPR
51.
Zurück zum Zitat Xue F, Wang X, Li S, Wang Q, Wang J, Zha H (2019) Beyond tracking: selecting memory and refining poses for deep visual odometry. In: CVPR Xue F, Wang X, Li S, Wang Q, Wang J, Zha H (2019) Beyond tracking: selecting memory and refining poses for deep visual odometry. In: CVPR
52.
Zurück zum Zitat Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR
53.
Zurück zum Zitat Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR
54.
Zurück zum Zitat Zhou H, Ummenhofer B, Brox T (2018) Deeptam: deep tracking and mapping. In: ECCV Zhou H, Ummenhofer B, Brox T (2018) Deeptam: deep tracking and mapping. In: ECCV
55.
Zurück zum Zitat Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR
56.
Zurück zum Zitat Zou Y, Luo Z, Huang JB (2018) Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: ECCV Zou Y, Luo Z, Huang JB (2018) Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: ECCV
Metadaten
Titel
Transformer guided geometry model for flow-based unsupervised visual odometry
verfasst von
Xiangyu Li
Yonghong Hou
Pichao Wang
Zhimin Gao
Mingliang Xu
Wanqing Li
Publikationsdatum
02.01.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 13/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05545-8

Weitere Artikel der Ausgabe 13/2021

Neural Computing and Applications 13/2021 Zur Ausgabe

Premium Partner