nach oben

Neural Computing and Applications

Erschienen in:

02.01.2021 | Original Article

Transformer guided geometry model for flow-based unsupervised visual odometry

verfasst von: Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

Erschienen in: Neural Computing and Applications | Ausgabe 13/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

Vorheriger Artikel Smartphone power management based on ConvLSTM model

Nächster Artikel Unconfined compressive strength (UCS) prediction in real-time while drilling using artificial intelligence tools

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467

Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: ICRA

Azuma RT (1997) A survey of augmented reality. Teleop Virtual Environ 6(4):355–385CrossRef

Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

Blanco-Claraco JL, Moreno-Dueñas FÁ, González-Jiménez J (2014) The málaga urban dataset: high-rate stereo and lidar in a realistic urban scenario. Int J Robot Res 33(2):207–214CrossRef

Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: ICCV

Clark R, Wang S, Wen H, Markham A, Trigoni N (2017) Vinet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI

Costante G, Ciarfuglia TA (2018) LS-VO: learning dense optical subspace for robust visual odometry estimation. In: IEEE robotics and automation letters

DeSouza GN, Kak AC (2002) Vision for mobile robot navigation: A survey. In: TPAMI

10.

Ding Y, Lin L, Wang L, Zhang M, Li D (2020) Digging into the multiscale structure for a more refined depth map and 3d reconstruction. Neural Comput Appl 32:11217–11228CrossRef

11.

Do Q, Jain LC (2010) Application of neural processing paradigm in visual landmark recognition and autonomous robot navigation. Neural Comput Appl 19(2):237–254CrossRef

12.

Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: ICCV

13.

Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems

14.

Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. In: TPAMI

15.

Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular slam. In: ECCV

16.

Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR

17.

Geiger A, Ziegler J, Stiller C (2011) Stereoscan: dense 3d reconstruction in real-time. In: IV

18.

Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: CVPR

19.

Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR

20.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR

21.

Hong C, Kiong L (2018) Topological Gaussian ARAM for biologically inspired topological map building. Neural Comput Appl 29(4):1055–1072CrossRef

22.

Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems

23.

Ji Y, Zhang H, Jie Z, Ma L, Wu, QJ (2020) Casnet: a cross-attention siamese network for video salient object detection. In: IEEE transactions on neural networks and learning systems

24.

Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV

25.

Klein G. Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, pp 1–10

26.

Li J, Zhang Y, Chen Z, Wang J, Fang M, Luo C, Wang H (2020) A novel edgeenabled slam solution using projected depth image information. Neural Comput Appl 32:15369–15381CrossRef

27.

Li R, Wang S, Long Z, Gu D (2018) Undeepvo: monocular visual odometry through unsupervised deep learning. In: ICRA

28.

Li S, Cui H, Li Y, Liu B, Lou Y (2013) Decentralized control of collaborative redundant manipulators with partial command coverage via locally connected recurrent neural networks. Neural Comput Appl 23(3):1051–1060CrossRef

29.

Li Y, Li S, Ge Y (2013) A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing 104:170–179CrossRef

30.

Li Y, Ushiku Y, Harada T (2019) Pose graph optimization for unsupervised monocular visual odometry. In: ICRA

31.

Li Y, Zhang J, Li S (2018) Stmvo: biologically inspired monocular visual odometry. Neural Comput Appl 29(6):215–225CrossRef

32.

Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR

33.

Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR

34.

Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: CVPR

35.

Mitic M, Vukovic N, Petrovic M, Miljkovic Z (2018) Chaotic metaheuristic algorithms for learning and reproduction of robot motion trajectories. Neural Comput Appl 30(4):1065–1083CrossRef

36.

Mur-Artal R, Tardós JD (2017) ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. In: IEEE Transactions on Robotics

37.

Parmar N, Vaswani A, Uszkoreit J, Kaiser Ł, Shazeer N, Ku A, Tran D (2018) Image transformer. arXiv preprint arXiv:1802.05751

38.

Pilzer A, Lathuiliere S, Sebe N, Ricci E (2019) Refine and distill: exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: CVPR

39.

Qin T, Pan J, Cao S, Shen S (2019) A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638

40.

Roberts RJW (2014) Optical flow templates for mobile robot environment understanding. Ph.D. thesis, Georgia Institute of Technology

41.

Shen T, Luo Z, Zhou L, Deng H, Zhang R, Fang T, Quan L (2019) Beyond photometric loss for self-supervised ego-motion estimation. In: ICRA

42.

Sun D, Yang X, Liu MY, Kautz J (2018) PWC-NET: CNNS for optical flow using pyramid, warping, and cost volume. In: CVPR

43.

Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: CVPR

44.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems

45.

Wang R, Pizer SM, Frahm JM (2019) Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In: CVPR

46.

Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA

47.

Wang S, Clark R, Wen H, Trigoni N (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 37(4–5):513–542CrossRef

48.

Wang Y, Wang P, Yang Z, Luo C, Yang Y, Xu W (2019) Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: CVPR

49.

Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. In: TIP

50.

Wong A, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: CVPR

51.

Xue F, Wang X, Li S, Wang Q, Wang J, Zha H (2019) Beyond tracking: selecting memory and refining poses for deep visual odometry. In: CVPR

52.

Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR

53.

Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR

54.

Zhou H, Ummenhofer B, Brox T (2018) Deeptam: deep tracking and mapping. In: ECCV

55.

Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR

56.

Zou Y, Luo Z, Huang JB (2018) Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: ECCV

Titel: Transformer guided geometry model for flow-based unsupervised visual odometry
verfasst von: Xiangyu Li
Yonghong Hou
Pichao Wang
Zhimin Gao
Mingliang Xu
Wanqing Li
Publikationsdatum: 02.01.2021
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 13/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-020-05545-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 13/2021

Detection of anomaly intrusion utilizing self-adaptive grasshopper optimization algorithm

Improved adaptive neuro-fuzzy inference system based on modified glowworm swarm and differential evolution optimization algorithm for medical diagnosis

Smartphone power management based on ConvLSTM model

Fixed-Lens camera setup and calibrated image registration for multifocus multiview 3D reconstruction

Machine learning ensemble with image processing for pest identification and classification in field crops

A novel dynamic recurrent functional link neural network-based identification of nonlinear systems using Lyapunov stability analysis

Premium Partner