Skip to main content
Erschienen in: Cognitive Computation 6/2018

08.09.2018

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

verfasst von: Ruihao Li, Sen Wang, Dongbing Gu

Erschienen in: Cognitive Computation | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Visual simultaneous localization and mapping (SLAM) has been investigated in the robotics community for decades. Significant progress and achievements on visual SLAM have been made, with geometric model-based techniques becoming increasingly mature and accurate. However, they tend to be fragile under challenging environments. Recently, there is a trend to develop data-driven approaches, e.g., deep learning, for visual SLAM problems with more robust performance. This paper aims to witness the ongoing evolution of visual SLAM techniques from geometric model-based to data-driven approaches by providing a comprehensive technical review. Our contribution is not only just a compilation of state-of-the-art end-to-end deep learning SLAM work, but also an insight into the underlying mechanism of deep learning SLAM. For such a purpose, we provide a concise overview of geometric model-based approaches first. Next, we identify visual depth estimation using deep learning is a starting point of the evolution. It is from depth estimation that ego-motion or pose estimation techniques using deep learning flourish rapidly. In addition, we strive to link semantic segmentation using deep learning with emergent semantic SLAM techniques to shed light on simultaneous estimation of ego-motion and high-level understanding. Finally, we visualize some further opportunities in this research direction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE; 2007. p. 225–234. Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE; 2007. p. 225–234.
2.
Zurück zum Zitat Endres F, Hess J, Sturm J, Cremers D, Burgard W. 3-D Mapping with an RGB-d camera. IEEE Trans Robot 2014;30(1):177–187.CrossRef Endres F, Hess J, Sturm J, Cremers D, Burgard W. 3-D Mapping with an RGB-d camera. IEEE Trans Robot 2014;30(1):177–187.CrossRef
3.
Zurück zum Zitat Mur-Artal R, Montiel J, Tardos JD. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015;31(5):1147–1163.CrossRef Mur-Artal R, Montiel J, Tardos JD. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015;31(5):1147–1163.CrossRef
4.
Zurück zum Zitat Newcombe RA, Lovegrove SJ, Davison AJ. DTAM: dense tracking and mapping in real-time. IEEE International Conference on Computer Vision (ICCV). IEEE; 2011. p. 2320–2327. Newcombe RA, Lovegrove SJ, Davison AJ. DTAM: dense tracking and mapping in real-time. IEEE International Conference on Computer Vision (ICCV). IEEE; 2011. p. 2320–2327.
5.
Zurück zum Zitat Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014. p. 15–22. Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014. p. 15–22.
6.
Zurück zum Zitat Engel J, Koltun V, Cremers D. Direct sparse odometry. IEEE Trans Patt Anal Mach Intell 2018;40 (3):611–25.CrossRef Engel J, Koltun V, Cremers D. Direct sparse odometry. IEEE Trans Patt Anal Mach Intell 2018;40 (3):611–25.CrossRef
7.
Zurück zum Zitat Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I, Leonard JJ. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Robot 2016;32(6):1309–1332.CrossRef Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I, Leonard JJ. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Robot 2016;32(6):1309–1332.CrossRef
8.
Zurück zum Zitat McCormac J., Handa A., Davison A., Leutenegger S. SemanticFusion: dense 3D semantic mapping with convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 4628–4635. McCormac J., Handa A., Davison A., Leutenegger S. SemanticFusion: dense 3D semantic mapping with convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 4628–4635.
9.
Zurück zum Zitat Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016. Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.
10.
Zurück zum Zitat LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten zip code recognition. Neural Comput 1989;1(4):541–551.CrossRef LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten zip code recognition. Neural Comput 1989;1(4):541–551.CrossRef
13.
Zurück zum Zitat Zeng D, Zhao F, Shen W, Ge S. Compressing and accelerating neural network for facial point localization. Cogn Comput 2018;10(2):359–367.CrossRef Zeng D, Zhao F, Shen W, Ge S. Compressing and accelerating neural network for facial point localization. Cogn Comput 2018;10(2):359–367.CrossRef
14.
Zurück zum Zitat Godard C, Mac Aodha O, Brostow GJ. Unsupervised monocular depth estimation with left-right consistency. IEEE Conference on computer vision and pattern recognition (CVPR); 2017. Godard C, Mac Aodha O, Brostow GJ. Unsupervised monocular depth estimation with left-right consistency. IEEE Conference on computer vision and pattern recognition (CVPR); 2017.
15.
Zurück zum Zitat Wang S, Clark R, Wen H, Trigoni N. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 2043–2050. Wang S, Clark R, Wen H, Trigoni N. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 2043–2050.
16.
Zurück zum Zitat Durrant-Whyte H, Bailey T. Simultaneous localization and mapping: Part I. IEEE Robot Autom Magazine 2006;13(2):99–110.CrossRef Durrant-Whyte H, Bailey T. Simultaneous localization and mapping: Part I. IEEE Robot Autom Magazine 2006;13(2):99–110.CrossRef
17.
Zurück zum Zitat Bailey T, Durrant-Whyte H. Simultaneous localization and mapping: part II. IEEE Robot Autom Magazine 2006;13(3):108–117.CrossRef Bailey T, Durrant-Whyte H. Simultaneous localization and mapping: part II. IEEE Robot Autom Magazine 2006;13(3):108–117.CrossRef
18.
Zurück zum Zitat Scaramuzza D, Fraundorfer F. Visual odometry: part I - the first 30 years and fundamentals. IEEE Robot Autom Magazine 2011;18(4):80–92.CrossRef Scaramuzza D, Fraundorfer F. Visual odometry: part I - the first 30 years and fundamentals. IEEE Robot Autom Magazine 2011;18(4):80–92.CrossRef
19.
Zurück zum Zitat Fraundorfer F, Scaramuzza D. Visual odometry: part II - matching, robustness, optimization, and applications. IEEE Robot Autom Magazine 2012;19(2):78–90.CrossRef Fraundorfer F, Scaramuzza D. Visual odometry: part II - matching, robustness, optimization, and applications. IEEE Robot Autom Magazine 2012;19(2):78–90.CrossRef
20.
Zurück zum Zitat Davison AJ, Reid ID, Molton ND, Stasse O. MonoSLAM: real-time single camera SLAM. IEEE Trans Patt Anal Mach Intell 2007;29(6):1052–1067.CrossRef Davison AJ, Reid ID, Molton ND, Stasse O. MonoSLAM: real-time single camera SLAM. IEEE Trans Patt Anal Mach Intell 2007;29(6):1052–1067.CrossRef
21.
Zurück zum Zitat Mur-Artal R, Tardós JD. Fast relocalisation and loop closing in keyframe-based SLAM. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014. p. 846–853. Mur-Artal R, Tardós JD. Fast relocalisation and loop closing in keyframe-based SLAM. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014. p. 846–853.
22.
Zurück zum Zitat Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. IEEE international conference on Computer Vision (ICCV). IEEE; 2011. p. 2564–2571. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. IEEE international conference on Computer Vision (ICCV). IEEE; 2011. p. 2564–2571.
23.
Zurück zum Zitat Mur-Artal R, Tardós JD. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-d cameras. IEEE Trans Robot 2017;33(5):1255–1262.CrossRef Mur-Artal R, Tardós JD. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-d cameras. IEEE Trans Robot 2017;33(5):1255–1262.CrossRef
24.
Zurück zum Zitat Kueng B, Mueggler E, Gallego G., Scaramuzza D. Low-latency visual odometry using event-based feature tracks. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2016. p. 16–23. Kueng B, Mueggler E, Gallego G., Scaramuzza D. Low-latency visual odometry using event-based feature tracks. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2016. p. 16–23.
25.
Zurück zum Zitat Kim H, Leutenegger S, Davison AJ. Real-time 3D reconstruction and 6-DoF tracking with an event camera. European Conference on Computer Vision. Springer; 2016. p. 349– 364. Kim H, Leutenegger S, Davison AJ. Real-time 3D reconstruction and 6-DoF tracking with an event camera. European Conference on Computer Vision. Springer; 2016. p. 349– 364.
26.
Zurück zum Zitat Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ. Dense planar SLAM. IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE; 2014. p. 157–164. Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ. Dense planar SLAM. IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE; 2014. p. 157–164.
27.
Zurück zum Zitat Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++: simultaneous localisation and mapping at the level of objects. IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 1352–1359. Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++: simultaneous localisation and mapping at the level of objects. IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 1352–1359.
28.
Zurück zum Zitat Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: real-time dense surface mapping and tracking. IEEE international symposium on Mixed and Augmented Reality (ISMAR). IEEE; 2011. p. 127–136. Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: real-time dense surface mapping and tracking. IEEE international symposium on Mixed and Augmented Reality (ISMAR). IEEE; 2011. p. 127–136.
29.
Zurück zum Zitat Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger S. ElasticFusion: real-time dense SLAM and light source estimation. Int J Robot Res 2016;35(14):1697–1716.CrossRef Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger S. ElasticFusion: real-time dense SLAM and light source estimation. Int J Robot Res 2016;35(14):1697–1716.CrossRef
30.
Zurück zum Zitat Engel J, Sturm J, Cremers D. Semi-dense visual odometry for a monocular camera. IEEE International Conference on Computer Vision; 2013. p. 1449–1456. Engel J, Sturm J, Cremers D. Semi-dense visual odometry for a monocular camera. IEEE International Conference on Computer Vision; 2013. p. 1449–1456.
31.
Zurück zum Zitat Engel J, Schöps T, Cremers D. LSD-SLAM: large-scale direct monocular SLAM. European Conference on Computer Vision (ECCV). Springer; 2014. p. 834–849. Engel J, Schöps T, Cremers D. LSD-SLAM: large-scale direct monocular SLAM. European Conference on Computer Vision (ECCV). Springer; 2014. p. 834–849.
32.
Zurück zum Zitat Pascoe G, Maddern W, Tanner M, Piniés P, Newman P. NID-SLAM: robust monocular SLAM using normalised information distance. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Pascoe G, Maddern W, Tanner M, Piniés P, Newman P. NID-SLAM: robust monocular SLAM using normalised information distance. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
33.
Zurück zum Zitat Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional network for real-time 6-DoF camera relocalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2015. p. 2938–2946. Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional network for real-time 6-DoF camera relocalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2015. p. 2938–2946.
34.
Zurück zum Zitat Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431–3440. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431–3440.
35.
Zurück zum Zitat Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2012. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2012.
36.
Zurück zum Zitat Janai J, Güney F, Behl A, Geiger A. 2017. Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.05519. Janai J, Güney F, Behl A, Geiger A. 2017. Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.​05519.
37.
Zurück zum Zitat Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 3213–3223. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 3213–3223.
38.
Zurück zum Zitat Maddern W, Pascoe G, Linegar C, Newman P. 1 Year, 1000km: the Oxford robotCar dataset. The International Journal of Robotics Research (IJRR) 2017;36(1):3–15.CrossRef Maddern W, Pascoe G, Linegar C, Newman P. 1 Year, 1000km: the Oxford robotCar dataset. The International Journal of Robotics Research (IJRR) 2017;36(1):3–15.CrossRef
39.
Zurück zum Zitat Burri M, Nikolic J, Gohl P, Schneider T, Rehder J, Omari S, Achtelik MW, Siegwart R. The EuRoC micro aerial vehicle datasets. Int J Robot Res 2016;35(10):1157–1163.CrossRef Burri M, Nikolic J, Gohl P, Schneider T, Rehder J, Omari S, Achtelik MW, Siegwart R. The EuRoC micro aerial vehicle datasets. Int J Robot Res 2016;35(10):1157–1163.CrossRef
40.
Zurück zum Zitat Sturm J, Engelhard N, Endres F, Burgard W, Cremers D. A benchmark for the evaluation of RGB-D SLAM systems. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2012. p. 573–580. Sturm J, Engelhard N, Endres F, Burgard W, Cremers D. A benchmark for the evaluation of RGB-D SLAM systems. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2012. p. 573–580.
41.
Zurück zum Zitat Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. European Conference on Computer Vision; 2012. p. 746–760. Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. European Conference on Computer Vision; 2012. p. 746–760.
42.
Zurück zum Zitat Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis 2010;88(2):303–338.CrossRef Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis 2010;88(2):303–338.CrossRef
43.
Zurück zum Zitat Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 3234–3243. Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 3234–3243.
44.
Zurück zum Zitat Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. European Conference on Computer Vision. Springer; 2014. p. 740–755. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. European Conference on Computer Vision. Springer; 2014. p. 740–755.
45.
Zurück zum Zitat Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. 2016. Semantic understanding of scenes through the ADE20K dataset. arXiv:1608.05442. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. 2016. Semantic understanding of scenes through the ADE20K dataset. arXiv:1608.​05442.
46.
Zurück zum Zitat Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A. Scene coordinate regression forests for camera relocalization in RGB-D images. IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 2930–2937. Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A. Scene coordinate regression forests for camera relocalization in RGB-D images. IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 2930–2937.
47.
Zurück zum Zitat Blanco-Claraco J-L, Moreno-Dueñas F-Á, González-Jiménez J. The Málaga urban dataset: high-rate stereo and liDAR in a realistic urban scenario. Int J Robot Res 2014;33(2):207–214.CrossRef Blanco-Claraco J-L, Moreno-Dueñas F-Á, González-Jiménez J. The Málaga urban dataset: high-rate stereo and liDAR in a realistic urban scenario. Int J Robot Res 2014;33(2):207–214.CrossRef
48.
Zurück zum Zitat Garg R, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: geometry to the rescue. European Conference on Computer Vision (ECCV). Springer; 2016. p. 740–756. Garg R, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: geometry to the rescue. European Conference on Computer Vision (ECCV). Springer; 2016. p. 740–756.
49.
Zurück zum Zitat Zhou T, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Zhou T, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
50.
Zurück zum Zitat Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems; 2014. p. 2366–2374. Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems; 2014. p. 2366–2374.
51.
Zurück zum Zitat Liu F, Shen C, Lin G, Reid I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 2016;38(10):2024–2039.PubMedCrossRef Liu F, Shen C, Lin G, Reid I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 2016;38(10):2024–2039.PubMedCrossRef
52.
Zurück zum Zitat Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
53.
Zurück zum Zitat Ladicky L, Shi J, Pollefeys M. Pulling things out of perspective. IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 89–96. Ladicky L, Shi J, Pollefeys M. Pulling things out of perspective. IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 89–96.
54.
Zurück zum Zitat Li B, Shen C, Dai Y, van den Hengel A, He M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 1119–1127. Li B, Shen C, Dai Y, van den Hengel A, He M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 1119–1127.
55.
Zurück zum Zitat Ma F, Karaman S. 2017. Sparse-to-dense: depth prediction from sparse depth samples and a single image. arXiv:1709.07492. Ma F, Karaman S. 2017. Sparse-to-dense: depth prediction from sparse depth samples and a single image. arXiv:1709.​07492.
56.
Zurück zum Zitat Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T. Demon: depth and motion network for learning monocular stereo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T. Demon: depth and motion network for learning monocular stereo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
57.
Zurück zum Zitat Xie J, Girshick R, Farhadi A. Deep3d: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. European Conference on Computer Vision (ECCV). Springer; 2016. p. 842–857. Xie J, Girshick R, Farhadi A. Deep3d: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. European Conference on Computer Vision (ECCV). Springer; 2016. p. 842–857.
58.
Zurück zum Zitat Zhong Y, Dai Y, Li H. 2017. Self-supervised learning for stereo matching with self-improving ability. arXiv:1709.00930. Zhong Y, Dai Y, Li H. 2017. Self-supervised learning for stereo matching with self-improving ability. arXiv:1709.​00930.
59.
Zurück zum Zitat Yang Z, Wang P, Xu W, Zhao L, Nevatia R. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv:1711.03665. Yang Z, Wang P, Xu W, Zhao L, Nevatia R. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv:1711.​03665.
60.
Zurück zum Zitat Vijayanarasimhan S, Ricco S, Schmid C, Sukthankar R, Fragkiadaki K. Sfm-net: learning of structure and motion from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Vijayanarasimhan S, Ricco S, Schmid C, Sukthankar R, Fragkiadaki K. Sfm-net: learning of structure and motion from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
61.
Zurück zum Zitat Clark R, Wang S, Markham A, Trigoni N, Wen H. Vidloc: 6-DoF video-clip relocalization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Clark R, Wang S, Markham A, Trigoni N, Wen H. Vidloc: 6-DoF video-clip relocalization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
62.
Zurück zum Zitat Li R, Wang S, Long Z, Gu D. 2017. Undeepvo: monocular visual odometry through unsupervised deep learning. arXiv:1709.06841. Li R, Wang S, Long Z, Gu D. 2017. Undeepvo: monocular visual odometry through unsupervised deep learning. arXiv:1709.​06841.
63.
64.
Zurück zum Zitat Kendall A, Cipolla R. Modelling uncertainty in deep learning for camera relocalization. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2016. p. 4762–4769. Kendall A, Cipolla R. Modelling uncertainty in deep learning for camera relocalization. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2016. p. 4762–4769.
65.
Zurück zum Zitat Kendall A, Cipolla R. Geometric loss functions for camera pose regression with deep learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. Kendall A, Cipolla R. Geometric loss functions for camera pose regression with deep learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
66.
Zurück zum Zitat Li R, Liu Q, Gui J, Gu D, Hu H. Indoor relocalization in challenging environments with dual-stream convolutional neural networks. IEEE Trans Autom Sci Eng 2018;15(2):651–62.CrossRef Li R, Liu Q, Gui J, Gu D, Hu H. Indoor relocalization in challenging environments with dual-stream convolutional neural networks. IEEE Trans Autom Sci Eng 2018;15(2):651–62.CrossRef
67.
Zurück zum Zitat Li R, Liu Q, Gui J, Gu D, Hu H. Night-time indoor relocalization using depth image with convolutional neural networks. International Conference on Automation and Computing (ICAC). IEEE; 2016. p. 261–266. Li R, Liu Q, Gui J, Gu D, Hu H. Night-time indoor relocalization using depth image with convolutional neural networks. International Conference on Automation and Computing (ICAC). IEEE; 2016. p. 261–266.
68.
Zurück zum Zitat Hazirbas FWC, Sattler LL-TT, Hilsenbeck S, Cremers D. Image-based localization using LSTMs for structured feature correlation. Hazirbas FWC, Sattler LL-TT, Hilsenbeck S, Cremers D. Image-based localization using LSTMs for structured feature correlation.
69.
Zurück zum Zitat Naseer T, Burgard W. Deep regression for monocular camera-based 6-DoF global localization in outdoor environments. Naseer T, Burgard W. Deep regression for monocular camera-based 6-DoF global localization in outdoor environments.
70.
Zurück zum Zitat DeTone D, Malisiewicz T, Rabinovich A. 2016. Deep image homography estimation. arXiv:1606.03798. DeTone D, Malisiewicz T, Rabinovich A. 2016. Deep image homography estimation. arXiv:1606.​03798.
71.
Zurück zum Zitat Costante G, Mancini M, Valigi P, Ciarfuglia TA. Exploring representation learning with CNNs for frame-to-frame ego-motion estimation. IEEE Robot Autom Lett 2016;1(1):18–25.CrossRef Costante G, Mancini M, Valigi P, Ciarfuglia TA. Exploring representation learning with CNNs for frame-to-frame ego-motion estimation. IEEE Robot Autom Lett 2016;1(1):18–25.CrossRef
72.
Zurück zum Zitat Wang S, Clark R, Wen H, Trigoni N. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 2018;37(4-5):513–42.CrossRef Wang S, Clark R, Wen H, Trigoni N. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 2018;37(4-5):513–42.CrossRef
73.
Zurück zum Zitat Melekhov I, Kannala J, Rahtu E. 2017. Relative camera pose estimation using convolutional neural networks. arXiv:1702.01381. Melekhov I, Kannala J, Rahtu E. 2017. Relative camera pose estimation using convolutional neural networks. arXiv:1702.​01381.
74.
Zurück zum Zitat Turan M, Almalioglu Y, Araujo H, Konukoglu E, Sitti M. Deep endovo: A recurrent convolutional neural network (rcnn) based visual odometry approach for endoscopic capsule robots. Neurocomputing 2018;275: 1861–70.CrossRef Turan M, Almalioglu Y, Araujo H, Konukoglu E, Sitti M. Deep endovo: A recurrent convolutional neural network (rcnn) based visual odometry approach for endoscopic capsule robots. Neurocomputing 2018;275: 1861–70.CrossRef
75.
Zurück zum Zitat Zhao H, O’Brien K, Li S, Shepherd RF. Optoelectronically innervated soft prosthetic hand via stretchable optical waveguides. Sci Robot 2016;1(1):eaai7529.CrossRef Zhao H, O’Brien K, Li S, Shepherd RF. Optoelectronically innervated soft prosthetic hand via stretchable optical waveguides. Sci Robot 2016;1(1):eaai7529.CrossRef
76.
Zurück zum Zitat Oliveira GL, Radwan N, Burgard W, Brox T. 2017. Topometric localization with deep learning. arXiv:1706.08775. Oliveira GL, Radwan N, Burgard W, Brox T. 2017. Topometric localization with deep learning. arXiv:1706.​08775.
77.
Zurück zum Zitat Peretroukhin V, Kelly J. 2017. DPC-Net: Deep pose correction for visual localization. arXiv:1709.03128. Peretroukhin V, Kelly J. 2017. DPC-Net: Deep pose correction for visual localization. arXiv:1709.​03128.
79.
Zurück zum Zitat Frost DP, Murray DW, Prisacariu VA. Using learning of speed to stabilize scale in monocular localization and mapping. Frost DP, Murray DW, Prisacariu VA. Using learning of speed to stabilize scale in monocular localization and mapping.
80.
Zurück zum Zitat Nguyen T, Chen SW, Shivakumar SS, Taylor CJ, Kumar V. 2017. Unsupervised deep homography: a fast and robust homography estimation model. arXiv:1709.03966. Nguyen T, Chen SW, Shivakumar SS, Taylor CJ, Kumar V. 2017. Unsupervised deep homography: a fast and robust homography estimation model. arXiv:1709.​03966.
81.
Zurück zum Zitat Clark R, Wang S, Wen H, Markham A, Trigoni N. VINet: visual-inertial odometry as a sequence-to-sequence learning problem. AAAI; 2017. p. 3995–4001. Clark R, Wang S, Wen H, Markham A, Trigoni N. VINet: visual-inertial odometry as a sequence-to-sequence learning problem. AAAI; 2017. p. 3995–4001.
82.
Zurück zum Zitat Turan M, Almalioglu Y, Gilbert H, Sari AE, Soylu U, Sitti M. 2017. Endo-VMFuseNet: Deep visual-magnetic sensor fusion approach for uncalibrated, unsynchronized and asymmetric endoscopic capsule robot localization data. arXiv:1709.06041. Turan M, Almalioglu Y, Gilbert H, Sari AE, Soylu U, Sitti M. 2017. Endo-VMFuseNet: Deep visual-magnetic sensor fusion approach for uncalibrated, unsynchronized and asymmetric endoscopic capsule robot localization data. arXiv:1709.​06041.
83.
Zurück zum Zitat Turan M, Almalioglu Y, Araujo H, Cemgil T, Sitti M. 2017. Endosensorfusion: particle filtering-based multi-sensory data fusion with switching state-space model for endoscopic capsule robots using recurrent neural network kinematics. arXiv:1709.03401. Turan M, Almalioglu Y, Araujo H, Cemgil T, Sitti M. 2017. Endosensorfusion: particle filtering-based multi-sensory data fusion with switching state-space model for endoscopic capsule robots using recurrent neural network kinematics. arXiv:1709.​03401.
84.
85.
Zurück zum Zitat Byravan A, Fox D. SE3-Nets: learning rigid body motion using deep neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 173–180. Byravan A, Fox D. SE3-Nets: learning rigid body motion using deep neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 173–180.
86.
Zurück zum Zitat Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39(12):2481–95.PubMedCrossRef Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39(12):2481–95.PubMedCrossRef
87.
Zurück zum Zitat Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.​00915.
88.
Zurück zum Zitat Wu Z, Shen C, Hengel Avd. 2016. Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080. Wu Z, Shen C, Hengel Avd. 2016. Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.​10080.
89.
91.
Zurück zum Zitat Zhao C, Sun L, Shuai B, Purkait P, Stolkin R. 2017. Dense RGB-D semantic mapping with pixel-voxel neural network. arXiv:1710.00132. Zhao C, Sun L, Shuai B, Purkait P, Stolkin R. 2017. Dense RGB-D semantic mapping with pixel-voxel neural network. arXiv:1710.​00132.
92.
Zurück zum Zitat Li R, Gu D, Liu Q, Long Z, Hu H. Semantic scene mapping with spatio-temporal deep neural network for robotic applications. Cogn Comput 2018;10(2):260–271.CrossRef Li R, Gu D, Liu Q, Long Z, Hu H. Semantic scene mapping with spatio-temporal deep neural network for robotic applications. Cogn Comput 2018;10(2):260–271.CrossRef
93.
94.
Zurück zum Zitat Kendall A, Badrinarayanan V, Cipolla R. 2015. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680. Kendall A, Badrinarayanan V, Cipolla R. 2015. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.​02680.
95.
Zurück zum Zitat Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH. Conditional random fields as recurrent neural networks. IEEE International Conference on Computer Vision; 2015. p. 1529–1537. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH. Conditional random fields as recurrent neural networks. IEEE International Conference on Computer Vision; 2015. p. 1529–1537.
96.
Zurück zum Zitat Arnab A, Jayasumana S, Zheng S, Torr PH. Higher order conditional random fields in deep neural networks. European Conference on Computer Vision. Springer; 2016. p. 524–540. Arnab A, Jayasumana S, Zheng S, Torr PH. Higher order conditional random fields in deep neural networks. European Conference on Computer Vision. Springer; 2016. p. 524–540.
97.
Zurück zum Zitat Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations; 2015. p. 1–14. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations; 2015. p. 1–14.
98.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–778. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–778.
99.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2009. p. 248–255. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2009. p. 248–255.
100.
Zurück zum Zitat Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.​7062.
101.
Zurück zum Zitat Chen L-C, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale: scale-aware semantic image segmentation. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 3640–3649. Chen L-C, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale: scale-aware semantic image segmentation. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 3640–3649.
102.
Zurück zum Zitat Yu F, Koltun V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122. Yu F, Koltun V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.​07122.
103.
Zurück zum Zitat Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis 2015;111(1):98–136.CrossRef Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis 2015;111(1):98–136.CrossRef
104.
Zurück zum Zitat Wu Z, Shen C, Hengel Avd. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339. Wu Z, Shen C, Hengel Avd. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.​04339.
105.
Zurück zum Zitat Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput 2018;10(2):272–281.CrossRef Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput 2018;10(2):272–281.CrossRef
106.
Zurück zum Zitat Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. Asian conference on computer vision; 2016. Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. Asian conference on computer vision; 2016.
107.
Zurück zum Zitat Valada A, Oliveira G, Brox T, Burgard W. Towards robust semantic segmentation using deep fusion. Robotics: Science and systems (RSS 2016) Workshop, Are the Sceptics Right? Limits and Potentials of Deep Learning in Robotics; 2016. Valada A, Oliveira G, Brox T, Burgard W. Towards robust semantic segmentation using deep fusion. Robotics: Science and systems (RSS 2016) Workshop, Are the Sceptics Right? Limits and Potentials of Deep Learning in Robotics; 2016.
108.
Zurück zum Zitat Valada A, Vertens J, Dhall A, Burgard W. Adapnet: adaptive semantic segmentation in adverse environmental conditions. IEEE International conference on robotics and automation (ICRA). IEEE; 2017. Valada A, Vertens J, Dhall A, Burgard W. Adapnet: adaptive semantic segmentation in adverse environmental conditions. IEEE International conference on robotics and automation (ICRA). IEEE; 2017.
109.
Zurück zum Zitat Hülse M., McBride S, Lee M. Fast learning mapping schemes for robotic hand–eye coordination. Cogn Comput 2010;2(1):1–16.CrossRef Hülse M., McBride S, Lee M. Fast learning mapping schemes for robotic hand–eye coordination. Cogn Comput 2010;2(1):1–16.CrossRef
110.
Zurück zum Zitat Pathak D, Krahenbuhl P, Darrell T. Constrained convolutional neural networks for weakly supervised segmentation. Proceedings of the IEEE international conference on computer vision; 2015. p. 1796–1804. Pathak D, Krahenbuhl P, Darrell T. Constrained convolutional neural networks for weakly supervised segmentation. Proceedings of the IEEE international conference on computer vision; 2015. p. 1796–1804.
Metadaten
Titel
Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities
verfasst von
Ruihao Li
Sen Wang
Dongbing Gu
Publikationsdatum
08.09.2018
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 6/2018
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-018-9591-8

Weitere Artikel der Ausgabe 6/2018

Cognitive Computation 6/2018 Zur Ausgabe