skip to main content
survey

Visual SLAM and Structure from Motion in Dynamic Environments: A Survey

Authors Info & Claims
Published:20 February 2018Publication History
Skip Abstract Section

Abstract

In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this article, we present for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments. We identify three main problems: how to perform reconstruction (robust visual SLAM), how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from the perspective of practicality and robustness.

References

  1. Vincent J. Aidala and Sherry E. Hammel. 1983. Utilization of modified polar coordinates for bearings-only tracking. IEEE Trans. Automat. Contr. 28, 3 (1983), 283--294.Google ScholarGoogle ScholarCross RefCross Ref
  2. Hirotogu Akaike. 1973. Information theory and an extension of the maximum likelihood principle. In Int. Symp. Inf. Theory. 267--281.Google ScholarGoogle Scholar
  3. Ijaz Akhter, Sohaib Khan, Yaser Sheikh, and Takeo Kanade. 2008. Nonrigid structure from motion in trajectory space. In Adv. Neural Inf. Process. Syst., Vol. 1. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Pablo F. Alcantarilla, José J. Yebes, Javier Almazán, and Luis M. Bergasa. 2012. On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In IEEE Int. Conf. Robot. Autom. 1290--1297.Google ScholarGoogle Scholar
  5. Shai Avidan and Amnon Shashua. 1999. Trajectory triangulation of lines: Reconstruction of a 3D point moving along a line from a monocular image sequence. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2. 66.Google ScholarGoogle ScholarCross RefCross Ref
  6. Shai Avidan and Amnon Shashua. 2000. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4 (2000), 348--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mohammadreza Babaee, Duc Tung Dinh, and Gerhard Rigoll. 2017. A deep convolutional neural network for background subtraction. In arXiv:1702.01731.Google ScholarGoogle Scholar
  8. Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (2008), 346--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paul A. Beardsley, Andrew Zisserman, and David W. Murray. 1994. Navigation using affine structure from motion. In Eur. Conf. Comput. Vis. 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Francisco Bonin-Font, Alberto Ortiz, Gabriel Oliver, Francisco Bonin-font Alberto, and Ortiz Gabriel. 2008. Visual navigation for mobile robots: A survey. J. Intell. Robot. Syst. 53 (2008), 263--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jean-Yves Bouguet. 2000. Pyramidal implementation of the affine Lucas Kanade feature tracker - Description of the algorithm. Intel Corp. Microprocess. Res. Labs.Google ScholarGoogle Scholar
  12. Terrance E. Boult and Lisa Gottesfeld Brown. 1991. Factorization-based segmentation of motions. In IEEE Work. Vis. Motion.Google ScholarGoogle Scholar
  13. Christoph Bregler, Aaron Herzmann, and Henning Biermann. 2000. Recovering non-rigid 3D shape from image streams. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  14. Michael D. Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and Luc Van Gool. 2011. Online multi-person tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. 33, 9 (2011), 1820--1833. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Arunkumar Byravan and Dieter Fox. 2017. SE3-Nets: Learning rigid body motion using deep neural networks. In IEEE Int. Conf. Robot. Autom.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jean-pierre L. E. Cadre and Olivier Tremois. 1998. Bearings-only tracking for maneuvering sources. IEEE Trans. Aerosp. Electron. Syst. 34, 1 (1998), 179--193.Google ScholarGoogle ScholarCross RefCross Ref
  17. Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary robust independent elementary features. In Eur. Conf. Comput. Vis. 778--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Robert O. Castle, Georg Klein, and David W. Murray. 2011. Wide-area augmented reality using camera tracking and mapping in multiple regions. Comput. Vis. Image Underst. 115, 6 (2011), 854--867. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Stephen M. Chaves, Ayoung Kim, and Ryan M. Eustice. 2014. Opportunistic sampling-based planning for active visual SLAM. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.Google ScholarGoogle Scholar
  20. Jinhui Chen and Jian Yang. 2014. Robust subspace segmentation by low-rank representation. IEEE Trans. Cybern. 44, 8 (2014), 1432--1445.Google ScholarGoogle ScholarCross RefCross Ref
  21. Falak Chhaya, Dinesh Reddy, Sarthak Upadhyay, Visesh Chari, M. Zeeshan Zia, and K. Madhava Krishna. 2016. Monocular reconstruction of vehicles: Combining SLAM with shape priors. In IEEE Int. Conf. Robot. Autom. 5758--5765.Google ScholarGoogle Scholar
  22. Ondrej Chum and Jiri Matas. 2005. Matching with PROSAC-Progressive Sample Consensus. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 220--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Burcu Cinaz and Holger Kenn. 2008. HeadSLAM - Simultaneous localization and mapping with head-mounted inertial and laser range sensors. In IEEE Int. Symp. Wearable Comput. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Joao Costeira and Takeo Kanade. 1995. A multi-body factorization method for motion analysis. In Int. Conf. Comput. Vis. 1071--1076. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mark Cummins and Paul Newman. 2008. FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Rob. Res. 27, 6 (2008), 647--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yuchao Dai, Hongdong Li, and Mingyi He. 2014. A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107, 2 (2014), 101--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Danping Zhou and Ping Tan. 2012. CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2 (2012), 354--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andrew J. Davison. 2003. Real-time simultaneous localisation and mapping with a single camera. In IEEE Int. Conf. Comput. Vis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2015. Moving object detection in real-time using stereo from a mobile platform. Unmanned Syst. 3, 4 (2015), 253--266.Google ScholarGoogle ScholarCross RefCross Ref
  31. Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2014. Real-time mobile object detection using stereo. In 13th Int. Conf. Control Autom. Robot. Vis. (ICARCV’14). 1021--1026.Google ScholarGoogle ScholarCross RefCross Ref
  32. Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2016. Deep image homography estimation. In arXiv:1606.03798.Google ScholarGoogle Scholar
  33. Alexey Dosovitskiy, Philipp Fischery, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2016. FlowNet: Learning optical flow with convolutional networks. In IEEE Int. Conf. Comput. Vis., Vol. 11-18-Dece. 2758--2766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ehsan Elhamifar and Rene Vidal. 2009. Sparse subspace clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. 2790--2797.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ehsan Elhamifar and Rene Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jakob Engel, Thomas Sch, and Daniel Cremers. 2014. LSD-SLAM: Direct monocular SLAM. In Eur. Conf. Comput. Vis. 834--849.Google ScholarGoogle Scholar
  37. Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 (1981), 381--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, and Jitendra Malik. 2015. Learning to segment moving objects in videos. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 4083--4090.Google ScholarGoogle ScholarCross RefCross Ref
  39. Friedrich Fraundorfer and Davide Scaramuzza. 2012. Visual odometry: Part II - matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19, 2 (2012), 78--90.Google ScholarGoogle ScholarCross RefCross Ref
  40. Jorge Fuentes-Pacheco, Jose Ruiz-Ascencio, and Juan Manuel Rendon-Mancha. 2012. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 43, 1 (2012), 55--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Dorian Galvez-Lopez and Juan D. Tardos. 2012. Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 5 (2012), 1188--1197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiao Shan Gao, Xiao Rong Hou, Jianliang Tang, and Hang Fei Cheng. 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930--943. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Emilio Garcia-Fidalgo and Alberto Ortiz. 2015. Vision-based topological mapping and localization methods: A survey. Rob. Auton. Syst. 64 (2015), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. W. Gear. 1998. Multibody grouping from motion images. Int. J. Comput. Vis. 29, 2 (1998), 133--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Andreas Geiger, Julius Ziegler, and Christoph Stiller. 2011. StereoScan: Dense 3D reconstruction in real-time. In IEEE Intell. Veh. Symp. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  46. Arturo Gil, Oscar Reinoso, Monica Ballesta, and Miguel Julia. 2010. Multi-robot visual SLAM using a Rao-Blackwellized particle filter. Rob. Auton. Syst. 58, 1 (2010), 68--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Georgia Gkioxari and Jitendra Malik. 2015. Finding action tubes. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  48. Susanna Gladh, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Deep motion features for visual tracking. In Int. Conf. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  49. Alvina Goh and Rene Vidal. 2007. Segmenting motions of different types by unsupervised manifold clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  50. Venu Madhav Govindu. 2001. Combining two-view constraints for motion estimation. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  51. H. M. Gross, H. J. Boehme, C. Schroeter, S. Mueller, A. Koenig, Ch. Martin, M. Merten, and A. Bley. 2008. Shopbot: Progress in developing an interactive mobile shopping assistant for everyday use. In IEEE Int. Conf. Syst. Man Cybern. 3471--3478.Google ScholarGoogle Scholar
  52. Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2015. Deep learning for visual understanding: A review. Neurocomputing 187 (2015), 27--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hugh C. Longuet-Higgins. 1981. A computer algorithm for reconstructing a scene from two projections. Nature 293 (1981), 133--135.Google ScholarGoogle ScholarCross RefCross Ref
  54. Mei Han and Takeo Kanade. 2004. Reconstruction of a scene with multiple linearly moving objects. Int. J. Comput. Vis. 59, 3 (2004), 285--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, and Andrew Davison. 2016. gvnn: Neural network library for geometric computer vision. In arXiv:1607.07405.Google ScholarGoogle Scholar
  56. Chris Harris and Carl Stennett. 1990. RAPID - A video rate object tracker. In Br. Mach. Vis. Conf.Google ScholarGoogle ScholarCross RefCross Ref
  57. Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Alvey Vis. Conf. 147--151.Google ScholarGoogle ScholarCross RefCross Ref
  58. Richard Hartley and Frederik Schaffalitzky. 2003. PowerFactorization: 3D reconstruction with missing or uncertain data. In Aust. Adv. Work. Comput. Vis., Vol. 74. 1--9.Google ScholarGoogle Scholar
  59. Richard Hartley and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Richard I. Hartley and Peter Sturm. 1997. Triangulation. Comput. Vis. Image Underst. 68, 2 (1997), 146--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Stephan Heuel and Wolfgang Förstner. 2001. Matching, reconstructing and grouping 3D lines from multiple views using uncertain projective geometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  62. Berthold K. P. Horn and Brian G. Schunck. 1981. Determining optical flow. Artif. Intell. 17, 1--3 (1981), 185--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Stefan Hrabar, Gaurav S. Sukhatme, Peter Corke, Kane Usher, and Jonathan Roberts. 2005. Combined optic-flow and stereo-based navigation of urban canyons for a UAV. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 302--309.Google ScholarGoogle ScholarCross RefCross Ref
  64. Thomas S. Huang and Arun N. Netravali. 1994. Motion and structure from feature correspondences: A review. Proc. IEEE 82, 2 (1994), 252--268.Google ScholarGoogle ScholarCross RefCross Ref
  65. Naoyuki Ichimura. 1999. Motion segmentation based on factorization method and discriminant critea. In IEEE Int. Conf. Comput. Vis.Google ScholarGoogle Scholar
  66. Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  67. Eagle S. Jones and Stefano Soatto. 2011. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. Int. J. Rob. Res. 30, 4 (2011), 1--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jeremy Yirmeyahu Kaminski and Mina Teicher. 2002. General trajectory triangulation. In Eur. Conf. Comput. Vis. 823--836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jeremy Yirmeyahu Kaminski and Mina Teicher. 2004. A general framework for trajectory optimization. J. Math. Imaging Vis. 21 (2004), 27--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Kenichi Kanatani. 1996. Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Kenichi Kanatani. 2001. Motion segmentation by subspace separation and model selection. In IEEE Int. Conf. Comput. Vis. 586--591.Google ScholarGoogle ScholarCross RefCross Ref
  73. Kenichi Kanatani and Chikara Matsunaga. 2002. Estimating the number of independent motions for multibody motion segmentation. In Asian Conf. Comput. Vis.Google ScholarGoogle Scholar
  74. Jens Klappstein, Tobi Vaudrey, Clemens Rabe, Andreas Wedel, and Reinhard Klette. 2009. Moving object segmentation using optical flow and depth information. In Pacific-Rim Symp. Image Video Technol. 611--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In IEEE ACM Int. Symp. Mix. Augment. Real. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Georg Klein and David Murray. 2009. Parallel tracking and mapping on a camera phone. In 8th IEEE Int. Symp. Mix. Augment. Real. 83--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Kishore Konda and Roland Memisevic. 2013. Unsupervised learning of depth and motion. In arXiv:1312.3429.Google ScholarGoogle Scholar
  78. Kishore Konda and Roland Memisevic. 2015. Learning visual odometry with a convolutional network. In Int. Conf. Comput. Vis. Theory Appl. 486--490.Google ScholarGoogle ScholarCross RefCross Ref
  79. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Suryansh Kumar, Yuchao Dai, and Hongdong Li. 2016. Multi-body non-rigid structure-from-motion. In Int. Conf. 3D Vis. 148--156.Google ScholarGoogle ScholarCross RefCross Ref
  81. Rainer Kummerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. G2o: A general framework for graph optimization. In IEEE Int. Conf. Robot. Autom. 3607--3613.Google ScholarGoogle Scholar
  82. Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2010. Realtime motion segmentation based multibody visual SLAM. In 7th Indian Conf. Comput. Vision, Graph. Image Process. 251--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2011. Realtime multibody visual SLAM and tracking with a smoothly moving monocular camera. In IEEE Int. Conf. Comput. Vis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Abhijit Kundu, K. Madhava Krishna, and Jayanthi Sivaswamy. 2009. Moving object detection by multi-view geometric techniques from a single camera mounted robot. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 4306--4312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. 3D Vis. 239--248.Google ScholarGoogle ScholarCross RefCross Ref
  86. Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Y. Ng. 2011. ICA with reconstruction cost for efficient overcomplete feature learning. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2011. Building high-level features using large scale unsupervised learning. In Int. Conf. Mach. Learn. 38115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2016. Deep learning. Nature 521 (2016), 436--444.Google ScholarGoogle ScholarCross RefCross Ref
  89. Kuan Hui Lee, Jenq Neng Hwang, Greg Okapal, and James Pitton. 2014. Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel. In 17th IEEE Int. Conf. Intell. Transp. Syst. 2629--2635.Google ScholarGoogle Scholar
  90. Kuan-hui Lee, Jenq-neng Hwang, Greg Okopal, and James Pitton. 2016. Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels. IEEE Trans. Intell. Transp. Syst. 17, 12 (2016), 3602--3612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In IEEE Int. Conf. Comput. Vis. 2548--2555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Stefan Leutenegger, Paul Furgale, Vincent Rabaud, Margarita Chli, Kurt Konolige, and Roland Siegwart. 2013. Keyframe-based visual-inertial SLAM using nonlinear optimization. Int. J. Rob. Res. 34, 3 (2013), 314--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Ting Li, Vinutha Kallem, Dheeraj Singaraju, and Rene Vidal. 2007. Projective factorization of multiple rigid-body motions. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  94. Hyon Lim, Jongwoo Lim, and H. Jin Kim. 2014. Real-time 6-DOF monocular visual SLAM in a large-scale environment. In IEEE Int. Conf. Robot. Autom.Google ScholarGoogle Scholar
  95. Kuen-Han Lin and Chieh-Chih Wang. 2010. Stereo-based simultaneous localization, mapping and moving object tracking. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.Google ScholarGoogle Scholar
  96. Tsung Han Lin and Chieh-Chih Wang. 2014. Deep learning of spatio-temporal features with geometric-based moving point detection for motion segmentation. In IEEE Int. Conf. Robot. Autom. 3058--3065.Google ScholarGoogle ScholarCross RefCross Ref
  97. Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recognit. 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  99. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Bruce D. Lucas and Takeo Kanade. 1981. An Iterative Image Registration Technique with an Application to Stereo Vision. In DARPA Image Underst. Work. 121--130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle ScholarCross RefCross Ref
  102. Christopher Mei, Gabe Sibley, Mark Cummins, Paul Newman, and Ian Reid. 2011. RSLAM: A system for large-scale mapping in constant-time using stereo. Int. J. Comput. Vis. 94, 2 (2011), 198--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. 2017. Relative camera pose estimation using convolutional neural networks. In arXiv:1702.01381.Google ScholarGoogle Scholar
  104. Davide Migliore, Roberto Rigamonti, Daniele Marzorati, Matteo Matteucci, and Domenico G. Sorrenti. 2009. Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. In ICRA Work. Safe Navig. Open Dyn. Environ. Appl. to Auton. Veh.Google ScholarGoogle Scholar
  105. Vikram Mohanty, Shubh Agrawal, Shaswat Datta, Arna Ghosh, Vishnu Dutt Sharma, and Debashish Chakravarty. 2016. DeepVO: A deep learning approach for monocular visual odometry. In arXiv:1611.06069.Google ScholarGoogle Scholar
  106. Toshihiko Morita and Takeo Kanade. 1993. A sequential factorization method for recovering shape and motion from image streams. Proc. Natl. Acad. Sci. 90, 21 (1993), 9795--9802.Google ScholarGoogle ScholarCross RefCross Ref
  107. Pierre Moulon, Pascal Monasse, and Renaud Marlet. 2013. Global fusion of relative motions for robust, accurate and scalable structure from motion. In IEEE Int. Conf. Comput. Vis. 3248--3255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Monocular vision based SLAM for mobile robots. In 18th Int. Conf. Pattern Recognit. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Real time localization and 3D reconstruction. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2007. Generic and real-time structure from motion. In Br. Mach. Vis. Conf. 64.1--64.10.Google ScholarGoogle ScholarCross RefCross Ref
  111. Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2009. Generic and real-time structure from motion using local bundle adjustment. Image Vis. Comput. 27, 8 (2009), 1178--1193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Peter Muller and Andreas Savakis. 2017. Flowdometry: An optical flow and deep learning based approach to visual odometry. In IEEE Winter Conf. Appl. Comput. Vis.Google ScholarGoogle ScholarCross RefCross Ref
  113. Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 5 (2015), 1147--1163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Yohei Murakami, Takeshi Endo, Yoshimichi Ito, and Noboru Babaguchi. 2012. Depth-estimation-free projective factorization and its application to 3D reconstruction. In Asian Conf. Comput. Vis. 150--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Richard A. Newcombe, David Molyneaux, David Kim, Andrew J. Davison, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In IEEE Int. Symp. Mix. Augment. Real. 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. David Nister. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (2004), 756--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. David Nistér, Oleg Naroditsky, and James Bergen. 2004. Visual odometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 652--659.Google ScholarGoogle ScholarCross RefCross Ref
  118. John Oliensis. 2000. A critique of structure-from-motion algorithms. Comput. Vis. Image Underst. 80, 2 (2000), 172--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. D. Ortín and J. Montiel. 2001. Indoor robot motion based on monocular images. Robotica 19, 3 (2001), 331--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. SMC-9, 1 (1979), 62--66.Google ScholarGoogle ScholarCross RefCross Ref
  121. Kemal Egemen Ozden, Kurt Cornelis, Luc Van Eycken, and Luc Van Gool. 2004. Reconstructing 3D trajectories of independently moving objects using generic constraints. Comput. Vis. Image Underst. 96, 3 (2004), 453--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Kemal E. Ozden, Konrad Schindler, and Luc Van Gool. 2010. Multibody structure-from-motion in practice. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6 (2010), 1134--1141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Marco Paladini, Alessio Del Bue, Marko Stošić, Marija Dodig, João Xavier, and Lourdes Agapito. 2009. Factorization for non-rigid and articulated structure using metric projections. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2898--2905.Google ScholarGoogle ScholarCross RefCross Ref
  124. Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2010. 3D reconstruction of a moving point from a series of 2D projections. In Eur. Conf. Comput. Vis. 158--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vis. 115, 2 (2015), 115--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Massimo Piccardi. 2004. Background subtraction techniques: A review. In EEE Int. Conf. Syst. Man Cybern., Vol. 4. 3099--3104.Google ScholarGoogle ScholarCross RefCross Ref
  127. Jouni Rantakokko, Joakim Rydell, Peter Strömbäck, Peter Händel, Jonas Callmer, David Törnqvist, Fredrik Gustafsson, Magnus Jobs, and Mathias Grudén. 2011. Accurate and reliable soldier and first responder indoor positioning: Multisensor systems and cooperative localization. IEEE Wirel. Commun. 18, 2 (2011), 10--18.Google ScholarGoogle ScholarCross RefCross Ref
  128. Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma. 2010. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1832--1845. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Jorma Rissanen. 1984. Universal coding, information, prediction, and eestimation. IEEE Trans. Inf. Theory 30, 4 (1984), 629--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In Eur. Conf. Comput. Vis., Vol. 1. 430--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In IEEE Int. Conf. Comput. Vis. 2564--2571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Reza Sabzevari and Davide Scaramuzza. 2014. Monocular simultaneous multi-body motion segmentation and reconstruction from perspective views. In IEEE Int. Conf. Robot. Autom. 23--30.Google ScholarGoogle ScholarCross RefCross Ref
  133. Reza Sabzevari and Davide Scaramuzza. 2016. Multi-body motion estimation from monocular vehicle-mounted cameras. IEEE Trans. Robot. 32, 3 (2016), 638--651.Google ScholarGoogle ScholarCross RefCross Ref
  134. Muhamad Risqi Utama Saputra, Widyawan, and Paulus Insap Santosa. 2014. Obstacle avoidance for visually impaired using auto-adaptive thresholding on Kinect’s depth image. In 11th IEEE Int. Conf. Ubiquitous Intell. Comput. 337--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 1999 (2003), 119--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Davide Scaramuzza. 2011. 1-point-RANSAC structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints. Int. J. Comput. Vis. 95, 1 (2011), 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Davide Scaramuzza, Friedrich Fraundorfer, and Roland Siegwart. 2009. Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC. In IEEE Int. Conf. Robot. Autom. 4293--4299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Konrad Schindler and David Suter. 2005. Two-view multibody structure-and-motion with outliers. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Konrad Schindler and David Suter. 2006. Two-view multibody structure-and-motion with outliers through model selection. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6 (2006), 983--995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Konrad Schindler, David Suter, and Hanzi Wang. 2008. A model-selection framework for multibody structure-and-motion of image sequences. Int. J. Comput. Vis. 79, 2 (2008), 159--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Konrad Schindler, James U., and Hanzi Wang. 2006. Perspective n-view multibody structure-and-motion through model selection. In Eur. Conf. Comput. Vis., Vol. 1. 606--619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In IEEE Conf. Comput. Vis. Pattern Recognit. 4104--4113.Google ScholarGoogle ScholarCross RefCross Ref
  143. Gideon Schwarz. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.Google ScholarGoogle ScholarCross RefCross Ref
  144. Amnon Shashua, Shai Avidan, and Michael Werman. 1999. Trajectory triangulation over conic sections. In IEEE Int. Conf. Comput. Vis.Google ScholarGoogle ScholarCross RefCross Ref
  145. Gabe Sibley, Christopher Mei, Ian Reid, and Paul Newman. 2010. Vast-scale outdoor navigation using adaptive relative bundle adjustment. Int. J. Rob. Res. 29, 8 (2010), 958--980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Noah Snavely, Steven Seitz, and Richard Szeliski. 2006. PhotoTourism: Exploring photo collections in 3D. In SIGGRAPH Conf. Proc. 835--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from internet photo collections. Int. J. Comput. Vis. 80, 2 (2008), 189--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Joan Solà. 2007. Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: A Geometric and Probabilistic Approach. Ph.D. Dissertation. Institut National Politechnique de Toulouse.Google ScholarGoogle Scholar
  150. Hauke Strasdat, J. M. M. Montiel, and Andrew J. Davison. 2012. Visual SLAM: Why filter? Image Vis. Comput. 30, 2 (2012), 65--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Peter Sturm and Bill Triggs. 1996. A factorization based algorithm for multi-image projective structure and motion. In Eur. Conf. Comput. Vis., Vol. 1065. 710--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. 2013. Robust monocular SLAM in dynamic environments. In IEEE Int. Symp. Mix. Augment. Real.Google ScholarGoogle Scholar
  153. Ninad Thakoor, Jean Gao, and Venkat Devarajan. 2010. Multibody structure-and-motion segmentation by branch-and-bound model selection. IEEE Trans. Image Process. 19, 6 (2010), 1393--1402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Carlo Tomasi and Takeo Kanade. 1992. Shape and motion from image streams under orthography: A factorization method. In Int. J. Comput. Vis., Vol. 9. 137--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Philip H. S. Torr. 1998. Geometric motion segmentation and model selection. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 356, 1740 (1998), 1321--1340.Google ScholarGoogle ScholarCross RefCross Ref
  156. Philip H. S. Torr and Andrew Zisserman. 1997. Robust parameterization and computation of the trifocal tensor. Image Vis. Comput. 15, 8 (1997), 591--605.Google ScholarGoogle ScholarCross RefCross Ref
  157. Philip H. S. Torr and Andrew Zisserman. 1999. Feature based methods for structure and motion estimation. In Int. Work. Vis. Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Philip H. S. Torr and Andrew Zisserman. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 1 (2000), 138--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Roberto Tron and Rene Vidal. 2007. A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conf. Comput. Vis. Pattern Recognit. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  160. Sepehr Valipour, Mennatullah Siam, Martin Jagersand, and Nilanjan Ray. 2017. Recurrent fully convolutional networks for video segmentation. In IEEE Winter Conf. Appl. Comput. Vis. 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  161. René Vidal. 2006. Online clustering of moving hyperplanes. In Adv. Neural Inf. Process. Syst. 1433--1440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Rene Vidal. 2011. Subspace clustering. IEEE Signal Process. Mag. 28, 2 (2011), 52--68.Google ScholarGoogle ScholarCross RefCross Ref
  163. René Vidal and Richard Hartley. 2008. Three-view multibody structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2 (2008), 214--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 40, 12 (2005), 1945--1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1945--1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. René Vidal, Yi Ma, Stefano Soatto, and Shankar Sastry. 2006. Two-view multibody structure from motion. Int. J. Comput. Vis. 68, 1 (2006), 7--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2002. Segmentation of dynamic scenes from the multibody fundamental matrix. In ECCV Work. Vis. Model. Dyn. Scenes.Google ScholarGoogle Scholar
  168. Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. SfM-Net: Learning of structure and motion from video. In arXiv:1704.07804.Google ScholarGoogle Scholar
  169. Chieh-Chih Wang and Chuck Thorpe. 2002. Simultaneous localization and mapping with detection and tracking of moving objects. In IEEE Int. Conf. Robot. Autom., Vol. 3. 2918--2924.Google ScholarGoogle Scholar
  170. Chieh-Chih Wang, Charles Thorpe, Sebastian Thrun, M. Hebert, and H. Durrant-Whyte. 2007. Simultaneous localization, mapping and moving object tracking. Int. J. Rob. Res. 26, 9 (2007), 889--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2017. DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In IEEE Int. Conf. Robot. Autom.Google ScholarGoogle ScholarCross RefCross Ref
  172. Yin Tien Wang, Ming Chun Lin, and Rung Chi Ju. 2010. Visual SLAM and moving-object detection for a small-size humanoid robot. Int. J. Adv. Robot. Syst. 7, 2 (2010), 133--138.Google ScholarGoogle ScholarCross RefCross Ref
  173. Somkiat Wangsiripitak and David W. Murray. 2009. Avoiding moving outliers in visual SLAM by tracking moving objects. In IEEE Int. Conf. Robot. Autom. Google ScholarGoogle ScholarDigital LibraryDigital Library
  174. Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Int. Conf. 3D Vis. 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 3057--3064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. Jing Xiao, Jin-xiang Chai, and Takeo Kanade. 2004. A closed-form solution to non-rigid shape and motion recovery. In Eur. Conf. Comput. Vis. 573--587.Google ScholarGoogle ScholarCross RefCross Ref
  177. Jingyu Yan and Marc Pollefeys. 2006. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Eur. Conf. Comput. Vis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Jingyu Yan and Marc Pollefeys. 2008. A factorization-based approach for articulated nonrigid shape, motion, and kinematic chain recovery from video. IEEE Trans. Pattern Anal. Mach. Intell. 30, 5 (2008), 865--877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Congyuan Yang, Daniel Robinson, and Rene Vidal. 2015. Sparse subspace clustering with missing entries. In Int. Conf. Mach. Learn. 2463--2472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Georges Younes, Daniel Asmar, and Elie Shammas. 2016. A survey on non-filter-based monocular visual SLAM systems. In arXiv:1607.00470.Google ScholarGoogle Scholar
  181. Khalid Yousif, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. 2015. An overview to visual odometry and visual SLAM: Applications to mobile robotics. Intell. Ind. Syst. 1, 4 (2015), 289--311.Google ScholarGoogle ScholarCross RefCross Ref
  182. Luca Zappella, Alessio Del Bue, Xavier Lladó, and Joaquim Salvi. 2013. Joint estimation of segmentation and structure from motion. Comput. Vis. Image Underst. 117, 2 (2013), 113--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Hendrik Zender, Patric Jensfelt, and Geert Jan M. Kruijff. 2007. Human- and situation-aware people following. In IEEE Int. Work. Robot Hum. Interact. Commun. 1131--1136.Google ScholarGoogle Scholar
  184. Dong Zhang and Ping Li. 2012. Visual odometry in dynamical scenes. Sensors Transducers J. 147, 12 (2012), 78--86.Google ScholarGoogle Scholar
  185. Teng Zhang, Arthur Szlam, and Gilad Lerman. 2009. Median K-flats for hybrid linear modeling with many outliers. In Int. Conf. Comput. Vis. Work. 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  186. Enliang Zheng, Ke Wang, Enrique Dunn, and Jan Michael Frahm. 2014. Joint object class sequencing and trajectory triangulation (JOST). In Eur. Conf. Comput. Vis. 599--614.Google ScholarGoogle ScholarCross RefCross Ref
  187. Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarGoogle Scholar

Index Terms

  1. Visual SLAM and Structure from Motion in Dynamic Environments: A Survey

          Recommendations

          Reviews

          Giuseppina Carla Gini

          Reconstructing an environment's 3D models is traditionally a computer vision problem, crucial for virtual reality (VR) applications and mobile robots that have to estimate the pose of the camera that moves with them. Well-known vision methods, such as structure from motion (SfM), and robotics methods, such as visual simultaneous localization and mapping (SLAM), while effective in static environments are still challenging in dynamic environments. This survey illustrates the state of the art of vision and robotics methods for real-time rendering in real-world environments containing dynamic objects. It proposes a taxonomy of the available approaches divided into three main themes: building static maps by rejecting dynamic features (robust visual SLAM), extracting moving objects while ignoring the static background (dynamic object segmentation and 3D tracking), and simultaneously handling the static and dynamic components of the world (joint motion segmentation and reconstruction). It also critically discusses the advantages and disadvantages of the many illustrated approaches, which rely on methods spanning from geometry to statistics to machine learning. The authors nicely organize about 200 references, using figures with flow diagrams and summarizing via tables the existing approaches. The paper can serve as an introduction for researchers new to the field, as well as a practical guide to specific approaches for application-oriented developers.

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 51, Issue 2
            March 2019
            748 pages
            ISSN:0360-0300
            EISSN:1557-7341
            DOI:10.1145/3186333
            • Editor:
            • Sartaj Sahni
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 February 2018
            • Revised: 1 December 2017
            • Accepted: 1 December 2017
            • Received: 1 August 2017
            Published in csur Volume 51, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • survey
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader