survey

Visual SLAM and Structure from Motion in Dynamic Environments: A Survey

Authors:
Muhamad Risqi U. Saputra

Department of Computer Science, University of Oxford, Oxford, United Kingdom

Department of Computer Science, University of Oxford, Oxford, United Kingdom

0000-0002-8056-4919
View Profile

,
Andrew Markham

Department of Computer Science, University of Oxford, Oxford, United Kingdom

Department of Computer Science, University of Oxford, Oxford, United Kingdom
View Profile

,
Niki Trigoni

Department of Computer Science, University of Oxford, Oxford, United Kingdom

Department of Computer Science, University of Oxford, Oxford, United Kingdom
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 51 Issue 2Article No.: 37pp 1–36https://doi.org/10.1145/3177853

Published:20 February 2018Publication History

ACM Computing Surveys

Abstract

In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this article, we present for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments. We identify three main problems: how to perform reconstruction (robust visual SLAM), how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from the perspective of practicality and robustness.

References

Vincent J. Aidala and Sherry E. Hammel. 1983. Utilization of modified polar coordinates for bearings-only tracking. IEEE Trans. Automat. Contr. 28, 3 (1983), 283--294.Google ScholarCross Ref
Hirotogu Akaike. 1973. Information theory and an extension of the maximum likelihood principle. In Int. Symp. Inf. Theory. 267--281.Google Scholar
Ijaz Akhter, Sohaib Khan, Yaser Sheikh, and Takeo Kanade. 2008. Nonrigid structure from motion in trajectory space. In Adv. Neural Inf. Process. Syst., Vol. 1. 1--8. Google ScholarDigital Library
Pablo F. Alcantarilla, José J. Yebes, Javier Almazán, and Luis M. Bergasa. 2012. On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In IEEE Int. Conf. Robot. Autom. 1290--1297.Google Scholar
Shai Avidan and Amnon Shashua. 1999. Trajectory triangulation of lines: Reconstruction of a 3D point moving along a line from a monocular image sequence. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2. 66.Google ScholarCross Ref
Shai Avidan and Amnon Shashua. 2000. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4 (2000), 348--357. Google ScholarDigital Library
Mohammadreza Babaee, Duc Tung Dinh, and Gerhard Rigoll. 2017. A deep convolutional neural network for background subtraction. In arXiv:1702.01731.Google Scholar
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (2008), 346--359. Google ScholarDigital Library
Paul A. Beardsley, Andrew Zisserman, and David W. Murray. 1994. Navigation using affine structure from motion. In Eur. Conf. Comput. Vis. 85--96. Google ScholarDigital Library
Francisco Bonin-Font, Alberto Ortiz, Gabriel Oliver, Francisco Bonin-font Alberto, and Ortiz Gabriel. 2008. Visual navigation for mobile robots: A survey. J. Intell. Robot. Syst. 53 (2008), 263--296. Google ScholarDigital Library
Jean-Yves Bouguet. 2000. Pyramidal implementation of the affine Lucas Kanade feature tracker - Description of the algorithm. Intel Corp. Microprocess. Res. Labs.Google Scholar
Terrance E. Boult and Lisa Gottesfeld Brown. 1991. Factorization-based segmentation of motions. In IEEE Work. Vis. Motion.Google Scholar
Christoph Bregler, Aaron Herzmann, and Henning Biermann. 2000. Recovering non-rigid 3D shape from image streams. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Michael D. Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and Luc Van Gool. 2011. Online multi-person tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. 33, 9 (2011), 1820--1833. Google ScholarDigital Library
Arunkumar Byravan and Dieter Fox. 2017. SE3-Nets: Learning rigid body motion using deep neural networks. In IEEE Int. Conf. Robot. Autom.Google ScholarCross Ref
Jean-pierre L. E. Cadre and Olivier Tremois. 1998. Bearings-only tracking for maneuvering sources. IEEE Trans. Aerosp. Electron. Syst. 34, 1 (1998), 179--193.Google ScholarCross Ref
Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary robust independent elementary features. In Eur. Conf. Comput. Vis. 778--792. Google ScholarDigital Library
Robert O. Castle, Georg Klein, and David W. Murray. 2011. Wide-area augmented reality using camera tracking and mapping in multiple regions. Comput. Vis. Image Underst. 115, 6 (2011), 854--867. Google ScholarDigital Library
Stephen M. Chaves, Ayoung Kim, and Ryan M. Eustice. 2014. Opportunistic sampling-based planning for active visual SLAM. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.Google Scholar
Jinhui Chen and Jian Yang. 2014. Robust subspace segmentation by low-rank representation. IEEE Trans. Cybern. 44, 8 (2014), 1432--1445.Google ScholarCross Ref
Falak Chhaya, Dinesh Reddy, Sarthak Upadhyay, Visesh Chari, M. Zeeshan Zia, and K. Madhava Krishna. 2016. Monocular reconstruction of vehicles: Combining SLAM with shape priors. In IEEE Int. Conf. Robot. Autom. 5758--5765.Google Scholar
Ondrej Chum and Jiri Matas. 2005. Matching with PROSAC-Progressive Sample Consensus. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 220--226. Google ScholarDigital Library
Burcu Cinaz and Holger Kenn. 2008. HeadSLAM - Simultaneous localization and mapping with head-mounted inertial and laser range sensors. In IEEE Int. Symp. Wearable Comput. Google ScholarDigital Library
Joao Costeira and Takeo Kanade. 1995. A multi-body factorization method for motion analysis. In Int. Conf. Comput. Vis. 1071--1076. Google ScholarDigital Library
João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179. Google ScholarDigital Library
Mark Cummins and Paul Newman. 2008. FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Rob. Res. 27, 6 (2008), 647--665. Google ScholarDigital Library
Yuchao Dai, Hongdong Li, and Mingyi He. 2014. A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107, 2 (2014), 101--122. Google ScholarDigital Library
Danping Zhou and Ping Tan. 2012. CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2 (2012), 354--366. Google ScholarDigital Library
Andrew J. Davison. 2003. Real-time simultaneous localisation and mapping with a single camera. In IEEE Int. Conf. Comput. Vis. Google ScholarDigital Library
Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2015. Moving object detection in real-time using stereo from a mobile platform. Unmanned Syst. 3, 4 (2015), 253--266.Google ScholarCross Ref
Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2014. Real-time mobile object detection using stereo. In 13th Int. Conf. Control Autom. Robot. Vis. (ICARCV’14). 1021--1026.Google ScholarCross Ref
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2016. Deep image homography estimation. In arXiv:1606.03798.Google Scholar
Alexey Dosovitskiy, Philipp Fischery, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2016. FlowNet: Learning optical flow with convolutional networks. In IEEE Int. Conf. Comput. Vis., Vol. 11-18-Dece. 2758--2766. Google ScholarDigital Library
Ehsan Elhamifar and Rene Vidal. 2009. Sparse subspace clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. 2790--2797.Google ScholarCross Ref
Ehsan Elhamifar and Rene Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781. Google ScholarDigital Library
Jakob Engel, Thomas Sch, and Daniel Cremers. 2014. LSD-SLAM: Direct monocular SLAM. In Eur. Conf. Comput. Vis. 834--849.Google Scholar
Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 (1981), 381--395. Google ScholarDigital Library
Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, and Jitendra Malik. 2015. Learning to segment moving objects in videos. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 4083--4090.Google ScholarCross Ref
Friedrich Fraundorfer and Davide Scaramuzza. 2012. Visual odometry: Part II - matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19, 2 (2012), 78--90.Google ScholarCross Ref
Jorge Fuentes-Pacheco, Jose Ruiz-Ascencio, and Juan Manuel Rendon-Mancha. 2012. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 43, 1 (2012), 55--81. Google ScholarDigital Library
Dorian Galvez-Lopez and Juan D. Tardos. 2012. Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 5 (2012), 1188--1197. Google ScholarDigital Library
Xiao Shan Gao, Xiao Rong Hou, Jianliang Tang, and Hang Fei Cheng. 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930--943. Google ScholarDigital Library
Emilio Garcia-Fidalgo and Alberto Ortiz. 2015. Vision-based topological mapping and localization methods: A survey. Rob. Auton. Syst. 64 (2015), 1--20. Google ScholarDigital Library
C. W. Gear. 1998. Multibody grouping from motion images. Int. J. Comput. Vis. 29, 2 (1998), 133--150. Google ScholarDigital Library
Andreas Geiger, Julius Ziegler, and Christoph Stiller. 2011. StereoScan: Dense 3D reconstruction in real-time. In IEEE Intell. Veh. Symp. 1--9.Google ScholarCross Ref
Arturo Gil, Oscar Reinoso, Monica Ballesta, and Miguel Julia. 2010. Multi-robot visual SLAM using a Rao-Blackwellized particle filter. Rob. Auton. Syst. 58, 1 (2010), 68--80. Google ScholarDigital Library
Georgia Gkioxari and Jitendra Malik. 2015. Finding action tubes. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Susanna Gladh, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Deep motion features for visual tracking. In Int. Conf. Pattern Recognit.Google ScholarCross Ref
Alvina Goh and Rene Vidal. 2007. Segmenting motions of different types by unsupervised manifold clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Venu Madhav Govindu. 2001. Combining two-view constraints for motion estimation. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
H. M. Gross, H. J. Boehme, C. Schroeter, S. Mueller, A. Koenig, Ch. Martin, M. Merten, and A. Bley. 2008. Shopbot: Progress in developing an interactive mobile shopping assistant for everyday use. In IEEE Int. Conf. Syst. Man Cybern. 3471--3478.Google Scholar
Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2015. Deep learning for visual understanding: A review. Neurocomputing 187 (2015), 27--48. Google ScholarDigital Library
Hugh C. Longuet-Higgins. 1981. A computer algorithm for reconstructing a scene from two projections. Nature 293 (1981), 133--135.Google ScholarCross Ref
Mei Han and Takeo Kanade. 2004. Reconstruction of a scene with multiple linearly moving objects. Int. J. Comput. Vis. 59, 3 (2004), 285--300. Google ScholarDigital Library
Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, and Andrew Davison. 2016. gvnn: Neural network library for geometric computer vision. In arXiv:1607.07405.Google Scholar
Chris Harris and Carl Stennett. 1990. RAPID - A video rate object tracker. In Br. Mach. Vis. Conf.Google ScholarCross Ref
Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Alvey Vis. Conf. 147--151.Google ScholarCross Ref
Richard Hartley and Frederik Schaffalitzky. 2003. PowerFactorization: 3D reconstruction with missing or uncertain data. In Aust. Adv. Work. Comput. Vis., Vol. 74. 1--9.Google Scholar
Richard Hartley and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. Google ScholarDigital Library
Richard I. Hartley and Peter Sturm. 1997. Triangulation. Comput. Vis. Image Underst. 68, 2 (1997), 146--157. Google ScholarDigital Library
Stephan Heuel and Wolfgang Förstner. 2001. Matching, reconstructing and grouping 3D lines from multiple views using uncertain projective geometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Berthold K. P. Horn and Brian G. Schunck. 1981. Determining optical flow. Artif. Intell. 17, 1--3 (1981), 185--203. Google ScholarDigital Library
Stefan Hrabar, Gaurav S. Sukhatme, Peter Corke, Kane Usher, and Jonathan Roberts. 2005. Combined optic-flow and stereo-based navigation of urban canyons for a UAV. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 302--309.Google ScholarCross Ref
Thomas S. Huang and Arun N. Netravali. 1994. Motion and structure from feature correspondences: A review. Proc. IEEE 82, 2 (1994), 252--268.Google ScholarCross Ref
Naoyuki Ichimura. 1999. Motion segmentation based on factorization method and discriminant critea. In IEEE Int. Conf. Comput. Vis.Google Scholar
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Eagle S. Jones and Stefano Soatto. 2011. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. Int. J. Rob. Res. 30, 4 (2011), 1--38. Google ScholarDigital Library
Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422. Google ScholarDigital Library
Jeremy Yirmeyahu Kaminski and Mina Teicher. 2002. General trajectory triangulation. In Eur. Conf. Comput. Vis. 823--836. Google ScholarDigital Library
Jeremy Yirmeyahu Kaminski and Mina Teicher. 2004. A general framework for trajectory optimization. J. Math. Imaging Vis. 21 (2004), 27--41.Google ScholarDigital Library
Kenichi Kanatani. 1996. Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier. Google ScholarDigital Library
Kenichi Kanatani. 2001. Motion segmentation by subspace separation and model selection. In IEEE Int. Conf. Comput. Vis. 586--591.Google ScholarCross Ref
Kenichi Kanatani and Chikara Matsunaga. 2002. Estimating the number of independent motions for multibody motion segmentation. In Asian Conf. Comput. Vis.Google Scholar
Jens Klappstein, Tobi Vaudrey, Clemens Rabe, Andreas Wedel, and Reinhard Klette. 2009. Moving object segmentation using optical flow and depth information. In Pacific-Rim Symp. Image Video Technol. 611--623. Google ScholarDigital Library
Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In IEEE ACM Int. Symp. Mix. Augment. Real. Google ScholarDigital Library
Georg Klein and David Murray. 2009. Parallel tracking and mapping on a camera phone. In 8th IEEE Int. Symp. Mix. Augment. Real. 83--86. Google ScholarDigital Library
Kishore Konda and Roland Memisevic. 2013. Unsupervised learning of depth and motion. In arXiv:1312.3429.Google Scholar
Kishore Konda and Roland Memisevic. 2015. Learning visual odometry with a convolutional network. In Int. Conf. Comput. Vis. Theory Appl. 486--490.Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarDigital Library
Suryansh Kumar, Yuchao Dai, and Hongdong Li. 2016. Multi-body non-rigid structure-from-motion. In Int. Conf. 3D Vis. 148--156.Google ScholarCross Ref
Rainer Kummerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. G2o: A general framework for graph optimization. In IEEE Int. Conf. Robot. Autom. 3607--3613.Google Scholar
Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2010. Realtime motion segmentation based multibody visual SLAM. In 7th Indian Conf. Comput. Vision, Graph. Image Process. 251--258. Google ScholarDigital Library
Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2011. Realtime multibody visual SLAM and tracking with a smoothly moving monocular camera. In IEEE Int. Conf. Comput. Vis. Google ScholarDigital Library
Abhijit Kundu, K. Madhava Krishna, and Jayanthi Sivaswamy. 2009. Moving object detection by multi-view geometric techniques from a single camera mounted robot. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 4306--4312. Google ScholarDigital Library
Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. 3D Vis. 239--248.Google ScholarCross Ref
Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Y. Ng. 2011. ICA with reconstruction cost for efficient overcomplete feature learning. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarDigital Library
Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2011. Building high-level features using large scale unsupervised learning. In Int. Conf. Mach. Learn. 38115. Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2016. Deep learning. Nature 521 (2016), 436--444.Google ScholarCross Ref
Kuan Hui Lee, Jenq Neng Hwang, Greg Okapal, and James Pitton. 2014. Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel. In 17th IEEE Int. Conf. Intell. Transp. Syst. 2629--2635.Google Scholar
Kuan-hui Lee, Jenq-neng Hwang, Greg Okopal, and James Pitton. 2016. Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels. IEEE Trans. Intell. Transp. Syst. 17, 12 (2016), 3602--3612. Google ScholarDigital Library
Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In IEEE Int. Conf. Comput. Vis. 2548--2555. Google ScholarDigital Library
Stefan Leutenegger, Paul Furgale, Vincent Rabaud, Margarita Chli, Kurt Konolige, and Roland Siegwart. 2013. Keyframe-based visual-inertial SLAM using nonlinear optimization. Int. J. Rob. Res. 34, 3 (2013), 314--334. Google ScholarDigital Library
Ting Li, Vinutha Kallem, Dheeraj Singaraju, and Rene Vidal. 2007. Projective factorization of multiple rigid-body motions. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Hyon Lim, Jongwoo Lim, and H. Jin Kim. 2014. Real-time 6-DOF monocular visual SLAM in a large-scale environment. In IEEE Int. Conf. Robot. Autom.Google Scholar
Kuen-Han Lin and Chieh-Chih Wang. 2010. Stereo-based simultaneous localization, mapping and moving object tracking. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.Google Scholar
Tsung Han Lin and Chieh-Chih Wang. 2014. Deep learning of spatio-temporal features with geometric-based moving point detection for motion segmentation. In IEEE Int. Conf. Robot. Autom. 3058--3065.Google ScholarCross Ref
Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184. Google ScholarDigital Library
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recognit. 3431--3440.Google ScholarCross Ref
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110. Google ScholarDigital Library
Bruce D. Lucas and Takeo Kanade. 1981. An Iterative Image Registration Technique with an Application to Stereo Vision. In DARPA Image Underst. Work. 121--130.Google ScholarDigital Library
Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
Christopher Mei, Gabe Sibley, Mark Cummins, Paul Newman, and Ian Reid. 2011. RSLAM: A system for large-scale mapping in constant-time using stereo. Int. J. Comput. Vis. 94, 2 (2011), 198--214. Google ScholarDigital Library
Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. 2017. Relative camera pose estimation using convolutional neural networks. In arXiv:1702.01381.Google Scholar
Davide Migliore, Roberto Rigamonti, Daniele Marzorati, Matteo Matteucci, and Domenico G. Sorrenti. 2009. Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. In ICRA Work. Safe Navig. Open Dyn. Environ. Appl. to Auton. Veh.Google Scholar
Vikram Mohanty, Shubh Agrawal, Shaswat Datta, Arna Ghosh, Vishnu Dutt Sharma, and Debashish Chakravarty. 2016. DeepVO: A deep learning approach for monocular visual odometry. In arXiv:1611.06069.Google Scholar
Toshihiko Morita and Takeo Kanade. 1993. A sequential factorization method for recovering shape and motion from image streams. Proc. Natl. Acad. Sci. 90, 21 (1993), 9795--9802.Google ScholarCross Ref
Pierre Moulon, Pascal Monasse, and Renaud Marlet. 2013. Global fusion of relative motions for robust, accurate and scalable structure from motion. In IEEE Int. Conf. Comput. Vis. 3248--3255. Google ScholarDigital Library
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Monocular vision based SLAM for mobile robots. In 18th Int. Conf. Pattern Recognit. Google ScholarDigital Library
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Real time localization and 3D reconstruction. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1--8. Google ScholarDigital Library
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2007. Generic and real-time structure from motion. In Br. Mach. Vis. Conf. 64.1--64.10.Google ScholarCross Ref
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2009. Generic and real-time structure from motion using local bundle adjustment. Image Vis. Comput. 27, 8 (2009), 1178--1193. Google ScholarDigital Library
Peter Muller and Andreas Savakis. 2017. Flowdometry: An optical flow and deep learning based approach to visual odometry. In IEEE Winter Conf. Appl. Comput. Vis.Google ScholarCross Ref
Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 5 (2015), 1147--1163.Google ScholarDigital Library
Yohei Murakami, Takeshi Endo, Yoshimichi Ito, and Noboru Babaguchi. 2012. Depth-estimation-free projective factorization and its application to 3D reconstruction. In Asian Conf. Comput. Vis. 150--162. Google ScholarDigital Library
Richard A. Newcombe, David Molyneaux, David Kim, Andrew J. Davison, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In IEEE Int. Symp. Mix. Augment. Real. 127--136. Google ScholarDigital Library
David Nister. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (2004), 756--770. Google ScholarDigital Library
David Nistér, Oleg Naroditsky, and James Bergen. 2004. Visual odometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 652--659.Google ScholarCross Ref
John Oliensis. 2000. A critique of structure-from-motion algorithms. Comput. Vis. Image Underst. 80, 2 (2000), 172--214. Google ScholarDigital Library
D. Ortín and J. Montiel. 2001. Indoor robot motion based on monocular images. Robotica 19, 3 (2001), 331--342. Google ScholarDigital Library
Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. SMC-9, 1 (1979), 62--66.Google ScholarCross Ref
Kemal Egemen Ozden, Kurt Cornelis, Luc Van Eycken, and Luc Van Gool. 2004. Reconstructing 3D trajectories of independently moving objects using generic constraints. Comput. Vis. Image Underst. 96, 3 (2004), 453--471. Google ScholarDigital Library
Kemal E. Ozden, Konrad Schindler, and Luc Van Gool. 2010. Multibody structure-from-motion in practice. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6 (2010), 1134--1141. Google ScholarDigital Library
Marco Paladini, Alessio Del Bue, Marko Stošić, Marija Dodig, João Xavier, and Lourdes Agapito. 2009. Factorization for non-rigid and articulated structure using metric projections. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2898--2905.Google ScholarCross Ref
Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2010. 3D reconstruction of a moving point from a series of 2D projections. In Eur. Conf. Comput. Vis. 158--171. Google ScholarDigital Library
Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vis. 115, 2 (2015), 115--135. Google ScholarDigital Library
Massimo Piccardi. 2004. Background subtraction techniques: A review. In EEE Int. Conf. Syst. Man Cybern., Vol. 4. 3099--3104.Google ScholarCross Ref
Jouni Rantakokko, Joakim Rydell, Peter Strömbäck, Peter Händel, Jonas Callmer, David Törnqvist, Fredrik Gustafsson, Magnus Jobs, and Mathias Grudén. 2011. Accurate and reliable soldier and first responder indoor positioning: Multisensor systems and cooperative localization. IEEE Wirel. Commun. 18, 2 (2011), 10--18.Google ScholarCross Ref
Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma. 2010. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1832--1845. Google ScholarDigital Library
Jorma Rissanen. 1984. Universal coding, information, prediction, and eestimation. IEEE Trans. Inf. Theory 30, 4 (1984), 629--636. Google ScholarDigital Library
Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In Eur. Conf. Comput. Vis., Vol. 1. 430--443. Google ScholarDigital Library
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In IEEE Int. Conf. Comput. Vis. 2564--2571. Google ScholarDigital Library
Reza Sabzevari and Davide Scaramuzza. 2014. Monocular simultaneous multi-body motion segmentation and reconstruction from perspective views. In IEEE Int. Conf. Robot. Autom. 23--30.Google ScholarCross Ref
Reza Sabzevari and Davide Scaramuzza. 2016. Multi-body motion estimation from monocular vehicle-mounted cameras. IEEE Trans. Robot. 32, 3 (2016), 638--651.Google ScholarCross Ref
Muhamad Risqi Utama Saputra, Widyawan, and Paulus Insap Santosa. 2014. Obstacle avoidance for visually impaired using auto-adaptive thresholding on Kinect’s depth image. In 11th IEEE Int. Conf. Ubiquitous Intell. Comput. 337--342. Google ScholarDigital Library
Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 1999 (2003), 119--155. Google ScholarDigital Library
Davide Scaramuzza. 2011. 1-point-RANSAC structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints. Int. J. Comput. Vis. 95, 1 (2011), 74--85. Google ScholarDigital Library
Davide Scaramuzza, Friedrich Fraundorfer, and Roland Siegwart. 2009. Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC. In IEEE Int. Conf. Robot. Autom. 4293--4299. Google ScholarDigital Library
Konrad Schindler and David Suter. 2005. Two-view multibody structure-and-motion with outliers. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Google ScholarDigital Library
Konrad Schindler and David Suter. 2006. Two-view multibody structure-and-motion with outliers through model selection. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6 (2006), 983--995. Google ScholarDigital Library
Konrad Schindler, David Suter, and Hanzi Wang. 2008. A model-selection framework for multibody structure-and-motion of image sequences. Int. J. Comput. Vis. 79, 2 (2008), 159--177. Google ScholarDigital Library
Konrad Schindler, James U., and Hanzi Wang. 2006. Perspective n-view multibody structure-and-motion through model selection. In Eur. Conf. Comput. Vis., Vol. 1. 606--619. Google ScholarDigital Library
Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In IEEE Conf. Comput. Vis. Pattern Recognit. 4104--4113.Google ScholarCross Ref
Gideon Schwarz. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.Google ScholarCross Ref
Amnon Shashua, Shai Avidan, and Michael Werman. 1999. Trajectory triangulation over conic sections. In IEEE Int. Conf. Comput. Vis.Google ScholarCross Ref
Gabe Sibley, Christopher Mei, Ian Reid, and Paul Newman. 2010. Vast-scale outdoor navigation using adaptive relative bundle adjustment. Int. J. Rob. Res. 29, 8 (2010), 958--980. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarDigital Library
Noah Snavely, Steven Seitz, and Richard Szeliski. 2006. PhotoTourism: Exploring photo collections in 3D. In SIGGRAPH Conf. Proc. 835--846. Google ScholarDigital Library
Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from internet photo collections. Int. J. Comput. Vis. 80, 2 (2008), 189--210. Google ScholarDigital Library
Joan Solà. 2007. Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: A Geometric and Probabilistic Approach. Ph.D. Dissertation. Institut National Politechnique de Toulouse.Google Scholar
Hauke Strasdat, J. M. M. Montiel, and Andrew J. Davison. 2012. Visual SLAM: Why filter? Image Vis. Comput. 30, 2 (2012), 65--77. Google ScholarDigital Library
Peter Sturm and Bill Triggs. 1996. A factorization based algorithm for multi-image projective structure and motion. In Eur. Conf. Comput. Vis., Vol. 1065. 710--720. Google ScholarDigital Library
Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. 2013. Robust monocular SLAM in dynamic environments. In IEEE Int. Symp. Mix. Augment. Real.Google Scholar
Ninad Thakoor, Jean Gao, and Venkat Devarajan. 2010. Multibody structure-and-motion segmentation by branch-and-bound model selection. IEEE Trans. Image Process. 19, 6 (2010), 1393--1402. Google ScholarDigital Library
Carlo Tomasi and Takeo Kanade. 1992. Shape and motion from image streams under orthography: A factorization method. In Int. J. Comput. Vis., Vol. 9. 137--154. Google ScholarDigital Library
Philip H. S. Torr. 1998. Geometric motion segmentation and model selection. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 356, 1740 (1998), 1321--1340.Google ScholarCross Ref
Philip H. S. Torr and Andrew Zisserman. 1997. Robust parameterization and computation of the trifocal tensor. Image Vis. Comput. 15, 8 (1997), 591--605.Google ScholarCross Ref
Philip H. S. Torr and Andrew Zisserman. 1999. Feature based methods for structure and motion estimation. In Int. Work. Vis. Algorithms. Google ScholarDigital Library
Philip H. S. Torr and Andrew Zisserman. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 1 (2000), 138--156. Google ScholarDigital Library
Roberto Tron and Rene Vidal. 2007. A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conf. Comput. Vis. Pattern Recognit. 1--8.Google ScholarCross Ref
Sepehr Valipour, Mennatullah Siam, Martin Jagersand, and Nilanjan Ray. 2017. Recurrent fully convolutional networks for video segmentation. In IEEE Winter Conf. Appl. Comput. Vis. 1--12.Google ScholarCross Ref
René Vidal. 2006. Online clustering of moving hyperplanes. In Adv. Neural Inf. Process. Syst. 1433--1440. Google ScholarDigital Library
Rene Vidal. 2011. Subspace clustering. IEEE Signal Process. Mag. 28, 2 (2011), 52--68.Google ScholarCross Ref
René Vidal and Richard Hartley. 2008. Three-view multibody structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2 (2008), 214--227. Google ScholarDigital Library
René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 40, 12 (2005), 1945--1959. Google ScholarDigital Library
René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1945--1959. Google ScholarDigital Library
René Vidal, Yi Ma, Stefano Soatto, and Shankar Sastry. 2006. Two-view multibody structure from motion. Int. J. Comput. Vis. 68, 1 (2006), 7--25. Google ScholarDigital Library
René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2002. Segmentation of dynamic scenes from the multibody fundamental matrix. In ECCV Work. Vis. Model. Dyn. Scenes.Google Scholar
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. SfM-Net: Learning of structure and motion from video. In arXiv:1704.07804.Google Scholar
Chieh-Chih Wang and Chuck Thorpe. 2002. Simultaneous localization and mapping with detection and tracking of moving objects. In IEEE Int. Conf. Robot. Autom., Vol. 3. 2918--2924.Google Scholar
Chieh-Chih Wang, Charles Thorpe, Sebastian Thrun, M. Hebert, and H. Durrant-Whyte. 2007. Simultaneous localization, mapping and moving object tracking. Int. J. Rob. Res. 26, 9 (2007), 889--916. Google ScholarDigital Library
Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2017. DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In IEEE Int. Conf. Robot. Autom.Google ScholarCross Ref
Yin Tien Wang, Ming Chun Lin, and Rung Chi Ju. 2010. Visual SLAM and moving-object detection for a small-size humanoid robot. Int. J. Adv. Robot. Syst. 7, 2 (2010), 133--138.Google ScholarCross Ref
Somkiat Wangsiripitak and David W. Murray. 2009. Avoiding moving outliers in visual SLAM by tracking moving objects. In IEEE Int. Conf. Robot. Autom. Google ScholarDigital Library
Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Int. Conf. 3D Vis. 127--134. Google ScholarDigital Library
Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 3057--3064. Google ScholarDigital Library
Jing Xiao, Jin-xiang Chai, and Takeo Kanade. 2004. A closed-form solution to non-rigid shape and motion recovery. In Eur. Conf. Comput. Vis. 573--587.Google ScholarCross Ref
Jingyu Yan and Marc Pollefeys. 2006. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Eur. Conf. Comput. Vis. Google ScholarDigital Library
Jingyu Yan and Marc Pollefeys. 2008. A factorization-based approach for articulated nonrigid shape, motion, and kinematic chain recovery from video. IEEE Trans. Pattern Anal. Mach. Intell. 30, 5 (2008), 865--877. Google ScholarDigital Library
Congyuan Yang, Daniel Robinson, and Rene Vidal. 2015. Sparse subspace clustering with missing entries. In Int. Conf. Mach. Learn. 2463--2472. Google ScholarDigital Library
Georges Younes, Daniel Asmar, and Elie Shammas. 2016. A survey on non-filter-based monocular visual SLAM systems. In arXiv:1607.00470.Google Scholar
Khalid Yousif, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. 2015. An overview to visual odometry and visual SLAM: Applications to mobile robotics. Intell. Ind. Syst. 1, 4 (2015), 289--311.Google ScholarCross Ref
Luca Zappella, Alessio Del Bue, Xavier Lladó, and Joaquim Salvi. 2013. Joint estimation of segmentation and structure from motion. Comput. Vis. Image Underst. 117, 2 (2013), 113--129. Google ScholarDigital Library
Hendrik Zender, Patric Jensfelt, and Geert Jan M. Kruijff. 2007. Human- and situation-aware people following. In IEEE Int. Work. Robot Hum. Interact. Commun. 1131--1136.Google Scholar
Dong Zhang and Ping Li. 2012. Visual odometry in dynamical scenes. Sensors Transducers J. 147, 12 (2012), 78--86.Google Scholar
Teng Zhang, Arthur Szlam, and Gilad Lerman. 2009. Median K-flats for hybrid linear modeling with many outliers. In Int. Conf. Comput. Vis. Work. 234--241.Google ScholarCross Ref
Enliang Zheng, Ke Wang, Enrique Dunn, and Jan Michael Frahm. 2014. Joint object class sequencing and trajectory triangulation (JOST). In Eur. Conf. Comput. Vis. 599--614.Google ScholarCross Ref
Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google Scholar

Index Terms

Visual SLAM and Structure from Motion in Dynamic Environments: A Survey
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Reconstruction
        Tracking
2. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

A review of monocular visual odometry
Abstract
Monocular visual odometry provides more robust functions on navigation and obstacle avoidance for mobile robots than other visual odometries, such as binocular visual odometry, RGB-D visual odometry and basic odometry. This paper describes the ...
Read More
Human-centered X - Y - T space path planning for mobile robot in dynamic environments

An autonomous mobile robot in a human's living space should be able to realize not only collision-free motion, but also human-centered motion, i.e., motion giving priority to a moving human according to the situation. In this study, we propose a real-...
Read More
A survey on Navigation Systems in Dynamic Environments
ICIST '20: Proceedings of the 10th International Conference on Information Systems and Technologies

Mobile robot navigation is a method of guiding a robot to accomplish a mission through an environment with obstacles in a good and safe manner. The main challenge of current mobile robotics is to develop intelligent navigation systems, where autonomous ...
Read More

Reviews

Reviewer: Giuseppina Carla Gini

Reconstructing an environment's 3D models is traditionally a computer vision problem, crucial for virtual reality (VR) applications and mobile robots that have to estimate the pose of the camera that moves with them. Well-known vision methods, such as structure from motion (SfM), and robotics methods, such as visual simultaneous localization and mapping (SLAM), while effective in static environments are still challenging in dynamic environments. This survey illustrates the state of the art of vision and robotics methods for real-time rendering in real-world environments containing dynamic objects. It proposes a taxonomy of the available approaches divided into three main themes: building static maps by rejecting dynamic features (robust visual SLAM), extracting moving objects while ignoring the static background (dynamic object segmentation and 3D tracking), and simultaneously handling the static and dynamic components of the world (joint motion segmentation and reconstruction). It also critically discusses the advantages and disadvantages of the many illustrated approaches, which rely on methods spanning from geometry to statistics to machine learning. The authors nicely organize about 200 references, using figures with flow diagrams and summarizing via tables the existing approaches. The paper can serve as an introduction for researchers new to the field, as well as a practical guide to specific approaches for application-oriented developers.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 51, Issue 2
March 2019
748 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3186333
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 February 2018
- Revised: 1 December 2017
- Accepted: 1 December 2017
- Received: 1 August 2017
Published in csur Volume 51, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D reconstruction
3D tracking
Structure from motion
deep learning
dynamic environments
dynamic object segmentation
motion segmentation
visual SLAM
visual odometry
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 258
  Total Citations
  View Citations
- 5,460
  Total Downloads
- Downloads (Last 12 months)725
- Downloads (Last 6 weeks)103
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Visual SLAM and Structure from Motion in Dynamic Environments: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

A review of monocular visual odometry

Human-centered X - Y - T space path planning for mobile robot in dynamic environments

A survey on Navigation Systems in Dynamic Environments

Reviews

Access critical reviews of Computing literature here