Abstract
In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this article, we present for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments. We identify three main problems: how to perform reconstruction (robust visual SLAM), how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from the perspective of practicality and robustness.
- Vincent J. Aidala and Sherry E. Hammel. 1983. Utilization of modified polar coordinates for bearings-only tracking. IEEE Trans. Automat. Contr. 28, 3 (1983), 283--294.Google ScholarCross Ref
- Hirotogu Akaike. 1973. Information theory and an extension of the maximum likelihood principle. In Int. Symp. Inf. Theory. 267--281.Google Scholar
- Ijaz Akhter, Sohaib Khan, Yaser Sheikh, and Takeo Kanade. 2008. Nonrigid structure from motion in trajectory space. In Adv. Neural Inf. Process. Syst., Vol. 1. 1--8. Google ScholarDigital Library
- Pablo F. Alcantarilla, José J. Yebes, Javier Almazán, and Luis M. Bergasa. 2012. On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In IEEE Int. Conf. Robot. Autom. 1290--1297.Google Scholar
- Shai Avidan and Amnon Shashua. 1999. Trajectory triangulation of lines: Reconstruction of a 3D point moving along a line from a monocular image sequence. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2. 66.Google ScholarCross Ref
- Shai Avidan and Amnon Shashua. 2000. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4 (2000), 348--357. Google ScholarDigital Library
- Mohammadreza Babaee, Duc Tung Dinh, and Gerhard Rigoll. 2017. A deep convolutional neural network for background subtraction. In arXiv:1702.01731.Google Scholar
- Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (2008), 346--359. Google ScholarDigital Library
- Paul A. Beardsley, Andrew Zisserman, and David W. Murray. 1994. Navigation using affine structure from motion. In Eur. Conf. Comput. Vis. 85--96. Google ScholarDigital Library
- Francisco Bonin-Font, Alberto Ortiz, Gabriel Oliver, Francisco Bonin-font Alberto, and Ortiz Gabriel. 2008. Visual navigation for mobile robots: A survey. J. Intell. Robot. Syst. 53 (2008), 263--296. Google ScholarDigital Library
- Jean-Yves Bouguet. 2000. Pyramidal implementation of the affine Lucas Kanade feature tracker - Description of the algorithm. Intel Corp. Microprocess. Res. Labs.Google Scholar
- Terrance E. Boult and Lisa Gottesfeld Brown. 1991. Factorization-based segmentation of motions. In IEEE Work. Vis. Motion.Google Scholar
- Christoph Bregler, Aaron Herzmann, and Henning Biermann. 2000. Recovering non-rigid 3D shape from image streams. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Michael D. Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and Luc Van Gool. 2011. Online multi-person tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. 33, 9 (2011), 1820--1833. Google ScholarDigital Library
- Arunkumar Byravan and Dieter Fox. 2017. SE3-Nets: Learning rigid body motion using deep neural networks. In IEEE Int. Conf. Robot. Autom.Google ScholarCross Ref
- Jean-pierre L. E. Cadre and Olivier Tremois. 1998. Bearings-only tracking for maneuvering sources. IEEE Trans. Aerosp. Electron. Syst. 34, 1 (1998), 179--193.Google ScholarCross Ref
- Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary robust independent elementary features. In Eur. Conf. Comput. Vis. 778--792. Google ScholarDigital Library
- Robert O. Castle, Georg Klein, and David W. Murray. 2011. Wide-area augmented reality using camera tracking and mapping in multiple regions. Comput. Vis. Image Underst. 115, 6 (2011), 854--867. Google ScholarDigital Library
- Stephen M. Chaves, Ayoung Kim, and Ryan M. Eustice. 2014. Opportunistic sampling-based planning for active visual SLAM. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.Google Scholar
- Jinhui Chen and Jian Yang. 2014. Robust subspace segmentation by low-rank representation. IEEE Trans. Cybern. 44, 8 (2014), 1432--1445.Google ScholarCross Ref
- Falak Chhaya, Dinesh Reddy, Sarthak Upadhyay, Visesh Chari, M. Zeeshan Zia, and K. Madhava Krishna. 2016. Monocular reconstruction of vehicles: Combining SLAM with shape priors. In IEEE Int. Conf. Robot. Autom. 5758--5765.Google Scholar
- Ondrej Chum and Jiri Matas. 2005. Matching with PROSAC-Progressive Sample Consensus. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 220--226. Google ScholarDigital Library
- Burcu Cinaz and Holger Kenn. 2008. HeadSLAM - Simultaneous localization and mapping with head-mounted inertial and laser range sensors. In IEEE Int. Symp. Wearable Comput. Google ScholarDigital Library
- Joao Costeira and Takeo Kanade. 1995. A multi-body factorization method for motion analysis. In Int. Conf. Comput. Vis. 1071--1076. Google ScholarDigital Library
- João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179. Google ScholarDigital Library
- Mark Cummins and Paul Newman. 2008. FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Rob. Res. 27, 6 (2008), 647--665. Google ScholarDigital Library
- Yuchao Dai, Hongdong Li, and Mingyi He. 2014. A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107, 2 (2014), 101--122. Google ScholarDigital Library
- Danping Zhou and Ping Tan. 2012. CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2 (2012), 354--366. Google ScholarDigital Library
- Andrew J. Davison. 2003. Real-time simultaneous localisation and mapping with a single camera. In IEEE Int. Conf. Comput. Vis. Google ScholarDigital Library
- Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2015. Moving object detection in real-time using stereo from a mobile platform. Unmanned Syst. 3, 4 (2015), 253--266.Google ScholarCross Ref
- Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2014. Real-time mobile object detection using stereo. In 13th Int. Conf. Control Autom. Robot. Vis. (ICARCV’14). 1021--1026.Google ScholarCross Ref
- Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2016. Deep image homography estimation. In arXiv:1606.03798.Google Scholar
- Alexey Dosovitskiy, Philipp Fischery, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2016. FlowNet: Learning optical flow with convolutional networks. In IEEE Int. Conf. Comput. Vis., Vol. 11-18-Dece. 2758--2766. Google ScholarDigital Library
- Ehsan Elhamifar and Rene Vidal. 2009. Sparse subspace clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. 2790--2797.Google ScholarCross Ref
- Ehsan Elhamifar and Rene Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781. Google ScholarDigital Library
- Jakob Engel, Thomas Sch, and Daniel Cremers. 2014. LSD-SLAM: Direct monocular SLAM. In Eur. Conf. Comput. Vis. 834--849.Google Scholar
- Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 (1981), 381--395. Google ScholarDigital Library
- Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, and Jitendra Malik. 2015. Learning to segment moving objects in videos. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 4083--4090.Google ScholarCross Ref
- Friedrich Fraundorfer and Davide Scaramuzza. 2012. Visual odometry: Part II - matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19, 2 (2012), 78--90.Google ScholarCross Ref
- Jorge Fuentes-Pacheco, Jose Ruiz-Ascencio, and Juan Manuel Rendon-Mancha. 2012. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 43, 1 (2012), 55--81. Google ScholarDigital Library
- Dorian Galvez-Lopez and Juan D. Tardos. 2012. Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 5 (2012), 1188--1197. Google ScholarDigital Library
- Xiao Shan Gao, Xiao Rong Hou, Jianliang Tang, and Hang Fei Cheng. 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930--943. Google ScholarDigital Library
- Emilio Garcia-Fidalgo and Alberto Ortiz. 2015. Vision-based topological mapping and localization methods: A survey. Rob. Auton. Syst. 64 (2015), 1--20. Google ScholarDigital Library
- C. W. Gear. 1998. Multibody grouping from motion images. Int. J. Comput. Vis. 29, 2 (1998), 133--150. Google ScholarDigital Library
- Andreas Geiger, Julius Ziegler, and Christoph Stiller. 2011. StereoScan: Dense 3D reconstruction in real-time. In IEEE Intell. Veh. Symp. 1--9.Google ScholarCross Ref
- Arturo Gil, Oscar Reinoso, Monica Ballesta, and Miguel Julia. 2010. Multi-robot visual SLAM using a Rao-Blackwellized particle filter. Rob. Auton. Syst. 58, 1 (2010), 68--80. Google ScholarDigital Library
- Georgia Gkioxari and Jitendra Malik. 2015. Finding action tubes. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Susanna Gladh, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Deep motion features for visual tracking. In Int. Conf. Pattern Recognit.Google ScholarCross Ref
- Alvina Goh and Rene Vidal. 2007. Segmenting motions of different types by unsupervised manifold clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Venu Madhav Govindu. 2001. Combining two-view constraints for motion estimation. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- H. M. Gross, H. J. Boehme, C. Schroeter, S. Mueller, A. Koenig, Ch. Martin, M. Merten, and A. Bley. 2008. Shopbot: Progress in developing an interactive mobile shopping assistant for everyday use. In IEEE Int. Conf. Syst. Man Cybern. 3471--3478.Google Scholar
- Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2015. Deep learning for visual understanding: A review. Neurocomputing 187 (2015), 27--48. Google ScholarDigital Library
- Hugh C. Longuet-Higgins. 1981. A computer algorithm for reconstructing a scene from two projections. Nature 293 (1981), 133--135.Google ScholarCross Ref
- Mei Han and Takeo Kanade. 2004. Reconstruction of a scene with multiple linearly moving objects. Int. J. Comput. Vis. 59, 3 (2004), 285--300. Google ScholarDigital Library
- Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, and Andrew Davison. 2016. gvnn: Neural network library for geometric computer vision. In arXiv:1607.07405.Google Scholar
- Chris Harris and Carl Stennett. 1990. RAPID - A video rate object tracker. In Br. Mach. Vis. Conf.Google ScholarCross Ref
- Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Alvey Vis. Conf. 147--151.Google ScholarCross Ref
- Richard Hartley and Frederik Schaffalitzky. 2003. PowerFactorization: 3D reconstruction with missing or uncertain data. In Aust. Adv. Work. Comput. Vis., Vol. 74. 1--9.Google Scholar
- Richard Hartley and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. Google ScholarDigital Library
- Richard I. Hartley and Peter Sturm. 1997. Triangulation. Comput. Vis. Image Underst. 68, 2 (1997), 146--157. Google ScholarDigital Library
- Stephan Heuel and Wolfgang Förstner. 2001. Matching, reconstructing and grouping 3D lines from multiple views using uncertain projective geometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Berthold K. P. Horn and Brian G. Schunck. 1981. Determining optical flow. Artif. Intell. 17, 1--3 (1981), 185--203. Google ScholarDigital Library
- Stefan Hrabar, Gaurav S. Sukhatme, Peter Corke, Kane Usher, and Jonathan Roberts. 2005. Combined optic-flow and stereo-based navigation of urban canyons for a UAV. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 302--309.Google ScholarCross Ref
- Thomas S. Huang and Arun N. Netravali. 1994. Motion and structure from feature correspondences: A review. Proc. IEEE 82, 2 (1994), 252--268.Google ScholarCross Ref
- Naoyuki Ichimura. 1999. Motion segmentation based on factorization method and discriminant critea. In IEEE Int. Conf. Comput. Vis.Google Scholar
- Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Eagle S. Jones and Stefano Soatto. 2011. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. Int. J. Rob. Res. 30, 4 (2011), 1--38. Google ScholarDigital Library
- Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422. Google ScholarDigital Library
- Jeremy Yirmeyahu Kaminski and Mina Teicher. 2002. General trajectory triangulation. In Eur. Conf. Comput. Vis. 823--836. Google ScholarDigital Library
- Jeremy Yirmeyahu Kaminski and Mina Teicher. 2004. A general framework for trajectory optimization. J. Math. Imaging Vis. 21 (2004), 27--41.Google ScholarDigital Library
- Kenichi Kanatani. 1996. Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier. Google ScholarDigital Library
- Kenichi Kanatani. 2001. Motion segmentation by subspace separation and model selection. In IEEE Int. Conf. Comput. Vis. 586--591.Google ScholarCross Ref
- Kenichi Kanatani and Chikara Matsunaga. 2002. Estimating the number of independent motions for multibody motion segmentation. In Asian Conf. Comput. Vis.Google Scholar
- Jens Klappstein, Tobi Vaudrey, Clemens Rabe, Andreas Wedel, and Reinhard Klette. 2009. Moving object segmentation using optical flow and depth information. In Pacific-Rim Symp. Image Video Technol. 611--623. Google ScholarDigital Library
- Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In IEEE ACM Int. Symp. Mix. Augment. Real. Google ScholarDigital Library
- Georg Klein and David Murray. 2009. Parallel tracking and mapping on a camera phone. In 8th IEEE Int. Symp. Mix. Augment. Real. 83--86. Google ScholarDigital Library
- Kishore Konda and Roland Memisevic. 2013. Unsupervised learning of depth and motion. In arXiv:1312.3429.Google Scholar
- Kishore Konda and Roland Memisevic. 2015. Learning visual odometry with a convolutional network. In Int. Conf. Comput. Vis. Theory Appl. 486--490.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarDigital Library
- Suryansh Kumar, Yuchao Dai, and Hongdong Li. 2016. Multi-body non-rigid structure-from-motion. In Int. Conf. 3D Vis. 148--156.Google ScholarCross Ref
- Rainer Kummerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. G2o: A general framework for graph optimization. In IEEE Int. Conf. Robot. Autom. 3607--3613.Google Scholar
- Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2010. Realtime motion segmentation based multibody visual SLAM. In 7th Indian Conf. Comput. Vision, Graph. Image Process. 251--258. Google ScholarDigital Library
- Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2011. Realtime multibody visual SLAM and tracking with a smoothly moving monocular camera. In IEEE Int. Conf. Comput. Vis. Google ScholarDigital Library
- Abhijit Kundu, K. Madhava Krishna, and Jayanthi Sivaswamy. 2009. Moving object detection by multi-view geometric techniques from a single camera mounted robot. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 4306--4312. Google ScholarDigital Library
- Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. 3D Vis. 239--248.Google ScholarCross Ref
- Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Y. Ng. 2011. ICA with reconstruction cost for efficient overcomplete feature learning. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarDigital Library
- Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2011. Building high-level features using large scale unsupervised learning. In Int. Conf. Mach. Learn. 38115. Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2016. Deep learning. Nature 521 (2016), 436--444.Google ScholarCross Ref
- Kuan Hui Lee, Jenq Neng Hwang, Greg Okapal, and James Pitton. 2014. Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel. In 17th IEEE Int. Conf. Intell. Transp. Syst. 2629--2635.Google Scholar
- Kuan-hui Lee, Jenq-neng Hwang, Greg Okopal, and James Pitton. 2016. Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels. IEEE Trans. Intell. Transp. Syst. 17, 12 (2016), 3602--3612. Google ScholarDigital Library
- Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In IEEE Int. Conf. Comput. Vis. 2548--2555. Google ScholarDigital Library
- Stefan Leutenegger, Paul Furgale, Vincent Rabaud, Margarita Chli, Kurt Konolige, and Roland Siegwart. 2013. Keyframe-based visual-inertial SLAM using nonlinear optimization. Int. J. Rob. Res. 34, 3 (2013), 314--334. Google ScholarDigital Library
- Ting Li, Vinutha Kallem, Dheeraj Singaraju, and Rene Vidal. 2007. Projective factorization of multiple rigid-body motions. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Hyon Lim, Jongwoo Lim, and H. Jin Kim. 2014. Real-time 6-DOF monocular visual SLAM in a large-scale environment. In IEEE Int. Conf. Robot. Autom.Google Scholar
- Kuen-Han Lin and Chieh-Chih Wang. 2010. Stereo-based simultaneous localization, mapping and moving object tracking. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.Google Scholar
- Tsung Han Lin and Chieh-Chih Wang. 2014. Deep learning of spatio-temporal features with geometric-based moving point detection for motion segmentation. In IEEE Int. Conf. Robot. Autom. 3058--3065.Google ScholarCross Ref
- Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184. Google ScholarDigital Library
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recognit. 3431--3440.Google ScholarCross Ref
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110. Google ScholarDigital Library
- Bruce D. Lucas and Takeo Kanade. 1981. An Iterative Image Registration Technique with an Application to Stereo Vision. In DARPA Image Underst. Work. 121--130.Google ScholarDigital Library
- Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE Conf. Comput. Vis. Pattern Recognit.Google ScholarCross Ref
- Christopher Mei, Gabe Sibley, Mark Cummins, Paul Newman, and Ian Reid. 2011. RSLAM: A system for large-scale mapping in constant-time using stereo. Int. J. Comput. Vis. 94, 2 (2011), 198--214. Google ScholarDigital Library
- Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. 2017. Relative camera pose estimation using convolutional neural networks. In arXiv:1702.01381.Google Scholar
- Davide Migliore, Roberto Rigamonti, Daniele Marzorati, Matteo Matteucci, and Domenico G. Sorrenti. 2009. Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. In ICRA Work. Safe Navig. Open Dyn. Environ. Appl. to Auton. Veh.Google Scholar
- Vikram Mohanty, Shubh Agrawal, Shaswat Datta, Arna Ghosh, Vishnu Dutt Sharma, and Debashish Chakravarty. 2016. DeepVO: A deep learning approach for monocular visual odometry. In arXiv:1611.06069.Google Scholar
- Toshihiko Morita and Takeo Kanade. 1993. A sequential factorization method for recovering shape and motion from image streams. Proc. Natl. Acad. Sci. 90, 21 (1993), 9795--9802.Google ScholarCross Ref
- Pierre Moulon, Pascal Monasse, and Renaud Marlet. 2013. Global fusion of relative motions for robust, accurate and scalable structure from motion. In IEEE Int. Conf. Comput. Vis. 3248--3255. Google ScholarDigital Library
- Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Monocular vision based SLAM for mobile robots. In 18th Int. Conf. Pattern Recognit. Google ScholarDigital Library
- Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Real time localization and 3D reconstruction. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1--8. Google ScholarDigital Library
- Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2007. Generic and real-time structure from motion. In Br. Mach. Vis. Conf. 64.1--64.10.Google ScholarCross Ref
- Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2009. Generic and real-time structure from motion using local bundle adjustment. Image Vis. Comput. 27, 8 (2009), 1178--1193. Google ScholarDigital Library
- Peter Muller and Andreas Savakis. 2017. Flowdometry: An optical flow and deep learning based approach to visual odometry. In IEEE Winter Conf. Appl. Comput. Vis.Google ScholarCross Ref
- Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 5 (2015), 1147--1163.Google ScholarDigital Library
- Yohei Murakami, Takeshi Endo, Yoshimichi Ito, and Noboru Babaguchi. 2012. Depth-estimation-free projective factorization and its application to 3D reconstruction. In Asian Conf. Comput. Vis. 150--162. Google ScholarDigital Library
- Richard A. Newcombe, David Molyneaux, David Kim, Andrew J. Davison, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In IEEE Int. Symp. Mix. Augment. Real. 127--136. Google ScholarDigital Library
- David Nister. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (2004), 756--770. Google ScholarDigital Library
- David Nistér, Oleg Naroditsky, and James Bergen. 2004. Visual odometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 652--659.Google ScholarCross Ref
- John Oliensis. 2000. A critique of structure-from-motion algorithms. Comput. Vis. Image Underst. 80, 2 (2000), 172--214. Google ScholarDigital Library
- D. Ortín and J. Montiel. 2001. Indoor robot motion based on monocular images. Robotica 19, 3 (2001), 331--342. Google ScholarDigital Library
- Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. SMC-9, 1 (1979), 62--66.Google ScholarCross Ref
- Kemal Egemen Ozden, Kurt Cornelis, Luc Van Eycken, and Luc Van Gool. 2004. Reconstructing 3D trajectories of independently moving objects using generic constraints. Comput. Vis. Image Underst. 96, 3 (2004), 453--471. Google ScholarDigital Library
- Kemal E. Ozden, Konrad Schindler, and Luc Van Gool. 2010. Multibody structure-from-motion in practice. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6 (2010), 1134--1141. Google ScholarDigital Library
- Marco Paladini, Alessio Del Bue, Marko Stošić, Marija Dodig, João Xavier, and Lourdes Agapito. 2009. Factorization for non-rigid and articulated structure using metric projections. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2898--2905.Google ScholarCross Ref
- Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2010. 3D reconstruction of a moving point from a series of 2D projections. In Eur. Conf. Comput. Vis. 158--171. Google ScholarDigital Library
- Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vis. 115, 2 (2015), 115--135. Google ScholarDigital Library
- Massimo Piccardi. 2004. Background subtraction techniques: A review. In EEE Int. Conf. Syst. Man Cybern., Vol. 4. 3099--3104.Google ScholarCross Ref
- Jouni Rantakokko, Joakim Rydell, Peter Strömbäck, Peter Händel, Jonas Callmer, David Törnqvist, Fredrik Gustafsson, Magnus Jobs, and Mathias Grudén. 2011. Accurate and reliable soldier and first responder indoor positioning: Multisensor systems and cooperative localization. IEEE Wirel. Commun. 18, 2 (2011), 10--18.Google ScholarCross Ref
- Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma. 2010. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1832--1845. Google ScholarDigital Library
- Jorma Rissanen. 1984. Universal coding, information, prediction, and eestimation. IEEE Trans. Inf. Theory 30, 4 (1984), 629--636. Google ScholarDigital Library
- Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In Eur. Conf. Comput. Vis., Vol. 1. 430--443. Google ScholarDigital Library
- Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In IEEE Int. Conf. Comput. Vis. 2564--2571. Google ScholarDigital Library
- Reza Sabzevari and Davide Scaramuzza. 2014. Monocular simultaneous multi-body motion segmentation and reconstruction from perspective views. In IEEE Int. Conf. Robot. Autom. 23--30.Google ScholarCross Ref
- Reza Sabzevari and Davide Scaramuzza. 2016. Multi-body motion estimation from monocular vehicle-mounted cameras. IEEE Trans. Robot. 32, 3 (2016), 638--651.Google ScholarCross Ref
- Muhamad Risqi Utama Saputra, Widyawan, and Paulus Insap Santosa. 2014. Obstacle avoidance for visually impaired using auto-adaptive thresholding on Kinect’s depth image. In 11th IEEE Int. Conf. Ubiquitous Intell. Comput. 337--342. Google ScholarDigital Library
- Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 1999 (2003), 119--155. Google ScholarDigital Library
- Davide Scaramuzza. 2011. 1-point-RANSAC structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints. Int. J. Comput. Vis. 95, 1 (2011), 74--85. Google ScholarDigital Library
- Davide Scaramuzza, Friedrich Fraundorfer, and Roland Siegwart. 2009. Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC. In IEEE Int. Conf. Robot. Autom. 4293--4299. Google ScholarDigital Library
- Konrad Schindler and David Suter. 2005. Two-view multibody structure-and-motion with outliers. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Google ScholarDigital Library
- Konrad Schindler and David Suter. 2006. Two-view multibody structure-and-motion with outliers through model selection. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6 (2006), 983--995. Google ScholarDigital Library
- Konrad Schindler, David Suter, and Hanzi Wang. 2008. A model-selection framework for multibody structure-and-motion of image sequences. Int. J. Comput. Vis. 79, 2 (2008), 159--177. Google ScholarDigital Library
- Konrad Schindler, James U., and Hanzi Wang. 2006. Perspective n-view multibody structure-and-motion through model selection. In Eur. Conf. Comput. Vis., Vol. 1. 606--619. Google ScholarDigital Library
- Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In IEEE Conf. Comput. Vis. Pattern Recognit. 4104--4113.Google ScholarCross Ref
- Gideon Schwarz. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.Google ScholarCross Ref
- Amnon Shashua, Shai Avidan, and Michael Werman. 1999. Trajectory triangulation over conic sections. In IEEE Int. Conf. Comput. Vis.Google ScholarCross Ref
- Gabe Sibley, Christopher Mei, Ian Reid, and Paul Newman. 2010. Vast-scale outdoor navigation using adaptive relative bundle adjustment. Int. J. Rob. Res. 29, 8 (2010), 958--980. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Adv. Neural Inf. Process. Syst. 1--9. Google ScholarDigital Library
- Noah Snavely, Steven Seitz, and Richard Szeliski. 2006. PhotoTourism: Exploring photo collections in 3D. In SIGGRAPH Conf. Proc. 835--846. Google ScholarDigital Library
- Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from internet photo collections. Int. J. Comput. Vis. 80, 2 (2008), 189--210. Google ScholarDigital Library
- Joan Solà. 2007. Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: A Geometric and Probabilistic Approach. Ph.D. Dissertation. Institut National Politechnique de Toulouse.Google Scholar
- Hauke Strasdat, J. M. M. Montiel, and Andrew J. Davison. 2012. Visual SLAM: Why filter? Image Vis. Comput. 30, 2 (2012), 65--77. Google ScholarDigital Library
- Peter Sturm and Bill Triggs. 1996. A factorization based algorithm for multi-image projective structure and motion. In Eur. Conf. Comput. Vis., Vol. 1065. 710--720. Google ScholarDigital Library
- Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. 2013. Robust monocular SLAM in dynamic environments. In IEEE Int. Symp. Mix. Augment. Real.Google Scholar
- Ninad Thakoor, Jean Gao, and Venkat Devarajan. 2010. Multibody structure-and-motion segmentation by branch-and-bound model selection. IEEE Trans. Image Process. 19, 6 (2010), 1393--1402. Google ScholarDigital Library
- Carlo Tomasi and Takeo Kanade. 1992. Shape and motion from image streams under orthography: A factorization method. In Int. J. Comput. Vis., Vol. 9. 137--154. Google ScholarDigital Library
- Philip H. S. Torr. 1998. Geometric motion segmentation and model selection. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 356, 1740 (1998), 1321--1340.Google ScholarCross Ref
- Philip H. S. Torr and Andrew Zisserman. 1997. Robust parameterization and computation of the trifocal tensor. Image Vis. Comput. 15, 8 (1997), 591--605.Google ScholarCross Ref
- Philip H. S. Torr and Andrew Zisserman. 1999. Feature based methods for structure and motion estimation. In Int. Work. Vis. Algorithms. Google ScholarDigital Library
- Philip H. S. Torr and Andrew Zisserman. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 1 (2000), 138--156. Google ScholarDigital Library
- Roberto Tron and Rene Vidal. 2007. A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conf. Comput. Vis. Pattern Recognit. 1--8.Google ScholarCross Ref
- Sepehr Valipour, Mennatullah Siam, Martin Jagersand, and Nilanjan Ray. 2017. Recurrent fully convolutional networks for video segmentation. In IEEE Winter Conf. Appl. Comput. Vis. 1--12.Google ScholarCross Ref
- René Vidal. 2006. Online clustering of moving hyperplanes. In Adv. Neural Inf. Process. Syst. 1433--1440. Google ScholarDigital Library
- Rene Vidal. 2011. Subspace clustering. IEEE Signal Process. Mag. 28, 2 (2011), 52--68.Google ScholarCross Ref
- René Vidal and Richard Hartley. 2008. Three-view multibody structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2 (2008), 214--227. Google ScholarDigital Library
- René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 40, 12 (2005), 1945--1959. Google ScholarDigital Library
- René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1945--1959. Google ScholarDigital Library
- René Vidal, Yi Ma, Stefano Soatto, and Shankar Sastry. 2006. Two-view multibody structure from motion. Int. J. Comput. Vis. 68, 1 (2006), 7--25. Google ScholarDigital Library
- René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2002. Segmentation of dynamic scenes from the multibody fundamental matrix. In ECCV Work. Vis. Model. Dyn. Scenes.Google Scholar
- Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. SfM-Net: Learning of structure and motion from video. In arXiv:1704.07804.Google Scholar
- Chieh-Chih Wang and Chuck Thorpe. 2002. Simultaneous localization and mapping with detection and tracking of moving objects. In IEEE Int. Conf. Robot. Autom., Vol. 3. 2918--2924.Google Scholar
- Chieh-Chih Wang, Charles Thorpe, Sebastian Thrun, M. Hebert, and H. Durrant-Whyte. 2007. Simultaneous localization, mapping and moving object tracking. Int. J. Rob. Res. 26, 9 (2007), 889--916. Google ScholarDigital Library
- Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2017. DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In IEEE Int. Conf. Robot. Autom.Google ScholarCross Ref
- Yin Tien Wang, Ming Chun Lin, and Rung Chi Ju. 2010. Visual SLAM and moving-object detection for a small-size humanoid robot. Int. J. Adv. Robot. Syst. 7, 2 (2010), 133--138.Google ScholarCross Ref
- Somkiat Wangsiripitak and David W. Murray. 2009. Avoiding moving outliers in visual SLAM by tracking moving objects. In IEEE Int. Conf. Robot. Autom. Google ScholarDigital Library
- Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Int. Conf. 3D Vis. 127--134. Google ScholarDigital Library
- Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 3057--3064. Google ScholarDigital Library
- Jing Xiao, Jin-xiang Chai, and Takeo Kanade. 2004. A closed-form solution to non-rigid shape and motion recovery. In Eur. Conf. Comput. Vis. 573--587.Google ScholarCross Ref
- Jingyu Yan and Marc Pollefeys. 2006. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Eur. Conf. Comput. Vis. Google ScholarDigital Library
- Jingyu Yan and Marc Pollefeys. 2008. A factorization-based approach for articulated nonrigid shape, motion, and kinematic chain recovery from video. IEEE Trans. Pattern Anal. Mach. Intell. 30, 5 (2008), 865--877. Google ScholarDigital Library
- Congyuan Yang, Daniel Robinson, and Rene Vidal. 2015. Sparse subspace clustering with missing entries. In Int. Conf. Mach. Learn. 2463--2472. Google ScholarDigital Library
- Georges Younes, Daniel Asmar, and Elie Shammas. 2016. A survey on non-filter-based monocular visual SLAM systems. In arXiv:1607.00470.Google Scholar
- Khalid Yousif, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. 2015. An overview to visual odometry and visual SLAM: Applications to mobile robotics. Intell. Ind. Syst. 1, 4 (2015), 289--311.Google ScholarCross Ref
- Luca Zappella, Alessio Del Bue, Xavier Lladó, and Joaquim Salvi. 2013. Joint estimation of segmentation and structure from motion. Comput. Vis. Image Underst. 117, 2 (2013), 113--129. Google ScholarDigital Library
- Hendrik Zender, Patric Jensfelt, and Geert Jan M. Kruijff. 2007. Human- and situation-aware people following. In IEEE Int. Work. Robot Hum. Interact. Commun. 1131--1136.Google Scholar
- Dong Zhang and Ping Li. 2012. Visual odometry in dynamical scenes. Sensors Transducers J. 147, 12 (2012), 78--86.Google Scholar
- Teng Zhang, Arthur Szlam, and Gilad Lerman. 2009. Median K-flats for hybrid linear modeling with many outliers. In Int. Conf. Comput. Vis. Work. 234--241.Google ScholarCross Ref
- Enliang Zheng, Ke Wang, Enrique Dunn, and Jan Michael Frahm. 2014. Joint object class sequencing and trajectory triangulation (JOST). In Eur. Conf. Comput. Vis. 599--614.Google ScholarCross Ref
- Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.Google Scholar
Index Terms
- Visual SLAM and Structure from Motion in Dynamic Environments: A Survey
Recommendations
A review of monocular visual odometry
AbstractMonocular visual odometry provides more robust functions on navigation and obstacle avoidance for mobile robots than other visual odometries, such as binocular visual odometry, RGB-D visual odometry and basic odometry. This paper describes the ...
Human-centered X - Y - T space path planning for mobile robot in dynamic environments
An autonomous mobile robot in a human's living space should be able to realize not only collision-free motion, but also human-centered motion, i.e., motion giving priority to a moving human according to the situation. In this study, we propose a real-...
A survey on Navigation Systems in Dynamic Environments
ICIST '20: Proceedings of the 10th International Conference on Information Systems and TechnologiesMobile robot navigation is a method of guiding a robot to accomplish a mission through an environment with obstacles in a good and safe manner. The main challenge of current mobile robotics is to develop intelligent navigation systems, where autonomous ...
Comments