Motion removal for reliable RGB-D SLAM in dynamic environments

doi:10.1016/j.robot.2018.07.002

Robotics and Autonomous Systems

Volume 108, October 2018, Pages 115-128

https://doi.org/10.1016/j.robot.2018.07.002 Get rights and content

Highlights

•
An on-line RGB-D data-based motion removal approach is proposed.
•
The approach does not require prior knowledge from moving objects.
•
The approach improves RGB-D SLAM in various dynamic scenarios.

Abstract

RGB-D data-based Simultaneous Localization and Mapping (RGB-D SLAM) aims to concurrently estimate robot poses and reconstruct traversed environments using RGB-D sensors. Many effective and impressive RGB-D SLAM algorithms have been proposed over the past years. However, virtually all the RGB-D SLAM systems developed so far rely on the static-world assumption. This is because the SLAM performance is prone to be degraded by the moving objects in dynamic environments. In this paper, we propose a novel RGB-D data-based motion removal approach to address this problem. The approach is on-line and does not require prior-known moving-object information, such as semantics or visual appearances. We integrate the approach into the front end of an RGB-D SLAM system. It acts as a pre-processing stage to filter out data that are associated with moving objects. Experimental results demonstrate that our approach is able to improve RGB-D SLAM in various challenging scenarios.

Introduction

Simultaneous Localization and Mapping (SLAM) is a fundamental step for many robotic applications. It concurrently estimates robot poses and reconstructs traversed environment models. Many effective SLAM algorithms using visual sensors, such as monocular cameras [1], stereo cameras [2] and RGB-D cameras [3], have been proposed over the past years. Related technologies, such as the augmented reality [4] and autonomous driving [5], have benefited from the development of the SLAM technology. It is worth noting that the advent of the RGB-D cameras has changed the computer vision world [6]. They provide colored point clouds with real-scale distance information, which greatly benefits the dense 3-D environment reconstruction. Many impressive RGB-D SLAM systems have been developed [[7], [8], [9], [10], [11], [12]] in recent years, and most of them adopt the graph optimization framework. We refer readers to this survey [13] to get an overview of the progress on the graph SLAM algorithm.

However, virtually all the current RGB-D SLAM algorithms are proposed under the static-world assumption. It requires that there is no moving object existing in the environment during the traversal of robots. The data associations in the SLAM front end can be hindered by moving objects. With incorrect data associations fed into the SLAM back end, the graph optimization process could be severely jeopardized, which finally leads to a catastrophic failure for the localization and mapping processes. Thus, the technology of RGB-D SLAM is still vulnerable in dynamic environments.

The data associations in the SLAM front end consists of two components, namely, the short-term data association and the long-term data association [13]. The short-term data association determines adjacent pose estimations, while the long-term one has an impact on the loop detection. Take as an example the sparse feature-based RGB-D SLAM. Standard robust estimators, such as the RANdom SAmple Consensus (RANSAC) algorithm [14], are usually employed in the SLAM front end to reject outlier feature associations. However, it is hard to reliably reject outliers when moving objects are not trivial in the camera field of view. In such a case, the outliers are unavoidably used for computing the robot poses, which makes the pose estimation erroneous. When a robot returns to a previously visited place where moving objects have gone away, the loop detection would be confused by matching the same scene but with different visual appearances. If we eliminate moving objects at the first exploration of a place, we can compare the image frames using just the static feature points, which would lead to a much more reliable loop detection result. Moreover, we can merely use the static feature points to get accurate robot pose estimations. Therefore, eliminating moving objects is able to reduce the incorrect data associations, which is critical to improve the SLAM performance.

In this paper, we develop a novel RGB-D data-based motion removal approach to address the problem of RGB-D SLAM in dynamic environments. We refer to the dense pixel-wisely moving-object segmentation as motion removal. Our approach serves as a pre-processing stage to filter out data that are associated with moving objects. With our approach, incorrect data associations can be greatly reduced in the SLAM front end.

Fig. 1 qualitatively compares the resulting point-cloud maps produced by the ORB-SLAM system [11] and the system integrated with our motion removal approach in a dynamic environment. In this scenario, two persons are playing with a basketball in an office room, where the moving objects are the two persons and the basketball. We perform the SLAM algorithm in this environment using a hand-held Asus Xtion RGB-D camera, which is moved in a circle-like trajectory in the scene. Comparing Fig. 1 a with b, we could see that the quality of the resulting map is substantially degraded. Almost no object or scene structure is correctly aligned. This is mainly caused by the jeopardized camera pose estimations due to the incorrect data associations. Moreover, moving objects are recorded as spurious objects in the resulting map, which makes the map virtually useless for future applications, such as the map-based navigation. The point-cloud map built with our motion removal approach is displayed in Fig. 1 c and d. The models for the desks, monitors, wall, floor and scene structure are correctly built. Virtually no point from the moving objects is recorded in the map. The holes caused by motion removal in the point-cloud map are complemented by the frame fusion with redundant scans. Fig. 1 clearly illustrates the negative effect caused by moving objects in dynamic environments. It should be noted that the tested environment shown in Fig. 1 is a small-scale environment. We believe that using such an example would be sufficient to illustrate the negative impact caused by moving objects. This is because large-scale environments are generally more challenging for SLAM algorithms. By performing the experiments in such a small-scale environment, the ORB-SLAM system has already given the unsatisfied performance. The performance in large-scale dynamic environments would not be better than this. The main contributions of this paper are summarized as follows:

1.
We propose a novel RGB-D data-based on-line motion removal approach. A foreground model is built and updated incrementally. No prior information of moving objects, such as semantics or visual appearances, is needed.
2.
We explain why we use motion removal to address the problem of RGB-D SLAM in dynamic environments. The experimental results confirm that the robustness of RGB-D SLAM can be increased with motion removal.
3.
We integrate our motion removal approach with an RGB-D SLAM system. Evaluations and method comparisons are performed with the widely used TUM RGB-D benchmark dataset [15].

The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 formulates the problem of RGB-D SLAM in dynamic environments and explains why we use motion removal to address this problem. Sections 4 The approach overview, 5 The proposed approach give the overview and the details of our approach, respectively. The experimental results are presented in Section 6. We conclude this paper and discuss the future work in the last section.

Section snippets

Related work

For Visual SLAM in dynamic environments, the mainstream solution is to identify the features or pixels that are associated with moving objects. This process is referred to as motion segmentation and many approaches have been proposed. We generally divide the motion segmentation approaches into two categories: sparse methods and dense methods. The sparse methods identify feature points from moving objects, while the dense ones segment moving objects pixel-wisely.

Problem statement

This section describes the problem of RGB-D SLAM in dynamic environments and explains why we use motion removal to address this problem. The SLAM problem can be naturally described in a graph structure [32], where the vertexes encode robot poses or landmark positions, and the edges represent constraints between vertexes. The constraints are pose transformations estimated from odometry, landmark reprojections, etc. The SLAM problem is to find the vertexes with the measured constraints. The graph

The approach overview

The idea of our motion removal approach is straightforward. Fig. 3 shows the flowchart of our approach. It consists of two on-line parallel processes: the Learning process that builds and updates the foreground model; the Inference process that pixel-wisely segments the foreground with the built model.

In the Learning process, we firstly employ a dense optical flow algorithm to find the 2-D pixel matchings between the two consecutive RGB images, which we refer to the last image and the current

The proposed approach

This section presents the details of the Learning and Inference processes of our approach.

Experiment setup

We performed the experiments using the public TUM RGB-D dataset. In the TUM dataset, the Dynamic Objects sequences are designed to evaluate SLAM algorithms in dynamic environments. Among the sequences, the desk_with_person and the sitting sequences depict low-dynamic scenarios, while the walking sequences depict high-dynamic scenarios [31]. Note that there is no moving object in the preceding section of the desk_with_person sequences. This part could be considered as recorded in a static

Conclusions

We proposed here a novel RGB-D data-based motion removal approach to address the problem of RGB-D SLAM in dynamic environments. Our approach requires no prior-known moving-object information, such as semantics or visual appearance. It is an on-line method and relies solely on the information obtained until the current frame. No future information or batch data processing is required. In addition, the on-line learning capability allows our approach accumulating the foreground information

Acknowledgments

Research presented in this paper was partially supported by the Hong Kong RGC GRF grant #14205914 and #14200618, ITC ITF grant #ITS/236/15, and Shenzhen Science and Technology Innovation project JCYJ20170413161616163 awarded to Max Q.-H. Meng, and partially supported by the Research Grant Council of Hong Kong SAR Government, China, under Project No. 11210017 and No. 16212815 and No. 21202816, the National Natural Science Foundation of China (Grant No. U1713211) awarded to

Yuxiang Sun received his Ph.D. degree from The Chinese University of Hong Kong (CUHK), Hong Kong, China, in 2017, the master’s degree from University of Science and Technology of China (USTC), Hefei, China, in 2012, and the bachelor’s degree from Hefei University of Technology (HFUT), Hefei, China, in 2009. He is now a research associate at the Robotics Institute, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology (HKUST), Hong Kong, China. His

References (53)

YounesG. et al.
Keyframe-based monocular SLAM: design, survey, and future directions
Robot. Auton. Syst.
(2017)
SunY. et al.
Improving RGB-D SLAM in dynamic environments: A motion removal approach
Robotics Auton. Syst.
(2017)
KimK. et al.
Real-time foreground–background segmentation using codebook model
Real-Time Imaging
(2005)
ZhangZ.
Parameter estimation techniques: A tutorial with application to conic fitting
Image Vis. Comput.
(1997)
R. Gomez-Ojeda, F.A. Moreno, D. Scaramuzza, J. Gonzalez-Jimenez, PL-SLAM: a stereo SLAM system through the combination...
WhelanT. et al.
Real-time large-scale dense RGB-D SLAM with volumetric fusion
Int. J. Rob. Res.
(2015)
MarchandE. et al.
Pose estimation for augmented reality: a hands-on survey
IEEE Trans. Vis. Comput. Graphics
(2016)
SongS. et al.
High accuracy monocular sfm and scale correction for autonomous driving
IEEE Trans. Pattern Anal. Mach. Intell.
(2016)
ShaoL. et al.
Computer vision for RGB-D sensors: Kinect and its applications [special issue intro.]
IEEE Trans. Cybern.
(2013)
KerlC. et al.
Dense visual SLAM for RGB-D cameras

EndresF. et al.

3-D mapping with an RGB-D camera

IEEE Trans. Robot.

(2014)

KerlC. et al.

Dense continuous-time tracking and mapping with rolling shutter RGB-D cameras

LabbéM. et al.

Online global loop closure detection for large-scale multi-session graph-based slam

Mur-ArtalR. et al.

Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras

IEEE Trans. Robot.

(2017)

...

CadenaC. et al.

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

IEEE Trans. Robot.

(2016)

FischlerM.A. et al.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

(1981)

SturmJ. et al.

A benchmark for the evaluation of RGB-D SLAM systems

LinK.-H. et al.

Stereo-based simultaneous localization, mapping and moving object tracking

WangC.-C. et al.

Simultaneous localization, mapping and moving object tracking

Int. J. Robot. Res.

(2007)

OzdenK.E. et al.

Multibody structure-from-motion in practice

IEEE Trans. Pattern Anal. Mach. Intell.

(2010)

ZouD. et al.

Coslam: Collaborative visual slam in dynamic environments

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

TanW. et al.

Robust monocular SLAM in dynamic environments

WangY. et al.

Motion segmentation based robust RGB-D SLAM

WangY. et al.

Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios

OchsP. et al.

Segmentation of moving objects by long term video analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2014)

Cited by (0)

Ming Liu received the BA degree in Automation at Tongji University in 2005. During his master study at Tongji University, he stayed one year in Erlangen-Nünberg University and Fraunhofer Institute IISB, Germany, as a master visiting scholar. He graduated as a Ph.D. student from the Department of Mechanical and Process Engineering of ETH Zürich in 2013, supervised by Prof Roland Siegwart. He is now affiliated with ECE department, CSE department and Robotics Institute of Hong Kong University of Science and Technology. He is a founding member of Shanghai SWing Automation Ltd. Co. He is also coordinating and involved in NSF projects, and National 863-Hi-Tech-Plan projects in China. As a team member, he won the second place of EMAV’09 (European Micro Aerial Vehicle Competition) and two awards from IARC’14 (International Aerial Robot Competition). He won the Best Student Paper Award as the first author for MFI 2012 (IEEE International Conference on Multisensor Fusion and Information Integration), the Best Paper Award in Information for ICIA 2013 (IEEE International Conference on Information and Automation) as the first author and the Best Paper Award Finalists as a co-author, the Best RoboCup Paper Award for IROS 2013 (IEEE/RSJ International Conference on Intelligent Robots and Systems), the Best Conference Paper Award for IEEE-CYBER 2015, the Best Student Paper Finalist for RCAR 2015 (IEEE International conference on Real-time Computing and Robotics), the Best Student Paper Finalist for ROBIO 2015, the Best Student Paper Award for IEEE-ICAR 2017 and the Best Paper in Automation Award for IEEE-ICIA 2017. He won twice the innovation contest Chunhui Cup Winning Award in 2012 and 2013. He won the Wu Weijun AI award in 2016. He was the Program Chair of IEEE-RCAR 2016; the Program Chair of International Robotics Conference in Foshan 2017; He is the Conference Chair of ICVS 2017. Ming Liu’s research interests include dynamic environment modeling, deep-learning for robotics, 3D mapping, machine learning and visual control.

Max Q.-H. Meng received his Ph.D. degree in Electrical and Computer Engineering from the University of Victoria, Canada, in 1992. He joined the Chinese University of Hong Kong in 2001 and is currently Professor and Chairman of Department of Electronic Engineering. He was with the Department of Electrical and Computer Engineering at the University of Alberta in Canada, serving as the Director of the Advanced Robotics and Teleoperation Lab and holding the positions of Assistant Professor (1994), Associate Professor (1998), and Professor (2000), respectively. He is affiliated with the State Key Laboratory of Robotics and Systems at Harbin Institute of Technology and the Honorary Dean of the School of Control Science and Engineering at Shandong University, in China. His research interests include robotics, medical robotics and devices, perception, and scenario intelligence. He has published some 600 journal and conference papers and led more than 50 funded research projects to completion as PI. He has served as an editor of several journals and General and Program Chair of many conferences including General Chair of IROS 2005 and General Chair of ICRA 2021 to be held in Xi’an, China. He is an elected member of the Administrative Committee (AdCom) of the IEEE Robotics and Automation Society. He is a recipient of the IEEE Millennium Medal, a Fellow of the Canadian Academy of Engineering, a Fellow of HKIE, and a Fellow of IEEE.

View full text

Motion removal for reliable RGB-D SLAM in dynamic environments

Highlights

Abstract

Introduction

Section snippets

Related work

Problem statement

The approach overview

The proposed approach

Experiment setup

Conclusions

Acknowledgments

Robot. Auton. Syst.

Robotics Auton. Syst.

Real-Time Imaging

Image Vis. Comput.

Real-time large-scale dense RGB-D SLAM with volumetric fusion

Int. J. Rob. Res.

Pose estimation for augmented reality: a hands-on survey

IEEE Trans. Vis. Comput. Graphics

High accuracy monocular sfm and scale correction for autonomous driving

IEEE Trans. Pattern Anal. Mach. Intell.

Computer vision for RGB-D sensors: Kinect and its applications [special issue intro.]

IEEE Trans. Cybern.

Dense visual SLAM for RGB-D cameras

3-D mapping with an RGB-D camera

IEEE Trans. Robot.

Dense continuous-time tracking and mapping with rolling shutter RGB-D cameras

Online global loop closure detection for large-scale multi-session graph-based slam

Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras

IEEE Trans. Robot.

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

IEEE Trans. Robot.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

A benchmark for the evaluation of RGB-D SLAM systems

Stereo-based simultaneous localization, mapping and moving object tracking

Simultaneous localization, mapping and moving object tracking

Int. J. Robot. Res.

Multibody structure-from-motion in practice

IEEE Trans. Pattern Anal. Mach. Intell.

Coslam: Collaborative visual slam in dynamic environments

IEEE Trans. Pattern Anal. Mach. Intell.

Robust monocular SLAM in dynamic environments

Motion segmentation based robust RGB-D SLAM

Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios

Segmentation of moving objects by long term video analysis

IEEE Trans. Pattern Anal. Mach. Intell.