Self-localization in non-stationary environments using omni-directional vision

doi:10.1016/j.robot.2007.02.002

Robotics and Autonomous Systems

Volume 55, Issue 7, 31 July 2007, Pages 541-551

https://doi.org/10.1016/j.robot.2007.02.002 Get rights and content

Abstract

This paper presents an image-based approach for localization in non-static environments using local feature descriptors, and its experimental evaluation in a large, dynamic, populated environment where the time interval between the collected data sets is up to two months. By using local features together with panoramic images, robustness and invariance to large changes in the environment can be handled. Results from global place recognition with no evidence accumulation and a Monte Carlo localization method are shown. To test the approach even further, experiments were conducted with up to 90% virtual occlusion in addition to the dynamic changes in the environment.

Introduction

One of the most essential abilities needed by robots is self-localization (“where am I?”), which can be divided into geometric and topological localization. Geometric localization tries to estimate the position of the robot as accurately as possible, e.g., by calculating a pose estimate $(x, y, θ)$ , while topological localization gives a more abstract position estimate, e.g., “I’m in the coffee room”. There has been much research on using accumulated sensory evidence to improve localization performance, including a highly successful class of algorithms that estimate posterior probability distributions over the space of possible locations [1], [2], [3], [4], [5], [6]. This approach enables both position tracking and relocalization from scratch, for example, when the robot is started (no prior knowledge of its position) or becomes lost or “kidnapped” (incorrect prior knowledge).

In recent years, panoramic or omni-directional cameras have become popular for self-localization because of their relatively low cost and large field of view. This makes it possible to create features that are invariant to the robot’s orientation, for example, using various colour histograms [7], [8], [9] or Eigenspace models [10], [11], [12].

Other innovations include increased robustness to lighting variations [11] and multi-view registration to deal with occlusions [13]. Recent work has combined panoramic vision with particle filters for global localization, including feature matching using Fourier transform [14], PCA [15] and colour histograms [16].

Most previous work on robot mapping and localization assumes a static world, i.e., that there are no changes to the environment between the time of mapping and the time of using the map. However, this assumption does not hold for typical populated indoor environments. Humans (and other robots) are not merely “dynamic obstacles” that may occlude the robot’s sensors — they also make changes to the world. For example, they may leave temporary objects such as packages, or move the furniture. In addition to these sudden changes, there may be gradual changes such as plants growing, coloured paint fading, etc.

Our approach to self-localization in non-static environments uses an image matching scheme that is robust to many of the changes that occur under natural conditions. Our hypothesis is that a map that is out of date can still contain much useful information. Thus the important question is how to extract features that can be used for matching new sensor data to a map that is only partially correct. We present an appearance-based approach to matching panoramic images that does not require calibration or geometric modelling of the scene or imaging system, thus it should be applicable to any mobile robot using omni-directional vision for self-localization. The hypothesis is validated through experiments using sensor data collected by a mobile robot in a real dynamic environment over a period of two months.

The image matching algorithm uses local features extracted from many small subregions of the image rather than global features extracted from the whole image, which makes the method very robust to variations and occlusions. For example, Fig. 2 shows some of the local features that were matched between two different panoramic images of a laboratory environment, recorded 56 days apart, despite changes such as a television appearing, chairs moving and people working. The approach is similar to other approaches using local features for self-localization [17], [18], [19], with the differences that our method is adapted for panoramic images and was specifically designed and tested to work in long-term experiments conducted in a real dynamic environment.

We use a version of Lowe’s SIFT algorithm [20], which is modified so that stored panoramic images are only recognised from a local area around the corresponding location in the world (Section 2.3). A novel scheme is introduced for combining local feature matching with a particle filter for global localization (Section 2.4), which minimizes computational costs as the filter converges. See also Fig. 1 for a brief overview of the method. To evaluate the method, we also compare its performance with that of several other types of features, including both global and local features (Section 3). We also show how image matching performance can be further improved by incorporating information about the relative orientation of corresponding features between images (Section 3.3). Our experiments were designed to test the system under a wide variety of conditions, including results in a large populated indoor environment (up to 5 persons visible) on different days under different lighting conditions (Section 4).

The results demonstrate that the robot is able to localize itself from scratch, including experiments in “kidnapping”, and that the performance shows a graceful degradation to occlusions (up to 90% of the robot’s field of view).

Section snippets

Basic methods

This section describes the methods used for extracting and matching the features, see also Fig. 1. To be able to match the current image with the images stored in the database, each image is converted into a set of features. The matching is done by comparing the features in the database with the features created from the current image.

Our localization methods assume that the database (map) is already created. The database is constructed from another set of images collected by the same robot.

Other image matching methods compared

Together with MSIFT, three other methods for image matching were evaluated, including one global feature and two local features. Matching of local descriptors was done as described in Section 2.3, where the match score is the total number of matched features between two images.

Results

The results are divided into two parts, one considering the location and orientation recognition performance with no prior knowledge (Section 4.2) and the other evaluates the full Monte Carlo localization scheme (Section 4.3).

Conclusion

A self-localization algorithm for mobile robots using panoramic images was presented. The main contributions are the integration of existing state-of-the-art algorithms for creating local descriptors [20] and probabilistic state estimation [25] with an omni-directional imaging system on a mobile robot, and the experimental evaluation of the entire system in a real dynamic environment over an extended period of time. By using experiments with data collected on different days over a period of

Henrik Andreasson is a Ph.D. student at Centre for Applied Autonomous Sensor System, Örebro University, Sweden. He received his Master degree in Mechatronics from Royal Institute of Technology, Sweden, in 2001. His research interests include mobile robotics, computer vision, and machine learning.

References (28)

T. Duckett et al.
Mobile robot self-localisation and measurement of performance in middle scale environments
Robotics and Autonomous Systems
(1998)
D. Fox et al.
Active Markov localization for mobile robots
Robotics and Autonomous Systems
(1998)
B. Kröse et al.
A probabilistic model for appearance-based robot localization
Image and Vision Computing
(2001)
E.P.E. Menegatti et al.
Image-based Monte-Carlo localisation with omnidirectional images
Robotics and Autonomous Systems
(2004)
I. Nourbakhsh et al.
Dervish: An office-navigation robot
AI Magazine
(1995)
A. Cassandra et al.
Acting under uncertainty: Discrete Bayesian models for mobile-robot navigation
R. Simmons, S. Koenig, Probabilistic robot navigation in partially observable environments, in: Proceedings of the...
F. Dellaert et al.
Monte Carlo localization for mobile robots
I. Ulrich et al.
Appearance-based place recognition for topological localization
P. Blaer et al.
Topological mobile robot localization using fast vision techniques

J. Gonzalez-Barbosa et al.

Rover localization in natural environments by indexing panoramic images

N. Winters et al.

Omni-directional vision for robot navigation

M. Artac et al.

Mobile robot localization using an incremental Eigenspace model

L. Paletta et al.

Robust localization using context in omnidirectional imaging

Cited by (33)

Dense topological maps and partial pose estimation for visual control of an autonomous cleaning robot
2013, Robotics and Autonomous Systems
Citation Excerpt :
In contrast to sparse model-based maps (Section 2.1.1), the spatial positions of the features in the world are not known. Topo-metric maps and feature-based landmarks are used for Monte-Carlo localization [19,20,23,53–59] and for trajectory-based SLAM [56,60–64]. As trajectory-based SLAM methods correct the map by fusing odometry and bearing information, these methods are closely related to the described trajectory controller.
We present a mostly vision-based controller for mapping and completely covering a rectangular area by meandering cleaning lanes. The robot is guided along a parallel course by controlling the current distance to its previous lane. In order to frequently compute and–if necessary–correct the robot’s distance to the previous lane, a dense topological map of the robot’s workspace is built. The map stores snapshots, i.e. panoramic images, taken at regular distances while moving along a cleaning lane. For estimating the distance, we combine bearing information obtained by local visual homing with distance information derived from the robot’s odometry. In contrast to traditional mapping applications, we do not compute the robot’s full pose w.r.t. an external reference frame. We rather rely on partial pose estimation and only compute the sufficient and necessary information to solve the task. For our specific application this includes estimates of (i) the robot’s distance to the previous lane and of (ii) the robot’s orientation w.r.t. world coordinates. The results show that the proposed method achieves good results with only a small portion of overlap or gaps between the lanes. The dense topological representation of space and the proposed controller will be used as building blocks for more complex cleaning strategies making the robot capable of covering complex-shaped workspaces such as rooms or apartments.
Human detection for a robot tractor using omni-directional stereo vision
2012, Computers and Electronics in Agriculture
Citation Excerpt :
For developing an omni-directional safety protection system, an omni-directional camera is a preferable choice because it can provide a 360° horizontal field of view. In the previous studies, omni-directional stereo vision (OSV) was developed by using a rotary stereo camera (Lin et al., 2008) or two omni-directional catadioptric cameras (Lima et al., 2001; Andreasson et al., 2007). The fact that a rotary stereo camera based on OSV cannot provide a real-time panoramic stereo image is a major problem, while a catadioptric camera has low image resolution and it is suitable for an in indoor environment.
It is critical to detect and identify obstacles for safe operation of robot tractors. This study focused on human detection using an omni-directional stereo vision (OSV). The Lucas–Kanade optical flow detection method was used to detect human in a panoramic image. A 3D panoramic image that was reconstructed from stereo rectified images using the sum of squared differences (SSDs) method was used to locate the position of a human. To evaluate the performance of the developed human detection method, two RTK-GPSs were used to investigate the accuracy of the detection method under stationary and motion conditions of a robot tractor. The results of field experiments indicated that a human could be detected successfully under both given conditions in the daytime. The RMS error of measured distance was less than half a meter compared with the reference distance measured by the RTK-GPSs.
A co-processor design to accelerate sequential monocular SLAM EKF process
2012, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
To increase SLAM estimation performance, many researchers simplify the prior data before passing them to the estimation component. For example Andreasson et al. [20] simplifies SIFT algorithm to minimize data inputs to the estimation process. Alternatively, Civera et al. [6] introduce linearity index approach to shift from more complex 6-D state vector to simpler Euclidean XYZ representation before the data is feed to the estimation component.
Extended Kalman Filter (EKF) is a non-linear state estimation technique which is used to produce values that close to the true value when given with measurement containing noise and other inaccuracies. In Monocular Simultaneous Localization and Mapping (SLAM), EKF is used to estimate position and motion information. In this paper, Monocular SLAM software implementation on a general purpose computer is studied to find the most time consuming part of the estimation program. The analysis concentrates on the Monocular SLAM EKF estimation process which involves prediction, measurement prediction, matching and update. For this purpose, a form of dynamic programming analysis tool called software profiling is utilized to determine which section of the estimation program demands the highest processing time. Based on the analysis, it is found that EKF “matching” process contribute to the highest computation time. The reason behind the time-consuming process is because for every predicted feature in the matching stage, the acceptance region and their cross correlation have to be calculated. In a typical general purpose computer software implementation, the processing is limited to sequences of operations (i.e. sequential processing). Such implementation will delay the next process until the prior process completed. However, further analysis conducted in this paper shows that each feature does not depend on the prior process and can be processed individually. This would allow several features to be processed simultaneously to improve the execution speed. Therefore, an FPGA pipelined and parallel processing architecture is proposed.
Detection of SIFT keypoints in spherical omnidirectional view sensor
2012, Procedia Engineering
Thispaperproposed a methodtodetectobject/scenethroughScaleInvariantFeatureTransform(SIFT) keypointsforahybrid sensor systemcomprises of aperspective viewsensor and aspherical omnidirectional viewsensor.A referenceimageisobtained from the perspective view sensor and matching is attempted with another distorted image acquired from the omnidirectional camera.Theomnidirectional viewimageisﬁrst subjected todistortioncorrection(commonly termed “unwrapping”) using closed form mappingfunctions and thenSIFTkeypoints of the correctedimage areextracted and matched againstthereferenceimage's features. Experiment results show that the distortion correction produces acceptable performance of SIFT keypoint matching without modiﬁcation to the classicSIFT algorithm.
Two novel real-time local visual features for omnidirectional vision
2010, Pattern Recognition
Citation Excerpt :
The original algorithms of local visual features should be modified when they are applied to omnidirectional vision because of its special imaging character, especially in determining the feature regions. In Ref. [34], the standard SIFT is simplified, and used for robot localization. The features are detected only in one resolution of the panoramic images without considering scale invariance, and then each feature region is rotated to the same global orientation to ensure rotation invariance.
Two novel real-time local visual features, namely FAST+LBP and FAST+CSLBP, are proposed in this paper for omnidirectional vision. They combine the advantages of two computationally simple operators by using FAST as the feature detector, and LBP and CS-LBP operators as feature descriptors. The matching experiments of the panoramic images from the COLD database were performed to determine their optimal parameters, and to evaluate and compare their performance with SIFT. The experimental results show that our algorithms perform better, and features can be extracted in real-time. Therefore, our local visual features can be applied to those computer/robot vision tasks with high real-time requirements.
Modeling floor-cleaning coverage performances of some domestic mobile robots in a reduced scenario
2010, Robotics and Autonomous Systems
Citation Excerpt :
In the case of an unknown, unstructured and dynamic cleaning scenario, as the originated by the typical domestic decoration things and human disorder habits, the complete-coverage problem becomes more complex [2,9,22,23] and most commercial mobile robots use inefficient random path-planning algorithms [24] and very few inexpensive contact (or non-contact) collision sensors to get complete-coverage. The main objective of this work is to model floor-cleaning coverage performances of some domestic random path-planning mobile robots through the measurement of its position and trajectory using an external vision system in a similar manner that is performed to identify kinematic parameters in robotic manipulators [25] instead of the typical self-localization problem [26,27]. Three commercial floor-cleaning mobile robots and one research prototype are measured, analyzed and modeled.
In this paper, floor-cleaning coverage performances of some domestic mobile robots are measured, analyzed and modeled. Results obtained in a reduced scenario show that floor-cleaning coverage is complete in all cases if the path-planning exploration algorithm has some random dependence. Additionally, the evolution of the area cleaned by the mobile robot expressed in a distance domain has an exponential shape that can be modeled with a single exponential where the amplitude defines the maximum cleaning-coverage achieved and the time-constant defines the dynamic evolution of the coverage. Both parameters are robot dependent and can be estimated if the area of the room is known and then floor-cleaning coverage can be predicted and over-cleaning minimized.

View all citing articles on Scopus

André Treptow recieved his diploma in computer engineering from the University of Siegen, Germany, in 2000. He was a Ph.D. student at the W.-Schickard-Institute of Computer Science at the Eberhard-Karls-University of Tuebingen, Germany and received his Ph.D. in 2007. Since 2006 he is working at the Robert Bosch GmbH in the field of signal processing for automotive radar sensors. His research interests include real-time object detection and tracking, computer vision, biologically motivated vision algorithms and evolutionary algorithms.

Tom Duckett is a Reader at the Department of Computing and Informatics, University of Lincoln. He was formerly a docent (Associate Professor) at Örebro University, where he founded the Learning Systems Laboratory, one of four research laboratories within the Centre for Applied Autonomous Sensor Systems. He obtained his Ph.D. from Manchester University, M.Sc. with distinction from Heriot-Watt University and B.Sc. (Hons.) from Warwick University, and has also studied at Karlsruhe and Bremen Universities. His research interests include mobile robotics, navigation, machine learning, AI, computer vision, and sensor fusion for perception-based control of autonomous systems.

¹: Present address: Department of Computing and Informatics, University of Lincoln, Lincoln LN6 7TS, UK.

View full text

Self-localization in non-stationary environments using omni-directional vision

Abstract

Introduction

Section snippets

Basic methods

Other image matching methods compared

Results

Conclusion

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Image and Vision Computing

Robotics and Autonomous Systems

Dervish: An office-navigation robot

AI Magazine

Acting under uncertainty: Discrete Bayesian models for mobile-robot navigation

Monte Carlo localization for mobile robots

Appearance-based place recognition for topological localization

Topological mobile robot localization using fast vision techniques

Rover localization in natural environments by indexing panoramic images

Omni-directional vision for robot navigation

Mobile robot localization using an incremental Eigenspace model

Robust localization using context in omnidirectional imaging