Superpixel-based appearance change prediction for long-term navigation across seasons

doi:10.1016/j.robot.2014.08.005

Robotics and Autonomous Systems

Volume 69, July 2015, Pages 15-27

https://doi.org/10.1016/j.robot.2014.08.005 Get rights and content

Highlights

•
This work is about place recognition across seasons.
•
We present the novel idea of an additional prediction step.
•
We propose a superpixel based implementation: SP-APC.
•
SP-APC can learn to predict e.g. winter images from summer images.
•
Extensive evaluation is done on the Nordland dataset.

Abstract

Changing environments pose a serious problem to current robotic systems aiming at long term operation under varying seasons or local weather conditions. This paper is built on our previous work where we propose to learn to predict the changes in an environment. Our key insight is that the occurring scene changes are in part systematic, repeatable and therefore predictable. The goal of our work is to support existing approaches to place recognition by learning how the visual appearance of an environment changes over time and by using this learned knowledge to predict its appearance under different environmental conditions. We describe the general idea of appearance change prediction (ACP) and investigate properties of our novel implementation based on vocabularies of superpixels (SP-ACP). Our previous work showed that the proposed approach significantly improves the performance of SeqSLAM and BRIEF-Gist for place recognition on a subset of the Nordland dataset under extremely different environmental conditions in summer and winter. This paper deepens the understanding of the proposed SP-ACP system and evaluates the influence of its parameters. We present the results of a large-scale experiment on the complete 10 h Nordland dataset and appearance change predictions between different combinations of seasons.

Introduction

Long term navigation in changing environments is one of the major challenges in robotics today. Robots operating autonomously over the course of days, weeks, and months have to cope with significant changes in the appearance of an environment. A single place can look extremely different depending on the current season, weather conditions or the time of day. Since state of the art algorithms for autonomous navigation are often based on vision and rely on the system’s capability to recognize known places, such changes in the appearance pose a severe challenge for any robotic system aiming at autonomous long term operation.

The problem has recently been addressed by few authors, but so far no congruent solution has been proposed. Milford and Wyeth [1] proposed to increase the place recognition robustness by matching sequences of images instead of single images and achieved impressive results on two across-seasons datasets. Exploring into a different direction, Churchill and Newman [2] proposed to accept that a single place can have a variety of appearances. Their conclusion was that instead of attempting to match different appearances across seasons or severe weather changes, different experiences should be remembered for each place, where each experience covers exactly one appearance. Both suggested approaches can be understood as the extreme ends of a spectrum of approaches that spans between interpreting changes as individual experiences of a single place on one hand and increasing the robustness of the matching against appearance changes on the other hand. Our work presented in the following is orthogonal to this spectrum.

What current approaches to place recognition (and environmental perception in general) lack, is the ability to reason about the occurring changes in the environment. Most approaches try to merely cope with them by developing change-invariant descriptors or matching methods. Potentially more promising is to develop a system that can learn to predict certain systematic changes (e.g. day–night cycles, weather and seasonal effects, re-occurring patterns in environments where robots interact with humans) and to infer further information from these changes. Doing so without being forced to explicitly know about the semantics of objects in the environment is in the focus of our research and the topic of this paper.

Fig. 1 illustrates the core idea of our work and how it compares to the current state of the art place recognition algorithms. Suppose a robot re-visits a place under extremely different environmental conditions. For example, an environment was first experienced in summer and is later re-visited in winter time. Most certainly, the visual appearance has undergone extreme changes. Despite that, state of the art approaches would attempt to match the currently seen winter image against the stored summer images.

Instead, we propose to predict how the current scene would appear under the same environmental conditions as the stored past representations, before attempting to match against the database. That is, when we attempt to match against a database of summer images but are in winter time now, we predict how the currently observed winter scene would appear in summer time or vice versa.

The result of this prediction process is a synthesized summer image that preserves the structure of the original scene and is close in appearance to the corresponding original summer scene. This prediction can be understood as translating the image from a winter vocabulary into a summer vocabulary or from winter language into summer language. As is the case with translations of speech or written text, some details will be lost in the process, but the overall idea, i.e. the gist of the scene will be preserved. Sticking to the analogy, the error rate of a translator will drop with experience. The same can be expected of our proposed system: It is dependent on training data, and the more and the better training data is gets, the better can it learn to predict how a scene changes over time or even across seasons.

This paper build upon our previous work [3] where we introduced the novel idea of predicting extreme scene changes across seasons to aid place recognition for the first time. We prove the feasibility of our idea and describe an implementation based on superpixel vocabularies. We demonstrate how we can predict the appearance of natural scenes across winter and summer time, as illustrated in Fig. 1. By applying this approach, we are able to significantly improve the place recognition performance of SeqSLAM [1] and BRIEF-Gist [4] on the new, publicly available large-scale Nordland dataset [5] that traverses an environment in winter and summer under extremely different environmental conditions. While the first results we reported in [3] were based on a small subset of the Nordland dataset, this paper presents new results on the complete Nordland track. We furthermore evaluate predictions between different combinations of seasons. An extensive evaluation of important parameters deepens our understanding of the proposed prediction system and its parameters.

In the following section, we put the proposed prediction system in the context of related work, before we describe its algorithmic steps in Section 3. Section 4 introduces the Nordland dataset we used for training, validation and testing. The results Section 5 presents comprehensive place recognition experiments on this dataset using FAB-MAP, BRIEF-Gist and SeqSLAM in combination with the proposed SP-ACP system. The paper is concluded by a discussion on limitations of the current system and directions for future work in Section 6. Additional information and videos can be obtained from our project website.¹

Section snippets

Related work

The related work is threefold. First we give a short review of the work on visual place recognition in changing environments, followed by methods on how to deal with changing environments on the mapping side, finally we present the relation of our approach to the texture transfer and image analogy ideas published in computer graphics.

SP-ACP: learning to predict scene changes across seasons

In this section of our paper we explore how the changing appearance of a scene across different environmental conditions can be predicted. Throughout the remainder of this section these changing environmental conditions will be summer and winter. However, the concepts described in the following can of course be applied to other sets of contrasting conditions such as day/night or weather conditions like sunny/rainy etc.

How can the severe changes in appearance a landscape undergoes between winter

The Nordland dataset

To test our proposed approach of appearance change prediction, we required a dataset where a camera traverses the same places under very different environmental conditions but under a similar viewing perspective. Ideally, the dataset should contain ground truth information, e.g. the corresponding scenes should be known.

The TV documentary “Nordlandsbanen — Minutt for Minutt” by the Norwegian Broadcasting Corporation NRK provides video footage of the 728 km long train ride between the cities of

Experiments and results

After the previous sections explained our proposed SP-ACP system and introduced the Nordland dataset, we can now describe the conducted experiments and their results.

We evaluate the proposed SP-ACP prediction system by using it as a preprocessing step to the existing place recognition algorithms FAB-MAP [6], BRIEF-GIST [4], and SeqSLAM [1]. For each of these three established approaches, we will compare the respective performance of

1.
directly matching between images of different seasons,

Current limitations of the approach and future work

The proposed SP-ACP system is a rather straightforward implementation of the idea of incorporating an additional prediction step for place recognition in systematically changing environments. However, there is plenty of space for improvements.

Obviously the prediction step incorporates smoothing and artifacts in the predicted images. This can cause a decrease in place recognition performance if the compared original sequences are very similar (e.g. summer and fall). However, the predicted images

Conclusions

Our paper described the novel concept of learning to predict systematic changes in the appearance of environments. We explained our SP-ACP implementation based on superpixel vocabularies and provided examples for scene change prediction between different seasons. We furthermore demonstrated how two approaches to place recognition, BRIEF-Gist and SeqSLAM, can benefit from the scene change prediction step.

We evaluated all important parameters of the proposed system and found that none of them

Peer Neubert received his Diploma (M.Sc.) degree in computer science in 2009 from the Technische Universität Chemnitz, Germany. Since 2010 he has been working as a researcher at the chair for automation technology at the same university. His research interests include image processing, computer vision and machine learning in particular for application in the area of autonomous mobile robots. A key aspect is the transformation of biologically inspired models, concepts and ideas to application on

References (26)

C. Valgren et al.
SIFT, SURF & seasons: appearance-based long-term localization in outdoor environments
Robot. Auton. Syst.
(2010)
M. Milford et al.
SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights
W. Churchill, P.M. Newman, Practice makes perfect? managing and leveraging visual experiences for lifelong navigation,...
P. Neubert, N. Sünderhauf, P. Protzel, Appearance change prediction for long-term navigation across seasons, in:...
N. Sünderhauf, P. Protzel, BRIEF-gist — closing the loop by simple means, in: International Conference on Intelligent...
N. Sünderhauf, P. Neubert, P. Protzel, Are we there yet? challenging seqslam on a 3000 km journey across all four...
M. Cummins et al.
FAB-MAP: probabilistic localization and mapping in the space of appearance
Int. J. Robot. Res.
(2008)
A. Torralba, K.P. Murphy, W.T. Freeman, M.A. Rubin, Context-based vision system for place and object recognition, in:...
C. Liu, J. Yuen, A. Torralba, J. Sivic, W.T. Freeman, SIFT flow: dense correspondence across different scenes, in:...
A. Glover, W. Maddern, M. Milford, G. Wyeth, FAB-MAP + RatSLAM: appearance-based SLAM for multiple times of day, in:...

M. Milford et al.

Persistent navigation and mapping using a biologically inspired SLAM system

Int. J. Robot. Res.

(2010)

H. Badino, D.F. Huber, T. Kanade, Real-time topometric localization, in: International Conference on Robotics and...

W. Maddern, S. Vidas, Towards robust night and day place recognition using visible and thermal imaging, in: Robotics...

Cited by (78)

Distilled representation using patch-based local-to-global similarity strategy for visual place recognition
2023, Knowledge-Based Systems
Visual Place Recognition (VPR) is important for ensuring the accuracy and reliability of re-localization in a Visual Simultaneous Localization and Mapping (VSLAM) system, effectively reducing potential errors in mapping and navigation tasks. In VPR tasks, CNN-based VPR techniques encounter challenges in mitigating the impact of severe appearance changes caused by seasons and weather, as well as, viewpoint changes arising from robot motion deviations. To cope with this problem, a local-to-global similarity strategy is proposed in this paper. Specifically, an Auto-Encoder (AE) block is designed to distill appearance-invariant local features from AlexNet, where each local feature represents a specific image patch. Then, three local similarity measures, namely paired similarity, additional similarity, and adjacent similarity, are used to measure the similarity between paired images. Finally, weight encoders are introduced to combine the three local measures into a global one that achieves viewpoint-invariance. Extensive experiments show that our proposed method is robust to severe appearance and viewpoint changes while outperforming the current state-of-the-art methods on public visual place recognition datasets. Moreover, the proposed similarity strategy distinguishes the relationships between internal and external patches within images, effectively enhancing its recognition capability in real-world scenarios.
Appearance-invariant place recognition by adversarially learning disentangled representation
2020, Robotics and Autonomous Systems
Citation Excerpt :
This transformation has been shown to improve the performance of visual localization compared with the matching between the original images. In [26], a superpixel vocabulary was built for each season and translates images across different seasons before matching. It demonstrates that SeqSLAM [17] and BRIEF-Gist [27] can benefit from this operation greatly.
Place recognition is an essential component to address the problem of visual navigation and SLAM. The long-term place recognition is challenging as the environment exhibits significant variations across different times of the days, months, and seasons. In this paper, we view appearance changes as multiple domains and propose a Feature Disentanglement Network (FDNet) based on a convolutional auto-encoder and adversarial learning to extract two independent deep features — content and appearance. In our network, the content feature is learned which only retains the content information of images through the competition with the discriminators and content encoder. Besides, we utilize the triplets loss to make the appearance feature encode the appearance information. The generated content features are directly used to measure the similarity of images without dimensionality reduction operations. We use datasets that contain extreme appearance changes to carry out experiments, which show how meaningful recall at 100% precision can be achieved by our proposed method where existing state-of-art approaches often get worse performance.
A systematic literature review on long-term localization and mapping for mobile robots
2023, Journal of Field Robotics
Multidimensional Particle Filter for Long-Term Visual Teach and Repeat in Changing Environments
2023, IEEE Robotics and Automation Letters
Performance Comparison of Visual Teach and Repeat Systems for Mobile Robots
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection
2022, IEEE Transactions on Intelligent Transportation Systems

View all citing articles on Scopus

Niko Sünderhauf received a Ph.D. in 2012 and a Diploma (M.Sc.) in computer science in 2006, both from Technische Universität Chemnitz, Germany, where he has been a research associate since May 2006. His research interests include SLAM, long-term autonomy, and probabilistic estimation with graphical models. He is also interested in computer vision for environmental perception and machine learning. Apart from mobile robotics, his research covers robust methods for sensor fusion, especially NLOS-mitigation for satellite-based localization systems.

Peter Protzel received a Diploma in 1982 and a Ph.D. in 1987, both in electrical engineering from the Technische Universität Braunschweig, Germany. From 1987 to 1991 he worked as a scientist at the NASA Langley Research Center, Virginia, USA, and from 1991 to 1998 he headed the neural networks research group at the Bavarian Research Center for Knowledge-Based Systems (FORWISS) in Erlangen, Germany. Since 1998 he is a full professor for automation technology at the Technische Universität Chemnitz, Germany. His research interests include all aspects of autonomous systems, especially mobile robotics, computer vision, and machine learning.

View full text

Superpixel-based appearance change prediction for long-term navigation across seasons

Highlights

Abstract

Introduction

Section snippets

Related work

SP-ACP: learning to predict scene changes across seasons

The Nordland dataset

Experiments and results

Current limitations of the approach and future work

Conclusions

Robot. Auton. Syst.

SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights

FAB-MAP: probabilistic localization and mapping in the space of appearance

Int. J. Robot. Res.

Persistent navigation and mapping using a biologically inspired SLAM system

Int. J. Robot. Res.