Long-range terrain perception using convolutional neural networks
Introduction
Terrain segmentation refers to the process of dividing scenes in the wild into various regions, such as traversable roads, obstacles, and other ambiguous regions. Terrain segmentation can help autonomous robots to perceive the surrounding topographic conditions and perform path planning toward a goal while avoiding obstacles. Although relevant algorithms such as [1], [2] exist, it remains a considerable challenge to segment unknown environments accurately due to the difficulty in perceiving and presenting variations in the environments.
Significant work based on image processing and machine learning methods has been devoted to the problem of terrain perception. Halatci et al. [3] presented a system of multi-sensor terrain classification that trained two “low level” classifiers offline for color, texture, and range features based on maximum likelihood estimation (MLE) and a support vector machine (SVM). Their system achieved accurate terrain classification through classifier fusion of visual and tactile features. Anguelov et al. [4] trained a model from a set of labeled scans offline based on Markov random fields (MRFs), and performed graph-cut inference on the trained MRFs to segment new scenes efficiently. Bradley et al. [5] trained a random forest classifier offline using voxel features (scan-line features, point cloud features, and color features) for terrain classification and ground surface height estimation. All of these algorithms rely on offline training methods and could achieve good performance when the testing scenes were similar to the training scenes; however, they may not perform well in a wild environment. Therefore, Procopio et al. [6] trained a model using online learning methods, based on a near-to-far learning strategy, to improve the generalization capability of their model. The stereo labels and color histogram features were extracted in the near field to train a logistic regression classifier, which then evaluated the remainder of the image to arrive at final terrain predictions.
Early approaches relying on low-level vision cues [7] and manually designed classifiers are being replaced by popular deep learning algorithms. Particularly, with the popularization of convolutional neural networks (CNNs), the representation power of CNNs has led to successful applications in handwritten digit recognition, and speech and image recognition [8], [9], [10], [11]. In the DARPA’s Learning Applied to Ground Robots (LAGR) program, the existing pioneering work has attempted to combine a CNN-based classifier with a histogram-based approach to divide scenes into several classes [12], [13], [14]. There is now an active interest in semantic pixel-wise labeling [15], [16], [17], [18], [19] in which each pixel is labeled with a predefined category. According to the SegNet model proposed by Vijay Badrinarayanan et al. [20], it is known that mapping downsampled feature maps onto images at the original resolution for pixel-wise classification is feasible and can achieve good performance. However, traditional deep models such as SegNet do not work well for terrain perception in the wild because they are trained in a traditional supervised manner offline. This means they cannot generalize well for new, unrecognizable data.
We introduce a near-to-far strategy combined with CNNs and present an end-to-end training architecture to overcome such limitations. It is demonstrated that training with near-field information strengthens the capabilities of the neural network with respect to learning suitable features for unknown long-field terrain perception. Thus, this work provides an alternative method of using the near-field terrain information for long-range perception. This differs from the traditional work that relied on the near-field information of the current image (one image) to train a classifier online [6] [21]. The potential issue with these methods is that the stereo information is often noisy and sparse, which may make online training unsatisfactory. Additionally, the online training in these methods relies on handcrafted features, which may need parameter tuning in practice. In contrast, the proposed network is trained end to end offline. Thus, it has low computational complexity for testing, and the features are extracted by learning. The network is trained and tested on the LAGR dataset [6] and compared to existing online and offline terrain perception methods.
Section snippets
Proposed approach
In this section, we will present the structure of the proposed model, illustrated in Fig. 1, and discuss the effects of reference maps on terrain perception in the wild.
Experimental results
This section first introduces the dataset used to train and test the proposed model, followed by the implementation details and a comparison with existing terrain perception models.
Conclusion
We introduced a near-to-far strategy for CNNs for terrain perception in the wild. In contrast to the traditional methods that used near-field information to train a classifier online, our method provided an alternative and implicit way of using near-field terrain information. The proposed network was trained end to end offline, and had low computational complexity during testing. Experimental results on a benchmark dataset demonstrated that incorporating near-field information properly enhanced
Wei Zhang received the Ph.D. degree in electronic engineering from The Chinese University of Hong Kong in 2010. He is currently with the School of Control Science and Engineering, Shandong University, China. He has authored about 50 papers in international journals and refereed conferences. His research interests include computer vision, image processing, pattern recognition, and robotics. He served as a Program Committee Member and a Reviewer for various international conferences and journals
References (29)
- et al.
Video-based road detection via online structural learning
Neurocomputing
(2015) - et al.
Adaptive road detection via context-aware label transfer
Neurocomputing
(2015) - et al.
Integrating low-level and semantic features for object consistent segmentation
Neurocomput.
(2013) - et al.
A survey of deep neural network architectures and their applications
Neurocomputing
(2017) - et al.
Terrain classification and classifier fusion for planetary exploration rovers
2007 IEEE Aerospace Conference
(2007) - et al.
Discriminative learning of markov random fields for segmentation of 3d scan data
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
(2005) - et al.
Scene understanding for a high-mobility walking robot
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015
(2015) - et al.
Coping with imbalanced training data for improved terrain prediction in autonomous outdoor robot navigation
IEEE International Conference on Robotics and Automation (ICRA), 2010
(2010) - et al.
Going deeper with convolutions
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2015) - et al.
Very deep convolutional networks for large-scale image recognition
International Conference on Learning Representations (ICLR)
(2015)
Adaptive long range vision in unstructured terrain
2007 IEEE/RSJ International Conference on Intelligent Robots and Systems
Mapping and planning under uncertainty in mobile robots with long-range perception
2008 IEEE/RSJ International Conference on Intelligent Robots and Systems
Online learning for offroad robots: Using spatial label propagation to learn long-range traversability
Proceedings of Robotics: Science and Systems (RSS)
Cited by (0)
Wei Zhang received the Ph.D. degree in electronic engineering from The Chinese University of Hong Kong in 2010. He is currently with the School of Control Science and Engineering, Shandong University, China. He has authored about 50 papers in international journals and refereed conferences. His research interests include computer vision, image processing, pattern recognition, and robotics. He served as a Program Committee Member and a Reviewer for various international conferences and journals in image processing, computer vision, and robotics.
Qi Chen received the B.S. degree in control science and engineering from the Huazhong University of Science and Technology in 2013. He is currently pursuing the M.S degree with the School of Control Science and Engineering, Shandong University, China. His research interests include computer vision, image processing, and pattern recognition.
Weidong Zhang received the B.S. degree from the Zhejiang University in 2012. He is currently pursuing the Ph.D degree with the School of Control Science and Engineering, Shandong University, China. His research interests include computer vision and machine learning.
Xuanyu He received the B.S. degree from the Zhejiang University in 2014. He is currently pursuing the M.S degree with the School of Control Science and Engineering, Shandong University, China. His research interests include computer vision, image processing and pattern recognition.