Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 781-787
Neurocomputing

Long-range terrain perception using convolutional neural networks

https://doi.org/10.1016/j.neucom.2017.09.012Get rights and content

Abstract

Autonomous robot navigation in wild environments is still an open problem and relies heavily on accurate terrain perception. Traditional machine learning techniques have achieved good performance for terrain perception; however, most of them require manually designed classifiers, meaning they have a poor generalization ability for learning new unknown environments. In this work, we integrate a deep convolutional neural network (CNN) model with a near-to-far learning strategy to improve the accuracy of terrain segmentation and make it more robust against wild environments. The proposed deep CNN model consists of an encoder and a decoder, which perform downsampling and upsampling for terrain feature extraction, respectively. The near-field terrain information obtained directly from the stereo disparity maps is fed into the CNNs as reference to aid in learning the far-field terrain information. Experimental results on a benchmark dataset demonstrate the effectiveness of the proposed terrain perception method.

Introduction

Terrain segmentation refers to the process of dividing scenes in the wild into various regions, such as traversable roads, obstacles, and other ambiguous regions. Terrain segmentation can help autonomous robots to perceive the surrounding topographic conditions and perform path planning toward a goal while avoiding obstacles. Although relevant algorithms such as [1], [2] exist, it remains a considerable challenge to segment unknown environments accurately due to the difficulty in perceiving and presenting variations in the environments.

Significant work based on image processing and machine learning methods has been devoted to the problem of terrain perception. Halatci et al. [3] presented a system of multi-sensor terrain classification that trained two “low level” classifiers offline for color, texture, and range features based on maximum likelihood estimation (MLE) and a support vector machine (SVM). Their system achieved accurate terrain classification through classifier fusion of visual and tactile features. Anguelov et al. [4] trained a model from a set of labeled scans offline based on Markov random fields (MRFs), and performed graph-cut inference on the trained MRFs to segment new scenes efficiently. Bradley et al. [5] trained a random forest classifier offline using voxel features (scan-line features, point cloud features, and color features) for terrain classification and ground surface height estimation. All of these algorithms rely on offline training methods and could achieve good performance when the testing scenes were similar to the training scenes; however, they may not perform well in a wild environment. Therefore, Procopio et al. [6] trained a model using online learning methods, based on a near-to-far learning strategy, to improve the generalization capability of their model. The stereo labels and color histogram features were extracted in the near field to train a logistic regression classifier, which then evaluated the remainder of the image to arrive at final terrain predictions.

Early approaches relying on low-level vision cues [7] and manually designed classifiers are being replaced by popular deep learning algorithms. Particularly, with the popularization of convolutional neural networks (CNNs), the representation power of CNNs has led to successful applications in handwritten digit recognition, and speech and image recognition [8], [9], [10], [11]. In the DARPA’s Learning Applied to Ground Robots (LAGR) program, the existing pioneering work has attempted to combine a CNN-based classifier with a histogram-based approach to divide scenes into several classes [12], [13], [14]. There is now an active interest in semantic pixel-wise labeling [15], [16], [17], [18], [19] in which each pixel is labeled with a predefined category. According to the SegNet model proposed by Vijay Badrinarayanan et al. [20], it is known that mapping downsampled feature maps onto images at the original resolution for pixel-wise classification is feasible and can achieve good performance. However, traditional deep models such as SegNet do not work well for terrain perception in the wild because they are trained in a traditional supervised manner offline. This means they cannot generalize well for new, unrecognizable data.

We introduce a near-to-far strategy combined with CNNs and present an end-to-end training architecture to overcome such limitations. It is demonstrated that training with near-field information strengthens the capabilities of the neural network with respect to learning suitable features for unknown long-field terrain perception. Thus, this work provides an alternative method of using the near-field terrain information for long-range perception. This differs from the traditional work that relied on the near-field information of the current image (one image) to train a classifier online [6] [21]. The potential issue with these methods is that the stereo information is often noisy and sparse, which may make online training unsatisfactory. Additionally, the online training in these methods relies on handcrafted features, which may need parameter tuning in practice. In contrast, the proposed network is trained end to end offline. Thus, it has low computational complexity for testing, and the features are extracted by learning. The network is trained and tested on the LAGR dataset [6] and compared to existing online and offline terrain perception methods.

Section snippets

Proposed approach

In this section, we will present the structure of the proposed model, illustrated in Fig. 1, and discuss the effects of reference maps on terrain perception in the wild.

Experimental results

This section first introduces the dataset used to train and test the proposed model, followed by the implementation details and a comparison with existing terrain perception models.

Conclusion

We introduced a near-to-far strategy for CNNs for terrain perception in the wild. In contrast to the traditional methods that used near-field information to train a classifier online, our method provided an alternative and implicit way of using near-field terrain information. The proposed network was trained end to end offline, and had low computational complexity during testing. Experimental results on a benchmark dataset demonstrated that incorporating near-field information properly enhanced

Wei Zhang received the Ph.D. degree in electronic engineering from The Chinese University of Hong Kong in 2010. He is currently with the School of Control Science and Engineering, Shandong University, China. He has authored about 50 papers in international journals and refereed conferences. His research interests include computer vision, image processing, pattern recognition, and robotics. He served as a Program Committee Member and a Reviewer for various international conferences and journals

References (29)

  • YuanY. et al.

    Video-based road detection via online structural learning

    Neurocomputing

    (2015)
  • WangQ. et al.

    Adaptive road detection via context-aware label transfer

    Neurocomputing

    (2015)
  • FuH. et al.

    Integrating low-level and semantic features for object consistent segmentation

    Neurocomput.

    (2013)
  • LiuW. et al.

    A survey of deep neural network architectures and their applications

    Neurocomputing

    (2017)
  • I. Halatci et al.

    Terrain classification and classifier fusion for planetary exploration rovers

    2007 IEEE Aerospace Conference

    (2007)
  • D. Anguelov et al.

    Discriminative learning of markov random fields for segmentation of 3d scan data

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

    (2005)
  • D.M. Bradley et al.

    Scene understanding for a high-mobility walking robot

    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015

    (2015)
  • M.J. Procopio et al.

    Coping with imbalanced training data for improved terrain prediction in autonomous outdoor robot navigation

    IEEE International Conference on Robotics and Automation (ICRA), 2010

    (2010)
  • C. Szegedy et al.

    Going deeper with convolutions

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • SimonyanK. et al.

    Very deep convolutional networks for large-scale image recognition

    International Conference on Learning Representations (ICLR)

    (2015)
  • Y. LeCun, C. Cortes, C.J. Burges, The mnist database of handwritten...
  • A.N. Erkan et al.

    Adaptive long range vision in unstructured terrain

    2007 IEEE/RSJ International Conference on Intelligent Robots and Systems

    (2007)
  • P. Sermanet et al.

    Mapping and planning under uncertainty in mobile robots with long-range perception

    2008 IEEE/RSJ International Conference on Intelligent Robots and Systems

    (2008)
  • R. Hadsell et al.

    Online learning for offroad robots: Using spatial label propagation to learn long-range traversability

    Proceedings of Robotics: Science and Systems (RSS)

    (2007)
  • Cited by (0)

    Wei Zhang received the Ph.D. degree in electronic engineering from The Chinese University of Hong Kong in 2010. He is currently with the School of Control Science and Engineering, Shandong University, China. He has authored about 50 papers in international journals and refereed conferences. His research interests include computer vision, image processing, pattern recognition, and robotics. He served as a Program Committee Member and a Reviewer for various international conferences and journals in image processing, computer vision, and robotics.

    Qi Chen received the B.S. degree in control science and engineering from the Huazhong University of Science and Technology in 2013. He is currently pursuing the M.S degree with the School of Control Science and Engineering, Shandong University, China. His research interests include computer vision, image processing, and pattern recognition.

    Weidong Zhang received the B.S. degree from the Zhejiang University in 2012. He is currently pursuing the Ph.D degree with the School of Control Science and Engineering, Shandong University, China. His research interests include computer vision and machine learning.

    Xuanyu He received the B.S. degree from the Zhejiang University in 2014. He is currently pursuing the M.S degree with the School of Control Science and Engineering, Shandong University, China. His research interests include computer vision, image processing and pattern recognition.

    View full text