Contributed article
Comparing Bayesian neural network algorithms for classifying segmented outdoor images

https://doi.org/10.1016/S0893-6080(01)00024-7Get rights and content

Abstract

In this paper we investigate the Bayesian training of neural networks for region labelling of segmented outdoor scenes; the data are drawn from the Sowerby Image Database of British Aerospace. Neural networks are trained with two Bayesian methods, (i) the evidence framework of MacKay, 1992a, MacKay, 1992b and (ii) a Markov Chain Monte Carlo method due to Neal (1996). The performance of the two methods is compared to evaluating the empirical learning curves of neural networks trained with the two methods. We also investigate the use of the Automatic Relevance Determination method for input feature selection.

Introduction

This paper is concerned with a comparison of two Bayesian methods for training neural networks; the evidence framework of MacKay, 1992a, MacKay, 1992b and a Markov Chain Monte Carlo method due to Neal (1996). The networks are trained on the task of labelling regions of segmented outdoor scenes. The performance of the two methods is compared by evaluating the empirical learning curves of neural networks trained with the two Bayesian algorithms. It is important to carry out such a comparison if a neural network has to be applied to real-world tasks, in order to understand which algorithm can provide better generalisation performance. We also discuss the use of the Automatic Relevance Determination method for input feature selection.

The structure of the paper is as follows. In Section 2 we discuss various approaches that can be taken to the segmentation of images and explain the method used in this paper. Section 3 introduces the image database used for the training of the neural networks. The two implementations of Bayesian learning of neural networks are introduced in Section 4 and applied to the training of a Multi Layer Perceptron (MLP). Section 5 describes a preliminary experiment using both multiple logistic regression and multi-layer perceptron classifiers on the full data set. In Section 7 we investigate the empirical learning curves of the two methods, comparing classification performances on training sets of various sizes. In Section 8 results are reported for the use of the Automatic Relevance Determination technique (ARD) of MacKay and Neal (Neal, 1996) for feature selection. This paper (which is a revised and expanded version of Vivarelli and Williams (1997)) concludes with a discussion of the study undertaken.

Section snippets

Segmentation and scene analysis

This paper deals with the specific problem of investigating the use of neural networks for the classification of regions of outdoor scenes.

Scene interpretation is usually split in a two stage procedure, the segmentation and the interpretation of images. The segmentation process divides an image in a set of regions, where ideally each region corresponds to one object; the interpretation consists in the labelling of the regions. To carry out the task successfully it is important to classify each

The database

The database consists of 96 coloured images extracted from the Sowerby Image Database of British Aerospace. The main subjects of the images are rural and urban scenes; the database was constructed so that objects with diverse characteristics appear in it (e.g. cars with different colours, different shapes of buildings, etc.). All the scenes have been photographed using small-grain 35 mm transparency film. In order to ensure clear pictures, all the images have been shot with good atmospheric

Bayesian training of neural networks

A feed-forward neural network (Hertz, Krogh, & Palmer, 1991) consists of processing units allocated in layers (Fig. 1); each node of one layer is connected with all those of the previous layer and is characterised by a numerical value called activation.

In this paper, neural networks with two layers of adaptable weights have mainly been used. The input layer is made up of 35 units whose activation values are the components of the feature vector.

In order to discriminate patterns belonging to one

Preliminary investigations

Preliminary experiments were carried out using Multiple Logistic Regression (MLR) and MLP networks. The MLR network has only one layer of adaptable weights, and no hidden layer. A MLP network with 30 hidden units was used; this is a relatively large number of hidden units, chosen with a view to observing a difference between the EF and MCMC methods. This network has 1421 weights, and the maximum number of training examples available is 5832. Both the MLR and MLP networks were trained using both

Overview of experiments

We compare and contrast the training of the EF and the MCMC with respect to two issues.

  • 1.

    Empirical learning curves. In order to assess how the training algorithms rely on the amount of training data, we trained the MLP on several independent training sets by varying the quantity of data available and assessing the sensitivity to the choice of the seed initialising the random number generator of the two algorithms. The study has been carried out comparing empirical learning curves of the MLP

Empirical learning curves

A thorough investigation of the differences between the EF and the MCMC methods training of neural networks can be carried out by a statistical analysis of empirical learning curves. We should note in advance that the task of comparing two learning techniques, determining which one actually out-performs the other and detecting all the relevant sources of variation, is a difficult one (Dietterich, 1998, Rasmussen et al., 1996).

Generalisation capabilities of neural networks depend upon many

Automatic relevance determination results

The results in this section were obtained by training the network on the full training set of 5832 examples. Fig. 3 reports the values of the hyperparameters (with their 95% confidence intervals) determined with the ARD prior for the MLP trained with both the EF and the MCMC algorithms (see Fig. 3(a) and (b), respectively). There is broad agreement between the two plots, although we note that the confidence intervals are usually larger for the MCMC case.

The graphs show that the features

Discussion

In this paper we have compared and contrasted the evidence framework and a Markov Chain Monte Carlo method for training neural networks on the task of labelling regions of segmented outdoor scenes, using a network with 30 hidden units.

The generalisation performances obtained by the MLP trained using the EF and MCMC methods have been thoroughly investigated for different amounts of training data by analysing empirical learning curves using a two-way ANOVA design. Our results show that on the

Acknowledgements

This research forms part of the ‘Validation and Verification of Neural Network Systems’ project funded jointly by EPSRC (GR/K 51792) and British Aerospace. FV was supported by a studentship from British Aerospace. We thank Dr Andy Wright of BAe for helpful discussions, Professor Radford Neal for discussions concerning the two-way ANOVA analysis and Drs Neil Campbell and William P.J. Mackeown who provided information about the database. We also thank the anonymous referees and action editor for

References (29)

  • S. Duane et al.

    Hybrid Monte Carlo

    Physics Letters B

    (1987)
  • D. Husmeier et al.

    An empirical evaluation of Bayesian sampling with hybrid Monte Carlo for training neural network classifiers

    Neural Networks

    (1999)
  • D. Barber et al.

    Ensemble learning for Multi-Layer Networks

  • C.M. Bishop

    Neural networks for pattern recognition

    (1995)
  • R.A. Brooks

    Model-based 3D interpretation of 2D images

    IEEE Trans. Pattern Analysis and Machine Intelligence

    (1983)
  • Clark, A. (1995). Computer vision for outdoor scene analysis. Master's thesis, University of Bristol, Bristol,...
  • M.K. Cowles et al.

    Markov Chain Monte Carlo convergence diagnostics: a comparative review

    J. American Statistical Assn.

    (1996)
  • M.H. DeGroot

    Probability and statistics

    (1984)
  • Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classified learning algorithms. Neural...
  • Draper, B. A., Collins, R. T., Brolio, J., Hanson, A. R., & Riseman, E. M. (1989). The Schema System. International...
  • Gay, M. (1989). Segmentation using region merging with edges. In Proceedings 5th Alvey Vision Conference. (pp.115–119)....
  • A. Gelman et al.

    Bayesian data analysis

    (1995)
  • J. Hertz et al.

    Introduction to the theory of neural networks

    (1991)
  • Hinton, G. E., & van Camp, D. (1993). Keeping neural networks simple by minimising the description length of the...
  • Cited by (18)

    • Promoter recognition with machine learning algorithms keREM, RULSE-3 and ANN

      2012, INISTA 2012 - International Symposium on INnovations in Intelligent SysTems and Applications
    View all citing articles on Scopus
    View full text