Design of ensemble neural network using the Akaike information criterion

https://doi.org/10.1016/j.engappai.2008.02.007Get rights and content

Abstract

Ensemble neural networks are commonly used networks in many engineering applications due to its better generalization property. In this paper, an ensemble neural network algorithm is proposed based on the Akaike information criterion (AIC). The AIC-based ensemble neural network searches the best weight configuration of each component network first, and uses the AIC as an automating tool to find the best combination weights of the ensemble neural network. Two analytical functions—the peak function and the Friedman function are used first to assess the accuracy of the proposed ensemble approach. The verified approach is then applied to a material modeling problem—the stress–strain–time relationship of mudstones. These computational experiments have verified that the AIC-based ensemble neural network outperforms both the simple averaging ensemble neural network and the single component neural network.

Introduction

The artificial neural network (NN) is a mathematical or computational model for information processing based on the biological NNs (McCulloch and Pitts, 1943). It has been successfully applied to a wide range of engineering applications, such as in fault detection (Jakubek and Strasser, 2004), face recognition (Aitkenhead and McDonald, 2003), concrete strength prediction (Jiang et al., 2003), color adjustment (Puerto and Ghalia, 2002), injection molding control (Kenig et al., 2001), bicycle derailleur control (Lin and Tseng, 2000) and steel model under elevated temperature (Zhao, 2006).

An ensemble neural network (ENN) is a collection of a finite number of NNs that are trained for the same task. Usually, the networks in an ENN are trained independently and their predictions are combined (Sollich and Krogh, 1996). In other words, any one of the component networks in an ENN could provide a solution to the task by itself, but better results might be obtained by an combination of component NNs due to its better generalization. Different methods can be employed to combine the solutions achieved by the component networks. A typical architecture of the ENN is shown in Fig. 1.

The ENN originates from Hansen and Salamon's work (1990), which showed that the generalization ability of an NN system can be significantly improved through ensembling a number of NNs. Since this approach behaves remarkably well, the ENN has been applied to many areas, such as in pattern recognition (Giacinto and Roli, 2001), medical diagnosis (Hayashi and Setiono, 2002), climate prediction (Cannon and Whitfield, 2002), and marine propeller modeling (Reich and Barai, 2000).

In general, an ENN is constructed in two steps: creating component networks and combining component networks into an ENN. For creating component networks, good regression or classification component networks must be both accurate and diverse. To find networks with different generalization ability, a number of training parameters can be manipulated, including the initial condition, the training data, the typology of the nets, and the training algorithm (Sharkey, 1999). The most widely used techniques for creating the training data for an ENN are Bagging and Boosting. The Bagging (short for ‘bootstrap aggregation’) was proposed by Breiman (1996) based on the bootstrap sampling (Efron and Tibshirani, 1993), where “bootstrap” is to use one available sample to generate many other samples by the re-sampling process. During the re-sampling process, the randomly picked repeated data can be used in the new training set. Then a component network with this new sample was trained. This process was repeated until the component networks are sufficient in the ENN. Therefore, the Bagging is suitable to models with insufficient data. The Boosting was proposed by Schapire (1990) and improved by Freund and Schapire (1995). The Boosting generates a set of component networks whose training sets are determined by the performance of former component networks. Since the Boosting method needs a large amount of data, Freund and Schapire (1996) proposed AdaBoost (adaptive boosting algorithm) to avoid this problem. Depending on how well this first weak learner performs on the training pattern, the probability of picking this pattern as part of the training set for the next weak learner is adjusted to be lower or remains the same. Thus, by increasing the number of rounds of boosting, more attention is paid to the hard pattern.

There are many other methods for creating the component networks. Opitz and Shavlik (1996) presented an algorithm using the genetic algorithms (GA) to generate a population of NNs. Granitto et al. (2001) proposed the late stopping method to create a stepwise construction of the ensemble, where each network is selected at a time and only its parameters have to be saved. The NeuralBAG (Carney and Cunningham, 1999) or the method by Naftaly et al. (1997) requires to keep the intermediate networks during training since the selection of stopping points for the ensemble members is performed only at the end of all the training processes. Zhou et al. (2002) presented a GA-based selective ensemble method, where the GA is used to select a suitable subset of all the trained networks to build the ENN.

After a set of component networks has been created, the methods to combine these networks have to be considered. From the beginning of the 1990s, several procedures have been proposed. Hashem (1993) provided a method to find optimal linear combinations of the members of an ensemble by using equal combination weights. This set of outputs which is combined by a uniform weighting factor is referred as the simple ensemble (or simple averaging method). Perrone and Cooper (1993) proposed a generalized ensemble method to determine the optimal weights using the correlation matrix. They defined the symmetric correlation matrix by using the error between the target function and the output of the component network. This ensemble method is sometimes called weighted averaging method and can efficiently utilize local minima. Rosen (1996) described a method that allows training an ensemble of networks by backpropagation, and a penalty term is designed to force networks to be decorrelated with each other. One major disadvantage of Rosen's algorithm is that training a component network does not affect the networks trained previously in the ensemble, so that the errors of the individual networks are not necessarily negatively correlated. Liu et al. (2000) presented an evolutionary ENN with the negative correlation learning (EENCL) for designing NN ensembles automatically. The EENCL extended Rosen's work to simultaneous training of negatively correlated NNs, which will encourage different component networks in the ensemble to learn different parts or aspects of the training data. Islam et al. (2003) proposed a constructive ENN (CNNE) for training cooperative NN ensembles. It determined automatically not only the number of NNs in an ensemble, but also the number of hidden nodes in individual networks. The CNNE adopted the negative correlation learning to promote and maintain diversity among individual networks, and the criteria for growing NNs and the ensemble are based on an NN contribution to reducing the ensemble's overall error, rather than in reducing its own error. But this approach can induce an ENN model more complex and may not find the optimal one. Lagaros et al. (2005) proposed an adaptive strategy for NN training. With the evolution-based optimization procedure, the adaptive strategy improves the prediction reliability of NN architecture substantially. The proposed algorithm (Lagaros et al., 2005) has been applied to predict the response of the structure in terms of objective and constraint functions’ values.

It is worth mentioning that when a number of NNs are available, most of the ensemble approaches aim to reduce the mean-squared-error (MSE) of each component NN, thus they may lead to an ensemble NN with unnecessary complexity and unstable performance. The complexity of the ENN model may increase the computational time and lead to over-fitting. This paper aims to reduce the over-fitting through the use of the Akaike information criterion (AIC). The proposed method reduces each component network's error first, and then balances their contributions to the ENN by using the AIC-based weights. Two theoretical examples and one practical example are used to demonstrate the accuracy of the proposed ENN approach.

Section snippets

Akaike information criterion in model selection

The Akaike information criterion (AIC), which was introduced more than 30 years ago by Akaike, is an information criterion for the identification of an optimal model from a class of competing models. The AIC belongs to the indirect approach since it penalizes the model complexity. For a conventional least squares regression with normally distributed errors, one can compute the AIC with the following formula (where arbitrary constants have been deleted) (Akaike, 1973):AIC=nlog(σ^2)+2K,andσ^2=εi2

Creating the component networks

Creation of the component network can be divided into two steps. The first step is to create the training data, the cross validation data and the testing data, and the second step is to create the component networks.

For creating the datasets, some common ratios of training data to the testing data and the cross validation data to the training data will be used in the analyses. The data selected uniformly or randomly are according to the problems property. Since the AIC is adopted as a

Computational experiments

To verify the performance of the ENN proposed in this paper, three computational experiments are carried out by an ENN program written in MATLAB. Two theoretical functions—the peak function and the Friedman function are tested first, then followed by a practical example—the modeling of stress–strain–time relationship of mudstone. For comparison purpose, a simple averaging ENN which has the same AIC-based ENN structure and a single NN which uses the best number of hidden nodes are also simulated

Conclusions

Determination of model complexity in an NN is crucial in NN design. This paper aims to use the AIC to balance the model complexity with model accuracy. By using the AIC to combine these best component networks, it is possible to balance the ensemble network's accuracy and to penalize model's complexity, and to create a simple and a stable ENN.

The three computational experiments with various input dimensions are used to verify the performance of the proposed ENN. From these results, it can be

References (39)

  • Y. Reich et al.

    A methodology for building neural networks models from empirical engineering data

    Engineering Applications of Artificial Intelligence

    (2000)
  • L.Q. Ren et al.

    An optimal neural network and concrete strength modeling

    Advances in Engineering Software

    (2002)
  • Z.Y. Zhao

    Steel column under fire—a neural network based strength model

    Advances in Engineering Software

    (2006)
  • Z.H. Zhou et al.

    Ensembling neural networks: many could be better than all

    Artificial Intelligence

    (2002)
  • Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd...
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • K.P. Burnham et al.

    Model Selection and Multimodel Inference: A Practical Information–Theoretic Approach

    (2002)
  • Carney J.G., Cunningham, P., 1999. The NeuralBAG algorithm: optimizing generalization performance in bagged neural...
  • B. Efron et al.

    An Introduction to the Bootstrap

    (1993)
  • Cited by (47)

    • Bagging ensemble-based novel data generation method for univariate time series forecasting

      2022, Expert Systems with Applications
      Citation Excerpt :

      The second method involves changing the hidden layer structure of the neural network, either by changing the number of hidden layers or by changing the number of hidden neurons in each hidden layer. This is the easiest and most effective way to achieve an ensemble effect (Li et al., 2015; Zhao, Yun, & Hongjian, 2008). Finally, it is a way to change the aggregation method of neural networks.

    • Performance based support design for horseshoe-shaped rock caverns using 2D numerical analysis

      2018, Engineering Geology
      Citation Excerpt :

      The ANN models are built to identify the relationships among the geological condition parameters, the excavation design parameters and the cavern performances obtained from the numerical analyses. The ANN is the multi-layer feed forward back-propagation network which has been widely used in rock engineering for data analysis to find their complex relationships (Zhao and Ren, 2002; Zhao et al., 2008; Tiryaki, 2008). In this study, a 4-n-1 structure is used to map the relationships between a set of SEM design parameters Pi (i.e., ground class P1, width of top heading P2, height of top heading P3, round length P4) and support performance Oj (i.e., normal stress O1, damage depth O2 or roof displacement O3) as shown in Fig. 7.

    • Wavelet sampling and generalization in neural networks

      2017, Neurocomputing
      Citation Excerpt :

      Model selection and early stopping are two effective ways to avoid overfitting, via limitation of network complexity [1,11]. Using measures such as the Akaike [2] and Bayesian information criteria [3], model selection techniques apply statistical learning to select optimal numbers of neurons and their connectivity, improving generalization of networks greatly. On the other hand, early stopping is often used together with cross-validation.

    View all citing articles on Scopus
    View full text