Design of ensemble neural network using the Akaike information criterion

doi:10.1016/j.engappai.2008.02.007

Engineering Applications of Artificial Intelligence

Volume 21, Issue 8, December 2008, Pages 1182-1188

https://doi.org/10.1016/j.engappai.2008.02.007 Get rights and content

Abstract

Ensemble neural networks are commonly used networks in many engineering applications due to its better generalization property. In this paper, an ensemble neural network algorithm is proposed based on the Akaike information criterion (AIC). The AIC-based ensemble neural network searches the best weight configuration of each component network first, and uses the AIC as an automating tool to find the best combination weights of the ensemble neural network. Two analytical functions—the peak function and the Friedman function are used first to assess the accuracy of the proposed ensemble approach. The verified approach is then applied to a material modeling problem—the stress–strain–time relationship of mudstones. These computational experiments have verified that the AIC-based ensemble neural network outperforms both the simple averaging ensemble neural network and the single component neural network.

Introduction

The artificial neural network (NN) is a mathematical or computational model for information processing based on the biological NNs (McCulloch and Pitts, 1943). It has been successfully applied to a wide range of engineering applications, such as in fault detection (Jakubek and Strasser, 2004), face recognition (Aitkenhead and McDonald, 2003), concrete strength prediction (Jiang et al., 2003), color adjustment (Puerto and Ghalia, 2002), injection molding control (Kenig et al., 2001), bicycle derailleur control (Lin and Tseng, 2000) and steel model under elevated temperature (Zhao, 2006).

An ensemble neural network (ENN) is a collection of a finite number of NNs that are trained for the same task. Usually, the networks in an ENN are trained independently and their predictions are combined (Sollich and Krogh, 1996). In other words, any one of the component networks in an ENN could provide a solution to the task by itself, but better results might be obtained by an combination of component NNs due to its better generalization. Different methods can be employed to combine the solutions achieved by the component networks. A typical architecture of the ENN is shown in Fig. 1.

The ENN originates from Hansen and Salamon's work (1990), which showed that the generalization ability of an NN system can be significantly improved through ensembling a number of NNs. Since this approach behaves remarkably well, the ENN has been applied to many areas, such as in pattern recognition (Giacinto and Roli, 2001), medical diagnosis (Hayashi and Setiono, 2002), climate prediction (Cannon and Whitfield, 2002), and marine propeller modeling (Reich and Barai, 2000).

In general, an ENN is constructed in two steps: creating component networks and combining component networks into an ENN. For creating component networks, good regression or classification component networks must be both accurate and diverse. To find networks with different generalization ability, a number of training parameters can be manipulated, including the initial condition, the training data, the typology of the nets, and the training algorithm (Sharkey, 1999). The most widely used techniques for creating the training data for an ENN are Bagging and Boosting. The Bagging (short for ‘bootstrap aggregation’) was proposed by Breiman (1996) based on the bootstrap sampling (Efron and Tibshirani, 1993), where “bootstrap” is to use one available sample to generate many other samples by the re-sampling process. During the re-sampling process, the randomly picked repeated data can be used in the new training set. Then a component network with this new sample was trained. This process was repeated until the component networks are sufficient in the ENN. Therefore, the Bagging is suitable to models with insufficient data. The Boosting was proposed by Schapire (1990) and improved by Freund and Schapire (1995). The Boosting generates a set of component networks whose training sets are determined by the performance of former component networks. Since the Boosting method needs a large amount of data, Freund and Schapire (1996) proposed AdaBoost (adaptive boosting algorithm) to avoid this problem. Depending on how well this first weak learner performs on the training pattern, the probability of picking this pattern as part of the training set for the next weak learner is adjusted to be lower or remains the same. Thus, by increasing the number of rounds of boosting, more attention is paid to the hard pattern.

There are many other methods for creating the component networks. Opitz and Shavlik (1996) presented an algorithm using the genetic algorithms (GA) to generate a population of NNs. Granitto et al. (2001) proposed the late stopping method to create a stepwise construction of the ensemble, where each network is selected at a time and only its parameters have to be saved. The NeuralBAG (Carney and Cunningham, 1999) or the method by Naftaly et al. (1997) requires to keep the intermediate networks during training since the selection of stopping points for the ensemble members is performed only at the end of all the training processes. Zhou et al. (2002) presented a GA-based selective ensemble method, where the GA is used to select a suitable subset of all the trained networks to build the ENN.

After a set of component networks has been created, the methods to combine these networks have to be considered. From the beginning of the 1990s, several procedures have been proposed. Hashem (1993) provided a method to find optimal linear combinations of the members of an ensemble by using equal combination weights. This set of outputs which is combined by a uniform weighting factor is referred as the simple ensemble (or simple averaging method). Perrone and Cooper (1993) proposed a generalized ensemble method to determine the optimal weights using the correlation matrix. They defined the symmetric correlation matrix by using the error between the target function and the output of the component network. This ensemble method is sometimes called weighted averaging method and can efficiently utilize local minima. Rosen (1996) described a method that allows training an ensemble of networks by backpropagation, and a penalty term is designed to force networks to be decorrelated with each other. One major disadvantage of Rosen's algorithm is that training a component network does not affect the networks trained previously in the ensemble, so that the errors of the individual networks are not necessarily negatively correlated. Liu et al. (2000) presented an evolutionary ENN with the negative correlation learning (EENCL) for designing NN ensembles automatically. The EENCL extended Rosen's work to simultaneous training of negatively correlated NNs, which will encourage different component networks in the ensemble to learn different parts or aspects of the training data. Islam et al. (2003) proposed a constructive ENN (CNNE) for training cooperative NN ensembles. It determined automatically not only the number of NNs in an ensemble, but also the number of hidden nodes in individual networks. The CNNE adopted the negative correlation learning to promote and maintain diversity among individual networks, and the criteria for growing NNs and the ensemble are based on an NN contribution to reducing the ensemble's overall error, rather than in reducing its own error. But this approach can induce an ENN model more complex and may not find the optimal one. Lagaros et al. (2005) proposed an adaptive strategy for NN training. With the evolution-based optimization procedure, the adaptive strategy improves the prediction reliability of NN architecture substantially. The proposed algorithm (Lagaros et al., 2005) has been applied to predict the response of the structure in terms of objective and constraint functions’ values.

It is worth mentioning that when a number of NNs are available, most of the ensemble approaches aim to reduce the mean-squared-error (MSE) of each component NN, thus they may lead to an ensemble NN with unnecessary complexity and unstable performance. The complexity of the ENN model may increase the computational time and lead to over-fitting. This paper aims to reduce the over-fitting through the use of the Akaike information criterion (AIC). The proposed method reduces each component network's error first, and then balances their contributions to the ENN by using the AIC-based weights. Two theoretical examples and one practical example are used to demonstrate the accuracy of the proposed ENN approach.

Section snippets

Akaike information criterion in model selection

The Akaike information criterion (AIC), which was introduced more than 30 years ago by Akaike, is an information criterion for the identification of an optimal model from a class of competing models. The AIC belongs to the indirect approach since it penalizes the model complexity. For a conventional least squares regression with normally distributed errors, one can compute the AIC with the following formula (where arbitrary constants have been deleted) (Akaike, 1973): $AIC = n \log ({\hat{σ}}^{2}) + 2 K, and {\hat{σ}}^{2} = \sum ε_{i}^{2}$

Creating the component networks

Creation of the component network can be divided into two steps. The first step is to create the training data, the cross validation data and the testing data, and the second step is to create the component networks.

For creating the datasets, some common ratios of training data to the testing data and the cross validation data to the training data will be used in the analyses. The data selected uniformly or randomly are according to the problems property. Since the AIC is adopted as a

Computational experiments

To verify the performance of the ENN proposed in this paper, three computational experiments are carried out by an ENN program written in MATLAB. Two theoretical functions—the peak function and the Friedman function are tested first, then followed by a practical example—the modeling of stress–strain–time relationship of mudstone. For comparison purpose, a simple averaging ENN which has the same AIC-based ENN structure and a single NN which uses the best number of hidden nodes are also simulated

Conclusions

Determination of model complexity in an NN is crucial in NN design. This paper aims to use the AIC to balance the model complexity with model accuracy. By using the AIC to combine these best component networks, it is possible to balance the ensemble network's accuracy and to penalize model's complexity, and to create a simple and a stable ENN.

The three computational experiments with various input dimensions are used to verify the performance of the proposed ENN. From these results, it can be

References (39)

M.J. Aitkenhead et al.
A neural network face recognition system
Engineering Applications of Artificial Intelligence
(2003)
A.J. Cannon et al.
Downscaling recent streamflow conditions in British Columbia, Canada using ensemble neural network modes
Hydrology
(2002)
G. Giacinto et al.
Design of effective neural network ensembles for image classification purposes
Image Vision Computation
(2001)
Y. Hayashi et al.
Combining neural network predictions for medical diagnosis
Computation in Biology and Medicine
(2002)
K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989)
S.M. Jakubek et al.
Artificial neural networks for fault detection in large-scale data acquisition systems
Engineering Applications of Artificial Intelligence
(2004)
N. Jiang et al.
Design of structural modular neural networks with genetic algorithm
Advances in Engineering Software
(2003)
S. Kenig et al.
Control of properties in injection molding by neural networks
Engineering Applications of Artificial Intelligence
(2001)
N.D. Lagaros et al.
An adaptive neural network strategy for improving the computational performance of evolutionary structural optimization
Computational Methods in Applied Mechanics and Engineering
(2005)
T.Y. Lin et al.
Optimum design for artificial neural networks: an example in a bicycle derailleur system
Engineering Applications of Artificial Intelligence
(2000)

Y. Reich et al.

A methodology for building neural networks models from empirical engineering data

Engineering Applications of Artificial Intelligence

(2000)

L.Q. Ren et al.

An optimal neural network and concrete strength modeling

Advances in Engineering Software

(2002)

Z.Y. Zhao

Steel column under fire—a neural network based strength model

Advances in Engineering Software

(2006)

Z.H. Zhou et al.

Ensembling neural networks: many could be better than all

Artificial Intelligence

(2002)

Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd...

L. Breiman

Bagging predictors

Machine Learning

(1996)

K.P. Burnham et al.

Model Selection and Multimodel Inference: A Practical Information–Theoretic Approach

(2002)

Carney J.G., Cunningham, P., 1999. The NeuralBAG algorithm: optimizing generalization performance in bagged neural...

B. Efron et al.

An Introduction to the Bootstrap

(1993)

Cited by (47)

Exploring the potential of time-series transformers for process modeling and control in chemical systems: An inevitable paradigm shift?
2023, Chemical Engineering Research and Design
The last two years have seen groundbreaking advances in natural language processing (NLP) with the advent of applications like ChatGPT, Codex, and ChatSonic. This revolution is powered by the development of cutting-edge transformer models that leverage multiheaded attention mechanisms, positional encoding, and highly efficient transfer learning capabilities. Despite these remarkable advances, there is still work to be done to fully realize the practical applicability of transformers in chemical systems. Thus, we are excited to present our latest work, which highlights the immense potential of transformers for non-trivial multivariate time-series prediction tasks with high-value implications in process monitoring, control, and optimization. Specifically, impressive prediction capabilities of first-generation time-series transformers (TSTs) were demonstrated by developing, testing, and comparing TSTs with existing models. Further, the practical applicability of TSTs was highlighted by developing a first-of-a-kind TST-based model predictive controller (MPC). More importantly, the current work provides a concrete foundation for exploring promising new directions, such as the development of large-scale TSTs leveraging transfer learning for modeling of new process equipment, and plant-level multisource aggregative cognitive models for fault prognosis and prevention. We are excited to see what the future holds as we continue to push the boundaries of what is possible with these ‘transformer-tive’ technologies.
Bagging ensemble-based novel data generation method for univariate time series forecasting
2022, Expert Systems with Applications
Citation Excerpt :
The second method involves changing the hidden layer structure of the neural network, either by changing the number of hidden layers or by changing the number of hidden neurons in each hidden layer. This is the easiest and most effective way to achieve an ensemble effect (Li et al., 2015; Zhao, Yun, & Hongjian, 2008). Finally, it is a way to change the aggregation method of neural networks.
The most critical issue in time series data is predicting future data values. Recently, an ensemble model combining multiple models with superior predictive performance has emerged. However, in the case of univariate time series data, an accurate prediction remains difficult because of the unique characteristic of the data: there is only one variable to analyze. In this paper, we propose a method to improve the performance of predictive models with a simple structure and apply it to time series data. This study proposes a time series forecasting method based on a bagging ensemble that uses the maximum overlap discrete wavelet transform (MODWT) and bootstrap. The proposed method decomposes the scale and detail of the time series data using the MODWT. The bootstrap is applied to univariate time series to generate bootstrapped data that slightly differ from the characteristics of the original data. Through experiments, we examined the results and validated the details of the proposed method depending on whether the proposed method was applied. In most cases, we confirmed that our proposed method improves the performance of the existing algorithms by employing a nonparametric test. The results show that the performance improved more when the algorithm is simple.
A catchment-scale model of river water quality by Machine Learning
2022, Science of the Total Environment
Water quality is a concern in most river basins worldwide due to the widespread release of pollutants which impacts the freshwater ecosystems. Exploring the relationships between drivers and water quality parameters at the regional scale is key in the identification of appropriate actions for the reduction of water pollution. Regional models are the appropriate tool to achieve this, though their development poses relevant challenges because of the complexity and non-linearity of such relationships. Among the available approaches, Machine Learning (ML) is promising because of its capability to detect complex nonlinear relationships and flexibility in the parameterization, which is learned from data. In this work, we developed regional models of water temperature, dissolved oxygen, arsenic, sulfate and chloride concentrations, as well as electrical conductivity, by using two ML algorithms, Random Forest and Deep feed-forward Neural Network, and compared their performances against the standard Linear Regression model. Our results indicate that the two ML algorithms are much more accurate models for such variables than the classical Linear Regression model, with Deep feed-forward Neural Network being the most effective in identifying the reciprocal importance of the drivers and capturing nonlinear relationships between drivers and water quality variables. Our analysis also revealed that the Julian day and year at which the sample was taken surrogate the air temperature in modeling water temperature and dissolved oxygen, with only a slight performance reduction. Arsenic, sulfate, and chloride show more complex behaviors in which geogenic and anthropogenic sources are intertwined. Dilution exerts a role chiefly for arsenic concentration, which suggests a non-uniform, in space, geogenic origin for this variable.
Performance based support design for horseshoe-shaped rock caverns using 2D numerical analysis
2018, Engineering Geology
Citation Excerpt :
The ANN models are built to identify the relationships among the geological condition parameters, the excavation design parameters and the cavern performances obtained from the numerical analyses. The ANN is the multi-layer feed forward back-propagation network which has been widely used in rock engineering for data analysis to find their complex relationships (Zhao and Ren, 2002; Zhao et al., 2008; Tiryaki, 2008). In this study, a 4-n-1 structure is used to map the relationships between a set of SEM design parameters Pi (i.e., ground class P1, width of top heading P2, height of top heading P3, round length P4) and support performance Oj (i.e., normal stress O1, damage depth O2 or roof displacement O3) as shown in Fig. 7.
Excavation in rock may change the stress field and induce excavation damaged zones (EDZ) in the surrounding rock mass. To consider the development of the EDZ, the support design could be evaluated using the convergence-confinement method (CCM). In this paper, an efficient approach is proposed to evaluate the support design based on rock cavern performance. 2D plane strain models are adopted to simulate the excavation effects of horseshoe cross-section rock caverns using the progressive core replacement method. The performance of rock cavern is investigated using CCM. Parameteric studies are carried out to analyze the effects of rock condition and sequential excavation. It shows the roof displacement is not changing significantly during excavation if Q > 10. It also presents the sequencial excavation can reduce the range of the EDZ, but there are no obvious relationships for different subdivision methods. Using the data from numerical analysis, the relationships among the rock conditions, the sequential excavation parameters and the cavern performances are mapped using artificial neural network (ANN). An evaluation chart for the support design of a rock cavern is proposed by integrating the ANN models into EXCEL software. A case study is presented to verify the accuracy of the proposed method. It illustrates the feasibility of the proposed approach for practical applications with much less computing time.
Wavelet sampling and generalization in neural networks
2017, Neurocomputing
Citation Excerpt :
Model selection and early stopping are two effective ways to avoid overfitting, via limitation of network complexity [1,11]. Using measures such as the Akaike [2] and Bayesian information criteria [3], model selection techniques apply statistical learning to select optimal numbers of neurons and their connectivity, improving generalization of networks greatly. On the other hand, early stopping is often used together with cross-validation.
A new approach based on wavelet sampling is proposed to overcome overfitting in neural networks. Our approach optimizes input weights and network structure according to the empirical distribution of input training data. Thus only output weights are adjusted from training data errors. Using the fact that our algorithm trains input and output weights in independent procedures, our theorems demonstrate that it has rapid and global convergence. More importantly, we redefine a norm on l² space, corresponding to a useful new cost function. Using this cost function, the algorithm improves the ability of our networks to distinguish target functions from noise. In fact, we prove that this algorithm allows neural networks to act as wavelet filters, yielding good generalization, approximation and anti-noise capacities. Our simulations verify these theoretical results and simultaneously show the algorithm is robust to noise.
Mathematical modelling of Portuguese hydroelectric energy system
2017, Energy Procedia
Hydropower is one of the most traditional renewable energy source and a major contributor for renewable energy production in many countries. In Portugal it was the only renewable energy source for many years but nowadays wind presents similar production levels and for example in 2015 wind was the main source producing 45.5 % of the total renewable energy. However hydro energy will continue to be important in the renewable energy production and in this work ranking of nine models for hydro energy production with various numbers of parameters was done using adjusted R-squared and corrected Akaike information criterion (AICc).

View all citing articles on Scopus

View full text

Design of ensemble neural network using the Akaike information criterion

Abstract

Introduction

Section snippets

Akaike information criterion in model selection

Creating the component networks

Computational experiments

Conclusions

Engineering Applications of Artificial Intelligence

Hydrology

Image Vision Computation

Computation in Biology and Medicine

Neural Networks

Engineering Applications of Artificial Intelligence

Advances in Engineering Software

Engineering Applications of Artificial Intelligence

Computational Methods in Applied Mechanics and Engineering

Engineering Applications of Artificial Intelligence

Engineering Applications of Artificial Intelligence

Advances in Engineering Software

Advances in Engineering Software

Artificial Intelligence

Bagging predictors

Machine Learning

Model Selection and Multimodel Inference: A Practical Information–Theoretic Approach

An Introduction to the Bootstrap