Convergence of BP algorithm for product unit neural networks with exponential weights
Introduction
Traditional feed-forward neural networks are constructed by using multiple layers of summation units. These networks can effectively solve approximation and classification problems [3], [9]. However, usually it requires a large number of summation units for a traditional feed-forward neural network to approximate a complicated function. Various kinds of “high order” neural networks have been developed to overcome this drawback [6], [7], [4]. Among them, product unit neural networks with exponential weights (PUNNs) were proposed by Durbin and Rumelhart [4], and studied in [12], [11], [10], [19]. PUNNs are used to solve regression problems in [16], [17]. But we have not found any theoretical analysis on the convergence of the training methods for PUNNs, and this becomes our main concern in this paper.
Back-propagation (BP) algorithm is possibly the most popular optimization algorithm for training feed-forward neural networks [21], [22]. Unfortunately, the solution space for PUNNs can be extremely convoluted, with numerous local minima that trap the BP training (cf. [4], [12], [5]). Hence, some global optimization algorithms such as Genetic Algorithm [11], Random Search [12], Evolutionary Algorithms [14], [15], [8], Particle Swarm Optimization and Leap Frog Optimization [10], etc. have been used to train PUNNs. But these global optimization algorithms usually converge very slowly. So it seems natural that BP algorithm, which is good at local optimization, has been used to combine with some global search algorithms such as the Random Search Algorithm (RSA), resulting in quite satisfactory training [12]. In this strategy, BP is used to move to the local minima, and if the training error is still above the desired level, the RSA algorithm generates a new set of random weights from which BP can start again. So the convergence analysis of the BP algorithm for PUNN is still useful as a theoretical support of the above training strategy.
The organization of the rest of the paper is as follows. Section 2 introduces PUNN. Section 3 describes the learning procedures of BP for PUNN. The main convergence results are presented in Section 4. Experimental results are given in Section 5. Some conclusions are drawn in Section 6. Appendix A is an appendix, in which the details of the proofs are provided.
Section snippets
Product unit neural networks with exponential weights
In PUNN, the traditional summation units of the hidden layer are replaced by the product units with exponential weights: Due to the special exponential form, PUNNs have some additional advantages. Durbin and Rumelhart [4] empirically determined that the information capacity of a single product unit (as measured by its capacity for learning random boolean patterns) is approximately , compared to for a single summation unit [2], where N is the number of inputs to the
BP algorithm for PUNN training
Let the network be supplied with a given set of training examples . For each , once being input into the above PUNN, the output of the PUNN above iswhere
The square error function of PUNN trained by the BP algorithm can be represented as follows:whereSo we haveand
Convergence result
We need the following assumptions:
- (A1)
there exists a constant such that for all
- (A2)
there exists a constant such that for all ;
- (A3)
the learning rate satisfies (A.18);
- (A4)
the set contains finite points.
Theorem 1
Let the error function be defined in (9), and the sequence be generated by (15), (16), (17) with being an arbitrary initial guess. If Assumptions (A1)–(A3) are
Experimental results
This section illustrates the performance of BP for training PUNNs. The 5-Parity problem is used as the test problem. This problem can be solved theoretically by a single product unit (cf. [19]), so we can easily compare the theoretical solution with the numerical results. The network has six inputs ( in Fig. 1) including a constant input of (the bias unit), two hidden product units () plus a bias unit, and one summation output unit. The learning rate is fixed to for all the
Conclusion
Product unit neural networks with exponential weights (PUNNs) were proposed in order to improve the efficiency of traditional feed-forward neural networks [4]. Unfortunately, the solution space for PUNNs can be extremely convoluted, with numerous local minima that trap the BP training. Hence, some global optimization algorithms such as Genetic Algorithm, etc. have been used to train PUNNs. But these global optimization algorithms usually converge very slowly. So it seems natural that BP
Chao Zhang was born in Dalian, PR China. He received B.S. in applied mathematics in 2004 and received M.S. degrees in computational mathematics in 2005, respectively, both from Dalian University of Technology. Now he is pursuing Ph.D. in computational mathematics in Dalian University of Technology. His research interests lie in the areas of neural networks and unconstrained optimization.
References (24)
Training nets of hardware realizable Sigma-Pi units
Neural Networks
(1992)Multilayer feedforward networks are universal approximators
Neural Networks
(1989)- et al.
On global exponential stability of generalized stochastic neural networks with mixed time delays
Neurocomputing
(2006) - et al.
Extracting regression rules from neural networks
Neural Networks
(2002) - et al.
On global asymptotic stability of neural networks with discrete and distributed delays
Phys. Lett. A
(2005) - et al.
Introduction to Calculus and Analysis
(1974) Geometrical and statistical properties of systems of linear inequalities with application in pattern recognition
IEEE Trans. Electron. Comput.
(1965)- G. Cybenko, Continuous-valued neural networks with two hidden layers are sufficient, Technical Report, Department of...
- et al.
Product units: a computationally powerful and biologically plausible extension to backpropagation networks
Neural Comput.
(1989) Computational Intelligence: An Introduction
(2003)
Learning, invariance, and generalization in higher-order neural networks
Appl. Opt.
Cited by (27)
Convergence analysis for sigma-pi-sigma neural network based on some relaxed conditions
2022, Information SciencesA self-organizing deep belief network for nonlinear system modeling
2018, Applied Soft Computing JournalComparing various artificial neural network types for water temperature prediction in rivers
2015, Journal of HydrologyCitation Excerpt :The interesting feature of PUNN is that, contrary to the majority of ANN types, it has fewer parameters to be optimized than MLP (when the same number of input, output and hidden nodes is considered). Although PUNNs are claimed to be difficult to train (Janson and Frenzel, 1993), the convergence of the gradient-based algorithm for PUNN has been proved (Zhang et al., 2008). To facilitate training in Piotrowski and Napiorkowski (2012) an approach of limiting the allowable size of PUNN weights was proposed for rainfall–runoff modelling, but this turned out not necessary for the application considered in the present study.
Monotonicity and convergence of asynchronous update gradient method for ridge polynomial neural network
2014, NeurocomputingCitation Excerpt :The gradient method is one of the most popular training algorithms for higher-order neural networks. Several convergence theorems of the gradient method for higher-order neural networks have been proposed [11–18]. However, the convergence of asynchronous gradient method for ridge polynomial neural network has not yet been discussed sufficiently.
Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration
2013, NeurocomputingCitation Excerpt :As one of the data-driven based modeling, neural networks have been widely applied to time series prediction, especially in some industrial applications [5,6]. In such a field, the back-propagation (BP) algorithm based feed-forward networks and the echo state networks (ESN) were widely used [7,8]. However, the nature of the BP algorithm concerns a gradient-based training, and its slow convergence rate and the tendency falling into local optimum become the main disadvantages [9], and it is inapplicable of dealing with dynamic problems [10].
Product-Units neural networks for catchment runoff forecasting
2012, Advances in Water ResourcesCitation Excerpt :Secondly, the PUNNs are said to be more difficult in training, especially by means of gradient-based algorithms, due to the complicated nature of the objective function [46]. Zhang et al. [110] proved the convergence of back-propagation algorithm for PUNNs, but in a number of earlier studies different global optimization methods, including genetic algorithms and particle swarm optimization, have been proposed to train them [46,49,44]. As a result in the present paper both very efficient for ANN training gradient-based Levenberg–Marquardt algorithm (LM, [83]) and some of the recently developed EC-based algorithms, mostly from Differential Evolution (DE) family, are being compared for PUNNs training.
Chao Zhang was born in Dalian, PR China. He received B.S. in applied mathematics in 2004 and received M.S. degrees in computational mathematics in 2005, respectively, both from Dalian University of Technology. Now he is pursuing Ph.D. in computational mathematics in Dalian University of Technology. His research interests lie in the areas of neural networks and unconstrained optimization.
Wei Wu received the M.S. degree from Jilin University, Changchun, China, in 1981 and the D.Phil. degree from Oxford University, Oxford, UK, in 1987. He is now a professor of the Applied Mathematics Department of Dalian University of Technology, Dalian, China. He serves as associate editors for the Journal of Information & Computational Science, Numerical Mathematics—A Journal of Chinese Universities, and Journal of Mathematical Research and Exposition. His research interests include numerical analysis and neural network computation.
Xianhua Chen, borned in Heilongjiang province in 1984, has received B.S. degree in applied mathematics from Shandong Normal University in 2006. Now he is pursuing M.S. degree in computational mathematics in Dalian University of Technology. His current interest is focused on artificial neural networks and finite elements.
Yan Xiong received M.S. degree in Applied Mathematics with an emphasis on genetic algorithm from Northeastern University in 2003. Then she received Ph.D. in computational mathematics in Dalian University of Technology in 2007. Her research interests include the convergence analysis and structural optimization of neural networks, especially in higher order neural networks.
- 1
Partly supported by the National Natural Science Foundation of China (10471017).