Elsevier

Neurocomputing

Volume 72, Issues 1–3, December 2008, Pages 513-520
Neurocomputing

Convergence of BP algorithm for product unit neural networks with exponential weights

https://doi.org/10.1016/j.neucom.2007.12.004Get rights and content

Abstract

Product unit neural networks with exponential weights (PUNNs) can provide more powerful internal representation capability than traditional feed-forward neural networks. In this paper, a convergence result of the back-propagation (BP) algorithm for training PUNNs is presented. The monotonicity of the error function in the training iteration process is also guaranteed. A numerical example is given to support the theoretical findings.

Introduction

Traditional feed-forward neural networks are constructed by using multiple layers of summation units. These networks can effectively solve approximation and classification problems [3], [9]. However, usually it requires a large number of summation units for a traditional feed-forward neural network to approximate a complicated function. Various kinds of “high order” neural networks have been developed to overcome this drawback [6], [7], [4]. Among them, product unit neural networks with exponential weights (PUNNs) were proposed by Durbin and Rumelhart [4], and studied in [12], [11], [10], [19]. PUNNs are used to solve regression problems in [16], [17]. But we have not found any theoretical analysis on the convergence of the training methods for PUNNs, and this becomes our main concern in this paper.

Back-propagation (BP) algorithm is possibly the most popular optimization algorithm for training feed-forward neural networks [21], [22]. Unfortunately, the solution space for PUNNs can be extremely convoluted, with numerous local minima that trap the BP training (cf. [4], [12], [5]). Hence, some global optimization algorithms such as Genetic Algorithm [11], Random Search [12], Evolutionary Algorithms [14], [15], [8], Particle Swarm Optimization and Leap Frog Optimization [10], etc. have been used to train PUNNs. But these global optimization algorithms usually converge very slowly. So it seems natural that BP algorithm, which is good at local optimization, has been used to combine with some global search algorithms such as the Random Search Algorithm (RSA), resulting in quite satisfactory training [12]. In this strategy, BP is used to move to the local minima, and if the training error is still above the desired level, the RSA algorithm generates a new set of random weights from which BP can start again. So the convergence analysis of the BP algorithm for PUNN is still useful as a theoretical support of the above training strategy.

The organization of the rest of the paper is as follows. Section 2 introduces PUNN. Section 3 describes the learning procedures of BP for PUNN. The main convergence results are presented in Section 4. Experimental results are given in Section 5. Some conclusions are drawn in Section 6. Appendix A is an appendix, in which the details of the proofs are provided.

Section snippets

Product unit neural networks with exponential weights

In PUNN, the traditional summation units of the hidden layer are replaced by the product units with exponential weights: n=1Nwnxnn=1Nxnwn.Due to the special exponential form, PUNNs have some additional advantages. Durbin and Rumelhart [4] empirically determined that the information capacity of a single product unit (as measured by its capacity for learning random boolean patterns) is approximately 3N, compared to 2N for a single summation unit [2], where N is the number of inputs to the

BP algorithm for PUNN training

Let the network be supplied with a given set of training examples {xj,Oj}j=1J{R{0}}N×R. For each xj, once being input into the above PUNN, the output of the PUNN above isyj=g(wM+1·Fj)=gm=1MwM+1,mfj(wm)-wM+1,0,wherefj(w)=f(w,xj),Fj=(-1,fj(w1),,fj(wM))T.

The square error function of PUNN trained by the BP algorithm can be represented as follows:E(W)=12j=1J(yj-Oj)2=j=1Jgj(wM+1·Fj),wheregj(t)=12(g(t)-Oj)2,tR,1jJ.So we haveEwM+1(W)=j=1Jgj(wM+1·Fj)FjandEwm(W)=j=1Jgj(wM+1·Fj)wm(wM+1·F

Convergence result

We need the following assumptions:

  • (A1)

    there exists a constant C1>0 such that for all tR max{|g(t)|,|g(t)|,|g(t)|}C1;

  • (A2)

    there exists a constant C2>0 such that wmkC2 for all m=1,,M+1,k=0,1,2,;

  • (A3)

    the learning rate η satisfies (A.18);

  • (A4)

    the set D0={W|(E/W)(W)=0} contains finite points.

In this paper, · denotes the Euclidean norm.

Theorem 1

Let the error function E(W) be defined in (9), and the sequence {Wk} be generated by (15), (16), (17) with W0 being an arbitrary initial guess. If Assumptions (A1)–(A3) are

Experimental results

This section illustrates the performance of BP for training PUNNs. The 5-Parity problem is used as the test problem. This problem can be solved theoretically by a single product unit (cf. [19]), so we can easily compare the theoretical solution with the numerical results. The network has six inputs (N=6 in Fig. 1) including a constant input of -1 (the bias unit), two hidden product units (M=2) plus a bias unit, and one summation output unit. The learning rate is fixed to 0.05 for all the

Conclusion

Product unit neural networks with exponential weights (PUNNs) were proposed in order to improve the efficiency of traditional feed-forward neural networks [4]. Unfortunately, the solution space for PUNNs can be extremely convoluted, with numerous local minima that trap the BP training. Hence, some global optimization algorithms such as Genetic Algorithm, etc. have been used to train PUNNs. But these global optimization algorithms usually converge very slowly. So it seems natural that BP

Chao Zhang was born in Dalian, PR China. He received B.S. in applied mathematics in 2004 and received M.S. degrees in computational mathematics in 2005, respectively, both from Dalian University of Technology. Now he is pursuing Ph.D. in computational mathematics in Dalian University of Technology. His research interests lie in the areas of neural networks and unconstrained optimization.

References (24)

  • C.L. Giles

    Learning, invariance, and generalization in higher-order neural networks

    Appl. Opt.

    (1987)
  • C. Hervs¨, F.J. Martänez-Estudillo, P.A. Gutir¨rez, Classification by means evolutionary product-unit neural networks,...
  • Cited by (27)

    • Comparing various artificial neural network types for water temperature prediction in rivers

      2015, Journal of Hydrology
      Citation Excerpt :

      The interesting feature of PUNN is that, contrary to the majority of ANN types, it has fewer parameters to be optimized than MLP (when the same number of input, output and hidden nodes is considered). Although PUNNs are claimed to be difficult to train (Janson and Frenzel, 1993), the convergence of the gradient-based algorithm for PUNN has been proved (Zhang et al., 2008). To facilitate training in Piotrowski and Napiorkowski (2012) an approach of limiting the allowable size of PUNN weights was proposed for rainfall–runoff modelling, but this turned out not necessary for the application considered in the present study.

    • Monotonicity and convergence of asynchronous update gradient method for ridge polynomial neural network

      2014, Neurocomputing
      Citation Excerpt :

      The gradient method is one of the most popular training algorithms for higher-order neural networks. Several convergence theorems of the gradient method for higher-order neural networks have been proposed [11–18]. However, the convergence of asynchronous gradient method for ridge polynomial neural network has not yet been discussed sufficiently.

    • Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration

      2013, Neurocomputing
      Citation Excerpt :

      As one of the data-driven based modeling, neural networks have been widely applied to time series prediction, especially in some industrial applications [5,6]. In such a field, the back-propagation (BP) algorithm based feed-forward networks and the echo state networks (ESN) were widely used [7,8]. However, the nature of the BP algorithm concerns a gradient-based training, and its slow convergence rate and the tendency falling into local optimum become the main disadvantages [9], and it is inapplicable of dealing with dynamic problems [10].

    • Product-Units neural networks for catchment runoff forecasting

      2012, Advances in Water Resources
      Citation Excerpt :

      Secondly, the PUNNs are said to be more difficult in training, especially by means of gradient-based algorithms, due to the complicated nature of the objective function [46]. Zhang et al. [110] proved the convergence of back-propagation algorithm for PUNNs, but in a number of earlier studies different global optimization methods, including genetic algorithms and particle swarm optimization, have been proposed to train them [46,49,44]. As a result in the present paper both very efficient for ANN training gradient-based Levenberg–Marquardt algorithm (LM, [83]) and some of the recently developed EC-based algorithms, mostly from Differential Evolution (DE) family, are being compared for PUNNs training.

    View all citing articles on Scopus

    Chao Zhang was born in Dalian, PR China. He received B.S. in applied mathematics in 2004 and received M.S. degrees in computational mathematics in 2005, respectively, both from Dalian University of Technology. Now he is pursuing Ph.D. in computational mathematics in Dalian University of Technology. His research interests lie in the areas of neural networks and unconstrained optimization.

    Wei Wu received the M.S. degree from Jilin University, Changchun, China, in 1981 and the D.Phil. degree from Oxford University, Oxford, UK, in 1987. He is now a professor of the Applied Mathematics Department of Dalian University of Technology, Dalian, China. He serves as associate editors for the Journal of Information & Computational Science, Numerical Mathematics—A Journal of Chinese Universities, and Journal of Mathematical Research and Exposition. His research interests include numerical analysis and neural network computation.

    Xianhua Chen, borned in Heilongjiang province in 1984, has received B.S. degree in applied mathematics from Shandong Normal University in 2006. Now he is pursuing M.S. degree in computational mathematics in Dalian University of Technology. His current interest is focused on artificial neural networks and finite elements.

    Yan Xiong received M.S. degree in Applied Mathematics with an emphasis on genetic algorithm from Northeastern University in 2003. Then she received Ph.D. in computational mathematics in Dalian University of Technology in 2007. Her research interests include the convergence analysis and structural optimization of neural networks, especially in higher order neural networks.

    1

    Partly supported by the National Natural Science Foundation of China (10471017).

    View full text