Elsevier

Neural Networks

Volume 23, Issue 5, June 2010, Pages 618-624
Neural Networks

New study on neural networks: the essential order of approximation

https://doi.org/10.1016/j.neunet.2010.01.004Get rights and content

Abstract

For the nearly exponential type of feedforward neural networks (neFNNs), the essential order of their approximation is revealed. It is proven that for any continuous function defined on a compact set of Rd, there exist three layers of neFNNs with the fixed number of hidden neurons that attain the essential order. Under certain assumption on the neFNNs, the ideal upper bound and lower bound estimations on approximation precision of the neFNNs are provided. The obtained results not only characterize the intrinsic property of approximation of the neFNNs, but also proclaim the implicit relationship between the precision (speed) and the number of hidden neurons of the neFNNs.

Introduction

Artificial neural networks have been extensively applied in various fields of science and engineering. It is mainly because feedforward neural networks (FNNs) have universal approximation capability (Attali and Pages, 1997, Cardaliaguet and Euvrard, 1992, Chen, 1994, Chen and Chen, 1995, Chui and Li, 1992, Chui and Li, 1993, Cybenko, 1989, Funahashi, 1989, Hornik et al., 1989, Hornik et al., 1990, Leshno et al., 1993). A typical example of such universal approximation assertions states that for any given continuous function defined on a compact set K of Rd, there exists a three-layer of FNN such that it can approximate the function arbitrarily well. A three-layer of FNN with one hidden layer, d inputs and one output can be mathematically expressed as N(x)=i=1mciσ(j=1dwijxj+θi),xRd,d1, where 1im,θiR are the thresholds, wi=(wi1,wi2,,wid)TRd are connection weights of neuron i in the hidden layer with the input neurons, ciR are the connection strength of neuron i with the output neuron, and σ is the activation function used in the network. The activation function is normally taken as sigmoid type, that is, it satisfies σ(t)1 as t+ and σ(t)0 as t. Eq. (1.1) can be further expressed in vector form as N(x)=i=1mciσ(wix+θi),xRd.

The approximation of multivariate functions by the FNNs (1.1) has been widely studied in past years, with various significant results, concerning density or complexity. For instance, it was proven in Cybenko (1989) that under very mild conditions on the sigmoidal activation function σ, any continuous function defined on a compact set K of Rd can be approximated arbitrarily well by the FNNs (1.1). Later, various density and complexity results on approximation of the functions by the FNNs (1.1) were established by many authors and by using different approaches for more or less general situations (Chen, 1994, Funahashi, 1989, Hornik et al., 1990, Yoshifusa, 1991). All these researches are qualitative in feature. From the perspective of application, however, the quantitative research on approximation of the neural networks is more helpful. Particularly, one would like to know what is the degree of the neural networks to approximate a certain type of functions, and how fast they approximate. Also one would like to know how the approximation capability of a neural network is related to the topology of the network (say, how many hidden neurons are needed in order for the network to reach a predetermined approximation precision?). There have been many authors who studied those problems (Barron, 1993, Cao and Xu, 2001, Kůrkova et al., 1997, Maiorov and Meir, 1998, Mhaskar, 1996, Mhaskar and Khachikyan, 1995, Mhaskar and Micchelli, 1992, Mhaskar and Micchelli, 1994, Mhaskar and Micchelli, 1995, Ritter, 1994). The results obtained have basically dealt with the neural networks with logistic (including sigmoid) and no Heaviside step activation functions. They have offered, however, only certain kind of upper bound estimations on approximation of the neural networks. In those researches, Suzuki (1998) obtained, by using a constructive approach, an upper bound estimation on approximation of the neural networks and explicitly calculated the number of hidden neurons needed for guaranteeing the predetermined approximation precision.

The upper bound estimation results can imply convergence of the neural networks to the function to be approximated and also provide quantitative measurement on how accurate the neural networks approximate the functions. The estimations cannot, however, precisely characterize the essential order of the networks, that is, they cannot decipher the highest approximation accuracy of the neural networks to achieve (Xu & Cao, 2004). In order to get such an essential order of approximation of a neural network, besides upper bound estimation, a lower bound estimation that characterizes the worst approximation precision of the network can also be needed. The essential order of approximation can be obtained when and only when the upper and the lower bound estimations are of the same order. Clearly, obtaining the essential order of a neural network is not easy, but very important, and is significant. In Xu and Cao (2004), such problem was tackled for the neural networks (1.1) when the activation function is sigmoidal and satisfies some other conditions. In the present paper, our aim is to tackle the same problem for more broader types of neural networks when the activation function is nearly exponential. The class of the nearly exponential functions was introduced by Ritter (1994), which are those functions that approximate the exponential function arbitrarily well on the negative half line through appropriate affine transformation at the origin and in the target space (for the more precise definition, see the next section). It includes the normal sigmoid and exponential functions as special cases. A neural network with the form (1.1) henceforth will be called a nearly exponential FNN (denoted by neFNN in brief) whenever the activation function is nearly exponential.

In Ritter (1999), an upper bound estimation on approximation of the neFNNs was developed, but it did not offer any estimation on lower bound of approximation. Consequently, uncovering the essential order of approximation of the neFNNs is still open. In Xu and Wang (2006), we resolved the problem through developing a more precise upper bound estimation on approximation of the FNNs first, and then provided a lower bound estimation of the approximation. Finally, we characterized the conditions under which the upper and the lower bounds have the same order, from which the essential order of the neFNNs will be revealed. In this paper, we characterize accurately the topology selection and the essential order of neFNNs under some conditions; the results obtained not only sharpen the results developed in Ritter (1999), but also clarify the relationship between the approximation speed (precision) and the number of hidden neurons needed for the neural networks.

The remainder of this paper is organized as follows. In Section 2, we present some notations, some basic definitions, the main results and briefly review their significations. Section 3 summarizes some of our previous work and offers two fundamental results for proving our results. In Section 4, the proof of the main results is given by some techniques of approximation theory. Section 5 provides some concluding remarks.

Section snippets

Notation and main results

We begin with some comments concerning notation. The symbols N,R and R+ stand respectively for the sets of nonnegative integers, real numbers, and nonnegative real numbers. We use Rd which denotes d-dimensional Euclidean space (d1). All operations on vectors are taken component wisely. For instance, when x=(x1,x2,,xd)Rd,y=(y1,y2,,yd)Rd, we define ex=(ex1,ex2,,exd),xy=(x1y1,x2y2,,xdyd) and whenever xi0,i=1,2,,d, we define xy=(x1y1,x2y2,,xdyd).

We use Pn(d) and Tn(d) which respectively

Uniform approximation of continuous functions by polynomials

In approximation theory (Jackson, 1912), the well-known Jackson’s type theorems describe approximation precision and speed of a continuous function defined on a real interval by polynomials in terms of modulus of smoothness of the function. Multivariate extensions of this type of theorems are due to the references (Feinerman and Newman, 1974, Nikol’skii, 1975, Soardi, 1984) Ditzian and Totik (1987), who introduced a new modulus of smoothness of a function, nowadays known as Ditzian–Totik

Proof of Theorem 1

Firstly, we prove the estimation (2.1). Let τ denote the Euclidean projection [0,1]dVRd, and f be a continuous function defined on the compact set V. Then f(τ) is a continuous extension of f on V with the same modulus of smoothness as that of f. Using Theorem 3, we know that there is an exponential polynomial p(x)=λl(0,1,2,,n)daλeλx such that fp|Cf(τ)p|[0,1]d12(dπ22+1)2ω2(f,1n+2)+ε. Since σ is nearly exponential, we know that the exponential function ex is uniformly

Conclusions

In this work, the essential approximation order of the nearly exponential-type neural networks has been studied. In terms of second-order modulus of smoothness of a function, an upper bound and lower bound estimations on approximation precision and speed of the neural networks are simultaneously developed. Under certain assumption on the neFNNs, we present ideally the upper bound and the lower bound on the degree of approximation. Our research reveals that the approximation precision and speed

References (32)

  • A.R. Barron

    Universal approximation bounds for superpositions of a sigmoidal function

    IEEE Transactions on Information Theory

    (1993)
  • F.L. Cao et al.

    Neural network approximation for multivariate periodic functions: Estimates on approximation order

    Chinese Journal of Computers

    (2001)
  • T.P. Chen

    Approximation problems in system identification with neural networks

    Science in China, Series A

    (1994)
  • T.P. Chen et al.

    Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to a dynamic system

    IEEE Transactions on Neural Networks

    (1995)
  • C.K. Chui et al.

    Neural networks with one hidden layer

  • G. Cybenko

    Approximation by superpositions of sigmoidal function

    Mathematics of Control, Signals, and Systems

    (1989)
  • Cited by (0)

    Supported by National Basic Research Program of China (Grant Nos. 2007CB311000), Natural Science Foundation of China (Grant Nos. 10726040 and 10701062), the Key Project of Chinese Ministry of Education. (No. 108176), China Postdoctoral Science Foundation (No. 20080431237), Natural Science Foundation Project of CQ CSTC (No.CSTC,2009BB2306), Southwest University (China) Development Foundation of China (No. SWUF2007014) and Southwest University Doctoral Foundation of China (No. SWUB2007006).

    View full text