New study on neural networks: the essential order of approximation☆
Introduction
Artificial neural networks have been extensively applied in various fields of science and engineering. It is mainly because feedforward neural networks (FNNs) have universal approximation capability (Attali and Pages, 1997, Cardaliaguet and Euvrard, 1992, Chen, 1994, Chen and Chen, 1995, Chui and Li, 1992, Chui and Li, 1993, Cybenko, 1989, Funahashi, 1989, Hornik et al., 1989, Hornik et al., 1990, Leshno et al., 1993). A typical example of such universal approximation assertions states that for any given continuous function defined on a compact set of , there exists a three-layer of FNN such that it can approximate the function arbitrarily well. A three-layer of FNN with one hidden layer, inputs and one output can be mathematically expressed as where are the thresholds, are connection weights of neuron in the hidden layer with the input neurons, are the connection strength of neuron with the output neuron, and is the activation function used in the network. The activation function is normally taken as sigmoid type, that is, it satisfies as and as . Eq. (1.1) can be further expressed in vector form as
The approximation of multivariate functions by the FNNs (1.1) has been widely studied in past years, with various significant results, concerning density or complexity. For instance, it was proven in Cybenko (1989) that under very mild conditions on the sigmoidal activation function , any continuous function defined on a compact set of can be approximated arbitrarily well by the FNNs (1.1). Later, various density and complexity results on approximation of the functions by the FNNs (1.1) were established by many authors and by using different approaches for more or less general situations (Chen, 1994, Funahashi, 1989, Hornik et al., 1990, Yoshifusa, 1991). All these researches are qualitative in feature. From the perspective of application, however, the quantitative research on approximation of the neural networks is more helpful. Particularly, one would like to know what is the degree of the neural networks to approximate a certain type of functions, and how fast they approximate. Also one would like to know how the approximation capability of a neural network is related to the topology of the network (say, how many hidden neurons are needed in order for the network to reach a predetermined approximation precision?). There have been many authors who studied those problems (Barron, 1993, Cao and Xu, 2001, Kůrkova et al., 1997, Maiorov and Meir, 1998, Mhaskar, 1996, Mhaskar and Khachikyan, 1995, Mhaskar and Micchelli, 1992, Mhaskar and Micchelli, 1994, Mhaskar and Micchelli, 1995, Ritter, 1994). The results obtained have basically dealt with the neural networks with logistic (including sigmoid) and no Heaviside step activation functions. They have offered, however, only certain kind of upper bound estimations on approximation of the neural networks. In those researches, Suzuki (1998) obtained, by using a constructive approach, an upper bound estimation on approximation of the neural networks and explicitly calculated the number of hidden neurons needed for guaranteeing the predetermined approximation precision.
The upper bound estimation results can imply convergence of the neural networks to the function to be approximated and also provide quantitative measurement on how accurate the neural networks approximate the functions. The estimations cannot, however, precisely characterize the essential order of the networks, that is, they cannot decipher the highest approximation accuracy of the neural networks to achieve (Xu & Cao, 2004). In order to get such an essential order of approximation of a neural network, besides upper bound estimation, a lower bound estimation that characterizes the worst approximation precision of the network can also be needed. The essential order of approximation can be obtained when and only when the upper and the lower bound estimations are of the same order. Clearly, obtaining the essential order of a neural network is not easy, but very important, and is significant. In Xu and Cao (2004), such problem was tackled for the neural networks (1.1) when the activation function is sigmoidal and satisfies some other conditions. In the present paper, our aim is to tackle the same problem for more broader types of neural networks when the activation function is nearly exponential. The class of the nearly exponential functions was introduced by Ritter (1994), which are those functions that approximate the exponential function arbitrarily well on the negative half line through appropriate affine transformation at the origin and in the target space (for the more precise definition, see the next section). It includes the normal sigmoid and exponential functions as special cases. A neural network with the form (1.1) henceforth will be called a nearly exponential FNN (denoted by neFNN in brief) whenever the activation function is nearly exponential.
In Ritter (1999), an upper bound estimation on approximation of the neFNNs was developed, but it did not offer any estimation on lower bound of approximation. Consequently, uncovering the essential order of approximation of the neFNNs is still open. In Xu and Wang (2006), we resolved the problem through developing a more precise upper bound estimation on approximation of the FNNs first, and then provided a lower bound estimation of the approximation. Finally, we characterized the conditions under which the upper and the lower bounds have the same order, from which the essential order of the neFNNs will be revealed. In this paper, we characterize accurately the topology selection and the essential order of neFNNs under some conditions; the results obtained not only sharpen the results developed in Ritter (1999), but also clarify the relationship between the approximation speed (precision) and the number of hidden neurons needed for the neural networks.
The remainder of this paper is organized as follows. In Section 2, we present some notations, some basic definitions, the main results and briefly review their significations. Section 3 summarizes some of our previous work and offers two fundamental results for proving our results. In Section 4, the proof of the main results is given by some techniques of approximation theory. Section 5 provides some concluding remarks.
Section snippets
Notation and main results
We begin with some comments concerning notation. The symbols and stand respectively for the sets of nonnegative integers, real numbers, and nonnegative real numbers. We use which denotes -dimensional Euclidean space (). All operations on vectors are taken component wisely. For instance, when we define and whenever , we define
We use and which respectively
Uniform approximation of continuous functions by polynomials
In approximation theory (Jackson, 1912), the well-known Jackson’s type theorems describe approximation precision and speed of a continuous function defined on a real interval by polynomials in terms of modulus of smoothness of the function. Multivariate extensions of this type of theorems are due to the references (Feinerman and Newman, 1974, Nikol’skii, 1975, Soardi, 1984) Ditzian and Totik (1987), who introduced a new modulus of smoothness of a function, nowadays known as Ditzian–Totik
Proof of Theorem 1
Firstly, we prove the estimation (2.1). Let denote the Euclidean projection , and be a continuous function defined on the compact set . Then is a continuous extension of on with the same modulus of smoothness as that of . Using Theorem 3, we know that there is an exponential polynomial such that Since is nearly exponential, we know that the exponential function is uniformly
Conclusions
In this work, the essential approximation order of the nearly exponential-type neural networks has been studied. In terms of second-order modulus of smoothness of a function, an upper bound and lower bound estimations on approximation precision and speed of the neural networks are simultaneously developed. Under certain assumption on the neFNNs, we present ideally the upper bound and the lower bound on the degree of approximation. Our research reveals that the approximation precision and speed
References (32)
- et al.
Approximation of functions by a multilayer perceptron: A new approach
Neural Networks
(1997) - et al.
Approximation of a function and its derivatives with a neural networks
Neural Networks
(1992) - et al.
Approximation by ridge functions and neural networks with one hidden layer
Journal of Approximation Theory
(1992) On the approximate realization of continuous mappings by neural networks
Neural Networks
(1989)- et al.
Multilayer feedforward networks are universal approximation
Neural Networks
(1989) - et al.
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
Neural Networks
(1990) - et al.
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Neural Networks
(1993) - et al.
Approximation by superposition of a sigmoidal function
Advances in Applied Mathematics
(1992) - et al.
Degree of approximation by neural networks with a single hidden layer
Advances in Applied Mathematics
(1995) Constructive function approximation by three-layer artificial neural networks
Neural Networks
(1998)
Universal approximation bounds for superpositions of a sigmoidal function
IEEE Transactions on Information Theory
Neural network approximation for multivariate periodic functions: Estimates on approximation order
Chinese Journal of Computers
Approximation problems in system identification with neural networks
Science in China, Series A
Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to a dynamic system
IEEE Transactions on Neural Networks
Neural networks with one hidden layer
Approximation by superpositions of sigmoidal function
Mathematics of Control, Signals, and Systems
Cited by (0)
- ☆
Supported by National Basic Research Program of China (Grant Nos. 2007CB311000), Natural Science Foundation of China (Grant Nos. 10726040 and 10701062), the Key Project of Chinese Ministry of Education. (No. 108176), China Postdoctoral Science Foundation (No. 20080431237), Natural Science Foundation Project of CQ CSTC (No.CSTC,2009BB2306), Southwest University (China) Development Foundation of China (No. SWUF2007014) and Southwest University Doctoral Foundation of China (No. SWUB2007006).