Approximation by series of sigmoidal functions with applications to neural networks

Costarelli, Danilo; Spigler, Renato

doi:10.1007/s10231-013-0378-y

Approximation by series of sigmoidal functions with applications to neural networks

Published: 11 September 2013

Volume 194, pages 289–306, (2015)
Cite this article

Download PDF

Annali di Matematica Pura ed Applicata (1923 -) Aims and scope Submit manuscript

Approximation by series of sigmoidal functions with applications to neural networks

Download PDF

Danilo Costarelli¹ &
Renato Spigler¹

1032 Accesses
35 Citations
Explore all metrics

Abstract

In this paper, we develop a constructive theory for approximating absolutely continuous functions by series of certain sigmoidal functions. Estimates for the approximation error are also derived. The relation with neural networks approximation is discussed. The connection between sigmoidal functions and the scaling functions of $r$-regular multiresolution approximations are investigated. In this setting, we show that the approximation error for $C^1$-functions decreases as $2^{-j}$, as $j \rightarrow + \infty $. Examples with sigmoidal functions of several kinds, such as logistic, hyperbolic tangent, and Gompertz functions, are given.

Coefficient Estimates for Starlike and Convex Functions Related to Sigmoid Functions

Article 28 October 2023

Quantitative Estimates for Neural Network Operators Implied by the Asymptotic Behaviour of the Sigmoidal Activation Functions

Article 04 September 2022

Solving numerically nonlinear systems of balance laws by multivariate sigmoidal functions approximation

Article 07 April 2016

1 Introduction

In this paper, we develop a new theory, for approximating uniformly functions in some class by series of sigmoidal functions, i.e., functions $\sigma : \mathbb{R }\rightarrow \mathbb{R }$ such that $\lim _{x\rightarrow -\infty } \sigma (x) = 0$ and $\lim _{x\rightarrow +\infty }\sigma (x) = 1$. The idea is to start from appropriate real-valued functions, $\phi $, normalized so that $\int _{\mathbb{R }}\phi (t) \, dt = 1$, and to construct sigmoidal functions having the integral form $\sigma _{\phi }(x) := \int _{-\infty }^x \phi (t) \, dt,\,x \in \mathbb{R }$. In this way, we can define the operators

$$\begin{aligned} (S_w^{\sigma _{\phi }} f)(x) := \sum _{k \in \mathbb{Z }} \left[ \int \limits _a^b \phi (w y - k) f'(y) \, dy \right] \sigma _{\phi }(w x - k) + f(a), \end{aligned}$$

(I)

$x \in [a,b]$, where $f$ is an absolutely continuous function on $[a,b] \subset \mathbb{R }$, and $w > 0$ (note that (I) becomes trivial for constants $f$).

We can show that, the family $(S_w^{\sigma _{\phi }} f)_{w>0}$ converges to $f$ uniformly on $[a,b]$. Moreover, we derive estimates for the approximation error and the truncation error of the series.

A remarkable result is obtained when $\phi $ is the real-valued wavelet-scaling function associated with an $r$-regular multiresolution approximation of $L^2(\mathbb{R })$, constructed by a suitable procedure, see [11, 17, 29, 30]. In this setting, we replace the weights $w$ with $2^j,\,j \in \mathbb{N }^+$, as it seems more natural in view of the relation that $\phi $ has with the multiresolution approximation. Also in this case, we can show that the family of the operators $(S_{j}^{\sigma _{\phi }} f)_{j \in \mathbb{N }^{+}}$, converges to $f$ as $j \rightarrow +\infty $, uniformly on $[a,b]$. Approximating $C^1-$functions, we obtain an approximation error decreasing to zero as $2^{-j}$ when $j \rightarrow + \infty $.

The approximation procedures based on sigmoidal functions find applications, for instance, in the theory of neural networks (NNs). NNs arise as a practical technique, successfully adopted to model a number of real-world problems, are often used in Approximation Theory as “universal approximators” and have the form

$$\begin{aligned} \sum _{k = 1}^N \alpha _k \, \sigma (x \cdot w_k - \theta _k), \quad x, \, w_k \in \mathbb{R }^n, \quad \alpha _k, \, \theta _k \in \mathbb{R }, \end{aligned}$$

(II)

where $x \cdot w_k := \sum _{i=1}^n x_i w_{k_i}$ denotes the inner product in $\mathbb{R }^n$, the $w_k$’s are the weights, the $\theta _k$’s are threshold values, and $\sigma $ is a sigmoidal activation function.

A theory for approximating functions by NNs, defined by (II), was developed by Cybenko in [16], and its feasibility was established by nonconstructive arguments. Often, $\sigma $ is either the well-known logistic function, or the sigmoidal function generated by the hyperbolic tangent, see [1, 2, 8]. The theory of NNs is mainly multivariate in nature, but useful constructive approximation results have been obtained also for univariate functions, see, e.g., [1, 2, 9, 14, 19, 22, 33]. Basic results on NNs were established by Li, Lenze, Mhaskar, Micchelli and Pinkus in [23, 26, 27, 31, 32, 34]. For results concerning the order of approximation, see [3, 10, 13, 15, 20, 24, 25]. One-dimensional NNs also play a role in numerical analysis. For instance, they have been used to solve ordinary differential equations [28], or to solve Fredholm or Volterra integral equations of the second kind [7, 12]. In this context, available constructive approximation algorithms are fundamental.

The theory for approximating certain functions by series of sigmoidal functions proposed in this paper can be exploited to obtain some kind of NNs approximation. Such an approach is completely new and allows us to obtain a constructive approximation algorithm based on a new class of sigmoidal functions.

Such a theory, in the present form, however, does not cover the important cases of NNs activated by either logistic, hyperbolic tangent or Gompertz sigmoidal functions. Therefore, in Sect. 5, we propose an extension of the theory previously developed, which includes such cases, also providing estimates for the approximation errors for functions belonging to the Lipschitz class.

2 Approximation by series of sigmoidal functions

In what follows, we denote by $C[a,b]$ and $AC[a,b]$ the sets of all continuous and absolutely continuous functions, $f: [a,b] \rightarrow \mathbb{R }$, on the bounded closed nonempty interval $[a,b]$, respectively; $\Vert \cdot \Vert _{\infty }$ is the usual sup norm $\Vert f \Vert _{\infty } := \max _{x \in [a,b]}|f(x)|$. Moreover, ${\widehat{C}}^n[a,b],\,n \in \mathbb{N }^+$, will denote the set of all functions $f \in C^n(a',b')$, for some open real interval $(a',b')$, such that $[a,b] \subset (a',b')$.

Let us introduce the class of functions we will work with.

Definition 2.1

The function $\phi :\mathbb{R }\rightarrow \mathbb{R }^+_0$ is said to belong to the class $\Phi $, if it satisfies the following conditions:

$(\varphi 1)\, \phi $ is continuous on $\mathbb{R }$ and there exists $C > 0$ such that
$$\begin{aligned} \phi (x) \le C (1 + |x|)^{-\alpha }, \end{aligned}$$
for every $x \in \mathbb{R }$, and for some $\alpha \ge 2$;
$(\varphi 2)$ $\sum _{k \in \mathbb{Z }}\phi (x - k) = 1$, for every $x \in \mathbb{R }$.

Remark 2.2

The condition $(\varphi 2)$ is equivalent to

$$\begin{aligned} {\widehat{\phi }}(k) := \left\{ \begin{array}{l} 0, \quad k \in \mathbb{Z }\setminus \left\{ 0 \right\} , \\ 1, \quad k = 0, \end{array} \right. \end{aligned}$$

where ${\widehat{\phi }}(v) := \int _{\mathbb{R }} \phi (t) \, e^{-ivt} \, dt,\,v \in \mathbb{R }$, is the Fourier transform of $\phi $; see [6]. In particular, it turns out that ${\widehat{\phi }}(0) = \int _{\mathbb{R }} \phi (t) \, dt = 1$.

For any fixed $\phi \in \Phi $, the function $K_{\phi }: \mathbb{R }^2\rightarrow \mathbb{R }^+_0$, defined by

$$\begin{aligned} K_{\phi }(x,y) := \sum _{k \in \mathbb{Z }} \phi (x - k) \, \phi (y - k), \quad (x,y) \in \mathbb{R }^2, \end{aligned}$$

(1)

will be called the kernel associated to $\phi $. Clearly, it follows from condition $(\varphi 2)$ and by Remark 2.2 that

$$\begin{aligned} \int \limits _{\mathbb{R }} K_{\phi }(x,y) \, dy = 1, \quad \text{ for } \text{ every } \, x \in \mathbb{R }. \end{aligned}$$

(2)

Moreover, using $(\varphi 1)$, it is easy to see that

$$\begin{aligned} K_{\phi }(x,y) \le L \, (1 + |x - y|)^{-\alpha }, \quad \text{ for } \text{ every } x,\, y \in \mathbb{R }, \end{aligned}$$

(3)

for some positive constant $L$. Under the previous assumptions on $K_{\phi }$, the following lemma, which will turn out to be useful later, could be established. Its proof is classical and can be found in [30].

Lemma 2.3

Let $(T_w)_{w>0}$ be the family of operators defined explicitly by

$$\begin{aligned} (T_w f)(x) := w \int \limits _{\mathbb{R }} K(w x, w y) \, f(y) \, dy, \quad x \in \mathbb{R }, \end{aligned}$$

for $f: \mathbb{R }\rightarrow \mathbb{R }$ (or $\mathbb{C }$), and where the kernel $K: \mathbb{R }^2 \rightarrow \mathbb{R }$ (or $\mathbb{C }$) meets the conditions (2) and (3). Then, for any uniformly continuous and bounded function $f$, we have

$$\begin{aligned} \lim _{w \rightarrow +\infty } \Vert T_w f - f\Vert _{\infty } = 0. \end{aligned}$$

Moreover, for every $f \in L^p(\mathbb{R }),\,1 \le p < +\infty $, it results

$$\begin{aligned} \lim _{w \rightarrow +\infty } \Vert T_w f - f \Vert _p = 0. \end{aligned}$$

Recall now the following

Definition 2.4

A function $\sigma : \mathbb{R }\rightarrow \mathbb{R }$ is called a “sigmoidal function”, whenever $ \lim _{x \rightarrow -\infty } \sigma (x) = 0$ and $\lim _{x \rightarrow +\infty } \sigma (x) = 1$.

Sometimes, boundedness, continuity and/or monotonicity are prescribed in addition. Let now $\phi \in \Phi $ be fixed and define the function $\sigma _{\phi }:\mathbb{R }\rightarrow \mathbb{R }^+_0$ as

$$\begin{aligned} \sigma _{\phi }(x) := \int \limits _{-\infty }^x \phi (t) \, dt, \quad x \in \mathbb{R }. \end{aligned}$$

(4)

Clearly, from condition $(\varphi 2)$ and Remark 2.2, such a function $\sigma _{\phi }$ is a sigmoidal function. We can now give the following

Definition 2.5

For every fixed function $\phi \in \Phi $, we define the family of operators $(S_w^{\sigma _{\phi }})_{w>0}$ by

$$\begin{aligned} (S_w^{\sigma _{\phi }} f)(x) := \sum \limits _{k \in \mathbb{Z }} \left[ \int \limits _a^b \phi (w y - k) \, f'(y) \, dy\right] \sigma _{\phi }(w x - k) + f(a), \quad x \in [a,b], \end{aligned}$$

for every $f \in AC[a,b]$ and $w > 0$. We call $S_w^{\sigma _{\phi }} f$ the “series of sigmoidal functions for $f$, based on $\phi $”, for the given value of $w > 0$.

Clearly, when $f$ is a constant function, the Definition 2.5 becomes trivial. Now, we can prove the following

Theorem 2.6

Let $\phi \in \Phi $ be fixed. For any given $f \in AC[a,b]$, the family $(S_w^{\sigma _{\phi }} f)_{w>0}$ converges uniformly to $f$ on $[a,b]$, i.e.,

$$\begin{aligned} \lim _{w \rightarrow \infty } \Vert S_w^{\sigma _{\phi }} f - f \Vert _{\infty } = 0. \end{aligned}$$

Moreover, if $f \in {\widehat{C}}^1[a,b]$, we have

$$\begin{aligned} \Vert S_w^{\sigma _{\phi }} f - f \Vert _{\infty } \le {\widetilde{C}} w^{-1}, \end{aligned}$$

for some positive constant ${\widetilde{C}}$ and for every $w > 0$.

Proof

Since $f \in AC[a,b],\,f(x)= \int ^x_a f'(z)\, dz\, + f(a)$ for every $x \in [a,b]$. Then, setting ${\widetilde{f}}'(z) = f'(z)$ for $z \in [a,b]$ and ${\widetilde{f}}'(z) = 0$ for $z \notin [a,b]$, we obtain

$$\begin{aligned} |(S_w^{\sigma _{\phi }} f)(x) - f(x)|&= \left| \sum _{k \in \mathbb{Z }} \left[ \int \limits _a^b \phi (w y - k) \, f'(y) \, dy \right] \sigma _{\phi }(w x - k) - \int \limits _a^x f'(z) \, dz \right| \\&= \left| \sum _{k \in \mathbb{Z }} \left[ \int \limits _{\mathbb{R }} \phi (w y - k) \, {\widetilde{f}}'(y) \, dy \right] \int \limits _{-\infty }^{w x - k} \phi (t) \, dt - \int \limits _{-\infty }^x {\widetilde{f}}'(z) \, dz \right| . \end{aligned}$$

Changing variable, by setting $t = w z - k$, we get

$$\begin{aligned}&|(S_w^{\sigma _{\phi }} f)(x) - f(x)| \nonumber \\&\quad \le \int \limits _{-\infty }^x \left| \sum _{k \in \mathbb{Z }} \left[ w \int \limits _{\mathbb{R }} \phi (w y - k) \, {\widetilde{f}}'(y) \, dy \right] \phi (w z - k) - {\widetilde{f}}'(z) \right| \, dz \nonumber \\&\quad = \int \limits _{-\infty }^x \left| w \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, {\widetilde{f}}'(y) \, dy - {\widetilde{f}}'(z) \right| \, dz \nonumber \\&\quad \le \int \limits _{-\infty }^{+\infty } \left| w \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, {\widetilde{f}}'(y) \, dy - {\widetilde{f}}'(z) \right| \, dz. \end{aligned}$$

(5)

Being ${\widetilde{f}}' \in L^1(\mathbb{R })$, we obtain by Lemma 2.3 and inequality (5)

$$\begin{aligned} \lim _{w \rightarrow +\infty } \Vert S_w^{\sigma _{\phi }} f - f \Vert _{\infty } \le \lim _{w \rightarrow +\infty } \Vert T_w {\widetilde{f}}' - {\widetilde{f}}' \Vert _1 = 0, \end{aligned}$$

which completes the proof of the first part of the theorem.

Consider now $f \in {\widehat{C}}^1[a,b]$. Note that, by conditions $(\varphi 2)$ and (2), we have

$$\begin{aligned} w \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, dy = 1, \quad \hbox { for every } z \in \mathbb{R }\hbox { and } w > 0. \end{aligned}$$

Then, again from inequality (5), we obtain

$$\begin{aligned}&|(S_w^{\sigma _{\phi }} f)(x) - f(x)| \nonumber \\&\quad \le \int \limits _{\mathbb{R }} \left| w \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, {\widetilde{f}}'(y) \, dy - {\widetilde{f}}'(z) w \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, dy \right| \, dz \nonumber \\&\quad \le w \int \limits _{\mathbb{R }} \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, | {\widetilde{f}}'(y) -{\widetilde{f}}'(z)| \, dy \, dz \nonumber \\&\quad \le 2 w \Vert f' \Vert _{\infty } \int \limits _{\mathbb{R }} \int \limits _{\mathbb{R }} K_{\phi }(w z, w y) \, dy \, dz. \end{aligned}$$

(6)

Changing the variables $z$ and $y$ in the last integral in (6) with $z_1/w$ and $y_1/w$, respectively, we obtain, in view of condition (3),

$$\begin{aligned} \Vert S_w^{\sigma _{\phi }} f - f \Vert _{\infty }&\le \, 2 w^{-1} \Vert f' \Vert _{\infty } \int \limits _{\mathbb{R }} \int \limits _{\mathbb{R }} K_{\phi }(z_1, y_1) \, dy_1 \, dz_1 \\&\le 2 w^{-1} \Vert f' \Vert _{\infty } \, L \int \limits _{\mathbb{R }} \int \limits _{\mathbb{R }} (1 + |z_1 - y_1|)^{-\alpha } \, dy_1 \, dz_1 =: \widetilde{C} w^{-1}, \end{aligned}$$

for every $w > 0$, for some $\widetilde{C}>0$, and where $\alpha \ge 2$ is the constant of condition $(\varphi 1)$. This completes the proof of the second part of the theorem. $\square $

Examples of functions $\phi \in \Phi $ will be given in the next sections.

3 Application to neural networks

Here, we give some applications of the theory developed in the previous sections to NNs of the form (II). Below, we will study NNs of the type in (II) in a univariate setting and activated by the sigmoidal functions generated by (4). We will denote by $\Phi _C$ the subset of $\Phi $ of functions having a compact support.

Let $\phi \in \Phi _C$ be fixed, and let $M_1,\,M_2 > 0$ such that $\text{ supp } \, \phi \subseteq [-M_1,M_2]$. In this case, we have for any $f \in AC[a,b]$ and $w > 0$,

$$\begin{aligned} \int \limits _a^b \phi (w y - k) f'(y) \, dy = 0, \end{aligned}$$

for every $k < w a - M_2$ and $k > w b + M_1,\,k \in \mathbb{Z }$, since for these values of $k,\,[w a - k,w b - k] \cap [-M_1,M_2] = \emptyset $. Then, the series appearing in the definition of the operator $S_w^{\sigma _{\phi }} f$ reduces to a finite sum, i.e.,

$$\begin{aligned} (S_w^{\sigma _{\phi }} f)(x) = \sum ^{\left\lceil wb+M_1 \right\rceil }_{k=\left\lfloor wa-M_2 \right\rfloor } \left[ \int \limits _a^b \phi (w y - k) \, f'(y) \, dy \right] \sigma _{\phi }(w x - k) + f(a), \end{aligned}$$

(7)

for every $x \in [a,b]$, where the functions $\left\lceil x \right\rceil $ and $\left\lfloor x \right\rfloor $ denote the upper and the lower integer part of $x \in \mathbb{R }$, respectively. Now, we introduce the following modification in definition 2.5 for the case $\phi \in \Phi _C$. For any $f \in AC[a,b]$, set

$$\begin{aligned} (G_w^{\sigma _{\phi }} f)(x) \!:=\! \sum _{k=\left\lfloor wa-M_2 \right\rfloor }^{\left\lceil wb\!+\!M_1 \right\rceil } \left[ \int \limits _a^b \phi (w y \!-\! k) \, f'(y) \, dy \right] \sigma _{\phi }(w x - k) \!+\! f(a) \, \sigma _{\phi }(w(x - a + 1)), \end{aligned}$$

for every $x \in [a,b]$ and $w > 0$. The $G_w^{\sigma _{\phi }} f$’s are a kind of NNs. They approximate $f$, uniformly on $[a,b]$, as $w \rightarrow +\infty $. The proof of this claim follows from the same arguments made in Theorem 2.6, taking into account that

$$\begin{aligned} \sup _{x \in [a,b]} |f(a)||1 - \sigma _{\phi }(w(x - a + 1))| \le |f(a)||1 - \sigma _{\phi }(w)|\ =\ 0, \end{aligned}$$

(8)

for $w>0$ sufficiently large. Indeed, by the definition of $\sigma _{\phi }$, for every $w > M_2$ we have

$$\begin{aligned} \sigma _{\phi }(w) = \int \limits _{-\infty }^{w}\phi (x)\, dx = \int \limits _{\mathbb{R }} \phi (x)\, dx = 1. \end{aligned}$$

(9)

Moreover, again by Theorem 2.6, if $f \in {\widehat{C}}^1[a,b]$ we obtain the convergence rate given by $\Vert G_w^{\sigma _{\phi }} f - f \Vert _{\infty } \le \widetilde{C} w^{-1}$, for some positive constants $\widetilde{C}$ and for every sufficiently large $w > 0$.

Our work provides a unified approach for NNs approximations. In addition, our proofs are constructive in nature and allow us to determine explicitly the form of the NN. In particular, we show that the set of NNs $G_w^{\sigma _{\phi }} f$ is dense in the set $AC[a,b]$, with respect to the uniform norm.

Now, we show that we can obtain NNs also starting from functions $\phi \in \Phi $ which are not necessarily compactly supported. Let first prove the following

Lemma 3.1

The series $\sum _{k \in \mathbb{Z }}\phi (wx-k)$ converges uniformly on the compact subsets of $\mathbb{R }$, for every fixed $w > 0$.

In particular, we have for every $[a,b] \subset \mathbb{R }$

$$\begin{aligned} \sup _{x \in [a,b]} \sum _{|k| > N}\phi (wx-k) \le \overline{C} \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} , \end{aligned}$$

for some $\overline{C}>0$, for every $N > w \max \left\{ |a|, |b| \right\} ,\,N \in \mathbb{N }^+$, where $\alpha \ge 2$ is the constant of condition $(\varphi 1)$.

Proof

Let $[a,b] \subset \mathbb{R }$ be fixed. By condition $(\varphi 1)$ and for $N > w \max \left\{ |a|, |b| \right\} $ we have

$$\begin{aligned}&\sup _{x \in [a,b]} \sum _{|k| > N}\phi (wx-k) \le C \sup _{x \in [a,b]} \sum _{|k| > N} (1+|wx-k|)^{-\alpha }\\&\quad = C \left\{ \sup _{x \in [a,b]} \sum _{k > N} (1+|wx-k|)^{-\alpha } + \sup _{x \in [a,b]} \sum _{k > N} (1 + |wx+k|)^{-\alpha } \right\} \\&\quad \le C \left\{ \sum _{k > N} (1+k-wb)^{-\alpha } + \sum _{k > N} (1+wa+k)^{-\alpha } \right\} \le C \left\{ \int \limits ^{+\infty }_N (1+x -wb)^{-\alpha } dx \right. \\&\qquad + \left. \int \limits ^{+\infty }_N (1+wa + x)^{-\alpha } dx \right\} =: \overline{C} \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} . \end{aligned}$$

The proof then follows. $\square $

We can now establish the following

Theorem 3.2

(i)
For any $f \in AC[a,b]$, we denote by
$$\begin{aligned} (G_{N,w}^{\sigma _{\phi }} f)(x)&:= \sum _{k=-N}^N \left[ \int \limits _a^b \phi (w y - k) \, f'(y) \, dy \right] \sigma _{\phi }(w x - k) \nonumber \\&+ f(a) \, \sigma _{\phi }(w(x - a + 1)), \end{aligned}$$
(10)
for $x \in [a,b],\, w > 0$, and $N \in \mathbb{N }^+$. Then, for every $\varepsilon > 0$ there exist $w > 0$ and $N \in \mathbb{N }^+$ such that
$$\begin{aligned} \Vert G_{N,w}^{\sigma _{\phi }} f - f \Vert _{\infty } < \varepsilon . \end{aligned}$$
(ii)
Moreover, for any $f \in {\widehat{C}}^1[a,b]$ we have
$$\begin{aligned} \Vert G_{N,w}^{\sigma _{\phi }} f - f \Vert _{\infty }&\le C_1 \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)}\right\} \\&+ C_2 \, w^{-1} + C_3 \, w^{-(\alpha -1)}, \end{aligned}$$
for some constants $C_1,\,C_2,\,C_3 \!>\! 0$, and for every $w \!>\! 0$ with $N \!>\! w \max \left\{ |a|,\, |b| \right\} ,\,N \!\in \! \mathbb{N }^+$, where $\alpha \ge 2$ is the constant appearing in condition $(\varphi 1)$.

Proof

(i)
Let $\varepsilon > 0$ be fixed. For every $x \in [a,b]$ we have
$$\begin{aligned}&|(G_{N,w}^{\sigma _{\phi }} f)(x) - f (x)| \le |(G_{N,w}^{\sigma _{\phi }} f)(x) - (S_w^{\sigma _{\phi }} f)(x)| + |(S_w^{\sigma _{\phi }} f)(x) - f (x)| \nonumber \\&\quad \le \sum _{|k|>N} \left[ \int \limits _a^b \phi (w y - k) \, |f'(y)| \, dy \right] \sigma _{\phi }(w x - k) + |f(a)||1 - \sigma _{\phi }(w(x - a + 1))|\nonumber \\&\qquad + \Vert S_w^{\sigma _{\phi }} f - f \Vert _{\infty } =: S_1 + S_2 + S_3. \end{aligned}$$
(11)

Proceeding as in (8) and using $(\varphi 1)$, we can write

$$\begin{aligned} S_2&\le |f(a)| |1 - \sigma _{\phi }(w)| = |f(a)| \int \limits ^{+\infty }_w \phi (x)\, dx \nonumber \\&\le |f(a)| C \int \limits ^{+\infty }_w (1+x)^{-\alpha } dx =: \underline{C} \, (1 + w)^{-(\alpha -1)}, \end{aligned}$$

(12)

where $\alpha \ge 2$ is the constant appearing in condition $(\varphi 1)$, and $\underline{C} > 0$, then $S_2 < \varepsilon $ for $w > 0$ sufficiently large. Moreover, we obtain from Theorem 2.6 that $S_3 < \varepsilon $ for $w > 0$ sufficiently large. Finally, we can estimate $S_1$. Being $\Vert \sigma _{\phi } \Vert _{\infty } \le 1$, we obtain for $S_1$

$$\begin{aligned} S_1\,&\le \, \Vert \sigma _{\phi } \Vert _{\infty } \sum _{|k|>N} \left[ \int \limits _a^b \phi (w y - k) \, |f'(y)| \, dy \right] \nonumber \\&\le \left[ \sup _{y \in [a,b]} \sum _{|k|>N} \phi (w y - k) \right] \int \limits _a^b |f'(y)| \, dy. \end{aligned}$$

(13)

We have by Lemma 3.1, for every fixed and sufficiently large $w > 0$,

$$\begin{aligned} \sup _{y \in [a,b]} \sum _{|k|>N} \phi (wy-k) \le \overline{C} \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} , \end{aligned}$$

(14)

for some constant $\overline{C} > 0$ and for every $N > w \max \left\{ |a|, |b| \right\} $ with $N \in \mathbb{N }^+$. Then, for $N$ sufficiently large, we obtain $S_1 < \varepsilon $. This completes the proof of (i).

(ii)
For any $f \in {\widehat{C}}^1[a,b]$, Theorem 2.6 shows that $S_3 \le {\widetilde{C}} w^{-1}$ uniformly with respect to $x \in [a,b]$, for every $w > 0$. Moreover, we obtain by (12) and (14)
$$\begin{aligned} S_1 + S_2 + S_3&\le \overline{C} \left[ \int \limits _a^b |f'(y)| \, dy \right] \left\{ (N-wb+1)^{-(\alpha -1)} \right. \\&\quad + \left. (N+wa+1)^{-(\alpha -1)} \right\} + \widetilde{C} \, w^{-1} + \underline{C} \, (1 + w)^{-(\alpha -1)}\\&\le C_1 \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} \\&\quad + C_2 \, w^{-1} + C_3 \, w^{-(\alpha -1)}, \end{aligned}$$
uniformly with respect to $x \in [a,b]$, for some constants $C_1,\,C_2,\,C_3 > 0$, and for $w > 0$ sufficiently large, with $N > w \max \left\{ |a|, \,|b|\right\} $. $\square $

Remark 3.3

Setting $C_3 = 0$ in Theorem 3.2 (ii), we also obtain an estimate for the truncation error for the series of sigmoidal functions introduced in Sect. 2. Note that, when the weight, $w$, increases, we need a higher number of neurons, $N$, which depends on $w$.

We now construct few examples of sigmoidal functions, $\sigma _{\phi }$, providing first some examples of functions $\phi \in \Phi _C$ satisfying all hypotheses of our theory. Recall that the “central B-splines” of order $n \in \mathbb{N }^+$, are defined as

$$\begin{aligned} M_n(x) := \frac{1}{(n-1)!} \sum _{i=0}^n (-1)^i \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{n}{2} + x - i \right) ^{n-1}_+, \end{aligned}$$

where $(x)_+ := \max \left\{ x, 0 \right\} $ is the positive part of $x \in \mathbb{R }$ [5]. The Fourier transform of $M_n$ is given by

$$\begin{aligned} \widehat{M_n}(v) := \text{ sinc }^n\left( \frac{v}{2 \pi } \right) , \quad v \in \mathbb{R }, \end{aligned}$$

where the $sinc$ function is defined by

$$\begin{aligned} \text{ sinc }(x) := \left\{ \begin{array}{l} \displaystyle \frac{\sin (\pi x)}{\pi x}, \quad x \in \mathbb{R }\setminus \left\{ 0 \right\} , \\ 1, \quad x = 0. \end{array} \right. \end{aligned}$$

The $M_n$’s are bounded and continuous on $\mathbb{R }$ for all $n \in \mathbb{N }^+$, and are compactly supported on $[-n/2,n/2]$. This implies that $M_n \in L^1(\mathbb{R })$ and satisfies condition $(\varphi 1)$ for every $\alpha \ge 2$. Finally, condition $(\varphi 2)$ holds, in view of Remark 2.2, hence, $M_n \in \Phi _C$ for every $n \in \mathbb{N }^+$. Therefore, we can construct explicitly the NNs $G_w^{\sigma _{M_n}} f,\,n \in \mathbb{N }^+$.

As an example of function $\phi \in \Phi $ which is not compactly supported, consider the continuous function

$$\begin{aligned} F(x) := \frac{1}{2 \pi }\text{ sinc }^2\left( \frac{x}{2 \pi }\right) , \qquad x \in \mathbb{R }. \end{aligned}$$

Clearly, $F(x) = \mathcal{O }(x^{-2-\varepsilon })$ as $x \rightarrow \pm \infty ,\,\varepsilon > 0$, hence, $F$ satisfies condition $(\varphi 1)$ with $\alpha =2$, see [5]. Moreover, its Fourier transform is

$$\begin{aligned} {\widehat{F}}(v) := \left\{ \begin{array}{l} 1 - |v|, \quad |v| \le 1, \\ 0, \quad |v|>1, \end{array} \right. \end{aligned}$$

(see [5] again). By Remark 2.2, $F$ satisfies also condition $(\varphi 2)$, and then $F \in \Phi $.

Remark 3.4

Note that the theory developed in this section cannot be applied to the case of NNs activated by the logistic functions, $\sigma _{\ell }(x) := (1 + e^{-x})^{-1}$ (see [4, 21], e.g., for applications to Demography and Economics), or to the hyperbolic tangent sigmoidal functions, $\sigma _h(x) := \frac{1}{2} + \frac{1}{2} \tanh (x) = \frac{1}{2} + \frac{e^{2x} - 1}{2 (e^{2x} + 1)}$, [1, 2, 8]. In fact, $\sigma _{\ell }$ and $\sigma _h$ can be generated by (4) from $\phi _{\ell }(x) := e^{-x} (1 + e^{-x})^{-2}$ and $\phi _h(x) := 2 e^{2x} (e^{2x} + 1)^{-2}$, respectively. However, $\widehat{\phi _{\ell }}(v) = \pi v /\sinh (\pi v)$ and $\widehat{\phi _{h}}(v) = \pi v/(2 \sinh (\pi v/2))$, respectively, which do not meet the condition in Remark 2.2, i.e., do not satisfy condition $(\varphi 2)$. In Sect. 5 below, an extension of the theory developed above is proposed, which allows to use NNs activated by $\sigma _{\ell }$ or $\sigma _h$.

4 Sigmoidal functions and multiresolution approximation

In this section, we will show a connection between the theory of multiresolution approximation and our theory for approximating functions by series of sigmoidal functions. We first recall some basic facts concerning the multiresolution approximation. For the detailed theory, see [11, 17, 29, 30, 36]. We start recalling the following

Definition 4.1

A multiresolution approximation of $L^2(\mathbb{R })$ is an increasing sequence, $V_j,\,j \in \mathbb{Z }$, of linear closed subspaces of $L^2(\mathbb{R })$, enjoying the following properties:

$$\begin{aligned} \bigcap _{j \in \mathbb{Z }} V_j = \left\{ 0 \right\} , \qquad \bigcup _{j \in \mathbb{Z }} V_j \hbox { is dense in } L^2(\mathbb{R }); \end{aligned}$$

(15)

for all $f \in L^2(\mathbb{R })$ and all $j \in \mathbb{Z }$,

$$\begin{aligned} f(x) \in V_j \Longleftrightarrow f(2x) \in V_{j+1}; \end{aligned}$$

(16)

for all $f \in L^2(\mathbb{R })$ and all $k \in \mathbb{Z }$,

$$\begin{aligned} f(x) \in V_0 \Longleftrightarrow f(x-k) \in V_0; \end{aligned}$$

(17)

there exists a function, $h(x) \in V_0$, such that the sequence

$$\begin{aligned} (h(x - k))_{k \in \mathbb{Z }} \hbox { is a Riesz basis of } V_0. \end{aligned}$$

(18)

Recall that a sequence of functions $(h_k)_{k \in \mathbb{Z }}$ is a Riesz basis of an Hilbert space, $H \subseteq L^2(\mathbb{R })$, if there exist two constants, $C_1$ and $C_2$, with $C_1 > C_2 > 0$, such that, for every sequence of real or complex numbers $(a_k)_{k \in \mathbb{Z }} \in l^2(\mathbb{Z })$, it turns out that

$$\begin{aligned} C_2 \left( \sum _{k \in \mathbb{Z }} |a_k|^2 \right) ^{1/2} \le \left\| \sum _{k \in \mathbb{Z }} a_k h_k \right\| _{L^2(\mathbb{R })} \le C_1\left( \sum _{k \in \mathbb{Z }} |a_k|^2 \right) ^{1/2}, \end{aligned}$$

and the vector space of finite linear combinations of $h_k$, is dense in $H$.

Definition 4.2

A multiresolution approximation, $V_j,\,j \in \mathbb{Z }$, is called r-regular $(r \in \mathbb{N }^+)$, if the function $h$ in (18) is such that $h \in C^r(\mathbb{R })$ and

$$\begin{aligned} |h^{(i)}(x)| \le C_{m} (1 + |x|)^{-m}, \quad x \in \mathbb{R }, \end{aligned}$$

(19)

for each integer $m \in \mathbb{N }^+$ and for every positive index $i \le r$.

For every $r$-regular multiresolution approximation $V_j,\,j \in \mathbb{Z }$, we can define the function $\phi \in L^2(\mathbb{R })$, called scaling function, as

$$\begin{aligned} {\widehat{\phi }}(v) := {\widehat{h}}(v) \left( \sum _{k \in \mathbb{Z }} |{\widehat{h}}(v + 2 \pi k)|^2 \right) ^{-1/2}, \quad v \in \mathbb{R }. \end{aligned}$$

(20)

In [30, Ch. 2], it is proved that $\sum _{k \in \mathbb{Z }} |{\widehat{h}}(v + 2 \pi k)|^2 \ge c > 0$, hence, $\phi $ is well-defined. Moreover, by the regularity of $h$, we have, as a consequence of the Sobolev’s embedding theorem, that $\sum _{k \in \mathbb{Z }} |{\widehat{h}}(v + 2 \pi k)|^2$ is a $C^{\infty }(\mathbb{R })$ function. Furthermore, the family $(\phi (x-k))_{k \in \mathbb{Z }}$ turns out to be an orthonormal basis of $V_0$, [17, 30], and from (16) and (17), we obtain by a simple change of scale that $(2^{j/2} \phi (2^jx-k))_{k \in \mathbb{Z }}$ forms an orthonormal basis of $V_j$.

Now, by smoothness and periodicity of $\left( \sum _{k \in \mathbb{Z }} |{\widehat{h}}(v + 2 \pi k)|^2 \right) ^{-1/2}$, the latter can be written by means of its Fourier series $\sum _{k \in \mathbb{Z }} \alpha _k e^{i k v}$, where the coefficients $\alpha _k$ decrease rapidly. We thus obtain ${\widehat{\phi }}(v) = \left( \sum _{k \in \mathbb{Z }} \alpha _k e^{i k v} \right) {\widehat{h}}(v)$ which gives $\phi (x) = \sum _{k \in \mathbb{Z }} \alpha _k \, h(x+k)$, and then it follows that the scaling function $\phi $ satisfies the estimates in (19). In particular, we have

$$\begin{aligned} |\phi (x)| \le \widetilde{C}_{\alpha } \, (1+|x|)^{-\alpha }, \qquad x \in \mathbb{R }, \end{aligned}$$

(21)

for some $\widetilde{C}_{\alpha } > 0$, for every integer $\alpha \in \mathbb{N }^+$, i.e., $\phi $ satisfies condition $(\varphi 1)$ for every $\alpha \in \mathbb{N }^+$.

Let now $E_j$ be the orthogonal projection of $L^2(\mathbb{R })$ onto $V_j$, given by

$$\begin{aligned} (E_j f)(x) := \sum _{k \in \mathbb{Z }} \left[ 2^j \int \limits _{\mathbb{R }} f(y) \, \overline{\phi }(2^j y - k) \, dy \right] \phi (2^jx-k), \quad f \in L^2(\mathbb{R }), \end{aligned}$$

(22)

where $\overline{\phi }$ is the complex conjugate of $\phi $. Let define $E(x,y) := \sum _{k \in \mathbb{Z }} \overline{\phi }(y-k) \, \phi (x-k)$, the kernel of the projection operator $E_0$, hence, $2^j \, E(2^jx,2^jy),\,j \in \mathbb{Z }$ will be the kernel of the projection operator $E_j$.

Again in [30], it is proved the following remarkable property for the kernel, $E$,

$$\begin{aligned} \int \limits _{\mathbb{R }} E(x,y) \, y^{\alpha } \, dy = x^{\alpha }, \quad \hbox {for every } x \in \mathbb{R }, \end{aligned}$$

(23)

for every integer $\alpha \in \mathbb{N }$ and $\alpha \le r$. From (23) with $\alpha = 0$, the integral property

$$\begin{aligned} \int \limits _{\mathbb{R }} E(x,y) \, dy = 1, \quad \hbox {for every } x \in \mathbb{R }, \end{aligned}$$

follows. Moreover, since $\phi $ satisfies (21), it is easy to see that

$$\begin{aligned} |E(x,y)| \le \overline{C}_{\alpha } (1 + |x - y|)^{-\alpha }, \quad \forall \, (x,y) \in \mathbb{R }^2, \hbox { and }\quad \forall \, \alpha \in \mathbb{N }^+, \end{aligned}$$

where $\overline{C}_{\alpha }>0$. Hence, $E$ is a bivariate kernel satisfying conditions (2) and (3). Then, by Lemma 2.3, we infer that $\Vert E_j f - f \Vert _p \rightarrow 0$ as $j \rightarrow + \infty $, for every $f \in L^p(\mathbb{R })$ and $1 \le p < \infty $. Moreover, exploiting the properties of the projection operators $E_j$, the quantity

$$\begin{aligned} \Sigma (x,v) := \sum _{k \in \mathbb{Z }} e^{i 2 \pi k x} \, {\widehat{\phi }}(v + 2 k \pi ) \, \overline{{\widehat{\phi }}}(v), \quad x, \, v \in \mathbb{R }, \end{aligned}$$

can be defined, which satisfies the condition $\Sigma (x,0) = 1$, for every $x \in \mathbb{R }$, [30]. This yields

$$\begin{aligned} \sum _{k \in \mathbb{Z }} e^{i 2 \pi k x} {\widehat{\phi }}(2 k \pi ) \, \overline{{\widehat{\phi }}}(0) = 1. \end{aligned}$$

(24)

Now, we can adjust the scaling function $\phi $ merely multiplying ${\widehat{\phi }}$ by a suitable constant of modulus $1$ so that ${\widehat{\phi }}(0) = \int _{\mathbb{R }} \phi (t) \, dt = 1$, while preserving all the other properties, [30]. By the regularity of $\phi $, the Poisson summation formula holds, and from (24), we obtain

$$\begin{aligned} 1 = \sum _{k \in \mathbb{Z }} e^{i 2 \pi k x} \, {\widehat{\phi }}(2 k \pi ) = \sum _{k \in \mathbb{Z }} \phi (x + k) = \sum _{k \in \mathbb{Z }} \phi (x - k), \quad x \in \mathbb{R }, \end{aligned}$$

i.e., the scaling function $\phi $ satisfies condition $(\varphi 2)$. Using (4), we can now consider the function $\sigma _{\phi }$ constructed by the scaling function $\phi $. Clearly, if $\phi $ is real valued, $\sigma _{\phi }$ turns out to be a sigmoidal function. Then, we have the following

Theorem 4.3

Let $\phi $ be a real-valued scaling function like that constructed above, associated with an $r$- regular multiresolution approximation of $L^2(\mathbb{R })$.

(i)
Then, for any $f \in AC[a,b]$, the sequence of operators $(S_j^{\sigma _{\phi }} f)_{j \in \mathbb{N }^+}$, defined by
$$\begin{aligned} (S_j^{\sigma _{\phi }} f)(x) := \sum _{k \in \mathbb{Z }} \left[ \int \limits _a^b \phi (2^j y - k) \, f'(y) \, dy \right] \sigma _{\phi }(2^j x - k) + f(a), \end{aligned}$$
for every $x \in [a,b]$, converges uniformly to $f$ on $[a,b]$. In particular, if $f \in {\widehat{C}}^1[a,b]$, we have
$$\begin{aligned} \Vert S_j^{\sigma _{\phi }} f - f \Vert _{\infty } \le C \, 2^{-j}, \end{aligned}$$
for some positive constant $C$ and for every positive integer $j$.
(ii)
Denote by $S_{N, j}^{\sigma _{\phi }} f,\,N \in \mathbb{N }^+$, the truncated series $S_j^{\sigma _{\phi }} f$, i.e.,
$$\begin{aligned} (S_{N,j}^{\sigma _{\phi }} f)(x) := \sum _{k = -N}^N \left[ \int \limits _a^b \phi (2^j y - k) \, f'(y) \, dy \right] \sigma _{\phi }(2^j x - k) + f(a). \end{aligned}$$
Then, for every $f \in {\widehat{C}}^1[a,b]$, we have
$$\begin{aligned} \Vert S_{N,j}^{\sigma _{\phi }} f - f \Vert _{\infty } \le C_1 \, 2^{-j} + C_{2, \alpha } \left\{ (N-2^jb+1)^{-(\alpha -1)} + (N+2^j a+1)^{-(\alpha -1)}\right\} , \end{aligned}$$
for some positive constants $C_1$ and $C_{2,\alpha }$, for every $j \in \mathbb{N }^+$, and $N > 2^j \max \left\{ |a|, |b| \right\} $, where $\alpha \in \mathbb{N }^+$ is an arbitrary integer.

The proof of Theorem 4.3 (i) follows as the proof of Theorem 2.6, taking into account that, the sequence $(E_j f)_{j \in \mathbb{Z }},\,f \in L^1(\mathbb{R })$, converges to $f$ in $L^1(\mathbb{R })$. Moreover, the proof of Theorem 4.3 (ii) follows, as the proof of Theorem 3.2 (ii), using condition (21) and Lemma 3.1, where we have $2^j$ in place of $w$.

Remark 4.4

Note that, in the special setting of $r$-regular multiresolution approximations, we are able to prove that the real-valued scaling functions $\phi $, constructed above, are such that $\phi \in \Phi $. Moreover, condition (16) in definition 4.1 allows us to consider the weights in the basis $(2^{j/2} \phi (2^jx-k))_{k \in \mathbb{Z }}$, and then in the series $S_j^{\sigma _{\phi }}f$, as $2^j$, i.e., the weights increase exponentially with respect to $j$. Then, the error of approximation of $C^1$-functions decreases as $2^{-j}$. Moreover, conditions (19) and (21) are crucial to prove that the truncation error also decrease rapidly.

Examples of $r$-regular multiresolution analysis satisfying the conditions above can be given, assuming $h$ to be generated by spline wavelets of order $r+1$. These are defined by

$$\begin{aligned} h_r(x) := \frac{1}{r!} \sum _{i=0}^{r+1} (-1)^i \left( {\begin{array}{c}r+1\\ i\end{array}}\right) \left( x - i \right) ^{r}_+, \quad x \in \mathbb{R }, \end{aligned}$$

(25)

which can be viewed just as shifted central B-spline $M_n$. Generally speaking, the definition of $h_n$ is given in terms of convolution, i.e., $h_n$ can be defined as the convolution of $r+1$ characteristic functions of the interval $[0,1)$, see [35]. Note that, also the central B-spline can be defined similarly, in terms of convolutions of the characteristic functions of the interval $[-1/2,1/2)$, see [5]. The Fourier transform of $h_r$ can be easily obtained by

$$\begin{aligned} {\widehat{h}}_r(v) := e^{-iv(r + 1)/2} \, \text{ sinc }^{r+1}\left( \frac{v}{2 \pi } \right) , \quad v \in \mathbb{R }. \end{aligned}$$

The scaling function $\phi $ associated with the spline wavelet multiresolution approximation can be obtained using (20) and the normalization procedure described above, see [18, 30, 35].

5 An extension of the theory for neural networks approximation

The theory developed in the previous sections, concerning the approximation by means of series of sigmoidal functions based on $\sigma _{\phi }$ is beset by the technical difficulty of checking that $\phi $ satisfies condition $(\varphi 2)$. To this purpose, we could use the condition given in Remark 2.2. However, this does not simplify the problem. In fact, evaluating the Fourier transform of a given function is often a difficult task. Moreover, as noticed in Remark 3.4, the sigmoidal functions most used for NN approximation do not satisfy $(\varphi 2)$. Below, we propose an extension of the theory developed in the previous sections, aiming at obtaining approximations with NNs activated by sigmoidal functions $\sigma _{\phi }$, without assuming that condition $(\varphi 2)$ be satisfied by $\phi $.

Through this section, we consider functions $\phi : \mathbb{R }\rightarrow \mathbb{R }^+_0$, with $\int _{\mathbb{R }} \phi (t)\, dt = 1$ and satisfying condition $(\varphi 1)$ with $\alpha > 2$. Moreover, we set

$$\begin{aligned} \psi _{\phi }(t) := \sigma _{\phi }(t + 1) - \sigma _{\phi }(t) > 0, \quad t \in \, \mathbb{R }, \end{aligned}$$

and assume in addition that $\psi _{\phi }$ satisfies:

$$\begin{aligned} (\Psi 1) \qquad \psi _{\phi }(t) \le A (1 + |t|)^{-\alpha }, \end{aligned}$$

for every $t \in \mathbb{R }$ and some $A > 0$. We denote by $\mathcal{T }$ the set of all functions $\phi $ satisfying such conditions. We can now prove the following

Lemma 5.1

For any given $\phi \in \mathcal{T }$, the relation

$$\begin{aligned} \sum _{k \in \mathbb{Z }} \psi _{\phi }(x - k) = 1, \quad x \in \mathbb{R }\end{aligned}$$

holds.

Proof

Let $x \in \mathbb{R }$ be fixed. Then,

$$\begin{aligned} \sum _{k=-N}^N \psi _{\phi }(x - k) = \sum _{k=-N}^N [\sigma _{\phi }(x - k + 1) - \sigma _{\phi }(x - k)] = \sigma _{\phi }(x + N + 1) - \sigma _{\phi }(x - N), \end{aligned}$$

since the sum is telescopic. Passing to the limit for $N \rightarrow +\infty $, we obtain immediately

$$\begin{aligned} \sum _{k=-\infty }^{+\infty } \psi _{\phi }(x - k) = \lim _{N \rightarrow +\infty } [\sigma _{\phi }(x + N + 1) -\sigma _{\phi }(x - N)] = 1. \end{aligned}$$

$\square $

Let now introduce the bivariate kernel

$$\begin{aligned} K_{\phi , \psi }(x,y) := \sum _{k \in \mathbb{Z }} \psi _{\phi }(x - k) \, \phi (y - k), \quad (x,y) \in \mathbb{R }^2. \end{aligned}$$

As made in Sect. 2 for the kernel $K_{\phi }$, we can show, using Lemma 5.1 and conditions $(\varphi 1)$ and $(\Psi 1)$, that $K_{\phi , \psi }$ satisfy both, (2) and (3). Now, for any given $\phi \in \mathcal{T }$, we consider the family of operators $(F_w^{\phi })_{w>0}$, defined by

$$\begin{aligned} (F_w^{\phi } f)(x)&:= \sum _{k \in \mathbb{Z }} w \left[ \int \limits _{\mathbb{R }} \phi (w y - k) f(y) \, dy\right] \psi _{\phi }(w x - k) \\&:= w \int \limits _{\mathbb{R }} K_{\phi , \psi }(wx,wy) f(y) \, dy, \quad x \in \mathbb{R }, \end{aligned}$$

for every bounded $f: \mathbb{R }\rightarrow \mathbb{R },\,w > 0$.

Remark 5.2

Note that, by Lemma 2.3, for every uniformly continuous and bounded function $f$, the family of operators $(F_w^{\phi }f)_{w>0}$ converges uniformly to $f$ on $\mathbb{R }$, as $w\rightarrow +\infty $.

To study the order of approximation for the operators above, we define the Lipschitz class of the Zygmund type we will work with. Let us define

$$\begin{aligned} \text{ Lip }(\nu ) := \left\{ f: \mathbb{R }\rightarrow \mathbb{R }: \Vert f(\cdot ) - f(\cdot + t)\Vert _{\infty } = \mathcal{O }(|t|^{\nu }) \hbox { as } t\, \rightarrow 0 \right\} , \end{aligned}$$

for every $0 < \nu \le 1$. We can now prove the following lemma concerning the order of approximation of $(F_w^{\sigma _{\phi }}f)_{w>0}$ to $f(x)$:

Lemma 5.3

Let $f \in \text{ Lip }(\nu ),\,0 < \nu \le 1$, be a fixed bounded function. Then, there exist $C_1 > 0$ and $C_2 > 0$ such that

$$\begin{aligned} \sup _{x \in \mathbb{R }} |(F_w^{\phi }f)(x) - f(x)| \le C_1 \, w^{- \nu } + C_2 \, w^{-(\alpha -1)}, \end{aligned}$$

for every sufficiently large $w > 0$.

Proof

Let $x \in \mathbb{R }$ be fixed. Since $f \in \text{ Lip }(\nu )$, there exist $M > 0$ and $\gamma > 0$ such that

$$\begin{aligned} \Vert f(\cdot ) - f(\cdot + t) \Vert _{\infty } \le M \, |t|^{\nu }, \end{aligned}$$

for every $|t| \le \gamma $. Moreover, we infer from condition (2)

$$\begin{aligned} w \int \limits _{\mathbb{R }} K_{\phi , \psi }(wx,wy) \, dy = 1, \quad x \in \mathbb{R }, \end{aligned}$$

(26)

and then we can write

$$\begin{aligned} |(F_w^{\phi }f)(x) - f(x)|&\le w \int \limits _{\mathbb{R }} K_{\phi , \psi }(wx,wy) \, |f(y)-f(x)| \, dy \\&= \left[ \int \limits _{|y-x| \le \gamma } + \int \limits _{|y-x| > \gamma } \right] w \, K_{\phi , \psi }(wx,wy) \, |f(y) - f(x)| \, dy =: J_1 + J_2. \end{aligned}$$

Let first estimate $J_1$. From (3) and (26), by the change of variable $y = (t/w) + x$, and being $f \in \text{ Lip }(\nu )$, we obtain for $w > 0$ sufficiently large

$$\begin{aligned} J_1&= \int \limits _{w^{-1}|t| \le \gamma } K_{\phi , \psi }(wx,t+wx) \, |f\left( x+t/w\right) - f(x)| \, dt \\&\le M \left[ \int \limits _{|t| \le w \, \gamma } K_{\phi , \psi }(wx,t+wx) \left| \frac{t}{w} \right| ^{\nu } \, dt \right] \le \widetilde{L} w^{-\nu } \int \limits _{\mathbb{R }}(1+|t|)^{-\alpha } \, |t|^{\nu } \, dt, \end{aligned}$$

where $\widetilde{L} > 0$ is a suitable constant. Now, since $\alpha > 2$, we have $\widetilde{L} \int _{\mathbb{R }}(1+|t|)^{-\alpha } \, |t|^{\nu } \, dt =: C_1 < +\infty $, then $J_1 \le C_1 \, w^{-\nu }$, for $w>0$ sufficiently large. Moreover, setting $t = w y$ and using again condition (3), we have

$$\begin{aligned} J_2&= \int \limits _{|t-wx| > w \gamma }K_{\phi , \psi }(wx,t) |f(t/w) - f(x)| \, dt\\&\le 2 \Vert f \Vert _{\infty } \int \limits _{|t-wx| > w \gamma } K_{\phi , \psi }(wx,t) \, dt \le \overline{L} \int \limits _{|t-wx| > w \gamma } (1 + |t - w x|)^{-\alpha } \, dt, \end{aligned}$$

where $\overline{L}$ is a suitable positive constant. Changing now the variable $t$ into $z$, setting $z = t - w x$ in the last integral, we obtain

$$\begin{aligned} J_2 \le \overline{L} \int \limits _{|z| > w \gamma } (1 + |z|)^{-\alpha } \, dz \le C_2 \, w^{-(\alpha -1)}, \end{aligned}$$

for every $w > 0$. This completes the proof. $\square $

We can now prove the following

Theorem 5.4

Let $\phi \in \mathcal{T }$ be fixed. Define the NNs

$$\begin{aligned} (N_{N,w}^{\phi } f)(x) := \sum _{k = - N }^{N} w \left[ \int \limits _{\mathbb{R }} \phi (w y - k) \, f(y) \, dy \right] \psi _{\phi }(w x - k), \quad x \in \mathbb{R }, \end{aligned}$$

where $w > 0,\,N \in \mathbb{N }^+$, and $f: \mathbb{R }\rightarrow \mathbb{R }$ is a bounded function on $\mathbb{R }$.

(i)
Let $f \in C[a,b]$ be fixed. Then, for every $\varepsilon > 0$, there exist $w > 0$ and $N > w \max \left\{ |a|,\, |b| \right\} $, such that
$$\begin{aligned} \Vert N_{N,w}^{\phi } {\widetilde{f}} - f \Vert _{\infty } = \sup _{x \in [a,b]}|(N_{N,w}^{\phi } {\widetilde{f}})(x) - f(x)| < \varepsilon , \end{aligned}$$
where ${\widetilde{f}}$ is a continuous extensions of $f$ such that ${\widetilde{f}}$ has compact support and ${\widetilde{f}} = f$ on $[a,b]$.
(ii)
Let $f \in Lip(\nu ),\,0<\nu \le 1$, and $[a,b] \subset \mathbb{R }$ be fixed. Then, we have
$$\begin{aligned}&\Vert N_{N,w}^{\phi } f - f \Vert _{\infty } = \sup _{x \in [a,b]}|(N_{N,w}^{\phi } f)(x)-f(x)| \\&\quad \le C_1 \, w^{- \nu } + C_2 \, w^{-(\alpha -1)} + C_3 \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} , \end{aligned}$$
for every sufficiently large $w>0$ and $N > w \max \left\{ |a|, |b|\right\} $, for some positive constants $C_1,\,C_2$, and $C_3$.

Proof

(i)
Suppose for the sake of simplicity that $\Vert f\Vert _{\infty } = \Vert {\widetilde{f}}\Vert _{\infty }$, and note that ${\widetilde{f}}$ is uniformly continuous. Let now $\varepsilon > 0$ and $x \in [a,b]$ be fixed. We can write
$$\begin{aligned} |(N_{N,w}^{\phi } {\widetilde{f}})(x) - f(x)|&\le |f(x) - (F_{w}^{\phi } {\widetilde{f}})(x)| + |(F_{w}^{\phi } {\widetilde{f}})(x) - (N_{N,w}^{\phi } {\widetilde{f}})(x)| \\&=: I_1 + I_2. \end{aligned}$$
By Remark 5.2 we have $I_1 < \varepsilon $ for $w > 0$ sufficiently large. Moreover,
$$\begin{aligned} I_2 \le \sum _{|k|> N} w \left[ \int \limits _{\mathbb{R }} \phi (w y - k) \, |{\widetilde{f}}(y)| \, dy \right] \psi _{\phi }(w x - k). \end{aligned}$$
Hence, $w \int \limits _{\mathbb{R }} \phi (w y - k)\, dy = 1$, and since $(\Psi 1)$ holds, we obtain for $\psi _{\phi }$ the same estimate given in Lemma 3.1 for $\phi $, then for every fixed sufficiently large $w > 0$ we have
$$\begin{aligned} I_2&\le \Vert f \Vert _{\infty } \sup _{x \in [a,b]} \sum _{|k|> N} \left[ w \int \limits _{\mathbb{R }} \phi (w y - k) \, dy \right] \psi _{\phi }(w x - k)\nonumber \\&= \Vert f \Vert _{\infty } \left[ \sup _{x \in [a,b]} \sum _{|k|>N} \psi _{\phi }(w x - k) \right] \nonumber \\&< \Vert f \Vert _{\infty } \widetilde{C} \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} < \varepsilon , \end{aligned}$$
(27)
for some positive constant $\widetilde{C},\,N \in \mathbb{N }^+,\,N > w \max \left\{ |a|, |b|\right\} $, and then, (i) is proved being $\varepsilon >0$ arbitrary.
(ii)
Let now $f \in \text{ Lip }(\nu )$ be a fixed. We have by Lemma 5.3
$$\begin{aligned} I_1 \le C_1 \, w^{- \nu } + C_2 \, w^{-(\alpha -1)}, \end{aligned}$$
for every sufficiently large $w > 0$ and for some positive constants $C_1$ and $C_2$. Moreover, we obtain from (27)
$$\begin{aligned} I_2 \le C_3 \left\{ (N-wb+1)^{-(\alpha -1)} + (N+wa+1)^{-(\alpha -1)} \right\} , \end{aligned}$$
for a suitable constant $C_3 > 0$. Then, the second part of the theorem is proved.

$\square $

As a first example, we can consider the case of the logistic function, $\sigma _{\ell }$ (see, e.g., [8]), generated by $\phi _{\ell }(x) := e^{-x} (1 + e^{-x})^{-2}$. Clearly, conditions $(\varphi 1)$ and $(\Psi 1)$, are fulfilled, since $\phi _{\ell }$ and

$$\begin{aligned} \psi _{\ell }(x) := \sigma _{\ell }(x + 1) - \sigma _{\ell }(x) = \frac{e \, (e - 1) \, e^{-x}}{(1 + e^{-x-1})(1 + e^{-x})}, \end{aligned}$$

decay exponentially as $x \rightarrow \pm \infty $. A second example, is given by the hyperbolic tangent sigmoidal function (see, e.g., [1, 2]),

$$\begin{aligned} \sigma _h(x) := \frac{1}{2} + \frac{1}{2} \tanh (x) = \frac{1}{2} + \frac{e^{2 x} - 1}{2 (e^{2 x} + 1)}. \end{aligned}$$

This can be generated by $\phi _h(x) = 2 \, e^{2 x} \, (e^{2 x} + 1)^{-2}$, whose associated function $\psi _{h}$ is

$$\begin{aligned} \psi _h(x) = \frac{(e^2 - 1) \, e^{2 x}}{(e^{2x+2} + 1)(e^{2x} + 1)}. \end{aligned}$$

It can be easily checked that such a function $\phi _h$ belongs to $\mathcal{T }$.

Finally, we recall that another remarkable example of sigmoidal function is provided by the class of Gompetz functions, defined by

$$\begin{aligned} \sigma _{\alpha \beta }(x) := e^{-\alpha \, e^{-\beta x}}, \quad x \in \mathbb{R }, \end{aligned}$$

for $\alpha ,\,\beta > 0$. Gompertz functions are widely used in such fields as, for instance, demography and in modeling tumor growth.

Remark 5.5

Note that, in closing, in order to approximate functions by the NNs $G^{\sigma _{\phi }}_{N,w}$, the half of the number of sigmoidal functions needed to approximate functions by the NNs $N^{\phi }_{N,w}$, would now suffice. The theory developed in this section, however, can be applied to important sigmoidal functions for which the theory earlier discussed in Sects. 2 and 3 cannot be applied.

References

Anastassiou, G.A.: Univariate hyperbolic tangent neural network approximation. Math. Comput. Model. 53(5–6), 1111–1132 (2011)
Article MATH MathSciNet Google Scholar
Anastassiou, G.A.: Multivariate hyperbolic tangent neural network approximation. Comput. Math. Appl. 61(4), 809–821 (2011)
Article MATH MathSciNet Google Scholar
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)
Article MATH MathSciNet Google Scholar
Brauer, F., Castillo Chavez, C.: Mathematical Models in Population Biology and Epidemiology. Springer, New York (2001)
Book MATH Google Scholar
Butzer, P.L., Nessel, R.J.: Fourier Analysis and Approximation. Birkhauser Verlag, Bassel and Academic Press, New York (1971)
Book MATH Google Scholar
Butzer, P.L., Splettstößer, W., Stens, R.L.: The sampling theorem and linear prediction in signal analysis. Jahresber. Deutsch. Math.-Verein 90, 1–70 (1988)
MATH Google Scholar
Buzhabadi, R., Effati, S.: A neural network approach for solving Fredholm integral equations of the second kind. Neural Comput. Appl. 21, 843–852 (2012)
Article Google Scholar
Cao, F., Chen, Z.: The approximation operators with sigmoidal functions. Comput. Math. Appl. 58(4), 758–765 (2009)
Article MATH MathSciNet Google Scholar
Chen, H., Chen, T., Liu, R.: A constructive proof and an extension of Cybenko’s approximation theorem. In: Computing Science and Statistics, pp. 163–168. Springer, New York (1992)
Chen, D.: Degree of approximation by superpositions of a sigmoidal function. Approx. Theory Appl. 9(3), 17–28 (1993)
MATH MathSciNet Google Scholar
Chui, C.K.: An Introduction to Wavelets, Wavelet Analysis and its Applications 1. Academic Press Inc., Boston (1992)
Google Scholar
Costarelli, D., Spigler, R.: Solving Volterra integral equations of the second kind by sigmoidal functions approximations, to appear in J. Integral Eq. Appl. 25(2) (2013)
Costarelli, D., Spigler, R.: Constructive approximation by superposition of sigmoidal functions. Anal. Theory Appl. 29(2), 169–196 (2013)
MATH MathSciNet Google Scholar
Costarelli, D., Spigler, R.: Approximation results for neural network operators activated by sigmoidal functions. Neural Netw. 44, 101–106 (2013)
Article MATH Google Scholar
Costarelli, D., Spigler, R.: Multivariate neural network operators with sigmoidal activation functions. Neural Netw. 48, 72–77 (2013)
Article MATH Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989)
Article MATH MathSciNet Google Scholar
Daubechies, I.: Ten Lectures on Wavelets, Regional Conference Series in Applied Mathematics 61. Society for Industrial and Applied Mathematics, SIAM, Philadelphia (1992)
De Boor, C.: A Practical Guide to Spline, Applied Mathematical Sciences 27. Springer, New York (2001)
Gao, B., Xu, Y.: Univariant approximation by superpositions of a sigmoidal function. J. Math. Anal. Appl. 178, 221–226 (1993)
Article MATH MathSciNet Google Scholar
Hahm, N., Hong, B.: Approximation order to a function in $\overline{C}({\mathbb{R}})$ by superposition of a sigmoidal function. Appl. Math. Lett. 15, 591–597 (2002)
Article MATH MathSciNet Google Scholar
Hritonenko, N., Yatsenko, Y.: Mathematical Modelling in Economics, Ecology and the Environment. Science Press, Beijing (2006)
Google Scholar
Jones, L.K.: Constructive approximations for neural networks by sigmoidal functions, Technical Report Series 7. University of Lowell, Dep. of Mathematics (1988)
Lenze, B.: Constructive multivariate approximation with sigmoidal functions and applications to neural networks. In: Numer. Methods Approx. Theory, Birkhauser Verlag, Basel-Boston-Berlin, pp. 155–175 (1992)
Lewicki, G., Marino, G.: Approximation by superpositions of a sigmoidal function. Z. Anal. Anwendungen J. Anal. Appl. 22(2), 463–470 (2003)
Article MATH MathSciNet Google Scholar
Lewicki, G., Marino, G.: Approximation of functions of finite variation by superpositions of a sigmoidal function. Appl. Math. Lett. 17, 1147–1152 (2004)
Article MATH MathSciNet Google Scholar
Li, X.: Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer. Neurocomputing 12, 327–343 (1996)
Article MATH Google Scholar
Li, X., Micchelli, C.A.: Approximation by radial bases and neural networks. Numer. Algorithms 25, 241–262 (2000)
Article MATH MathSciNet Google Scholar
Malek, A., Shekari Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network—optimization method. Appl. Math. Comput. 183, 260–271 (2006)
Article MATH MathSciNet Google Scholar
Mallat, S.G.: Multiresolution approximations and wavelet orthonormal bases of $L^2({\mathbb{R}})$. Trans. Am. Math. Soc. 315, 69–87 (1989)
MATH MathSciNet Google Scholar
Meyer, Y.: Wavelets and Operators. Cambridge Studies in Advanced Mathematics 37, Cambridge (1992)
Mhaskar, H.N., Micchelli, C.A.: Approximation by superposition of sigmoidal and radial basis functions. Adv. Appl. Math. 13, 350–373 (1992)
Article MATH MathSciNet Google Scholar
Mhaskar, H.N., Micchelli, C.A.: Degree of approximation by neural and translation networks with a single hidden layer. Adv. Appl. Math. 16, 151–183 (1995)
Article MATH MathSciNet Google Scholar
Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)
Article Google Scholar
Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numer. 8, 143–195 (1999)
Article MathSciNet Google Scholar
Unser, M.: Ten good reasons for using spline wavelets. Wavelets Appl. Signal Image Process. 3169(5), 422–431 (1997)
Google Scholar
Xiehua, S.: On the degree of approximation by wavelet expansions. Approx. Theory Appl. 14(1), 81–90 (1998)
MATH MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported, in part, by the GNAMPA and the GNFM of the Italian INdAM.

Author information

Authors and Affiliations

Dipartimento di Matematica e Fisica, Sezione di Matematica, Università “Roma Tre”, 1, Largo S. Leonardo Murialdo, 00146 , Rome, Italy
Danilo Costarelli & Renato Spigler

Authors

Danilo Costarelli
View author publications
You can also search for this author in PubMed Google Scholar
Renato Spigler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renato Spigler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costarelli, D., Spigler, R. Approximation by series of sigmoidal functions with applications to neural networks . Annali di Matematica 194, 289–306 (2015). https://doi.org/10.1007/s10231-013-0378-y

Download citation

Received: 06 March 2013
Accepted: 28 August 2013
Published: 11 September 2013
Issue Date: February 2015
DOI: https://doi.org/10.1007/s10231-013-0378-y

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximation by series of sigmoidal functions with applications to neural networks

Abstract

Similar content being viewed by others

Coefficient Estimates for Starlike and Convex Functions Related to Sigmoid Functions

Quantitative Estimates for Neural Network Operators Implied by the Asymptotic Behaviour of the Sigmoidal Activation Functions

Solving numerically nonlinear systems of balance laws by multivariate sigmoidal functions approximation

1 Introduction

2 Approximation by series of sigmoidal functions

Definition 2.1

Remark 2.2

Lemma 2.3

Definition 2.4

Definition 2.5

Theorem 2.6

Proof

3 Application to neural networks

Lemma 3.1

Proof

Theorem 3.2

Proof

Remark 3.3

Remark 3.4

4 Sigmoidal functions and multiresolution approximation

Definition 4.1

Definition 4.2

Theorem 4.3

Remark 4.4

5 An extension of the theory for neural networks approximation

Lemma 5.1

Proof

Remark 5.2

Lemma 5.3

Proof

Theorem 5.4

Proof

Remark 5.5

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation