1 Introduction
Recent years have seen a dynamic development in applications of deep neural networks (DNNs for short) in expressing high-dimensional input–output relations. This development was driven mainly by the need for quantitative modelling of input–output relationships subject to large sets of observation data. Rather naturally, therefore, DNNs have found a large number of applications in computational finance and financial engineering. We refer to the survey by Ruf and Wang [
47] and the references there. Without going into details, we only state that the majority of activity addresses techniques to employ DNNs in demanding tasks in computational finance. The often striking efficient computational performance of DNN-based algorithms raises naturally the question for theoretical, in particular mathematical, underpinning of successful algorithms. Recent years have seen progress, in particular in the context of option pricing for Black–Scholes-type models, for DNN-based numerical approximation of diffusion models on possibly large baskets (see e.g. Berner et al. [
9], Elbrächter et al. [
22] and Ito et al. [
34], Reisinger and Zhang [
45] for game-type options). These references prove that DNN-based approximations of option prices on possibly large baskets of risky assets can overcome the so-called curse of dimensionality in the context of affine diffusion models for the dynamics of the (log-)prices of the underlying risky assets. These results could be viewed also as particular instances of DNN expression rates of certain PDEs on high-dimensional state spaces, and indeed corresponding DNN expressive power results have been shown for their solution sets in Grohs et al. [
29], Gonon et al. [
27] and the references there.
Since the turn of the century, models beyond the classical diffusion setting have been employed increasingly in financial engineering. In particular, Lévy processes and their non-stationary generalisations such as Feller–Lévy processes (see e.g. Böttcher et al. [
11, Chap. 2] and the references there) have received wide attention. This can in part be explained by their ability to account for heavy tails of financial data and by Lévy-based models constituting
hierarchies of models, comprising in particular classical diffusion (“Black–Scholes”) models with constant volatility that are still widely used in computational finance as a benchmark. Therefore, all results for geometric Lévy processes in the present paper apply in particular to the Black–Scholes model.
The “Feynman–Kac correspondence” which relates conditional expectations of sufficiently regular functionals over diffusions to (viscosity) solutions of corresponding Kolmogorov PDEs extends to multivariate Lévy processes. We mention only Nualart and Schoutens [
41], Cont and Tankov [
16, Sect. 12.2], Cont and Voltchkova [
18], Glau [
26], Eberlein and Kallsen [
21, Chap. 5.4] and the references there. The Kolmogorov PDE (“Black–Scholes equation”) in the diffusion case is then replaced by a so-called
partial integro-differential equation (PIDE) where the fractional integro-differential operator accounting for the jumps is related in a one-to-one fashion with the Lévy measure
\(\nu ^{d}\) of the
-valued Lévy process
\(X^{d}\). In particular, Lévy-type models for (log-)returns of risky assets result in
nonlocal partial integro-differential equations for the option price which generalise the linear parabolic differential equations which arise in classical diffusion models. We refer to Bertoin [
10, Chap. 1], Sato [
48, Chaps. 1–5] for fundamentals on Lévy processes and to Böttcher et al. [
11, Chap. 2] for extensions to certain non-stationary settings. For the use of Lévy processes in financial modelling, we refer to Cont and Tankov [
16, Chap. 11], Eberlein and Kallsen [
21, Sect. 8.1] and the references there. We refer to Cont and Voltchkova [
18,
17], Matache et al. [
40], Hilber et al. [
32, Chap. 14] for a presentation and for numerical methods for option pricing in Lévy models.
The results on DNNs in the context of option pricing mentioned above are exclusively concerned with models with continuous price processes. This naturally raises the question whether DNN-based approximations are still capable of overcoming the curse of dimensionality in high-dimensional financial models with jumps which have a much richer mathematical structure. This question is precisely the subject of this article. We study the expression rates of DNNs for prices of options (and the associated PIDEs) written on possibly large baskets of risky assets whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. In particular, we establish sufficient conditions on the characteristic triplet of the Lévy process \(X^{d}\) that ensure \(\varepsilon \) error of DNN expressed option prices with DNNs of size \({\mathcal{O}}(\varepsilon ^{-2})\), and with constants implied in \({\mathcal{O}}(\, \cdot \, )\) which grow polynomially with respect to \(d\). This shows that DNNs are capable of overcoming the curse of dimensionality also for general exponential Lévy models.
Let us outline the scope of our results. The DNN expression rate results proved here give a theoretical justification for neural-network-based non-parametric option pricing methods. These have become very popular recently; see for instance the recent survey by Ruf and Wang [
47]. Our results show that if option prices result from an exponential Lévy model, as described e.g. in Eberlein and Kallsen [
21, Chap. 3.7], these prices can under mild conditions on the Lévy triplets be expressed efficiently by (ReLU) neural networks, also for high dimensions. The result covers in particular rather general, multivariate correlation structure in the jump part of the Lévy process, for example parametrised by a so-called
Lévy copula; see Kallsen and Tankov [
36], Farkas et al. [
24], Eberlein and Kallsen [
21, Chap. 8.1] and the references there. This extends, at least to some extent, the theoretical foundation to the widely used neural-network-based non-parametric option pricing methodologies to market models with jumps.
We prove two types of results on DNN expression rate bounds for European options in exponential Lévy models, with one probabilistic and one “deterministic” proof. The former is based on concepts from statistical learning theory and provides for relevant payoffs (baskets, call on max, …) an expression error \({\mathcal{O}}(\varepsilon )\) with DNN sizes of \({\mathcal{O}}(\varepsilon ^{-2})\) and with constants implied in \({\mathcal{O}}(\, \cdot \, )\) which grow polynomially in \(d\), thereby overcoming the curse of dimensionality. The latter bound is based on parabolic smoothing of the Kolmogorov equation and allows us to prove exponential expressivity of prices for positive maturities, i.e., an expression error \({\mathcal{O}}(\varepsilon )\) with DNN sizes of \({\mathcal{O}}(|\log \varepsilon |^{a})\) for some \(a>0\), albeit with constants implied in \({\mathcal{O}}(\, \cdot \, )\) possibly growing exponentially in \(d\).
For the latter approach, a certain non-degeneracy is required for the symbol of the underlying Lévy process. The probabilistic proof of the DNN approximation rate results, on the other hand, does not require any such assumptions. It only relies on the additive structure of the semigroup associated to the Lévy process and on existence of moments. Thus the results proved here are specifically tailored to the class of option pricing functions (or more generally expectations of exponential Lévy processes) under European-style, plain vanilla payoffs.
The structure of this paper is as follows. In Sect.
2, we review terminology, basic results and financial modelling with exponential Lévy processes. In particular, we also recapitulate the corresponding fractional, partial integro-differential Kolmogorov equations which generalise the classical Black–Scholes equations to Lévy models. Section
3 recapitulates notation and basic terminology for deep neural networks to the extent required in the ensuing expression rate analysis. We focus mainly on so-called ReLU DNNs, but add that corresponding definitions and also results hold for more general activation functions. In Sect.
4, we present a first set of DNN expression rate results, still in the univariate case. This is, on the one hand, for presentation purposes, as this setting allows lighter notation, and for introducing mathematical concepts which will be used subsequently also for contracts on possibly large basket of Lévy-driven risky assets. We also present an application of the results to neural-network-based call option pricing. Section
5 then has the main results of the present paper: expression rate bounds for ReLU DNNs for multivariate, exponential Lévy models. We identify sufficient conditions to obtain expression rates which are free from the curse of dimensionality via mathematical tools from statistical learning theory. We also develop a second argument based on parabolic Gevrey-regularity with quantified derivative bounds, which even yield exponential expressivity of ReLU DNNs, albeit with constants that generally depend on the basket size in a possibly exponential way. Finally, we develop an argument based on quantified sparsity in polynomial chaos expansions and corresponding ReLU expression rates from Schwab and Zech [
49] to prove high algebraic expression rates for ReLU DNNs with constants that are independent of the basket size. We also provide a brief discussion of recent, related results. We conclude in Sect.
6 and indicate several possible generalisations of the present results.
3 Deep neural networks (DNNs)
This article is concerned with establishing expression rate bounds of deep neural networks (DNNs) for prices of options (and the associated PIDEs) written on possibly large baskets of risky assets whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. The term “expression rate” denotes the rate of convergence to 0 of the error between the option price and its DNN approximation. This rate can be directly translated to quantify the DNN size required to achieve a given approximation accuracy. For instance, in Theorem
5.1 below, an expression rate of
\(\mathfrak{q}^{-1}\) is established and one may even choose
\(\mathfrak{q} = 2\) in many relevant cases. We now give a brief introduction to DNNs.
Roughly speaking, a deep neural network (DNN for short) is a function built by multiple concatenations of affine transformations with a (typically nonlinear) activation function. This gives rise to a parametrised family of nonlinear maps; see for example Petersen and Voigtlaender [
44] or Buehler et al. [
14, Sect. 4.1] and the references there.
Here we follow current practice and refer to the collection of parameters
\(\Phi \) as
“the neural network” and denote by
\(\mathrm{R}(\Phi )\) its realisation, that is, the function defined by these parameters. More specifically, we use the following terminology (see for example Opschoor et al. [
42, Sect. 2]). We first fix a function
(referred to as the activation function) which is applied componentwise to vector-valued inputs.
We refer to Opschoor et al. [
42, Sect. 2] for further details.
The following lemma shows that concatenating
\(n\) affine transformations with distinct neural networks and taking their weighted average can itself be represented as a neural network. The number of non-zero weights in the resulting neural network can be controlled by the number of non-zero weights in the original neural networks. The proof of the lemma is based on a simple extension of the
full parallelisation operation for neural networks (see [
42, Proposition 2.5]) and refines Grohs et al. [
29, Lemma 3.8].
4 DNN approximations for univariate Lévy models
We study DNN expression rates for option prices under (geometric) Lévy models for asset prices, initially here in one spatial dimension. We present two expression rate estimates for ReLU DNNs, which are based on distinct mathematical arguments; the first, probabilistic argument builds on ideas used in recent works by Gonon et al. [
27], Beck et al. [
7] and the references there. However, for the key step of the proof, a different technique is used, which is based on the Ledoux–Talagrand contraction principle (see Ledoux and Talagrand [
39, Theorem 4.12]) and statistical learning. This new approach is not only technically less involved (in comparison to e.g. the techniques used in [
27]), but also allows for weaker assumptions on the activation function; see Proposition
4.1 below. Alternatively, under stronger hypotheses on the activation function, one can also rely on [
27, Lemma 2.16]; see Proposition
4.4 below. The probabilistic arguments result in, essentially, an expression error
\({\mathcal{O}}(\varepsilon )\) with DNN sizes of
\({\mathcal{O}}(\varepsilon ^{-2})\). The second argument draws on parabolic (analytic) regularity furnished by the corresponding Kolmogorov equations and results in far stronger, exponential expression rates, i.e., with an expression error
\({\mathcal{O}}(\varepsilon )\) with DNN sizes which are polylogarithmic with respect to
\(0< {\varepsilon }< 1\). As we shall see in the next section, however, the latter argument is in general subject to the curse of dimensionality.
4.1 DNN expression rates: probabilistic argument
We fix
\(0< a < b < \infty \) and measure the approximation error in the uniform norm on
\([a,b]\). Recall that
\(M(\Phi )\) denotes the number of (non-zero) weights of a neural network
\(\Phi \) and
\(\mathrm{R}(\Phi )\) is the realisation of
\(\Phi \). Consider the following exponential integrability condition on the Lévy measure
\(\nu \): for some
\(p\geq 2\),
$$ \int _{\{|y|>1\}} e^{py} \nu (d y) < \infty . $$
(4.1)
Furthermore, for any function
\(g\), we denote by
\(\mathrm{Lip}(g)\) the best Lipschitz constant for
\(g\).
4.2 DNN expression of European calls
In this section, we illustrate how the results of Proposition
4.1 can be used to bound DNN expression rates of call options on exponential Lévy models.
Suppose we observe call option prices for a fixed maturity
\(T\) and
\(N\) different strikes
\(K_{1},\ldots ,K_{N}>0\). Denote these prices by
\(\hat{C}(T,K_{1}),\ldots ,\hat{C}(T,K_{N})\). A task frequently encountered in practice is to extrapolate from these prices to prices corresponding to unobserved maturities or to learn a non-parametric option pricing function. A widely used approach is to solve
$$ \min _{\phi \in \mathcal{H}} \frac{1}{N} \sum _{i=1}^{N} \bigg( \frac{\hat{C}(T,K_{i})}{K_{i}}-\phi (S_{0}/K_{i})\bigg)^{2}. $$
(4.15)
Here ℋ is a suitable collection of (realisations of) neural networks, for example all networks with an a-priori fixed architecture. In fact, many of the papers listed in the recent review by Ruf and Wang [
47] use this approach or a variation of it, where for example an absolute value is inserted instead of a square or
\(\hat{C}(T,K_{i})/K_{i}\) is replaced by
\(\hat{C}(T,K_{i})\) and
\(S/K_{i}\) by
\(K_{i}\).
In this section, we assume that the observed call prices are generated from an (assumed unknown) exponential Lévy model and ℋ consists of ReLU networks. Then we show that the error in (
4.15) can be controlled and that we can give bounds on the number of non-zero parameters of the minimising neural network. The following result is a direct consequence of Proposition
4.1. It shows that
\({\mathcal{O}}(\varepsilon ^{-1})\) weights suffice to achieve an error of at most
\(\varepsilon \) in (
4.15).
4.3 ReLU DNN exponential expressivity
We now develop a second argument for bounding the expressivity of ReLU DNNs for the option price
\(u(\tau ,s)\) solving (
2.4) with initial condition
\(u(0,s) = \varphi (s)\). In particular,
in this subsection, we choose
\(\varrho (x)=\max \{x,0\}\) as activation function.
As in the preceding first, probabilistic argument, we consider the DNN expression error in a bounded interval
\([a,b]\) with
\(0< a< s< b<\infty \). The second argument is based on
parabolic smoothing of the linear parabolic PIDE (
2.4). This in turn ensures smoothness of
\(s\mapsto u(\tau ,s)\) at positive times
\(\tau >0\), i.e., smoothness in the “spatial” variable
\(s\in [a,b]\) resp. in the log-return variable
\(x=\log s \in [\log a,\log b ]\), even for non-smooth payoff functions
\(\varphi \) (so in particular, binary options with discontinuous payoffs
\(\varphi \) are admissible, albeit at the cost of non-uniformity of derivative bounds at
\(\tau \downarrow 0\)). It is a classical result that this implies spectral, possibly exponential convergence of
polynomial approximations of
\(u(\tau ,\, \cdot \,)|_{[a,b]}\) in
\(L^{\infty }([a,b])\). As observed in Opschoor et al. [
43, Sect. 3.2], this exponential polynomial convergence rate implies also exponential expressivity of ReLU DNNs of
\(u(\tau , \, \cdot \,)|_{[a,b]}\) in
\(L^{\infty }([a,b])\) for any
\(\tau >0\).
To ensure smoothing properties of the solution operator of the PIDE, we require additional assumptions (see (
4.17) below) on the Lévy triplet
\((\sigma ^{2},\gamma ,\nu )\). To formulate these, we recall the Lévy symbol
\(\psi \) of the ℝ-valued LP
\(X\) as
The proof proceeds in several steps. First, we apply the change of variables
in order to leverage the stationarity of the LP
\(X\) for obtaining a constant coefficient Kolmogorov PIDE. Assumption (
4.17) then ensures well-posedness of the PIDE in a suitable variational framework. We then exploit that stationarity of the LP
\(X\) facilitates the use of Fourier transformation; the lower bound on
\(\psi \) in (
4.17) will allow deriving sharp, explicit bounds on high spatial derivatives of (variational) solutions of the PIDE which imply Gevrey-regularity of these solutions on bounded intervals
\([a,b]\subseteq (0,\infty )\). Gevrey-regularity in turn implies exponential rates of convergence of polynomial and deep ReLU NN approximations of
\(s\mapsto u(\tau ,s)\) for
\(\tau >0\), whence we obtain the assertion of the theorem. We recall that for
\(\delta \geq 1\), a smooth function
\(x\mapsto f(x)\) is
Gevrey-\(\delta \)-regular in an open subset
if
\(f\in C^{\infty }(D)\) and for every compact set
\(\kappa \subseteq D\), there exists
\(C_{\kappa }> 0\) such that for all
and every
\(x\in \kappa \), we have
\(|D_{x}^{\alpha }f(x)| \leq C_{\kappa }^{|\alpha |+1} (\alpha !)^{\delta }\). Note that
\(\delta =1\) implies that
\(f\) is real analytic in
\(\kappa \). We refer to Rodino [
46, Sect. 1.4] for details, examples and further references.
We change coordinates to
\(x = \log s \in (-\infty ,\infty )\) so that
\(v(\tau ,x)=u(\tau ,e^{x})\). Then the PIDE (
2.4) takes the form (see e.g. Matache et al. [
40, Sect. 3], Lamberton and Mikou [
38, Sect. 3.1])
(4.18)
where
\(A\) denotes the integro-differential operator
together with the initial condition
$$ v(0,x)= \varphi (e^{x}) = (\varphi \circ \exp )(x). $$
(4.19)
Then
\(C(t, s)= v(T-t, \log s)\) satisfies
(4.20)
Conversely, if
\(C(t,s)\) in (
4.20) is sufficiently regular, then
\(v(\tau , x) = C(T-\tau , e^{x})\) is a solution of (
4.18), (
4.19) (recall that we assume
\(r=0\) for notational simplicity).
The Lévy–Khintchine formula describes the ℝ-valued LP
\(X\) by the log-characteristic function
\(\psi \) of the random variable
\(X_{1}\). From the time-homogeneity of the LP
\(X\),
(4.21)
The Lévy exponent
\(\psi \) of the LP
\(X\) admits the explicit representation (
4.16).
The Lévy exponent
\(\psi \) is the symbol of the pseudo-differential operator
\(-{\mathcal{L}}\), where ℒ is the infinitesimal generator of the semi-group of the LP
\(X\). Here
\({\mathcal{A}}=-{\mathcal{L}}\) is the spatial operator in (
4.18) given by
$$ {\mathcal{A}}[f](x) = - \frac{\sigma ^{2}}{2} \frac{d^{2} f}{dx^{2}}(x) - \gamma \frac{df}{dx}(x) + A[f](x). $$
(4.22)
For
, we associate with the operator
\({\mathcal{A}}\) the bilinear form
The translation invariance of
\(\mathcal{A}\) (implied by stationarity of the LP
\(X\)) in (
4.22) and Parseval’s equality (see Hilber et al. [
32, Remark 10.4.1]) imply that
\(\psi \) is the symbol of
\(\mathcal{A}\), i.e.,
Here
,
, denotes the Fourier transform of
\(f\). The assumption (
4.17) on
\(\psi \) implies continuity and coercivity of the bilinear form
\(a(\, \cdot \, , \, \cdot \,)\) on
so that for
, there exists a unique variational solution
of the PIDE (
4.18) with initial condition (
4.19); see e.g. Eberlein and Glau [
20].
Fix
\(0<\tau \leq T < \infty \) and
. The variational solution
\(v\) of (
4.18), (
4.19) satisfies
For every
, Parseval’s equality implies with the lower bound in (
4.17) that
An elementary calculation shows that for any
\(m,\kappa ,\mu >0\), we have
$$ \max _{\eta >0} \big( \eta ^{m} \exp (-\kappa \eta ^{\mu }) \big) = \bigg(\frac{m}{\kappa \mu e} \bigg)^{m/\mu } . $$
(4.23)
We use (
4.23) with
\(m=2k\),
\(\kappa =2\tau C_{1}\),
\(\mu = 2\rho \) and
\(\eta = |\xi |\) to obtain
Taking square roots and using the (rough) Stirling bound
\(k^{k} \leq k! \, e^{k}\) valid for all
, we obtain
(4.24)
This implies with the Sobolev embedding theorem that for any bounded interval
,
\(-\infty < x_{-} < x_{+} < \infty \), and for every fixed
\(\tau >0\), there exist constants
\(C= C(x_{+}, x_{-})>0\) and
\(A(\tau ,\rho )>0\) such that
This means that
\(v(\tau , \, \cdot \,)|_{I}\) is Gevrey-
\(\delta \)-regular with
\(\delta = 1/\min \{1, 2\rho \}\).
To construct the DNNs
\(\psi ^{u}_{\varepsilon }\) in the claim, we proceed in several steps. We first use a (analytic, in the bounded interval
) change of variables
\(s = \exp (x)\) and the fact that Gevrey-regularity is preserved under analytic changes of variables to infer Gevrey-
\(\delta \)-regularity in
of
\(s\mapsto u(\tau ,s)\), for every fixed
\(\tau >0\). This in turn implies the existence of a sequence
\((u_{p}(s))_{p\geq 1}\) of polynomials of degree
in
\([a,b]\) converging in
\(W^{1,\infty }([a,b])\) to
\(u(\tau , \, \cdot \,)\) for
\(\tau >0\) at rate
\(\exp (-b'p^{1/\delta })\) for some constant
\(b'> 0\) depending on
\(a\),
\(b\) and on
\(\delta \geq 1\), but independent of
\(p\). The asserted DNNs are then obtained by approximately expressing the
\(u_{p}\) through ReLU DNNs, again at exponential rates, via Opschoor et al. [
43]. The details are as follows.
The interval
\(s\in [a,b]\) in the assertion of the proposition corresponds to the interval
\(x\in [\log a,\log b]\) under the analytic (in the bounded interval
\([a,b]\)) change of variables
\(x=\log s\). As Gevrey-regularity is known to be preserved under analytic changes of variables (see e.g. Rodino [
46, Proposition 1.4.6]), also
\(u(\tau ,s)|_{s \in [a,b]}\) is Gevrey-
\(\delta \)-regular, with the same index
\(\delta = 1/ \min \{1, 2\rho \} \geq 1\) and with constants in the derivative bounds which depend on
\(0 < a < b < \infty \),
\(\rho \in (0,1]\),
\(\tau > 0\). In particular, for
\(\rho \geq 1/2\),
\(u(\tau ,s)|_{s \in [a,b]}\) is real analytic in
\([a,b]\).
With Gevrey-
\(\delta \)-regularity of
\(s\mapsto u(\tau ,s)\) for
\(s \in [a,b]\) established, we may invoke expression rate bounds for deep ReLU NNs for such functions. In Opschoor et al. [
43, Proposition 4.1], it was shown that for such functions in space dimension
\(d=1\), there exist constants
\(C'>0\),
\(\beta '>0\) such that for every
, there exists a deep ReLU NN
\(\tilde{u}_{\mathcal{N}}\) with
$$\begin{aligned} M(\tilde{u}_{\mathcal{N}}) \leq {\mathcal{N}}, \qquad L(\tilde{u}_{ \mathcal{N}}) &\leq C' {\mathcal{N}}^{\min \{\frac{1}{2}, \frac{1}{d+1/\delta } \}}\log {\mathcal{N}}, \\ \left \| u - \mathrm{R}(\tilde{u}_{\mathcal{N}}) \right \| _{W^{1, \infty }([-1,1]^{d})} &\leq C' \exp \big(- \beta '{\mathcal{N}}^{\min \{\frac{1}{2\delta },\frac{1}{d\delta +1} \}} \big). \end{aligned}$$
This implies that for every
\(0<{\varepsilon }<1/2\), a pointwise error of
\({\mathcal{O}}({\varepsilon })\) in
\([a,b]\) can be achieved by some ReLU NN
\(\psi ^{u}_{\varepsilon }\) of depth
\({\mathcal{O}}(|\log {\varepsilon }|^{\delta } |\log (|\log { \varepsilon }|)|)\) and of size
\({\mathcal{O}}(|\log {\varepsilon }|^{2\delta })\). This completes the proof. □
4.4 Summary and discussion
For prices of derivative contracts on one risky asset whose log-returns are modelled by an LP
\(X\), we have analysed the expression rate of deep ReLU NNs. We have provided two mathematically distinct approaches to the analysis of the expressive power of deep ReLU NNs. The first, probabilistic approach furnishes algebraic expression rates, i.e., pointwise accuracy
\(\varepsilon >0\) on a bounded interval
\([a,b]\) is furnished with DNNs of size
\({\mathcal{O}}(\varepsilon ^{-q})\) with suitable
\(q\geq 0\). The argument is based on approximating the option price by Monte Carlo sampling, estimating the uniform error on
\([a,b]\) and then emulating the resulting average by a DNN. The second, “analytic” approach, leverages regularity of (variational) solutions of the corresponding Kolmogorov partial integro-differential equations and furnishes exponential DNN expression rates. That is, an expression error
\(\varepsilon > 0\) is achieved with DNNs of size
\({\mathcal{O}}(|\log \varepsilon |^{a})\) for suitable
\(a>0\). Key in the second approach were stronger conditions (
4.17) on the characteristic exponent of the LP
\(X\), which imply, as we showed, Gevrey-
\(\delta \)-regularity of the map
\(s\mapsto u(\tau ,s)\) for suitable
\(\tau >0\). This regularity implies in turn exponential rates of polynomial approximation (in the uniform norm on
\([a,b]\)) of
\(s\mapsto u(\tau ,s)\), which is a result of independent interest, and subsequently, by emulation of polynomials with deep ReLU NNs, the corresponding exponential rates.
We remark that in the particular case
\(\delta = 1\), the derivative bounds (
4.24) imply analyticity of the map
\(s\mapsto u(\tau ,s)\) for
\(s\in [a,b]\) which implies the assertion also with the exponential expression rate bound for analytic functions in Opschoor et al. [
43].
We also remark that the smoothing of the solution operator in Proposition
4.8 accommodates payoff functions which belong merely to
\(L^{2}\), as they arise e.g. in particular binary contracts. This is a consequence of the assumption (
4.17), which on the other hand excludes Lévy processes with one-sided jumps. Such processes are covered by Proposition
4.1.
5 DNN approximation rates for multivariate Lévy models
We now turn to DNN expression rates for multivariate geometric Lévy models. This is a typical situation when option prices on baskets of
\(d\) risky assets are of interest, whose log-returns are modelled by multivariate Lévy processes. We admit rather general jump measures, in particular with fully correlated jumps in the marginals, as provided for example by so-called Lévy copula constructions in Kallsen and Tankov [
36].
As in the univariate case, we prove two results on ReLU DNN expression rates of option prices for European-style contracts. The first argument is developed in Sect.
5.1 below and overcomes in particular the curse of dimensionality. Its proof is again based on probabilistic arguments from statistical learning theory. As exponential LPs
\(X^{d}\) generalise geometric Brownian motions, Theorem
5.1 generalises several results from the classical Black–Scholes setting, and we comment on the relation of Theorem
5.1 to these recent results in Sect.
5.2. Owing to the method of proof, the DNN expression rate in Theorem
5.1 delivers an
\(\varepsilon \)-complexity of
\({\mathcal{O}}(\varepsilon ^{-2})\), achieved with potentially shallow DNNs; see Remark
4.5.
The second argument is based on parabolic regularity of the deterministic Kolmogorov PIDE associated to the LP
\(X^{d}\). We show in Theorem
5.4 that polylogarithmic in
\(\varepsilon \) expression rate bounds can be achieved by allowing DNN depth to increase essentially as
\({\mathcal{O}}(|\log \varepsilon |)\). The result in Theorem
5.4 is, however, prone to the curse of dimensionality: the constants implied in the
\({\mathcal{O}}(\, \cdot \, )\) bounds may (and in general will) depend exponentially on
\(d\). We also show that under a hypothesis of sufficiently large time
\(t>0\), parabolic smoothing allows overcoming the curse of dimensionality, with dimension-independent expression rates which are possibly larger than the rate furnished by the probabilistic argument (which is, however, valid uniformly for all
\(t>0\)).
5.1 DNN expression rate bounds via probabilistic arguments
We start by remarking that in this subsection, there is no need to assume ReLU activation.
The following result proves that neural networks are capable of approximating option prices in multivariate exponential Lévy models without the curse of dimensionality if the corresponding Lévy triplets \((A^{d},\gamma ^{d},\nu ^{d})\) are bounded uniformly with respect to the dimension \(d\).
For any dimension
, we assume given a payoff
and a
\(d\)-variate LP
\(X^{d}\), and we denote the option price in time-to-maturity by
(5.1)
We refer to Sato [
48, Chap. 2] for more details on multivariate Lévy processes and to Cont and Tankov [
16, Chap. 11], Eberlein and Kallsen [
21, Sect. 8.1] for more details on multivariate geometric Lévy models in finance.
The next theorem is a main result of the present paper. It states that DNNs can efficiently express prices on possibly large baskets of risky assets whose dynamics are driven by multivariate Lévy processes with general jump correlation structure. The expression rate bounds are polynomial in the number
\(d\) of assets and therefore not prone to the curse of dimensionality. This result partially generalises earlier work on DNN expression rates for diffusion models in Elbrächter et al. [
22], Grohs et al. [
29].
Let
\(\varepsilon \in (0,1]\) be the given target accuracy and consider
\(\bar{\varepsilon } \in (0,1]\) (to be selected later). To simplify notation, we write for
\(s \in [a,b]^{d}\) $$ s e^{X_{T}^{d}} = \big(s_{1} \exp (X_{T,1}^{d}),\ldots ,s_{d} \exp (X_{T,d}^{d}) \big). $$
The proof consists of four steps:
– Step 1 bounds the error that arises when the payoff \(\varphi _{d}\) is replaced by the neural network approximation \(\phi _{\bar{\varepsilon },d}\). As a part of Step 1, we also prove that the \(p\)th exponential moments of the components \(X_{T,i}^{d}\) of the Lévy process are bounded uniformly in the dimension \(d\).
– Step 2 is a technical step that is required for Step 3; it bounds the error that arises when the Lévy process is capped at a threshold \(D>0\). If we were to assume in addition that the output of the neural network \(\phi _{\bar{\varepsilon },d}\) were bounded (this is for example the case if the activation function \(\varrho \) is bounded), then Step 2 could be omitted.
– Step 3 is the key step in the proof. We introduce \(n\) i.i.d. copies of (the capped version of) \(X_{T}^{d}\) and use statistical learning techniques (symmetrisation, Gaussian and Rademacher complexities) to estimate the expected maximum difference between the option price (with neural network payoff) and its sample average. This is then used to construct the approximating neural networks.
– Step 4 combines the estimates from Steps 1–3 and concludes the proof.
Step 1: Assumption (
5.2) and Hölder’s inequality yield for all
\(s \in [a,b]^{d}\) that
(5.6)
with the constant
and we used
\(| \cdot | \leq | \cdot |_{1}\) in the last step. To see that
\(c_{1}\) is indeed finite, note that (
5.5) and Sato [
48, Theorem 25.17] (with the vector
in that result being
\(pe_{i}\)) imply that for any
,
\(i=1,\ldots ,d\), the exponential moment can be bounded as
where the second inequality uses that
\(|e^{z}-1-z| \leq z^{2} e^{p}\) for all
\(z \in [-p,p]\), which can be seen e.g. from the (mean value form of the) Taylor remainder formula.
Step 2: Before proceeding with the key step of the proof, we need to introduce a cut-off in order to ensure that the neural network output is bounded. Let
\(D>0\) and consider the random variable
\(X_{T}^{d,D} = \min (X_{T}^{d},D)\), where the minimum is understood componentwise. Then the Lipschitz property (
5.4) implies that
where
\(\tilde{c}_{1} = 2 b c \exp (5TpB + 2Te^{p}pB)\) and we used
\(| \cdot | \leq | \cdot |_{1}\), Hölder’s inequality, Chernoff’s bound and finally again Hölder’s inequality and (
5.7).
Step 3: Let
\(X_{1},\ldots ,X_{n}\) denote
\(n\) i.i.d. copies of the random vector
\(X_{T}^{d,D}\) and
\(Z_{1},\ldots ,Z_{n}\) i.i.d. standard normal variables, independent of
\(X_{1},\ldots ,X_{n}\). For any separable class of functions
, define the random variable (the so-called
empirical Gaussian complexity)
Consider now for
\(i=1,\ldots ,d\) the function classes
$$ \mathcal{H}_{i} = \{(-\infty ,D]^{d} \ni x \mapsto s \exp (x_{i}) \colon s \in [a,b] \} $$
and, with the notation
\(s \exp (x)=(s_{1} \exp (x_{1}),\ldots ,s_{d}\exp (x_{d}))\), the class
$$ \mathcal{H}=\big\{ (-\infty ,D]^{d} \ni x \mapsto \mathrm{R}(\phi _{ \bar{\varepsilon },d})\big(s \exp (x)\big)-\mathrm{R}(\phi _{ \bar{\varepsilon },d})(0) \colon s \in [a,b]^{d} \big\} . $$
Denoting by
the direct sum of
\(\mathcal{H}_{1},\ldots ,\mathcal{H}_{d}\), we have that
$$ \mathcal{H}= \phi (\tilde{\mathcal{H}}), $$
where
\(\phi = \mathrm{R}(\phi _{\bar{\varepsilon },d})(\, \cdot \, )- \mathrm{R}(\phi _{\bar{\varepsilon },d})(0)\) is a Lipschitz function with Lipschitz constant
\(cd^{\tilde{q}}\) (due to the hypothesis on the Lipschitz constant of the neural network (
5.4)), satisfies
\(\phi (0)=0\) and is bounded on the range of
\(\tilde{\mathcal{H}}\) (which is contained in
\([0,b\exp (D)]^{d}\)). Consequently, Bartlett and Mendelson [
6, Theorem 14] implies that
$$ \hat{G}_{n}(\mathcal{H}) \leq 2 c d^{\tilde{q}} \sum _{i=1}^{d} \hat{G}_{n}(\mathcal{H}_{i}). $$
(5.9)
Let
\(\varepsilon _{1},\ldots ,\varepsilon _{n}\) be an independent collection of Rademacher random variables. We then estimate
(5.10)
Here, the first inequality follows by symmetrisation (see for example Boucheron et al. [
12, Lemma 11.4]), the second follows from the comparison results on Gaussian and Rademacher complexities (see for instance Bartlett and Mendelson [
6, Lemma 4]) with some absolute constant
\(\tilde{c}_{2}\) and the third uses (
5.9).
We note that the constant
\(\tilde{c}_{2}\) in (
5.10) may be chosen as
. Indeed, setting
\(\mathcal{G}=\sigma (\varepsilon _{1},\ldots ,\varepsilon _{n},X_{1}, \ldots ,X_{n})\) and using independence yields
To further simplify (
5.10), we now apply Jensen’s inequality and use independence and
to derive for
\(i=1,\ldots ,d\) that
Combining this with (
5.10) and (
5.7), we obtain that
with
\(c_{2} = 4 \sqrt{\pi /2} c b \exp (5BT p/2 + BTp e^{p} )\). By applying Markov’s inequality (see (
4.10) and (
4.11)), this proves that there exists
\(\omega \in \Omega \) with
Now we observe that
\(s \mapsto \frac{1}{n} \sum _{k=1}^{n} \mathrm{R}(\phi _{ \bar{\varepsilon },d})(se^{X_{k}(\omega )})\) is the realisation of a neural network
\(\tilde{\psi }_{\bar{\varepsilon },d}\) with
\(M(\tilde{\psi }_{\bar{\varepsilon },d}) \leq n M(\phi _{ \bar{\varepsilon },d})\) (see Lemma
3.2). We have therefore proved that for arbitrary
, there exists a neural network
\(\tilde{\psi }_{\bar{\varepsilon },d}\) with
(5.11)
Step 4: In the final step, we now provide appropriate choices of the hyperparameters. We select
\(\bar{\varepsilon } = \varepsilon (c_{1} d^{{\tilde{q}}+\frac{1}{2}p+ \frac{1}{2}}+2)^{-1}\), choose
\(n = \lceil (2 c_{2} d^{{\tilde{q}}+1} \bar{\varepsilon }^{-1})^{2} \rceil \),
\(D= \log (\bar{\varepsilon }^{-1}d^{{\tilde{q}}+1} \tilde{c}_{1})\) and set
\(\psi _{\varepsilon ,d} = \tilde{\psi }_{\bar{\varepsilon },d}\). Then the total number of parameters of the approximating neural network can be estimated, using assumption (
5.3), as
$$\begin{aligned} M(\psi _{\varepsilon ,d}) &= M(\tilde{\psi }_{\bar{\varepsilon },d}) \\ & \leq n M(\phi _{\bar{\varepsilon },d}) \\ & \leq \big(1+(2 c_{2} d^{{\tilde{q}}+1} \bar{\varepsilon }^{-1})^{2} \big) c d^{\tilde{q}} \bar{\varepsilon }^{-q} \\ & \leq (1+4 c_{2}^{2}) c d^{3{\tilde{q}}+2} \bar{\varepsilon }^{-2-q} \\ & \leq \big((1+4 c_{2}^{2}) c(c_{1} +2)^{2+q}\big) d^{({\tilde{q}}+ \frac{1}{2}p+\frac{1}{2})(2+q)+3{\tilde{q}}+2} \varepsilon ^{-2-q}. \end{aligned}$$
(5.12)
Thus the number of weights is bounded polynomially in
\(d\) and
\(\varepsilon ^{-1}\), as claimed. Finally, we combine (
5.6), (
5.8) and (
5.11) to estimate the approximation error as
as claimed. □
The proof of Theorem
5.1 is very similar to the proof of Proposition
4.1. Steps 1 and 4 in the proof of Theorem
5.1 are essentially identical in both proofs. The key difference is in Step 3: in the
\(d\)-dimensional case we cannot use the comparison theorem for Rademacher complexities in Ledoux and Talagrand [
39, Theorem 4.12], but instead need to use a comparison result for Gaussian complexities from Bartlett and Mendelson [
6, Theorem 14]. In the
\(d\)-dimensional case, the truncation with
\(D\) in Step 2 is needed to guarantee that the hypotheses of [
6, Theorem 14] are satisfied; in the 1-dimensional case, this is not required for [
39, Theorem 4.12].
As recently there have been several results on DNN expression rates in high-dimensional diffusion models, a discussion on the relation of the multivariate DNN expression rate result in Theorem
5.1 to other recent mathematical results on DNN expression rate bounds is in order. Given that geometric diffusion models are particular cases of the presently considered models (corresponding to
\(\nu ^{d} = 0\) in the Lévy triplet), it is of interest to consider to which extent the DNN expression error bound in Theorem
5.1 relates to these results.
Firstly, we note that with the exception of Gonon et al. [
27] and Elbrächter et al. [
22], previous results in the literature which are concerned with DNN approximation rates for Kolmogorov equations for diffusion processes (see e.g. Grohs et al. [
30], Berner et al. [
9], Grohs et al. [
29], Reisinger and Zhang [
45] and the references therein) study approximation with respect to the
\(L^{p}\)-norm (
\(p<\infty \)), whereas in Theorem
5.1 we study approximation with respect to the
\(L^{\infty }\)-norm, which requires entirely different techniques. While the results in [
22] rely on a specific structure of the payoff, the proof of the expression rates in [
27] has some similarities with the proof of Theorem
5.1. However, the novelty in the proof of Theorem
5.1 is the use of statistical learning techniques (symmetrisation, Gaussian and Rademacher complexities) which allow weaker assumptions on the activation function than in [
27]. In addition, the class of PDEs considered in [
27] (heat equation and related) is different from the one considered in Theorem
5.1 (Black–Scholes PDE and Lévy PIDE).
Secondly, Theorem
5.1 is the first result on ReLU DNN expression rates for option prices in models with jumps or, equivalently, for
partial integro-differential equations in non-divergence form
for
,
\(\tau > 0\), or, when transformed from log-price variables
\(x_{i}\) to actual price variables
\(s_{i}\) via
\((s_{1},\ldots ,s_{d})=(\exp (x_{1}),\ldots ,\exp (x_{d}))\) (and with the convention
\(s e^{y} = (s_{1}e^{y_{1}},\ldots ,s_{d} e^{y_{d}})\)),
(5.14)
for
\(s \in (0,\infty )^{d}\),
\(\tau > 0\) and with
(see for instance Hilber et al. [
31, Theorem 4.1]). As in our assumptions also
\(A^{d} = 0\) is admissible under suitable conditions on
\(\nu ^{d}\), the present ReLU DNN expression rates are not mere generalisations of the diffusion case, but cover indeed the case of arbitrary pure jump models for both finite and infinite activity Lévy processes satisfying (
5.5).
In the case of
\(X\) being a diffusion with drift, i.e., for
\(\nu ^{d}=0\), the Lévy PIDE reduces to a Black–Scholes PDE. In this particular case, we may compare the result in Theorem
5.1 to the recent results e.g. in Grohs et al. [
29]. The results in the latter article are specialised to the Black–Scholes case in [
29, Sect. 4], where Setting 4.1 specifies the coefficients
\(A^{d}_{i,j}\) (in our notation) as
\(\beta _{i}^{d} \beta _{j}^{d} (B^{d} (B^{d})^{\top })_{i,j}\) for some
,
satisfying
\((B^{d} (B^{d})^{\top })_{k,k} = 1\) for all
,
\(i,j,k=1,\ldots ,d\) and
\(\sup _{d,i} |\beta _{i}^{d}| < \infty \). The coefficient
\(\gamma ^{d}\) is chosen as
\(\alpha ^{d}\) satisfying
\(\sup _{d,i} | \alpha ^{d}_{i}| < \infty \). Using that
\(\Sigma = B^{d} (B^{d})^{\top }\) is symmetric and positive definite, we obtain
\(\Sigma _{i,j} \leq \sqrt{\Sigma _{i,i}\Sigma _{j,j}} = 1\) and hence these assumptions imply that (
5.5) is satisfied. Therefore, the DNN expression rate results from [
29, Sect. 4] can also be deduced from our Theorem
5.1 in the case when the probability measure used to quantify the
\(L^{p}\)-error in [
29] is compactly supported, as in that case the
\(L^{\infty }\)-bounds proved here imply the
\(L^{p}\)-bounds proved in [
29].
5.3 Exponential ReLU DNN expression rates via PIDEs
We now extend the univariate case discussed in Sect.
4.3, and prove an exponential expression rate bound similar to Proposition
4.8 for baskets of
\(d\geq 2\) Lévy-driven assets.
In this subsection, we assume the ReLU activation function
\(\varrho (x)=\max \{x,0\}\). As in Sect.
5.1, we admit a general correlation structure for the marginal processes’ jumps. To prove DNN expression rate bounds, we exploit once more the fact that the stationarity and homogeneity of the
-valued LP
\(X^{d}\) imply that the Kolmogorov equation (
5.13) has constant coefficients. Under the provision that
holds in (
5.13), this allows writing for every
\(\tau >0\) the Fourier transform
\(F_{x\to \xi }v_{d}(\tau , \, \cdot \,) = \hat{v}_{d}(\tau ,\xi )\) as
(5.15)
Here, for
, the symbol is given by
\(\psi (\xi ) = \exp (-ix^{\top }\xi ) {\mathcal{A}}(\partial _{x}) \exp (i x^{\top }\xi )\) with
\({\mathcal{A}}(\partial _{x})\) denoting the constant coefficient spatial integro-differential operator in (
5.13) by Courrège’s second theorem (see e.g. Applebaum [
1, Theorem 3.5.5]), and (
4.21) becomes
(5.16)
In fact,
\(\psi \) can be expressed in terms of the characteristic triplet
\((A^{d},\gamma ^{d},\nu ^{d})\) of the LP
\(X^{d}\) as
We impose again the strong ellipticity assumption (
4.17), but now with
\(|\xi |\) understood as
\(|\xi |^{2} = \xi ^{\top }\xi \) for
. Then, reasoning exactly as in the proof of Proposition
4.8, we obtain with
\(C_{1}>0\) as in (
4.17) for every
\(\tau >0\) for the variational solution
\(v_{d}\) of (
5.13) the bound
(5.18)
Here,
\(D^{k}_{x}\) denotes any weak derivative of total order
with respect to
.
With the Sobolev embedding theorem, we again obtain for any bounded cube
with
\(-\infty < x_{-} < x_{+} < \infty \) and for every fixed
\(\tau >0\) that there exist constants
\(C(d)>0\) and
\(A(\tau ,\rho ) > 0\) such that
(5.19)
In (
5.19), the constant
\(C(d)\) is independent of
\(x_{-},x_{+}\), but depends in general exponentially on the basket size (respectively the dimension)
\(d\geq 2\), and the constant
\(A( \tau ,\rho ) = (2\tau C_{1}\rho )^{-1/(2\rho )}\) denotes the constant from (
5.18) and Stirling’s bound. If
\(\rho =1\) (which corresponds to the case of a non-degenerate diffusion part) and if
\(\tau >0\) is sufficiently large (so that
\((2 \tau C_{1})^{1/(2\rho )} \geq 1\)), the constant is bounded uniformly with respect to the dimension
\(d\). The derivative bound (
5.19) implies that
\(v_{d}(\tau , \, \cdot \,)|_{I^{d}}\) is Gevrey-
\(\delta \)-regular with
\(\delta = 1/\min \{1, 2\rho \}\).
In particular, for
\(\delta = 1\), i.e., when
\(\rho \geq 1/2\), for every fixed
\(\tau >0\), the function
\(x\mapsto v_{d}(\tau ,x)\) is real analytic in
\(I^{d}\), which is the case we consider first. In this case, we perform an affine change of coordinates to transform
\(v_{d}(\tau , \, \cdot \,)\) to the real analytic function
\([-1,1]^{d} \ni \hat{x} \mapsto v_{c}(\tau ,\hat{x})\). This function admits a holomorphic extension to some open set
containing
\([-1,1]^{d}\). By choosing
\(\bar{\varrho } > 1\) (the “semiaxis sums”) sufficiently close to 1, we obtain that
\({\mathcal{E}}_{\bar{\varrho }} \subseteq O\), i.e.,
\(v_{c}(\tau , \, \cdot \,)\) admits a holomorphic extension to
\({\mathcal{E}}_{\bar{\varrho }}\), where the Bernstein polyellipse
is defined as the
\(d\)-fold Cartesian product of the Bernstein ellipse
. More precisely,
\(x\mapsto v_{d}(\tau ,x)\) admits, with respect to each co-ordinate
\(x_{i} \in [x_{-},x_{+}]\) of
\(x\), a holomorphic extension to an open neighborhood of
\([x_{-},x_{+}]\) in ℂ (see e.g. Krantz and Parks [
37, Sect. 1.2]). By Hartogs’ theorem (see e.g. Hörmander [
33, Theorem 2.2.8]), for every fixed
\(\tau >0\), the function
\(x\mapsto v_{d}(\tau ,x)\) admits a holomorphic extension to a polyellipse in
with foci at
\(x_{-},x_{+}\) or, in normalised coordinates
$$ \hat{x}_{i} = \big(T^{-1}(x)\big)_{i} = 2[x_{i} - (x_{-} + x_{+})/2] / (x_{+}-x_{-}),\qquad i=1,\dots ,d, $$
(5.20)
the map
\([-1,1]^{d} \ni \hat{x} \mapsto v_{d}(\tau ,T(\hat{x})) = v_{c}(\tau , \hat{x})\) admits a holomorphic extension to a Bernstein polyellipse
with foci at
\(\hat{x}_{i} = \pm 1\) and semiaxis sums
\(1<\bar{\varrho } = { \mathcal{O}}(A(\tau ,\rho )^{-1})\). As
\(\tau \mapsto A(\tau ,\rho )^{-1}\) is increasing for every fixed value of
\(\rho \), parabolic smoothing increases for
\(\rho \geq 1/2\) the domain of holomorphy with
\(\tau \).
In the general case
\(\delta = 1/\min \{1,2\rho \}\) with
\(\rho >0\) as in (
4.17), ReLU DNN expression rates of multivariate holomorphic (if
\(\rho \geq 1/2\)) and Gevrey-regular (if
\(0<\rho <1/2\)) functions such as
\(\hat{x} \mapsto v_{c}(\tau ,\hat{x})\) have been studied in Opschoor et al. [
43]. The holomorphy or Gevrey-
\(\delta \)-regularity of the map
\(\hat{x} \mapsto {v_{c}}(\tau ,\hat{x})\) implies with [
43, Theorem 3.6, Proposition 4.1] that there exist constants
\(\beta '=\beta '(\bar{\varrho },d)>0\) and
\(C = C(u_{d},\bar{\varrho },d) > 0\) and for every
a ReLU DNN
such that
$$ M(\tilde{u}_{\mathcal{N}}) \leq {\mathcal{N}}, \qquad L(\tilde{u}_{ \mathcal{N}}) \le C {\mathcal{N}}^{\min \{\frac{1}{2}, \frac{1}{d+1/\delta }\}} \log {\mathcal{N}}$$
(5.21)
and such that the error bound
$$\begin{aligned} \|{v_{c}}(\tau , \, \cdot \,) - \tilde{u}_{{\mathcal{N}}}(\, \cdot \, ) \|_{W^{1,\infty }([-1,1]^{d})} \leq C\exp ( -\beta ' {\mathcal{N}}^{ \min \{ \frac{1}{2\delta }, \frac{1}{\delta d+1}\}} ) \end{aligned}$$
(5.22)
holds. Reversing the affine change of variables (
5.20) in the input layer, we obtain the following result on the
\(\varepsilon \)-complexity of the ReLU DNN expression error for
\(x \mapsto v_{d}(\tau ,x)\) at fixed
\(0<\tau \leq T\).
5.4 Breaking the curse of dimensionality
The result in Theorem
5.1 gives an
\({\varepsilon }\) expression error for DNNs whose depth and size are bounded polynomially in terms of
\({\varepsilon }^{-1}d\), for European-style options in multivariate exponential Lévy models. In particular, in Theorem
5.1, the curse of dimensionality is proved to be overcome for a market model with jumps: a DNN expression rate is shown which is algebraic in terms of the target accuracy
\({\varepsilon }>0\) with constants that depend polynomially on the dimension
\(d\). The rates
\(\mathfrak{p},\mathfrak{q} \in [0,\infty )\) can be read off from the proof of Theorem
5.1; however, these constants could be large, thereby affording only low DNN expression rates.
Theorem
5.4, on the other hand, states
exponential expressivity of deep ReLU NNs, i.e., a maximum expression error at time
\(\tau >0\) with accuracy
\(\varepsilon > 0\) can be attained by a deep ReLU NN of size and depth which grow polylogarithmically with respect to
\(|\log \varepsilon |\). This exponential expression rate bound is, however, still prone to the curse of dimensionality (CoD).
In the present section, we further address alternative mathematical arguments on how DNNs can overcome the CoD in the presently considered Lévy models. Specifically, two mathematical arguments in addition to the probabilistic arguments in Sect.
5.1 are presented. Both exploit stationarity of the LP
\(X^{d}\) which implies (
5.15), (
5.16), to obtain DNN expression rates free from the curse of dimensionality.
5.4.1 Barron space analysis
The first alternative approach to Theorem
5.1 is based on verifying, using (
5.15), (
5.16), regularity of option prices in the so-called
Barron space introduced in the fundamental work of Barron [
5]. It provides DNN expression error bounds with explicit values for
\(\mathfrak{p}\) and
\(\mathfrak{q}\), however,
in [5] only for DNNs with sigmoidal activation functions \(\varrho \); similar results for ReLU activations are asserted in E and Wojtowytsch [
19]. For simplicity, we consider here a subset ℬ of the Barron space. An integrable function
belongs to ℬ if
(5.24)
Recall that
\(\hat{f}\) denotes the Fourier transform of
\(f\). The explicit appearance of
\(\hat{f}\) renders the norm
\(\| \cdot \|_{{\mathcal{B}}}\) in (
5.24) particularly suitable for our purposes due to (
5.15)–(
5.17). As was pointed out in [
5,
19], the relevance of the Barron norm
\(\| \cdot \|_{{\mathcal{B}}}\) stems from it being sufficient for dimension-robust DNN approximation rates. For
, consider the two-layer neural networks
\(f_{m}\) given by
(5.25)
with parameters
. Their relevance stems from the following result: If
\(\varrho \) is sigmoidal, i.e., bounded, measurable and
\(\varrho (z) \to 1\) as
\(z \to \infty \),
\(\varrho (z) \to 0\) as
\(z \to -\infty \), then for
\(f\in {\mathcal{B}}\) and for every
\(R>0\), every probability measure
\(\pi \) on
\([-R,R]^{d}\) and every
, there exist parameters
\(\{ (a_{i},w_{i},b_{i}) \}_{i=1}^{m}\) such that for the corresponding DNN
\(f_{m}\) in (
5.25), we have
$$ \textstyle\begin{array}{c} \| f - f_{m} \|_{L^{2}([-R,R]^{d}, \pi )} \leq \max \{1,R\} m^{-1/2} \| f \|_{{\mathcal{B}}}. \end{array} $$
(5.26)
The bound (
5.26) follows from Barron [
5, Theorem 1], and was generalised in E and Wojtowytsch [
19, Eq. (1.3)] to ReLU activation.
The bound in (
5.26) is free from the CoD: the number
\(N\) of parameters in the DNN grows as
\({\mathcal{O}}(md)\) so that
\(m^{-1/2} \leq CN^{-1/2} d^{1/2}\) with an absolute constant
\(C>0\).
With (
5.15), (
5.16), for every
\(\tau \geq 0\), sufficient conditions for
\(x\mapsto v_{d}(\tau ,x)\) to belong to ℬ can be verified. With (
5.26), DNN mean-square expression rate bounds of option prices that are free from the CoD follow.
Pointwise,
\(L^{\infty }\)-norm error bounds can be obtained by using [
19, Eq. (1.4)].
5.4.2 Parabolic smoothing and sparsity of chaos expansions
The second non-probabilistic approach to Theorem
5.1 towards DNN expression error rates not subject to the CoD is based on dimension-explicit derivative bounds of option prices, which allow in turn establishing summability bounds for
generalised polynomial chaos (gpc for short) expansions of these prices. Good summability of gpc coefficient sequences is well known to imply high, dimension-independent rates of approximation by sparse, multivariate polynomials. This in turn implies corresponding expression rates by suitable DNNs; see Schwab and Zech [
49, Theorem 3.9]. Key in this approach is to exploit
parabolic smoothing of the Kolmogorov PDE. The corresponding dimension-independent expression rate results are in general higher than those based on probabilistic or Barron space analysis, but hold only
for sufficiently large \(\tau >0\).
We start by discussing more precisely the dependence of the constants in the proof of Theorem
5.4 on the dimension
\(d\).
The constant
\(C>0\) in the exponential expression rate bounds established in Theorem
5.4 depends in general exponentially on the basket size
\(d\), resp. on the dimension of the solution space of the PIDE (
5.13), due to the reliance on the ReLU DNN expression rate analysis in Opschoor et al. [
43]. Furthermore, the DNN size grows polylogarithmically with respect to the dimension
\(d\), in terms of
\(|\log \varepsilon |\). Considering exponential expression rate bounds, this exponential dependence on
\(d\) in terms of
\(|\log \varepsilon |\) seems in general not avoidable, as can be seen from [
43, Theorem 3.5]. Nevertheless, in Remark
5.8, we already hinted at parabolic smoothing implying sufficient regularity (under the
\(d\)-dependent provision (
5.29) on
\(\tau \)) for constants in DNN expression rate bounds which are polynomial with respect to
\(d\).
In the following paragraphs, we settle for
algebraic DNN expression rates and overcome exponential dependence on
\(d\) in ReLU DNN expression error bounds under certain
sparsity assumptions on polynomial chaos expansions, as shown in Schwab and Zech [
49], Cohen et al. [
15] and the references there. We develop a variation of the results in [
49] in the present context.
We impose the following hypothesis, which takes the place of the lower bound in (
4.17). We still impose
\(| \psi (\xi ) | \leq C_{2} | \xi |^{2\rho } + C_{3}\), i.e., the second condition in (
4.17) holds for each
(but
\(C_{2}\),
\(C_{3}\) and
\(\rho \) in that condition are allowed to depend on
\(d\)).
In comparison to the lower bound in (
4.17), the condition (
5.30) is restricted to the case
\(\rho >\frac{1}{2}\). On the other hand, different exponents
\(\rho _{j}\) are allowed along each component. Furthermore, note that Assumption
5.9 imposes that
\(C_{1}\) does not depend on the dimension
\(d\).
As we shall see below, Assumption
5.9 ensures good “separation” and “anisotropy” properties of the symbol (
5.17) of the corresponding Lévy process
\(X^{d}\).
For
\(\tau >0\) satisfying (
5.29), we now analyse the regularity of
\(x\mapsto v_{d}(\tau ,x)\). From Assumption
5.9, we find that for every
\(\tau >0\),
\(x\mapsto v_{d}(\tau ,x)\) is in
and that its Fourier transform has the explicit form
$$ \hat{v}_{d}(\tau ,\xi ) = F_{x\to \xi }v_{d}(\tau , \, \cdot \,) = \exp \big(-\tau \psi _{X^{d}}(\xi )\big) \hat{v}_{d}(0,\xi ). $$
(5.32)
For a multi-index
, denote by
\(\partial _{x}^{\boldsymbol{\nu }}\) the mixed partial derivative of total order
\(|{\boldsymbol{\nu }}|=\nu _{1}+ \cdots +\nu _{d}\) with respect to
. Formula (
5.32) and Assumption
5.9 can be used to show that for every
\(\tau >0\),
\(x\mapsto v_{d}(\tau ,x)\) is analytic at any
. This is of course the well-known smoothing property of the generator of certain non-degenerate Lévy processes. To address the curse of dimensionality, we quantify the smoothing effect in a
\(d\)-explicit fashion.
To this end, with Assumption
5.9, we calculate for any
at
\(x=0\) (by stationarity, the same bounds hold for the Taylor coefficients at any
) that
We use (
4.23) with
\(m:= \nu _{j}\),
\(\kappa := C_{1} \tau \),
\(\mu =2 \rho _{j}\) to bound the product as
$$ \prod _{j=1}^{d} |\xi _{j}|^{\nu _{j}} \exp (-\tau C_{1} |\xi _{j}|^{2 \rho _{j}}) \leq \prod _{j=1}^{d} \bigg( \frac{\nu _{j}}{2 \rho _{j} \tau C_{1} e} \bigg)^{\nu _{j}/(2\rho _{j})} . $$
For the Taylor coefficient of order
of
\(v_{d}(t,\,\cdot \,)\) at
\(x=0\), we arrive at the bound
(5.33)
Stirling’s inequality
implies in (
5.33) the bound
(5.34)
Here,
\(\rho ' = 1-\frac{1}{2\rho } > 0\) and the positive weight sequence
\(b= (b_{j})_{j\geq 1}\) is given by
\(b_{j}= (2 \rho _{j} \tau C_{1})^{-1/(2\rho _{j} \rho ')}\),
\(j=1,2,\dots \), and multi-index notation is employed: we write
\({\boldsymbol{\nu }}^{-{\boldsymbol{\nu }}} = (\nu _{1}^{\nu _{1}} \nu _{2}^{\nu _{2}}\cdots )^{-1}\),
\(b^{\boldsymbol{\nu }}=b_{1}^{\nu _{1}}b_{2}^{\nu _{2}}\cdots \) and
\({\boldsymbol{\nu }}! = \nu _{1}! \, \nu _{2} ! \, \ldots \), with the convention
\(0!=1\) and
\(0^{0} = 1\). We raise (
5.34) to a power
\(q>0\) with
\(q < 1/\rho '\) and sum the resulting inequality over all
to estimate (generously)
To obtain the estimate (
5.34), one could also use the
\(L^{2}\)-bound with the explicit constant derived in (
5.27), (
5.28).
Under hypothesis (
5.30) and for
\(\tau >0\) satisfying (
5.29),
\(q\)-summability of the Taylor coefficients follows; indeed,
Using that
\(|{\boldsymbol{\nu }}|! \geq {\boldsymbol{\nu }}!\) and that
\(1 \geq q \rho ' >0 \), the multinomial theorem yields
Hence, provided that
$$ \tau > \tau _{0}(d) := \frac{d^{2\rho /q}}{2 \rho C_{1}}, $$
(5.35)
it follows that
(5.36)
Therefore, we have proved
\(q\)-summability of the Taylor coefficients of the map
\(x\mapsto v_{d}(\tau ,x)\) at
\(x=0\) for any
\(\tau > \tau _{0}(d)\) as in (
5.35). The
\(q\)-norm
is bounded independently of
\(d\) provided that
\(\tau > \tau _{0}(d)\) and
is bounded independently of
\(d\).
The
\(q\)-summability (
5.36) of the Taylor coefficients of
\(x\mapsto v_{d}(\tau ,x)\) at
\(x=0\) with
\(q=1\) implies for
\(\tau > \tau _{0}(d)\) absolute, pointwise convergence in the cube
\([-1,1]^{d}\) of
(5.37)
Furthermore, as was shown in Schwab and Zech [
49, Lemma 2.8], the fact that the sequence
\(( t_{\boldsymbol{\nu }})\) is
\(q\)-summable for some
\(0 < q < 1\) and the coefficient bound (
5.34) imply that for
\(\tau > \tau _{0}(d)\) with
\(\tau _{0}(d)\) as defined in (
5.35), there exists a sequence
of
nested, downward closed (i.e., if
\({\boldsymbol{e}}_{j}\in \Lambda _{n}\), then
\({\boldsymbol{e}}_{i}\in \Lambda _{n}\) for all
\(0\leq i\leq j\))
multi-index sets with
\(\#(\Lambda _{n}) \leq n\) such that general polynomial chaos (gpc) approximations given by the partial sums
$$ v_{d}^{\Lambda _{n}}(\tau ,x) = \sum _{{\boldsymbol{\nu }}\in \Lambda _{n}} t_{\boldsymbol{\nu }}x^{\boldsymbol{\nu }}$$
converge at the dimension-independent rate
\(r=1/q-1\) (see e.g. Cohen et al. [
15, Lemma 5.5]), i.e.,
The summability (
5.36) of the coefficients in the Taylor gpc expansion (
5.37) also implies quantitative bounds on the expression rates of ReLU DNNs. With [
49, Theorem 2.7, (ii)], we find that there exists a constant
\(C>0\) independent of
\(d\) such that
$$ \sup _{{\boldsymbol{\nu }}\in \Lambda _{n}} |{\boldsymbol{\nu }}|_{1} \leq C (1+\log n ) . $$
We now refer to [
49, Theorem 3.9] (with
\(q\) in place of
\(p\) in the statement of that result) and, observing that in the proof of that theorem, only the
\(p\)-summability of the Taylor coefficient sequence
\(( t_{\boldsymbol{\nu }})\) was used, we conclude that for
\(\tau >0\) satisfying (
5.35), there exists a constant
\(C>0\) that is independent of
\(d\) and for every
, there exists a ReLU DNN
\(\tilde{v}_{d}^{n}\) with input dimension
\(d\) such that
$$\begin{aligned} M(\tilde{v}_{d}^{n}) &\leq C \big(1+n\log n \log (\log n )\big), \\ L(\tilde{v}_{d}^{n}) &\leq C\big(1+\log n \log (\log n )\big), \\ \sup _{x \in [-1,1]^{d}}| v_{d}(\tau ,x) - \mathrm{R}(\tilde{v}_{d}^{n})(x) | &\leq C n^{-(1/q-1)} . \end{aligned}$$
(5.38)
6 Conclusion and generalisations
We have proved that prices of European-style derivative contracts on baskets of
\(d\geq 1\) assets in exponential Lévy models can be expressed by ReLU DNNs to accuracy
\(\varepsilon > 0\) with DNN size polynomially growing in
\(\varepsilon ^{-1}\) and
\(d\), thereby overcoming the curse of dimensionality. The technique of proof is based on probabilistic arguments and provides expression rate bounds that scale algebraically in terms of the DNN size. We have then also provided an alternative, analytic argument that allows proving
exponential expressivity of ReLU DNNs of the option price, i.e., of the map
\(s\mapsto u(t,s)\) at any fixed time
\(0< t< T\), with the DNN size growing polynomially with respect to
\(\log \varepsilon \) to achieve accuracy
\(\varepsilon > 0\). For sufficiently large
\(t>0\), based on analytic arguments involving parabolic smoothing and sparsity of generalised polynomial chaos expansions, we have established in (
5.38) a second, algebraic expression rate bound for ReLU DNNs that is free from the curse of dimensionality. In forthcoming work (Gonon and Schwab [
28]), we address PIDEs (
5.13) with non-constant coefficients. In addition, the main result of the present paper, Theorem
5.1, could be extended in the following directions.
First, the expression rates are almost certainly not optimal in general; for high-dimensional diffusions, which are a particular case with
\(A^{d} = I\) and
\(\nu ^{d} = 0\), we have established in Elbrächter et al. [
22] for particular payoff functions a spectral expression rate in terms of the DNN size, free from the curse of dimensionality.
Next, solving Hamilton–Jacobi partial integro-differential equations (HJPIDEs for short) by DNNs: It is classical that the Kolmogorov equation for the exponential LP
\(X^{d}\) in Sect.
2.2 is in fact a special case of an HJPIDE (see e.g. Barles et al. [
2], Barles and Imbert [
3]). In forthcoming work [
28], we aim at proving that the expression rate bounds obtained in Sect.
5 imply corresponding expression rate bounds for ReLU DNNs which are free from the curse of dimensionality for viscosity solutions of general HJPIDEs associated to the LP
\(X^{d}\) and for its exponential counterparts.
Barriers: We have considered payoff functions corresponding to European-style contracts. Here, the stationarity of the LP
\(X^{d}\) and exponential Lévy modelling have allowed to reduce our analysis to Cauchy problems of the Kolmogorov equations of
\(X^{d}\) in
. In the presence of barriers, option prices in Lévy models in general exhibit singularities at the barriers. More involved versions of the Fourier-transform-based representations are available (involving a so-called Wiener–Hopf factorisation of the Fourier symbol; see e.g. Boyarchenko and Levendorskiĭ [
13]). For LPs
\(X^{d}\) with bounded exponential moments, the present regularity analysis may be localised to compact subsets, well separated from the barriers, subject to an exponentially small localisation error term; see Hilber et al. [
32, Chap. 10.5]. Here, the semiheavy tails of the LPs
\(X^{d}\) enter crucially in the analysis. We therefore expect the present DNN expression rate bounds to remain valid also for barrier contracts, at least far from the barriers, for the LPs
\(X^{d}\) considered here.
Dividends: We have assumed throughout that contracts do not pay dividends. However, including a dividend stream (with constant rate over
\((0,T]\)) on the underlying does not change the mathematical arguments; we refer to Lamberton and Mikou [
38, Sect. 3.1] for a complete statement of exponential Lévy models with constant dividend payment rate
\(\delta > 0\), and for the corresponding pricing of European- and American-style contracts for such models.
American-style contracts: Deep-learning-based algorithms for the numerical solution of optimal stopping problems for Markovian models have been recently proposed in Becker et al. [
8]. For the particular case of American-style contracts in exponential Lévy models, [
38] provide an analysis in the univariate case and establish qualitative properties of the exercise boundary
\(\{ (b(t),t): 0< t< T \}\). Here, for geometric Lévy models, in certain situations (
\(d=1\), i.e., single risky asset, monotonic, piecewise analytic payoff function), the option price as a function of
at fixed
\(0< t< T\) is shown in [
38] to be a piecewise analytic function which is globally Hölder-continuous with a possibly algebraic singularity at the exercise boundary
\(b(t)\). This holds likewise for the price expressed in the logarithmic coordinate
\(x=\log s \). The ReLU DNN expression rate of such functions has been analysed in Opschoor et al. [
42, Sect. 5.4]. In higher dimensions
\(d>1\), recently also higher Hölder-regularity of the price in symmetric, stable Lévy models has been obtained for smooth payoffs in Barrios et al. [
4].
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.