Top

Finance and Stochastics

Published in:

Open Access 31-08-2021

Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models

Authors: Lukas Gonon, Christoph Schwab

Published in: Finance and Stochastics | Issue 4/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

We study the expression rates of deep neural networks (DNNs for short) for option prices written on baskets of $d$ risky assets whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. We establish sufficient conditions on the characteristic triplet of the Lévy process $X$ that ensure $\varepsilon $ error of DNN expressed option prices with DNNs of size that grows polynomially with respect to ${\mathcal{O}}(\varepsilon ^{-1})$, and with constants implied in ${\mathcal{O}}(\, \cdot \, )$ which grow polynomially in $d$, thereby overcoming the curse of dimensionality (CoD) and justifying the use of DNNs in financial modelling of large baskets in markets with jumps.

In addition, we exploit parabolic smoothing of Kolmogorov partial integro-differential equations for certain multivariate Lévy processes to present alternative architectures of ReLU (“rectified linear unit”) DNNs that provide $\varepsilon $ expression error in DNN size ${\mathcal{O}}(|\log (\varepsilon )|^{a})$ with exponent $a$ proportional to $d$, but with constants implied in ${\mathcal{O}}(\, \cdot \, )$ growing exponentially with respect to $d$. Under stronger, dimension-uniform non-degeneracy conditions on the Lévy symbol, we obtain algebraic expression rates of option prices in exponential Lévy models which are free from the curse of dimensionality. In this case, the ReLU DNN expression rates of prices depend on certain sparsity conditions on the characteristic Lévy triplet. We indicate several consequences and possible extensions of the presented results.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Recent years have seen a dynamic development in applications of deep neural networks (DNNs for short) in expressing high-dimensional input–output relations. This development was driven mainly by the need for quantitative modelling of input–output relationships subject to large sets of observation data. Rather naturally, therefore, DNNs have found a large number of applications in computational finance and financial engineering. We refer to the survey by Ruf and Wang [47] and the references there. Without going into details, we only state that the majority of activity addresses techniques to employ DNNs in demanding tasks in computational finance. The often striking efficient computational performance of DNN-based algorithms raises naturally the question for theoretical, in particular mathematical, underpinning of successful algorithms. Recent years have seen progress, in particular in the context of option pricing for Black–Scholes-type models, for DNN-based numerical approximation of diffusion models on possibly large baskets (see e.g. Berner et al. [9], Elbrächter et al. [22] and Ito et al. [34], Reisinger and Zhang [45] for game-type options). These references prove that DNN-based approximations of option prices on possibly large baskets of risky assets can overcome the so-called curse of dimensionality in the context of affine diffusion models for the dynamics of the (log-)prices of the underlying risky assets. These results could be viewed also as particular instances of DNN expression rates of certain PDEs on high-dimensional state spaces, and indeed corresponding DNN expressive power results have been shown for their solution sets in Grohs et al. [29], Gonon et al. [27] and the references there.

Since the turn of the century, models beyond the classical diffusion setting have been employed increasingly in financial engineering. In particular, Lévy processes and their non-stationary generalisations such as Feller–Lévy processes (see e.g. Böttcher et al. [11, Chap. 2] and the references there) have received wide attention. This can in part be explained by their ability to account for heavy tails of financial data and by Lévy-based models constituting hierarchies of models, comprising in particular classical diffusion (“Black–Scholes”) models with constant volatility that are still widely used in computational finance as a benchmark. Therefore, all results for geometric Lévy processes in the present paper apply in particular to the Black–Scholes model.

The “Feynman–Kac correspondence” which relates conditional expectations of sufficiently regular functionals over diffusions to (viscosity) solutions of corresponding Kolmogorov PDEs extends to multivariate Lévy processes. We mention only Nualart and Schoutens [41], Cont and Tankov [16, Sect. 12.2], Cont and Voltchkova [18], Glau [26], Eberlein and Kallsen [21, Chap. 5.4] and the references there. The Kolmogorov PDE (“Black–Scholes equation”) in the diffusion case is then replaced by a so-called partial integro-differential equation (PIDE) where the fractional integro-differential operator accounting for the jumps is related in a one-to-one fashion with the Lévy measure $\nu ^{d}$ of the

R^{d}

-valued Lévy process $X^{d}$. In particular, Lévy-type models for (log-)returns of risky assets result in nonlocal partial integro-differential equations for the option price which generalise the linear parabolic differential equations which arise in classical diffusion models. We refer to Bertoin [10, Chap. 1], Sato [48, Chaps. 1–5] for fundamentals on Lévy processes and to Böttcher et al. [11, Chap. 2] for extensions to certain non-stationary settings. For the use of Lévy processes in financial modelling, we refer to Cont and Tankov [16, Chap. 11], Eberlein and Kallsen [21, Sect. 8.1] and the references there. We refer to Cont and Voltchkova [18, 17], Matache et al. [40], Hilber et al. [32, Chap. 14] for a presentation and for numerical methods for option pricing in Lévy models.

The results on DNNs in the context of option pricing mentioned above are exclusively concerned with models with continuous price processes. This naturally raises the question whether DNN-based approximations are still capable of overcoming the curse of dimensionality in high-dimensional financial models with jumps which have a much richer mathematical structure. This question is precisely the subject of this article. We study the expression rates of DNNs for prices of options (and the associated PIDEs) written on possibly large baskets of risky assets whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. In particular, we establish sufficient conditions on the characteristic triplet of the Lévy process $X^{d}$ that ensure $\varepsilon $ error of DNN expressed option prices with DNNs of size ${\mathcal{O}}(\varepsilon ^{-2})$, and with constants implied in ${\mathcal{O}}(\, \cdot \, )$ which grow polynomially with respect to $d$. This shows that DNNs are capable of overcoming the curse of dimensionality also for general exponential Lévy models.

Let us outline the scope of our results. The DNN expression rate results proved here give a theoretical justification for neural-network-based non-parametric option pricing methods. These have become very popular recently; see for instance the recent survey by Ruf and Wang [47]. Our results show that if option prices result from an exponential Lévy model, as described e.g. in Eberlein and Kallsen [21, Chap. 3.7], these prices can under mild conditions on the Lévy triplets be expressed efficiently by (ReLU) neural networks, also for high dimensions. The result covers in particular rather general, multivariate correlation structure in the jump part of the Lévy process, for example parametrised by a so-called Lévy copula; see Kallsen and Tankov [36], Farkas et al. [24], Eberlein and Kallsen [21, Chap. 8.1] and the references there. This extends, at least to some extent, the theoretical foundation to the widely used neural-network-based non-parametric option pricing methodologies to market models with jumps.

We prove two types of results on DNN expression rate bounds for European options in exponential Lévy models, with one probabilistic and one “deterministic” proof. The former is based on concepts from statistical learning theory and provides for relevant payoffs (baskets, call on max, …) an expression error ${\mathcal{O}}(\varepsilon )$ with DNN sizes of ${\mathcal{O}}(\varepsilon ^{-2})$ and with constants implied in ${\mathcal{O}}(\, \cdot \, )$ which grow polynomially in $d$, thereby overcoming the curse of dimensionality. The latter bound is based on parabolic smoothing of the Kolmogorov equation and allows us to prove exponential expressivity of prices for positive maturities, i.e., an expression error ${\mathcal{O}}(\varepsilon )$ with DNN sizes of ${\mathcal{O}}(|\log \varepsilon |^{a})$ for some $a>0$, albeit with constants implied in ${\mathcal{O}}(\, \cdot \, )$ possibly growing exponentially in $d$.

For the latter approach, a certain non-degeneracy is required for the symbol of the underlying Lévy process. The probabilistic proof of the DNN approximation rate results, on the other hand, does not require any such assumptions. It only relies on the additive structure of the semigroup associated to the Lévy process and on existence of moments. Thus the results proved here are specifically tailored to the class of option pricing functions (or more generally expectations of exponential Lévy processes) under European-style, plain vanilla payoffs.

The structure of this paper is as follows. In Sect. 2, we review terminology, basic results and financial modelling with exponential Lévy processes. In particular, we also recapitulate the corresponding fractional, partial integro-differential Kolmogorov equations which generalise the classical Black–Scholes equations to Lévy models. Section 3 recapitulates notation and basic terminology for deep neural networks to the extent required in the ensuing expression rate analysis. We focus mainly on so-called ReLU DNNs, but add that corresponding definitions and also results hold for more general activation functions. In Sect. 4, we present a first set of DNN expression rate results, still in the univariate case. This is, on the one hand, for presentation purposes, as this setting allows lighter notation, and for introducing mathematical concepts which will be used subsequently also for contracts on possibly large basket of Lévy-driven risky assets. We also present an application of the results to neural-network-based call option pricing. Section 5 then has the main results of the present paper: expression rate bounds for ReLU DNNs for multivariate, exponential Lévy models. We identify sufficient conditions to obtain expression rates which are free from the curse of dimensionality via mathematical tools from statistical learning theory. We also develop a second argument based on parabolic Gevrey-regularity with quantified derivative bounds, which even yield exponential expressivity of ReLU DNNs, albeit with constants that generally depend on the basket size in a possibly exponential way. Finally, we develop an argument based on quantified sparsity in polynomial chaos expansions and corresponding ReLU expression rates from Schwab and Zech [49] to prove high algebraic expression rates for ReLU DNNs with constants that are independent of the basket size. We also provide a brief discussion of recent, related results. We conclude in Sect. 6 and indicate several possible generalisations of the present results.

2 Exponential Lévy models and PIDEs

2.1 Lévy processes

Fix a complete probability space

(Ω, F, P)

on which all random elements are defined.

We start with the univariate case. We recall that an ℝ-valued continuous-time process $(X_{t})_{t\geq 0}$ is called a Lévy process if it is stochastically continuous, has almost surely RCLL sample paths, satisfies $X_{0} = 0$ almost surely and has stationary and independent increments. See e.g. Bertoin [10, Sect. I.1], Sato [48, Definition 1.6] for discussion and for detailed statements of definitions.

It is shown in these references that a Lévy process (LP for short) $X$ is characterised by its so-called Lévy triplet $(\sigma ^{2},\gamma ,\nu )$, where $\sigma \geq 0$,

γ \in R

and $\nu $ is a measure on

(R, B (R))

with $\nu (\{0\}) = 0$, the so-called jump measure or Lévy measure of the LP $X$ which satisfies

\int_{R} (x^{2} \land 1) ν (d x) < \infty

. For more details on both univariate LPs and the multivariate situation, we refer to [48, Chap. 2].

As in the univariate case, multivariate (

R^{d}

-valued) LPs $X^{d}$ are completely described by their characteristic triplet $(A^{d},\gamma ^{d}, \nu ^{d})$, where

γ^{d} \in R^{d}

is a drift vector,

A^{d} \in R^{d \times d}

is a symmetric, nonnegative definite matrix denoting the covariance matrix of the Brownian motion part of $X^{d}$ and $\nu ^{d}$ is the Lévy measure describing the jump structure of $X^{d}$.

To characterise the dependence structure of a Lévy process, the drift parameter $\gamma ^{d}$ does not play a role. The dependence structure of the diffusion part of $X^{d}$ is characterised by $A^{d}$. Since the continuous part and the jump part of $X^{d}$ are stochastically independent, the dependence structure of the jump part of $X^{d}$ is characterised by the Lévy measure $\nu ^{d}$.

In Kallsen and Tankov [36], a characterisation of admissible jump measures $\nu ^{d}$ for

R^{d}

-valued LPs $X^{d}$ has been obtained as superposition of marginal, univariate Lévy measures with a so-called Lévy copula function.

2.2 Exponential Lévy models

In this article, we are interested in estimating expression rates of deep neural networks for approximating the function

s \mapsto E [φ (s S_{T})]

, where $S$ is an exponential of a $d$-dimensional Lévy process and

φ : R^{d} \to R

an appropriate function. The key motivation for studying such expectations comes from the context of option valuation. Thus, we now outline this relation and always use the language of option pricing, i.e., we refer to these expectations as option prices and to $\varphi $ as the payoff. This interpretation is justified if $S$ is a martingale, and we state below conditions on the Lévy process that guarantee this.

Let the ℝ-valued stochastic process $(S_{t})_{t \in [0,T]}$ model the price of one risky financial asset. Here $T \in (0,\infty )$ is a fixed, finite time horizon. An exponential Lévy model assumes that $S_{t} = S_{0} e^{rt + X_{t}}$, $t \in [0,T]$, where

r \in R

denotes the (constant) interest rate. The model could be specified either under a real-world measure or directly under a risk-neutral measure (constructed using the general change of measure result in Sato [48, Theorems 33.1 and 33.2] of which the Esscher transform in Gerber and Shiu [25] is a particular case, or by minimising certain functionals over the family of equivalent martingale measures; see for instance Jeanblanc et al. [35], Esche and Schweizer [23] and the references therein). The latter situation means that $(S_{t} e^{-rt})_{t \in [0,T]}$ is a martingale, which is equivalent to the condition on the Lévy triplet of $X$ (see e.g. Hilber et al. [32, Lemma 10.1.5]) that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ1_HTML.png

(2.1)

For a $d$-dimensional Lévy process $X^{d}$, [48, Theorem 25.17] shows that the multivariate geometric Lévy process $(e^{X^{d}_{t,1}},\ldots ,e^{X^{d}_{t,d}})_{t \geq 0}$ is a martingale if and only if

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ2_HTML.png

(2.2)

This condition ensures that the functions defined in (2.3) and (5.1) below represent option prices. However, the condition is not needed for the proof of the results later, and so we do not need to impose (2.1) or (2.2) in any of the results proved in the article. We shall, however, impose certain moment or regularity conditions.

For more details on exponential Lévy models, with particular attention to their use in financial modelling, we refer to Cont and Tankov [16, Chap. 11], Lamberton and Mikou [38] and Eberlein and Kallsen [21, Sect. 8.1] and the references there.

2.3 PIDEs for option prices

Let us first discuss the case of a univariate exponential Lévy model. For the multivariate case, we refer to Sect. 5 (cf. (5.1) and (5.14) below).

Consider a European-style option with payoff function $\varphi \colon (0,\infty ) \to [0,\infty )$ and at most polynomial ($p$th order) growth at infinity. Assume for this subsection that (2.1) is satisfied.

The value of the option (under the chosen risk-neutral measure) at time $t \in [0,T]$ is given as the conditional expectation

C_{t} = E [e^{- r (T - t)} φ (S_{T}) | F_{t}]

with the $\sigma $-field $\mathcal{F}_{t} = \sigma (S_{v}:v\in [0,t])$. By the Markov property, $C_{t} = C(t,S_{t})$ and so, switching to time-to-maturity $\tau = T-t$, $u(\tau ,s)=C(T-\tau ,s)$, we can rewrite the option price as

u (τ, s) = E [e^{- r τ} φ (S_{T}) | S_{t} = s] = E [e^{- r τ} φ (s exp (r τ + X_{τ}))]

(2.3)

for $\tau \in [0,T]$, $s \in (0,\infty )$, where the second step uses that $X_{T} - X_{t}$ is independent of $X_{t}$ and has the same distribution as $X_{T-t}$. If the payoff function $\varphi $ is Lipschitz-continuous on ℝ and the Lévy process fulfils either $\sigma >0$ or a certain non-degeneracy condition on $\nu $, then $u$ is continuous on $[0,T)\times (0,\infty )$, $C^{1,2}$ on $(0,T)\times (0,\infty )$ and satisfies the linear, parabolic partial integro-differential equation (PIDE for short)

\begin{aligned} \frac{\partial u}{\partial τ} (τ, s) - r s \frac{\partial u}{\partial s} (τ, s) - \frac{σ^{2} s^{2}}{2} \frac{\partial^{2} u}{\partial s^{2}} (τ, s) - r u (τ, s) \\ - \int_{R} (u (τ, s e^{y}) - u (τ, s) - s (e^{y} - 1) \frac{\partial u}{\partial s} (τ, s)) ν (d y) = 0 \end{aligned}

(2.4)

on $[0,T)\times (0,\infty )$ with initial condition $u(0, \, \cdot \,) = \varphi $; see for instance Cont and Voltchkova [18, Proposition 2]. If the non-degeneracy condition on $\nu $ is dropped, one can still characterise $u$ (transformed to log-price variables) as the unique viscosity solution to the PIDE above. This is established e.g. in [18] (see also Cont and Voltchkova [17, Proposition 3.3]). For our purposes, the representation (2.3) is more suitable. However, by using this characterisation (also called Feynman–Kac representation for viscosity solutions of PIDEs; see Barles et al. [2]), the results formulated below also provide DNN approximations for PIDEs. Finally, note that the interest rate $r$ may also be directly modelled as a part of $X$ by modifying $\gamma $. To simplify the notation, we set $r=0$ in what follows. We also remark that all expression rate results hold verbatim for assets with a constant dividend payment (see e.g. Lamberton and Mikou [38, Eq. (3.1)] for the functional form of the exponential Lévy model in that case).

3 Deep neural networks (DNNs)

This article is concerned with establishing expression rate bounds of deep neural networks (DNNs) for prices of options (and the associated PIDEs) written on possibly large baskets of risky assets whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. The term “expression rate” denotes the rate of convergence to 0 of the error between the option price and its DNN approximation. This rate can be directly translated to quantify the DNN size required to achieve a given approximation accuracy. For instance, in Theorem 5.1 below, an expression rate of $\mathfrak{q}^{-1}$ is established and one may even choose $\mathfrak{q} = 2$ in many relevant cases. We now give a brief introduction to DNNs.

Roughly speaking, a deep neural network (DNN for short) is a function built by multiple concatenations of affine transformations with a (typically nonlinear) activation function. This gives rise to a parametrised family of nonlinear maps; see for example Petersen and Voigtlaender [44] or Buehler et al. [14, Sect. 4.1] and the references there.

Here we follow current practice and refer to the collection of parameters $\Phi $ as “the neural network” and denote by $\mathrm{R}(\Phi )$ its realisation, that is, the function defined by these parameters. More specifically, we use the following terminology (see for example Opschoor et al. [42, Sect. 2]). We first fix a function

ϱ : R \to R

(referred to as the activation function) which is applied componentwise to vector-valued inputs.

Definition 3.1

Let

d, L \in N

. A neural network (with $L$ layers and $d$-dimensional input) is a collection

$$ \Phi = \big((A_{1},b_{1}),\ldots ,(A_{L},b_{L})\big), $$

where $N_{0}=d$,

N_{i} \in N

A_{i} \in R^{N_{i} \times N_{i - 1}}

b_{i} \in R^{N_{i}}

for $i=1,\ldots ,L$ and $(A_{i},b_{i})$ are referred to as the weights of the $i$th layer of the NN. The associated realisation of $\Phi $ is the mapping

R (Φ) : R^{d} \to R^{N^{L}}, x \mapsto R (Φ) (x) = A_{L} x_{L - 1} + b_{L},

where $x_{L-1}$ is given as

$$ x_{0}=x, \qquad x_{\ell }= \varrho (A_{\ell }x_{\ell -1} + b_{\ell }) \quad \text{for } \ell =1,\ldots ,L-1. $$

We call $M_{j}(\Phi ) = \|A_{j}\|_{0} + \|b_{j}\|_{0}$, with $\Vert \cdot \Vert _{0}$ being the number of non-zero entries in ⋅, the number of (non-zero) weights in the $j$th layer and $M(\Phi ) = \sum _{j=1}^{L} M_{j}(\Phi )$ the number of weights of the neural network $\Phi $. We also refer to $M(\Phi )$ as the size of the neural network, write $L(\Phi ) = L$ for the number of layers of $\Phi $ and refer to $N_{o}(\Phi )=N_{L}$ as the output dimension.

We refer to Opschoor et al. [42, Sect. 2] for further details.

The following lemma shows that concatenating $n$ affine transformations with distinct neural networks and taking their weighted average can itself be represented as a neural network. The number of non-zero weights in the resulting neural network can be controlled by the number of non-zero weights in the original neural networks. The proof of the lemma is based on a simple extension of the full parallelisation operation for neural networks (see [42, Proposition 2.5]) and refines Grohs et al. [29, Lemma 3.8].

Lemma 3.2

Let

d, L, n \in N

and let $\Phi ^{1},\ldots ,\Phi ^{n}$ be neural networks with $L$ layers, $d$-dimensional input and equal output dimensions. Let $D_{1},\ldots ,D_{n}$ be $d\times d$-matrices,

c_{1}, \dots, c_{n} \in R^{d}

and

w_{1}, \dots, w_{n} \in R

. Then there exists a neural network $\psi $ such that

R (ψ) (x) = \sum_{i = 1}^{n} w_{i} R (Φ^{i}) (D_{i} x + c_{i}) for all x \in R^{d}

(3.1)

and $M_{j}(\psi ) \leq \sum _{i=1}^{n} M_{j}(\Phi ^{i})$ for $j=2,\ldots ,L$. If in addition $D_{1},\ldots ,D_{n}$ are diagonal matrices and $c_{1}=\cdots =c_{n}=0$, then $M(\psi ) \leq \sum _{i=1}^{n} M(\Phi ^{i})$.

Proof

Write for $i=1,\ldots ,n$

$$ \Phi ^{i} =\big((A_{1}^{i},b_{1}^{i}),\ldots ,(A_{L}^{i},b_{L}^{i}) \big) $$

and define the block matrices Set $\psi = ((A_{1}^{n+1},b_{1}^{n+1}),\ldots ,(A_{L}^{n+1},b_{L}^{n+1})) $. Then for $\ell \in \{1,\ldots ,L-1\}$ and

x \in R^{d}

, it is straightforward to verify that $x_{\ell }$ has a block structure (with subscripts indicating the layers and superscripts indicating the blocks)

$$ x_{\ell }= \begin{pmatrix} x_{\ell }^{1} \\ \vdots \\ x_{\ell }^{n} \end{pmatrix} , $$

with $x_{1}^{i} = \varrho (A_{1}^{i} (D_{i} x + c_{i}) + b_{1}^{i})$, $x_{\ell }^{i} = \varrho (A_{\ell }^{i} x_{\ell -1}^{i} + b_{\ell }^{i})$ for $\ell =2,\ldots ,L-1$ and

$$ \mathrm{R}(\psi )(x) = A^{n+1}_{L} x_{L-1} + b^{n+1}_{L} = \sum _{i=1}^{n} w_{i}(A_{L}^{i} x_{L-1}^{i} + b_{L}^{i}). $$

Hence (3.1) is satisfied and

$$ \begin{aligned} M_{j}(\psi ) &= M_{j}(\Phi ^{1})+\cdots +M_{j}(\Phi ^{n}) \qquad \text{for $j=2,\ldots ,L-1$}, \\ M_{L}(\psi ) & \leq M_{L}(\Phi ^{1}) \mathbbm{1}_{\{w_{1} \neq 0\}} + \cdots + M_{L}(\Phi ^{n}) \mathbbm{1}_{\{w_{n} \neq 0\}}. \end{aligned} $$

If in addition $D_{1},\ldots ,D_{n}$ are diagonal matrices and $c_{1}=\cdots =c_{n}=0$, then $\|A_{1}^{i} D_{i}\|_{0} = \|A_{1}^{i}\|_{0}$ and therefore $M_{1}(\psi ) = M_{1}(\Phi ^{1})+\cdots +M_{1}(\Phi ^{n})$. Thus in this situation, $M(\psi ) = \sum _{j=1}^{L} M_{j}(\psi ) \leq \sum _{j=1}^{L} \sum _{i=1}^{n} M_{j}(\Phi ^{i}) = \sum _{i=1}^{n} M(\Phi ^{i})$, as claimed. □

4 DNN approximations for univariate Lévy models

We study DNN expression rates for option prices under (geometric) Lévy models for asset prices, initially here in one spatial dimension. We present two expression rate estimates for ReLU DNNs, which are based on distinct mathematical arguments; the first, probabilistic argument builds on ideas used in recent works by Gonon et al. [27], Beck et al. [7] and the references there. However, for the key step of the proof, a different technique is used, which is based on the Ledoux–Talagrand contraction principle (see Ledoux and Talagrand [39, Theorem 4.12]) and statistical learning. This new approach is not only technically less involved (in comparison to e.g. the techniques used in [27]), but also allows for weaker assumptions on the activation function; see Proposition 4.1 below. Alternatively, under stronger hypotheses on the activation function, one can also rely on [27, Lemma 2.16]; see Proposition 4.4 below. The probabilistic arguments result in, essentially, an expression error ${\mathcal{O}}(\varepsilon )$ with DNN sizes of ${\mathcal{O}}(\varepsilon ^{-2})$. The second argument draws on parabolic (analytic) regularity furnished by the corresponding Kolmogorov equations and results in far stronger, exponential expression rates, i.e., with an expression error ${\mathcal{O}}(\varepsilon )$ with DNN sizes which are polylogarithmic with respect to $0< {\varepsilon }< 1$. As we shall see in the next section, however, the latter argument is in general subject to the curse of dimensionality.

4.1 DNN expression rates: probabilistic argument

We fix $0< a < b < \infty $ and measure the approximation error in the uniform norm on $[a,b]$. Recall that $M(\Phi )$ denotes the number of (non-zero) weights of a neural network $\Phi $ and $\mathrm{R}(\Phi )$ is the realisation of $\Phi $. Consider the following exponential integrability condition on the Lévy measure $\nu $: for some $p\geq 2$,

$$ \int _{\{|y|>1\}} e^{py} \nu (d y) < \infty . $$

(4.1)

Furthermore, for any function $g$, we denote by $\mathrm{Lip}(g)$ the best Lipschitz constant for $g$.

Proposition 4.1

Suppose the moment condition (4.1) holds. Suppose further the payoff $\varphi $ can be approximated by neural networks, i.e., given a payoff function $s\mapsto \varphi (s)$, there exist constants $c>0$, $q\geq 0$ such that for any $\varepsilon \in (0,1]$, there exists a neural network $\phi _{\varepsilon }$ with

$$\begin{aligned} |\varphi (s)-\mathrm{R}(\phi _{\varepsilon })(s)| & \leq \varepsilon c (1+|s|), \qquad s \in (0,\infty ), \end{aligned}$$

(4.2)

$$\begin{aligned} M(\phi _{\varepsilon }) &\leq c \varepsilon ^{-q}, \end{aligned}$$

(4.3)

$$\begin{aligned} \mathrm{Lip}\big(\mathrm{R}(\phi _{\varepsilon })\big) & \leq c. \end{aligned}$$

(4.4)

Then there exist $\kappa \in [c,\infty )$ (depending on the interval $[a,b]$) and neural networks $\psi _{\varepsilon }$, $\varepsilon \in (0,1]$, such that for any target accuracy $\varepsilon \in (0,1]$, the number of weights is bounded by $M(\psi _{\varepsilon }) \leq \kappa \varepsilon ^{-2-q}$ and the approximation error between the neural network $\psi _{\varepsilon }$ and the option price $u$ from (2.3) is at most $\varepsilon $, that is,

$$ \sup _{s \in [a,b]} |u(T,s) - \mathrm{R}(\psi _{\varepsilon })(s)| \leq \varepsilon . $$

Remark 4.2

In relevant examples such as e.g. plain vanilla European options, the payoff can be represented exactly as a neural network $\phi $. Then one can choose $\phi _{\varepsilon }=\phi $ for all $\varepsilon \in (0,1]$ and so (4.2)–(4.4) are satisfied with $q=0$, $c=\max \{M(\phi ),\mathrm{Lip}( \mathrm{R}(\phi ))\}$. Examples include call options, straddles and butterfly payoff functions (when $\varrho $ is the ReLU activation function given by $x \mapsto \max \{x,0\}$).

Remark 4.3

In Proposition 4.1, the time horizon $T>0$ is finite and fixed. As is evident from the proof, the constant $\kappa $ depends on $T$.

Proof of Proposition 4.1

Let $\varepsilon \in (0,1]$ be the given target accuracy and fix $\bar{\varepsilon } \in (0,1]$ (to be specified later). Denote $\phi = \phi _{\bar{\varepsilon }}$. First, (4.2) and (4.4) show for any $s \in (0,\infty )$ that

$$ \begin{aligned} |\varphi (s)|&\leq |\varphi (s)-\mathrm{R}(\phi )(s)|+|\mathrm{R}( \phi )(s)-\mathrm{R}(\phi )(0)| + |\mathrm{R}(\phi )(0)| \\ & \leq \bar{\varepsilon } c (1+|s|) + c|s|+|\mathrm{R}(\phi )(0)|. \end{aligned} $$

Thus $\varphi $ is at most linearly growing at $\infty $. Hence we obtain

E [φ (s e^{X_{T}})] < \infty

since even the second exponential moment is finite, i.e.,

E [e^{2 X_{T}}] < \infty,

(4.5)

due to the assumed integrability (4.1) of $\nu $ and Sato [48, Theorem 25.17].

Now recall that

u (T, s) = E [φ (s e^{X_{T}})] .

Combining this with assumption (4.2) yields for all $s \in [a,b]$ that

\begin{aligned} | u (T, s) - E [R (ϕ) (s e^{X_{T}})] | & \leq E [| φ (s e^{X_{T}}) - R (ϕ) (s e^{X_{T}}) |] \\ \leq \bar{ε} c (1 + | s | E [e^{X_{T}}]) \leq \bar{ε} c_{1} \end{aligned}

(4.6)

with the constant

c_{1} = c (1 + b E [e^{X_{T}}])

being finite due to (4.5).

In the second step, let $X^{1},\ldots ,X^{n}$ denote $n$ i.i.d. copies of $X$ and introduce an independent collection of Rademacher random variables $\varepsilon _{1},\ldots ,\varepsilon _{n}$. This means that $\varepsilon _{1}, \dots , \varepsilon _{n}$ are i.i.d., take the values $\pm 1$ with probabilities $\frac{1}{2}$ each and are independent of all other random variables introduced before. Write $f(s) = \mathrm{R}( \phi )(s)-\mathrm{R}(\phi )(0)$ for $s \in (0,\infty )$. Note that the mapping

R^{n} \times R^{n} ∋ (x, y) \mapsto {sup}_{s \in [a, b]} | \sum_{k = 1}^{n} y_{k} f (s e^{x_{k}}) |

is Borel-measurable, because the supremum over $s \in [a,b]$ equals the supremum over $s \in [a,b] \cap \mathbbm{Q}$ due to continuity of $f$. The same reasoning guarantees that the suprema over $s \in [a,b]$ in (4.7) and (4.8) below are indeed random variables.

Using independence and a standard symmetrisation argument (see for example Boucheron et al. [12, Lemma 11.4]), we obtain

\begin{aligned} E [sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k}}) |] \\ \leq 2 E [sup_{s \in [a, b]} | \frac{1}{n} \sum_{k = 1}^{n} ε_{k} f (s e^{X_{T}^{k}}) |] . \end{aligned}

(4.7)

Elementary properties of conditional expectations in the first step and Ledoux and Talagrand [39, Theorem 4.12], with $T$ in that result chosen as

T_{x_{1}, \dots, x_{n}} = {t \in R^{n} : t_{1} = s e^{x_{1}}, \dots, t_{n} = s e^{x_{n}} for some s \in [a, b]},

in the second step show that

\begin{aligned} 2 E [sup_{s \in [a, b]} | \frac{1}{n} \sum_{k = 1}^{n} ε_{k} f (s e^{X_{T}^{k}}) |] \\ = \frac{2}{n} E [E [sup_{t \in T_{x_{1}, \dots, x_{n}}} | \sum_{k = 1}^{n} ε_{k} f (t_{k}) |] |_{x_{1} = X_{T}^{1}, \dots, x_{n} = X_{T}^{n}}] \\ \leq \frac{4}{n} Lip (R (ϕ)) E [E [sup_{t \in T_{x_{1}, \dots, x_{n}}} | \sum_{k = 1}^{n} ε_{k} t_{k} |] |_{x_{1} = X_{T}^{1}, \dots, x_{n} = X_{T}^{n}}] \\ = \frac{4}{n} Lip (R (ϕ)) E [sup_{s \in [a, b]} | \sum_{k = 1}^{n} ε_{k} s e^{X_{T}^{k}} |] \\ \leq \frac{4 b}{n} Lip (R (ϕ)) E [| \sum_{k = 1}^{n} ε_{k} e^{X_{T}^{k}} |] . \end{aligned}

(4.8)

On the other hand, using Jensen’s inequality, independence and

E [ε_{k} ε_{ℓ}] = δ_{k, ℓ}

yields

E [| \sum_{k = 1}^{n} ε_{k} e^{X_{T}^{k}} |] \leq E {[{| \sum_{k = 1}^{n} ε_{k} e^{X_{T}^{k}} |}^{2}]}^{1 / 2} = {(\sum_{k = 1}^{n} E [e^{2 X_{T}^{k}}])}^{1 / 2} = \sqrt{n} E {[e^{2 X_{T}}]}^{1 / 2} .

Combining this with (4.7), (4.8) and (4.4), we obtain that

E [sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k}}) |] \leq \frac{c_{2}}{\sqrt{n}}

(4.9)

with

c_{2} = 4 b c E {[e^{2 X_{T}}]}^{1 / 2}

, which is finite again due to (4.5).

In a third step, we now apply Markov’s inequality and then insert (4.9) to estimate

\begin{aligned} P [sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k}}) | \geq \frac{3 c_{2}}{2 \sqrt{n}}] \\ \leq \frac{2 \sqrt{n}}{3 c_{2}} E [sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k}}) |] \leq \frac{2}{3} . \end{aligned}

(4.10)

This proves in particular that

P [sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k}}) | \leq \frac{2 c_{2}}{\sqrt{n}}] > 0 .

Therefore there exists $\omega \in \Omega $ with

sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k} (ω)}) | \leq \frac{2 c_{2}}{\sqrt{n}} .

(4.11)

Lemma 3.2 proves that $s \mapsto \frac{1}{n} \sum _{k=1}^{n} \mathrm{R}(\phi )(se^{X_{T}^{k}( \omega )}) $ is itself the realisation of a neural network $\tilde{\psi }$ with $M(\tilde{\psi }) \leq n M(\phi )$, and hence we have proved the existence of a neural network $\tilde{\psi }$ with

sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - R (\tilde{ψ}) (s) | \leq \frac{2 c_{2}}{\sqrt{n}} .

(4.12)

The final step consists in selecting $\bar{\varepsilon } = \varepsilon (c_{1}+1)^{-1}$, choosing $n = \lceil (2 c_{2} \bar{\varepsilon }^{-1})^{2} \rceil $, setting $\psi _{\varepsilon } = \tilde{\psi }$, noting (with $\kappa = c(1+4 c_{2}^{2})(c_{1}+1)^{2+q}$) that

$$ M(\psi _{\varepsilon }) = M(\tilde{\psi }) \leq n M(\phi ) \leq \big(1+(2 c_{2} \bar{\varepsilon }^{-1})^{2}\big) c \bar{\varepsilon }^{-q} \leq c(1+4 c_{2}^{2}) \bar{\varepsilon }^{-2-q} = \kappa \varepsilon ^{-2-q} $$

and combining (4.12) with (4.6) to estimate

\begin{aligned} sup_{s \in [a, b]} | u (T, s) - R (ψ_{ε}) (s) | \\ \leq sup_{s \in [a, b]} | u (T, s) - E [R (ϕ) (s e^{X_{T}})] | + sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - R (\tilde{ψ}) (s) | \\ \leq \bar{ε} (c_{1} + 1) = ε . \end{aligned}

□

Proposition 4.4

Consider the setting of Proposition 4.1, but instead of (4.4) assume that $\mathrm{R}(\phi _{\varepsilon })$ is $C^{1}$ and there is a constant $c>0$ such that for every $s\in (0,\infty )$, we have

$$ |\mathrm{R}(\phi _{\varepsilon })'(s)| \leq c. $$

(4.13)

Then the assertion of Proposition 4.1remains valid.

Proof

We present two different approaches to prove Proposition 4.4. The first combines (4.13) with the mean value theorem to obtain that (4.4) is satisfied. Hence, Proposition 4.1 can be applied. The second approach is based on a different technique to obtain (4.9). First, let us verify that (4.13) and (4.2) yield a linear growth condition for $\mathrm{R}(\phi _{\varepsilon })$. Indeed, we may use the triangle inequality to estimate for any $\varepsilon \in (0,1]$, $s \in (0,\infty )$ that

$$\begin{aligned} |\mathrm{R}(\phi _{\varepsilon })(s)| & \leq |\mathrm{R}(\phi _{ \varepsilon })(s)-\mathrm{R}(\phi _{\varepsilon })(0)| + |\mathrm{R}( \phi _{\varepsilon })(0)-\varphi (0)| + |\varphi (0)| \\ & \leq \max \{c,|\varphi (0)|\} (1+|s|). \end{aligned}$$

(4.14)

Now the same proof as for Proposition 4.1 applies; only the second step needs to be adapted. In other words, we prove the estimate (4.9) with a different constant $c_{2}$ by using a different technique.

To do this, we again let $X^{1},\ldots ,X^{n}$ denote $n$ i.i.d. copies of $X$. Applying Gonon et al. [27, Lemma 2.16] (with random fields $\xi _{k}(s,\omega )=\mathrm{R}(\phi )(se^{X_{T}^{k}(\omega )})$, $k=1,\ldots ,n$, which satisfy the hypotheses of [27, Lemma 2.16] thanks to (4.5) and (4.13)) in the first inequality and using (4.13) and (4.14) for the second inequality then proves that

\begin{aligned} E [sup_{s \in [a, b]} | E [R (ϕ) (s e^{X_{T}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ) (s e^{X_{T}^{k}}) |] \\ \leq \frac{32 \sqrt{e}}{\sqrt{n}} sup_{s \in [a, b]} (E {[{| R (ϕ) (s e^{X_{T}}) |}^{2}]}^{1 / 2} + (b - a) E {[{| R {(ϕ)}^{'} (s e^{X_{T}}) e^{X_{T}} |}^{2}]}^{1 / 2}) \\ \leq \frac{32 max {c, | φ (0) |} \sqrt{e}}{\sqrt{n}} (1 + b E {[e^{2 X_{T}}]}^{1 / 2} + (b - a) E {[e^{2 X_{T}}]}^{1 / 2}), \end{aligned}

which is a bound as in (4.9) with constant

c_{2} = 32 max {c, | φ (0) |} \sqrt{e} (1 + b E {[e^{2 X_{T}}]}^{1 / 2} + (b - a) E {[e^{2 X_{T}}]}^{1 / 2}) .

□

Remark 4.5

The architecture of the neural network approximations constructed using probabilistic arguments in Proposition 4.1, Proposition 4.4 and also Theorem 5.1 below differ from architectures obtained by analytic arguments; see Proposition 4.8 and Theorem 5.4 below. While the neural networks in the latter results are deep in any situation, the architecture of the neural networks in the former situation depends heavily on the architecture of the neural network $\phi _{\varepsilon }$ used to approximate the payoff function $\varphi $. Therefore, in certain simple situations, the approximating neural network $\psi _{\varepsilon }$ may be a shallow neural network, that is, a neural network with only $L=2$ layers. For example, by (4.6) or (2.3), the function $\varphi $ is specified in the variable $s>0$ and not in the log-return variable $x$. This implies e.g. for a plain vanilla European call that $\varphi (s) = (s-K)^{+}$ must be emulated by a ReLU NN, which can be done using the simple 2-layer neural network $\phi _{0} = ((1,-K),(1,0))$, that is, $\mathrm{R}(\phi _{0})=\varphi $.

4.2 DNN expression of European calls

In this section, we illustrate how the results of Proposition 4.1 can be used to bound DNN expression rates of call options on exponential Lévy models.

Suppose we observe call option prices for a fixed maturity $T$ and $N$ different strikes $K_{1},\ldots ,K_{N}>0$. Denote these prices by $\hat{C}(T,K_{1}),\ldots ,\hat{C}(T,K_{N})$. A task frequently encountered in practice is to extrapolate from these prices to prices corresponding to unobserved maturities or to learn a non-parametric option pricing function. A widely used approach is to solve

$$ \min _{\phi \in \mathcal{H}} \frac{1}{N} \sum _{i=1}^{N} \bigg( \frac{\hat{C}(T,K_{i})}{K_{i}}-\phi (S_{0}/K_{i})\bigg)^{2}. $$

(4.15)

Here ℋ is a suitable collection of (realisations of) neural networks, for example all networks with an a-priori fixed architecture. In fact, many of the papers listed in the recent review by Ruf and Wang [47] use this approach or a variation of it, where for example an absolute value is inserted instead of a square or $\hat{C}(T,K_{i})/K_{i}$ is replaced by $\hat{C}(T,K_{i})$ and $S/K_{i}$ by $K_{i}$.

In this section, we assume that the observed call prices are generated from an (assumed unknown) exponential Lévy model and ℋ consists of ReLU networks. Then we show that the error in (4.15) can be controlled and that we can give bounds on the number of non-zero parameters of the minimising neural network. The following result is a direct consequence of Proposition 4.1. It shows that ${\mathcal{O}}(\varepsilon ^{-1})$ weights suffice to achieve an error of at most $\varepsilon $ in (4.15).

Proposition 4.6

Assume that

\hat{C} (T, K_{i}) = E [{(S_{T} - K_{i})}^{+}] for i = 1, \dots, N,

with $S_{T} = S_{0} \exp (X_{T})$ and $X$ an (unknown) Lévy process satisfying (4.1). For any $\kappa >0$, $\varepsilon \in (0,1]$, we let $\mathcal{H}_{\kappa ,\varepsilon }$ denote the set of all (realisations of) neural networks with at most $\kappa \varepsilon ^{-1}$ non-zero weights and choose $\varrho (x)=\max \{x,0\}$ as activation function. Then there exists $\kappa \in (0,\infty )$ such that for all $\varepsilon \in (0,1]$,

$$ \min _{\phi \in \mathcal{H}_{\kappa ,\varepsilon }} \frac{1}{N} \sum _{i=1}^{N} \bigg(\frac{\hat{C}(T,K_{i})}{K_{i}}-\phi (S_{0}/K_{i})\bigg)^{2} \leq \varepsilon . $$

Proof

First, choose the interval $[a,b]$ by setting $a = \min \{S_{0}/K_{1},\ldots ,S_{0}/K_{N}\}$ and $b=\max \{ S_{0}/K_{1}, \ldots , S_{0}/K_{N}\}$. We note that the function $\varphi (s)=(s-1)^{+}$ can be represented by the 2-layer neural network $\phi _{0} = ((1,-1),(1,0))$, that is, $\mathrm{R}(\phi _{0})=\varphi $. Thus Proposition 4.1 can be applied (with $\phi _{\varepsilon }=\phi _{0}$ for all $\varepsilon \in (0,1]$ and $q=0$, $c=3$) and there exist $\kappa \in [3,\infty )$ and neural networks $\psi _{\delta }$, $\delta \in (0,1]$, such that for any $\delta \in (0,1]$, we have $M(\psi _{\delta }) \leq \kappa \delta ^{-2}$ and

$$ \sup _{s \in [a,b]} |u(T,s) - \mathrm{R}(\psi _{\delta })(s)| \leq \delta $$

with

u (T, s) = E [{(s e^{X_{T}} - 1)}^{+}]

. Therefore,

$$ \begin{aligned} & \frac{1}{N} \sum _{i=1}^{N} \bigg(\frac{\hat{C}(T,K_{i})}{K_{i}}- \mathrm{R}(\psi _{\delta })(S_{0}/K_{i})\bigg)^{2} \\ & = \frac{1}{N} \sum _{i=1}^{N} \big(u(T,S_{0}/K_{i})-\mathrm{R}( \psi _{\delta })(S_{0}/K_{i})\big)^{2} \leq \delta ^{2}. \end{aligned} $$

Setting $\varepsilon = \delta ^{2}$ and noting $\mathrm{R}(\psi _{\delta }) \in \mathcal{H}_{\kappa ,\varepsilon }$ then finishes the proof. □

Remark 4.7

The proof shows that $\kappa $ is independent of $N$. This can also be seen by observing that the result directly generalises to an infinite number of call options with strikes in a compact interval $\mathcal{K} = [\underline{K},\overline{K}]$ with $\underline{K}> 0$, $\overline{K}< \infty $. Indeed, let $\mu $ be a probability measure on $(\mathcal{K},\mathcal{B}(\mathcal{K}))$. Then choosing $\psi _{\delta }$, $\delta = \varepsilon ^{2}$ as in the proof of Proposition 4.6 and $a = S_{0}/\overline{K}$, $b= S_{0} / \underline{K}$ yields $\mathrm{R}(\psi _{\delta }) \in \mathcal{H}_{\kappa ,\varepsilon }$ and

$$ \begin{aligned} & \int _{\mathcal{K}} \bigg(\frac{\hat{C}(T,K)}{K}-\mathrm{R}(\psi _{ \delta })(S_{0}/K)\bigg)^{2} \mu (d K) \\ = & \int _{\mathcal{K}} \big(u(T,S_{0}/K)-\mathrm{R}(\psi _{\delta })(S_{0}/K) \big)^{2} \mu (d K) \leq \varepsilon . \end{aligned} $$

4.3 ReLU DNN exponential expressivity

We now develop a second argument for bounding the expressivity of ReLU DNNs for the option price $u(\tau ,s)$ solving (2.4) with initial condition $u(0,s) = \varphi (s)$. In particular, in this subsection, we choose $\varrho (x)=\max \{x,0\}$ as activation function.

As in the preceding first, probabilistic argument, we consider the DNN expression error in a bounded interval $[a,b]$ with $0< a< s< b<\infty $. The second argument is based on parabolic smoothing of the linear parabolic PIDE (2.4). This in turn ensures smoothness of $s\mapsto u(\tau ,s)$ at positive times $\tau >0$, i.e., smoothness in the “spatial” variable $s\in [a,b]$ resp. in the log-return variable $x=\log s \in [\log a,\log b ]$, even for non-smooth payoff functions $\varphi $ (so in particular, binary options with discontinuous payoffs $\varphi $ are admissible, albeit at the cost of non-uniformity of derivative bounds at $\tau \downarrow 0$). It is a classical result that this implies spectral, possibly exponential convergence of polynomial approximations of $u(\tau ,\, \cdot \,)|_{[a,b]}$ in $L^{\infty }([a,b])$. As observed in Opschoor et al. [43, Sect. 3.2], this exponential polynomial convergence rate implies also exponential expressivity of ReLU DNNs of $u(\tau , \, \cdot \,)|_{[a,b]}$ in $L^{\infty }([a,b])$ for any $\tau >0$.

To ensure smoothing properties of the solution operator of the PIDE, we require additional assumptions (see (4.17) below) on the Lévy triplet $(\sigma ^{2},\gamma ,\nu )$. To formulate these, we recall the Lévy symbol $\psi $ of the ℝ-valued LP $X$ as

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ21_HTML.png

(4.16)

Proposition 4.8

Suppose that the symbol $\psi $ of the LP $X$ is such that there exist $\rho \in (0,1]$ and constants $C_{i} > 0$, $i=1,2,3$, such that for all

ξ \in R

, we have

$$ {\mathrm{Re}}\psi (\xi ) \geq C_{1} | \xi |^{2\rho }, \qquad | \psi (\xi ) | \leq C_{2} | \xi |^{2\rho } + C_{3} . $$

(4.17)

Then for every $v_{0}$ such that

v_{0} = φ \circ exp \in L^{2} (R)

, every $0< \tau \leq T < \infty $, every $0< a< b<\infty $ and every $0<{\varepsilon }<1/2$, there exist neural networks $\psi ^{u}_{\varepsilon }$ which express the solution $u(\tau , \,\cdot \,)|_{[a,b]}$ to accuracy ${\varepsilon }$, i.e.,

$$ \sup _{s \in [a,b]}| u(\tau , s) - \mathrm{R}(\psi ^{u}_{ \varepsilon })(s) | \leq {\varepsilon }. $$

Furthermore, there exists a constant $C' > 0$ such that with $\delta = \frac{1}{\min \{ 1, 2 \rho \}} \geq 1$, we have

$$ M(\psi ^{u}_{\varepsilon }) \leq C' |\log {\varepsilon }|^{2\delta } , \qquad L(\psi ^{u}_{\varepsilon }) \leq C' |\log {\varepsilon }|^{ \delta } |\log (|\log {\varepsilon }| ) | . $$

Remark 4.9

A sufficient condition on the Lévy triplet which ensures (4.17) is as follows. Let $X$ be a Lévy process with characteristic triplet $(\sigma ^{2},\gamma ,\nu )$ and Lévy density $k(z)$, where $\nu (d z) = k(z)d z$ satisfies the following:

1) There are constants $\beta _{-} >0$, $\beta _{+} >1$ and $C > 0$ such that

$$ k(z) \le C \textstyle\begin{cases} e^{-\beta _{-} \left |z\right |}, & \quad \text{$z< -1$,} \\ e^{-\beta _{+} z}, &\quad \text{$z>1$.} \end{cases} $$

2) Furthermore, there exist constants $0<\alpha <2$ and $C_{+} > 0$ such that

$$ k(z) \le C_{+} \frac{1}{\left |z\right |^{1+\alpha }}, \qquad 0< \left |z \right |< 1. $$

3) If $\sigma = 0$, we assume additionally that there is a $C_{-} > 0$ such that

$$ \frac{1}{2}\big(k(z)+k(-z)\big) \ge C_{-} \frac{1}{\left |z\right |^{1+\alpha }}, \qquad 0< \left |z\right |< 1. $$

Then (4.17) is satisfied (see Hilber et al. [32, Lemma 10.4.2]). Here, $\rho = 1$ if $\sigma > 0$ and otherwise $\rho = \alpha /2$.

Proof of Proposition 4.8

The proof proceeds in several steps. First, we apply the change of variables

x = log s \in R

in order to leverage the stationarity of the LP $X$ for obtaining a constant coefficient Kolmogorov PIDE. Assumption (4.17) then ensures well-posedness of the PIDE in a suitable variational framework. We then exploit that stationarity of the LP $X$ facilitates the use of Fourier transformation; the lower bound on $\psi $ in (4.17) will allow deriving sharp, explicit bounds on high spatial derivatives of (variational) solutions of the PIDE which imply Gevrey-regularity of these solutions on bounded intervals $[a,b]\subseteq (0,\infty )$. Gevrey-regularity in turn implies exponential rates of convergence of polynomial and deep ReLU NN approximations of $s\mapsto u(\tau ,s)$ for $\tau >0$, whence we obtain the assertion of the theorem. We recall that for $\delta \geq 1$, a smooth function $x\mapsto f(x)$ is Gevrey-$\delta $-regular in an open subset

D \subseteq R^{d}

if $f\in C^{\infty }(D)$ and for every compact set $\kappa \subseteq D$, there exists $C_{\kappa }> 0$ such that for all

α \in N_{0}^{d}

and every $x\in \kappa $, we have $|D_{x}^{\alpha }f(x)| \leq C_{\kappa }^{|\alpha |+1} (\alpha !)^{\delta }$. Note that $\delta =1$ implies that $f$ is real analytic in $\kappa $. We refer to Rodino [46, Sect. 1.4] for details, examples and further references.

We change coordinates to $x = \log s \in (-\infty ,\infty )$ so that $v(\tau ,x)=u(\tau ,e^{x})$. Then the PIDE (2.4) takes the form (see e.g. Matache et al. [40, Sect. 3], Lamberton and Mikou [38, Sect. 3.1])

\frac{\partial v}{\partial τ} - \frac{σ^{2}}{2} \frac{\partial^{2} v}{\partial x^{2}} - (γ + r) \frac{\partial v}{\partial x} + A [v] + r v = 0 in (0, T) \times R,

(4.18)

where $A$ denotes the integro-differential operator

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equab_HTML.png

together with the initial condition

$$ v(0,x)= \varphi (e^{x}) = (\varphi \circ \exp )(x). $$

(4.19)

Then $C(t, s)= v(T-t, \log s)$ satisfies

C (t, S_{t}) = E [e^{r (t - T)} φ (S_{T}) | F_{t}] .

(4.20)

Conversely, if $C(t,s)$ in (4.20) is sufficiently regular, then $v(\tau , x) = C(T-\tau , e^{x})$ is a solution of (4.18), (4.19) (recall that we assume $r=0$ for notational simplicity).

The Lévy–Khintchine formula describes the ℝ-valued LP $X$ by the log-characteristic function $\psi $ of the random variable $X_{1}$. From the time-homogeneity of the LP $X$,

E [e^{i ξ X_{t}}] = e^{- t ψ (ξ)}, \forall t > 0 .

(4.21)

The Lévy exponent $\psi $ of the LP $X$ admits the explicit representation (4.16).

The Lévy exponent $\psi $ is the symbol of the pseudo-differential operator $-{\mathcal{L}}$, where ℒ is the infinitesimal generator of the semi-group of the LP $X$. Here ${\mathcal{A}}=-{\mathcal{L}}$ is the spatial operator in (4.18) given by

$$ {\mathcal{A}}[f](x) = - \frac{\sigma ^{2}}{2} \frac{d^{2} f}{dx^{2}}(x) - \gamma \frac{df}{dx}(x) + A[f](x). $$

(4.22)

For

f, g \in C_{0}^{\infty} (R)

, we associate with the operator ${\mathcal{A}}$ the bilinear form

a (f, g) = \int_{R} A [f] (x) g (x) d x .

The translation invariance of $\mathcal{A}$ (implied by stationarity of the LP $X$) in (4.22) and Parseval’s equality (see Hilber et al. [32, Remark 10.4.1]) imply that $\psi $ is the symbol of $\mathcal{A}$, i.e.,

a (f, g) = \int_{R} ψ (ξ) \hat{f} (ξ) \overline{\hat{g} (ξ)} d ξ, \forall f, g \in C_{0}^{\infty} (R) .

Here

\hat{f} (ξ) = {(2 π)}^{- \frac{1}{2}} \int_{R} e^{- i x ξ} f (x) d x

ξ \in R

, denotes the Fourier transform of $f$. The assumption (4.17) on $\psi $ implies continuity and coercivity of the bilinear form $a(\, \cdot \, , \, \cdot \,)$ on

H^{ρ / 2} (R) \times H^{ρ / 2} (R)

so that for

v_{0} \in L^{2} (R)

, there exists a unique variational solution

v \in C ([0, T]; L^{2} (R)) \cap L^{2} (0, T; H^{ρ / 2} (R))

of the PIDE (4.18) with initial condition (4.19); see e.g. Eberlein and Glau [20].

Fix $0<\tau \leq T < \infty $ and

x \in R

. The variational solution $v$ of (4.18), (4.19) satisfies

\begin{aligned} v (τ, x) & = \frac{1}{\sqrt{2 π}} \int_{R} exp (i x ξ) \hat{v} (τ, ξ) d ξ \\ = \frac{1}{\sqrt{2 π}} \int_{R} exp (i x ξ) exp (- τ ψ (ξ)) \hat{φ \circ exp} (ξ) d ξ . \end{aligned}

For every

k \in N_{0}

, Parseval’s equality implies with the lower bound in (4.17) that

\begin{array}{rcl} \int_{R} {| (D_{x}^{k} v) (τ, x) |}^{2} d x & = & \int_{R} {| ξ |}^{2 k} | exp (- 2 τ ψ (ξ)) | {| \hat{φ \circ exp} (ξ) |}^{2} d ξ \\ \leq & \int_{R} {| ξ |}^{2 k} exp (- 2 τ C_{1} {| ξ |}^{2 ρ}) {| \hat{φ \circ exp} (ξ) |}^{2} d ξ . \end{array}

An elementary calculation shows that for any $m,\kappa ,\mu >0$, we have

$$ \max _{\eta >0} \big( \eta ^{m} \exp (-\kappa \eta ^{\mu }) \big) = \bigg(\frac{m}{\kappa \mu e} \bigg)^{m/\mu } . $$

(4.23)

We use (4.23) with $m=2k$, $\kappa =2\tau C_{1}$, $\mu = 2\rho $ and $\eta = |\xi |$ to obtain

{∥ (D_{x}^{k} v) (τ, \cdot) ∥}_{L^{2} (R)}^{2} \leq {(\frac{k}{2 τ C_{1} ρ e})}^{k / ρ} {∥ v_{0} ∥}_{L^{2} (R)}^{2} .

Taking square roots and using the (rough) Stirling bound $k^{k} \leq k! \, e^{k}$ valid for all

k \in N

, we obtain

{∥ (D_{x}^{k} v) (τ, \cdot) ∥}_{L^{2} (R)} \leq {({(\frac{1}{2 τ C_{1} ρ})}^{\frac{1}{2 ρ}})}^{k} {(k!)}^{\frac{1}{2 ρ}} {∥ v_{0} ∥}_{L^{2} (R)}, \forall τ > 0, \forall k \in N .

(4.24)

This implies with the Sobolev embedding theorem that for any bounded interval

I = [x_{-}, x_{+}] \subseteq R

, $-\infty < x_{-} < x_{+} < \infty $, and for every fixed $\tau >0$, there exist constants $C= C(x_{+}, x_{-})>0$ and $A(\tau ,\rho )>0$ such that

sup_{x \in I} | (D_{x}^{k} v) (τ, x) | \leq C {(A (τ, ρ))}^{k} {(k!)}^{1 / min {1, 2 ρ}}, \forall k \in N .

This means that $v(\tau , \, \cdot \,)|_{I}$ is Gevrey-$\delta $-regular with $\delta = 1/\min \{1, 2\rho \}$.

To construct the DNNs $\psi ^{u}_{\varepsilon }$ in the claim, we proceed in several steps. We first use a (analytic, in the bounded interval

I = [x_{-}, x_{+}] \subseteq R

) change of variables $s = \exp (x)$ and the fact that Gevrey-regularity is preserved under analytic changes of variables to infer Gevrey-$\delta $-regularity in

[a, b] \subseteq R_{+ +}

of $s\mapsto u(\tau ,s)$, for every fixed $\tau >0$. This in turn implies the existence of a sequence $(u_{p}(s))_{p\geq 1}$ of polynomials of degree

p \in N

in $[a,b]$ converging in $W^{1,\infty }([a,b])$ to $u(\tau , \, \cdot \,)$ for $\tau >0$ at rate $\exp (-b'p^{1/\delta })$ for some constant $b'> 0$ depending on $a$, $b$ and on $\delta \geq 1$, but independent of $p$. The asserted DNNs are then obtained by approximately expressing the $u_{p}$ through ReLU DNNs, again at exponential rates, via Opschoor et al. [43]. The details are as follows.

The interval $s\in [a,b]$ in the assertion of the proposition corresponds to the interval $x\in [\log a,\log b]$ under the analytic (in the bounded interval $[a,b]$) change of variables $x=\log s$. As Gevrey-regularity is known to be preserved under analytic changes of variables (see e.g. Rodino [46, Proposition 1.4.6]), also $u(\tau ,s)|_{s \in [a,b]}$ is Gevrey-$\delta $-regular, with the same index $\delta = 1/ \min \{1, 2\rho \} \geq 1$ and with constants in the derivative bounds which depend on $0 < a < b < \infty $, $\rho \in (0,1]$, $\tau > 0$. In particular, for $\rho \geq 1/2$, $u(\tau ,s)|_{s \in [a,b]}$ is real analytic in $[a,b]$.

With Gevrey-$\delta $-regularity of $s\mapsto u(\tau ,s)$ for $s \in [a,b]$ established, we may invoke expression rate bounds for deep ReLU NNs for such functions. In Opschoor et al. [43, Proposition 4.1], it was shown that for such functions in space dimension $d=1$, there exist constants $C'>0$, $\beta '>0$ such that for every

N \in N

, there exists a deep ReLU NN $\tilde{u}_{\mathcal{N}}$ with

$$\begin{aligned} M(\tilde{u}_{\mathcal{N}}) \leq {\mathcal{N}}, \qquad L(\tilde{u}_{ \mathcal{N}}) &\leq C' {\mathcal{N}}^{\min \{\frac{1}{2}, \frac{1}{d+1/\delta } \}}\log {\mathcal{N}}, \\ \left \| u - \mathrm{R}(\tilde{u}_{\mathcal{N}}) \right \| _{W^{1, \infty }([-1,1]^{d})} &\leq C' \exp \big(- \beta '{\mathcal{N}}^{\min \{\frac{1}{2\delta },\frac{1}{d\delta +1} \}} \big). \end{aligned}$$

This implies that for every $0<{\varepsilon }<1/2$, a pointwise error of ${\mathcal{O}}({\varepsilon })$ in $[a,b]$ can be achieved by some ReLU NN $\psi ^{u}_{\varepsilon }$ of depth ${\mathcal{O}}(|\log {\varepsilon }|^{\delta } |\log (|\log { \varepsilon }|)|)$ and of size ${\mathcal{O}}(|\log {\varepsilon }|^{2\delta })$. This completes the proof. □

4.4 Summary and discussion

For prices of derivative contracts on one risky asset whose log-returns are modelled by an LP $X$, we have analysed the expression rate of deep ReLU NNs. We have provided two mathematically distinct approaches to the analysis of the expressive power of deep ReLU NNs. The first, probabilistic approach furnishes algebraic expression rates, i.e., pointwise accuracy $\varepsilon >0$ on a bounded interval $[a,b]$ is furnished with DNNs of size ${\mathcal{O}}(\varepsilon ^{-q})$ with suitable $q\geq 0$. The argument is based on approximating the option price by Monte Carlo sampling, estimating the uniform error on $[a,b]$ and then emulating the resulting average by a DNN. The second, “analytic” approach, leverages regularity of (variational) solutions of the corresponding Kolmogorov partial integro-differential equations and furnishes exponential DNN expression rates. That is, an expression error $\varepsilon > 0$ is achieved with DNNs of size ${\mathcal{O}}(|\log \varepsilon |^{a})$ for suitable $a>0$. Key in the second approach were stronger conditions (4.17) on the characteristic exponent of the LP $X$, which imply, as we showed, Gevrey-$\delta $-regularity of the map $s\mapsto u(\tau ,s)$ for suitable $\tau >0$. This regularity implies in turn exponential rates of polynomial approximation (in the uniform norm on $[a,b]$) of $s\mapsto u(\tau ,s)$, which is a result of independent interest, and subsequently, by emulation of polynomials with deep ReLU NNs, the corresponding exponential rates.

We remark that in the particular case $\delta = 1$, the derivative bounds (4.24) imply analyticity of the map $s\mapsto u(\tau ,s)$ for $s\in [a,b]$ which implies the assertion also with the exponential expression rate bound for analytic functions in Opschoor et al. [43].

We also remark that the smoothing of the solution operator in Proposition 4.8 accommodates payoff functions which belong merely to $L^{2}$, as they arise e.g. in particular binary contracts. This is a consequence of the assumption (4.17), which on the other hand excludes Lévy processes with one-sided jumps. Such processes are covered by Proposition 4.1.

5 DNN approximation rates for multivariate Lévy models

We now turn to DNN expression rates for multivariate geometric Lévy models. This is a typical situation when option prices on baskets of $d$ risky assets are of interest, whose log-returns are modelled by multivariate Lévy processes. We admit rather general jump measures, in particular with fully correlated jumps in the marginals, as provided for example by so-called Lévy copula constructions in Kallsen and Tankov [36].

As in the univariate case, we prove two results on ReLU DNN expression rates of option prices for European-style contracts. The first argument is developed in Sect. 5.1 below and overcomes in particular the curse of dimensionality. Its proof is again based on probabilistic arguments from statistical learning theory. As exponential LPs $X^{d}$ generalise geometric Brownian motions, Theorem 5.1 generalises several results from the classical Black–Scholes setting, and we comment on the relation of Theorem 5.1 to these recent results in Sect. 5.2. Owing to the method of proof, the DNN expression rate in Theorem 5.1 delivers an $\varepsilon $-complexity of ${\mathcal{O}}(\varepsilon ^{-2})$, achieved with potentially shallow DNNs; see Remark 4.5.

The second argument is based on parabolic regularity of the deterministic Kolmogorov PIDE associated to the LP $X^{d}$. We show in Theorem 5.4 that polylogarithmic in $\varepsilon $ expression rate bounds can be achieved by allowing DNN depth to increase essentially as ${\mathcal{O}}(|\log \varepsilon |)$. The result in Theorem 5.4 is, however, prone to the curse of dimensionality: the constants implied in the ${\mathcal{O}}(\, \cdot \, )$ bounds may (and in general will) depend exponentially on $d$. We also show that under a hypothesis of sufficiently large time $t>0$, parabolic smoothing allows overcoming the curse of dimensionality, with dimension-independent expression rates which are possibly larger than the rate furnished by the probabilistic argument (which is, however, valid uniformly for all $t>0$).

5.1 DNN expression rate bounds via probabilistic arguments

We start by remarking that in this subsection, there is no need to assume ReLU activation.

The following result proves that neural networks are capable of approximating option prices in multivariate exponential Lévy models without the curse of dimensionality if the corresponding Lévy triplets $(A^{d},\gamma ^{d},\nu ^{d})$ are bounded uniformly with respect to the dimension $d$.

For any dimension

d \in N

, we assume given a payoff

φ_{d} : R^{d} \to R

and a $d$-variate LP $X^{d}$, and we denote the option price in time-to-maturity by

u_{d} (τ, s) = E [φ_{d} (s_{1} exp (X_{τ, 1}^{d}), \dots, s_{d} exp (X_{τ, d}^{d}))], τ \in [0, T], s \in {(0, \infty)}^{d} .

(5.1)

We refer to Sato [48, Chap. 2] for more details on multivariate Lévy processes and to Cont and Tankov [16, Chap. 11], Eberlein and Kallsen [21, Sect. 8.1] for more details on multivariate geometric Lévy models in finance.

The next theorem is a main result of the present paper. It states that DNNs can efficiently express prices on possibly large baskets of risky assets whose dynamics are driven by multivariate Lévy processes with general jump correlation structure. The expression rate bounds are polynomial in the number $d$ of assets and therefore not prone to the curse of dimensionality. This result partially generalises earlier work on DNN expression rates for diffusion models in Elbrächter et al. [22], Grohs et al. [29].

Theorem 5.1

Assume that for any

d \in N

, the payoff

φ_{d} : R^{d} \to R

can be approximated well by neural networks, that is, there exist constants $c>0$, $p \geq 2,\tilde{q},q\geq 0$ and for all $\varepsilon \in (0,1]$,

d \in N

, there exists a neural network $\phi _{\varepsilon ,d}$ with

$$\begin{aligned} |\varphi _{d}(s)-\mathrm{R}(\phi _{\varepsilon ,d})(s)| & \leq \varepsilon c d^{\tilde{q}} (1+ |s |^{p}) \qquad \textit{for all } s \in (0,\infty )^{d}, \end{aligned}$$

(5.2)

$$\begin{aligned} M(\phi _{\varepsilon ,d}) &\leq c d^{\tilde{q}} \varepsilon ^{-q}, \end{aligned}$$

(5.3)

$$\begin{aligned} \mathrm{Lip}\big(\mathrm{R}(\phi _{\varepsilon ,d})\big) & \leq c d^{ \tilde{q}}. \end{aligned}$$

(5.4)

In addition, assume that the Lévy triplets $(A^{d},\gamma ^{d},\nu ^{d})$ of $X^{d}$ are bounded in the dimension, that is, there exists a constant $B > 0$ such that for each

d \in N, i, j = 1, \dots, d

$$ \max \bigg\{ A^{d}_{ij}, \gamma ^{d}_{i} , \int _{\{ |y |> 1\}} e^{p y_{i}} \nu ^{d}(d y), \int _{\{ |y |\leq 1\}} y_{i}^{2} \nu ^{d}(d y)\bigg\} \leq B. $$

(5.5)

Then there exist constants $\kappa ,\mathfrak{p},\mathfrak{q} \in [0,\infty )$ and neural networks $\psi _{\varepsilon ,d}$, $\varepsilon \in (0,1]$,

d \in N

, such that for any target accuracy $\varepsilon \in (0,1]$ and for any

d \in N

, the number of weights grows only polynomially, i.e., $M(\psi _{\varepsilon ,d}) \leq \kappa d^{\mathfrak{p}}\varepsilon ^{- \mathfrak{q}}$, and the approximation error between the neural network $\psi _{\varepsilon ,d}$ and the option price is at most $\varepsilon $, that is,

$$ \sup _{s \in [a,b]^{d}} |u_{d}(T,s) - \mathrm{R}(\psi _{\varepsilon ,d})(s)| \leq \varepsilon . $$

Remark 5.2

The statement of Theorem 5.1 is still valid if we admit logarithmic growth of $B$ with $d$ in (5.5).

Remark 5.3

As in the univariate case (cf. Remark 4.2), in relevant examples of options written on $d>1$ underlyings (such as basket options, call on max/min options, put on max/min options, …), the payoff can be represented exactly as a ReLU DNN. Thus we may choose $q=0$ in (5.3) and obtain $\mathfrak{q} = 2$ in Theorem 5.1 (cf. (5.12) below).

Proof of Theorem 5.1

Let $\varepsilon \in (0,1]$ be the given target accuracy and consider $\bar{\varepsilon } \in (0,1]$ (to be selected later). To simplify notation, we write for $s \in [a,b]^{d}$

$$ s e^{X_{T}^{d}} = \big(s_{1} \exp (X_{T,1}^{d}),\ldots ,s_{d} \exp (X_{T,d}^{d}) \big). $$

The proof consists of four steps:

– Step 1 bounds the error that arises when the payoff $\varphi _{d}$ is replaced by the neural network approximation $\phi _{\bar{\varepsilon },d}$. As a part of Step 1, we also prove that the $p$th exponential moments of the components $X_{T,i}^{d}$ of the Lévy process are bounded uniformly in the dimension $d$.

– Step 2 is a technical step that is required for Step 3; it bounds the error that arises when the Lévy process is capped at a threshold $D>0$. If we were to assume in addition that the output of the neural network $\phi _{\bar{\varepsilon },d}$ were bounded (this is for example the case if the activation function $\varrho $ is bounded), then Step 2 could be omitted.

– Step 3 is the key step in the proof. We introduce $n$ i.i.d. copies of (the capped version of) $X_{T}^{d}$ and use statistical learning techniques (symmetrisation, Gaussian and Rademacher complexities) to estimate the expected maximum difference between the option price (with neural network payoff) and its sample average. This is then used to construct the approximating neural networks.

– Step 4 combines the estimates from Steps 1–3 and concludes the proof.

Step 1: Assumption (5.2) and Hölder’s inequality yield for all $s \in [a,b]^{d}$ that

\begin{aligned} | u_{d} (T, s) - E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d}})] | & \leq E [| φ_{d} (s e^{X_{T}^{d}}) - R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d}}) |] \\ \leq \bar{ε} c d^{\tilde{q}} (1 + E [{| s e^{X_{T}^{d}} |}^{p}]) \\ = \bar{ε} c d^{\tilde{q}} (1 + E [{(\sum_{i = 1}^{d} s_{i}^{2} e^{2 X_{T, i}^{d}})}^{p / 2}]) \\ \leq \bar{ε} c d^{\tilde{q}} (1 + b^{p} E [d^{(p - 1) / 2} {(\sum_{i = 1}^{d} e^{2 p X_{T, i}^{d}})}^{1 / 2}]) \\ \leq \bar{ε} c_{1} d^{\tilde{q} + \frac{1}{2} p + \frac{1}{2}} \end{aligned}

(5.6)

with the constant

c_{1} = c max {1, b^{p}} (1 + {sup}_{d, i} E [e^{p X_{T, i}^{d}}])

and we used $| \cdot | \leq | \cdot |_{1}$ in the last step. To see that $c_{1}$ is indeed finite, note that (5.5) and Sato [48, Theorem 25.17] (with the vector

w \in R^{d}

in that result being $pe_{i}$) imply that for any

d \in N

, $i=1,\ldots ,d$, the exponential moment can be bounded as

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ36_HTML.png

(5.7)

where the second inequality uses that $|e^{z}-1-z| \leq z^{2} e^{p}$ for all $z \in [-p,p]$, which can be seen e.g. from the (mean value form of the) Taylor remainder formula.

Step 2: Before proceeding with the key step of the proof, we need to introduce a cut-off in order to ensure that the neural network output is bounded. Let $D>0$ and consider the random variable $X_{T}^{d,D} = \min (X_{T}^{d},D)$, where the minimum is understood componentwise. Then the Lipschitz property (5.4) implies that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ37_HTML.png

(5.8)

where $\tilde{c}_{1} = 2 b c \exp (5TpB + 2Te^{p}pB)$ and we used $| \cdot | \leq | \cdot |_{1}$, Hölder’s inequality, Chernoff’s bound and finally again Hölder’s inequality and (5.7).

Step 3: Let $X_{1},\ldots ,X_{n}$ denote $n$ i.i.d. copies of the random vector $X_{T}^{d,D}$ and $Z_{1},\ldots ,Z_{n}$ i.i.d. standard normal variables, independent of $X_{1},\ldots ,X_{n}$. For any separable class of functions

H \subseteq C (R^{d}; R)

, define the random variable (the so-called empirical Gaussian complexity)

{\hat{G}}_{n} (H) = E [sup_{f \in H} | \frac{2}{n} \sum_{k = 1}^{n} Z_{k} f (X_{k}) | | X_{1}, \dots, X_{n}] .

Consider now for $i=1,\ldots ,d$ the function classes

$$ \mathcal{H}_{i} = \{(-\infty ,D]^{d} \ni x \mapsto s \exp (x_{i}) \colon s \in [a,b] \} $$

and, with the notation $s \exp (x)=(s_{1} \exp (x_{1}),\ldots ,s_{d}\exp (x_{d}))$, the class

$$ \mathcal{H}=\big\{ (-\infty ,D]^{d} \ni x \mapsto \mathrm{R}(\phi _{ \bar{\varepsilon },d})\big(s \exp (x)\big)-\mathrm{R}(\phi _{ \bar{\varepsilon },d})(0) \colon s \in [a,b]^{d} \big\} . $$

Denoting by

\tilde{H} \subseteq C ((- \infty, D]^{d}; R^{d})

the direct sum of $\mathcal{H}_{1},\ldots ,\mathcal{H}_{d}$, we have that

$$ \mathcal{H}= \phi (\tilde{\mathcal{H}}), $$

where $\phi = \mathrm{R}(\phi _{\bar{\varepsilon },d})(\, \cdot \, )- \mathrm{R}(\phi _{\bar{\varepsilon },d})(0)$ is a Lipschitz function with Lipschitz constant $cd^{\tilde{q}}$ (due to the hypothesis on the Lipschitz constant of the neural network (5.4)), satisfies $\phi (0)=0$ and is bounded on the range of $\tilde{\mathcal{H}}$ (which is contained in $[0,b\exp (D)]^{d}$). Consequently, Bartlett and Mendelson [6, Theorem 14] implies that

$$ \hat{G}_{n}(\mathcal{H}) \leq 2 c d^{\tilde{q}} \sum _{i=1}^{d} \hat{G}_{n}(\mathcal{H}_{i}). $$

(5.9)

Let $\varepsilon _{1},\ldots ,\varepsilon _{n}$ be an independent collection of Rademacher random variables. We then estimate

\begin{aligned} E [sup_{s \in {[a, b]}^{d}} | E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d, D}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ_{\bar{ε}, d}) (s e^{X_{k}}) |] \\ \leq 2 E [sup_{s \in {[a, b]}^{d}} | \frac{1}{n} \sum_{k = 1}^{n} ε_{k} ϕ (s e^{X_{k}}) |] \\ \leq {\tilde{c}}_{2} E [sup_{s \in {[a, b]}^{d}} | \frac{2}{n} \sum_{k = 1}^{n} Z_{k} ϕ (s e^{X_{k}}) |] \\ = {\tilde{c}}_{2} E [sup_{f \in H} | \frac{2}{n} \sum_{k = 1}^{n} Z_{k} f (X_{k}) |] \\ \leq 2 {\tilde{c}}_{2} c d^{\tilde{q}} \sum_{i = 1}^{d} E [{\hat{G}}_{n} (H_{i})] \\ \leq \frac{4 {\tilde{c}}_{2} c d^{\tilde{q}} b}{n} \sum_{i = 1}^{d} E [| \sum_{k = 1}^{n} Z_{k} e^{X_{k, i}} |] . \end{aligned}

(5.10)

Here, the first inequality follows by symmetrisation (see for example Boucheron et al. [12, Lemma 11.4]), the second follows from the comparison results on Gaussian and Rademacher complexities (see for instance Bartlett and Mendelson [6, Lemma 4]) with some absolute constant $\tilde{c}_{2}$ and the third uses (5.9).

We note that the constant $\tilde{c}_{2}$ in (5.10) may be chosen as

{\tilde{c}}_{2} = 1 / E [| Z_{1} |] = \sqrt{π / 2}

. Indeed, setting $\mathcal{G}=\sigma (\varepsilon _{1},\ldots ,\varepsilon _{n},X_{1}, \ldots ,X_{n})$ and using independence yields

\begin{aligned} E [| Z_{1} |] & E [sup_{s \in {[a, b]}^{d}} | \frac{1}{n} \sum_{k = 1}^{n} ε_{k} ϕ (s e^{X_{k}}) |] \\ = E [sup_{s \in {[a, b]}^{d}} | \frac{1}{n} \sum_{k = 1}^{n} E [| Z_{k} | | G] ε_{k} ϕ (s e^{X_{k}}) |] \\ = E [sup_{s \in {[a, b]}^{d}} | E [\frac{1}{n} \sum_{k = 1}^{n} | Z_{k} | ε_{k} ϕ (s e^{X_{k}}) | G] |] \\ \leq E [E [sup_{s \in {[a, b]}^{d}} | \frac{1}{n} \sum_{k = 1}^{n} | Z_{k} | ε_{k} ϕ (s e^{X_{k}}) | | G]] \\ = E [sup_{s \in {[a, b]}^{d}} | \frac{1}{n} \sum_{k = 1}^{n} Z_{k} ϕ (s e^{X_{k}}) |] . \end{aligned}

To further simplify (5.10), we now apply Jensen’s inequality and use independence and

E [Z_{k} Z_{ℓ}] = δ_{k, ℓ}

to derive for $i=1,\ldots ,d$ that

\begin{aligned} E [| \sum_{k = 1}^{n} Z_{k} e^{X_{k, i}} |] & \leq E {[{| \sum_{k = 1}^{n} Z_{k} e^{X_{k, i}} |}^{2}]}^{1 / 2} = {(\sum_{k = 1}^{n} E [e^{2 X_{k, i}}])}^{1 / 2} \\ \leq \sqrt{n} E {[e^{2 X_{T, i}^{d}}]}^{1 / 2} \leq \sqrt{n} E {[e^{p X_{T, i}^{d}}]}^{1 / p} . \end{aligned}

Combining this with (5.10) and (5.7), we obtain that

E [sup_{s \in {[a, b]}^{d}} | E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d, D}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ_{\bar{ε}, d}) (s e^{X_{k}}) |] \leq \frac{c_{2} d^{\tilde{q} + 1}}{\sqrt{n}}

with $c_{2} = 4 \sqrt{\pi /2} c b \exp (5BT p/2 + BTp e^{p} )$. By applying Markov’s inequality (see (4.10) and (4.11)), this proves that there exists $\omega \in \Omega $ with

sup_{s \in {[a, b]}^{d}} | E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d, D}})] - \frac{1}{n} \sum_{k = 1}^{n} R (ϕ_{\bar{ε}, d}) (s e^{X_{k} (ω)}) | \leq \frac{2 c_{2} d^{\tilde{q} + 1}}{\sqrt{n}} .

Now we observe that $s \mapsto \frac{1}{n} \sum _{k=1}^{n} \mathrm{R}(\phi _{ \bar{\varepsilon },d})(se^{X_{k}(\omega )})$ is the realisation of a neural network $\tilde{\psi }_{\bar{\varepsilon },d}$ with $M(\tilde{\psi }_{\bar{\varepsilon },d}) \leq n M(\phi _{ \bar{\varepsilon },d})$ (see Lemma 3.2). We have therefore proved that for arbitrary

n \in N

, there exists a neural network $\tilde{\psi }_{\bar{\varepsilon },d}$ with

sup_{s \in {[a, b]}^{d}} | E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d, D}})] - R ({\tilde{ψ}}_{\bar{ε}, d}) (s) | \leq \frac{2 c_{2} d^{\tilde{q} + 1}}{\sqrt{n}} .

(5.11)

Step 4: In the final step, we now provide appropriate choices of the hyperparameters. We select $\bar{\varepsilon } = \varepsilon (c_{1} d^{{\tilde{q}}+\frac{1}{2}p+ \frac{1}{2}}+2)^{-1}$, choose $n = \lceil (2 c_{2} d^{{\tilde{q}}+1} \bar{\varepsilon }^{-1})^{2} \rceil $, $D= \log (\bar{\varepsilon }^{-1}d^{{\tilde{q}}+1} \tilde{c}_{1})$ and set $\psi _{\varepsilon ,d} = \tilde{\psi }_{\bar{\varepsilon },d}$. Then the total number of parameters of the approximating neural network can be estimated, using assumption (5.3), as

$$\begin{aligned} M(\psi _{\varepsilon ,d}) &= M(\tilde{\psi }_{\bar{\varepsilon },d}) \\ & \leq n M(\phi _{\bar{\varepsilon },d}) \\ & \leq \big(1+(2 c_{2} d^{{\tilde{q}}+1} \bar{\varepsilon }^{-1})^{2} \big) c d^{\tilde{q}} \bar{\varepsilon }^{-q} \\ & \leq (1+4 c_{2}^{2}) c d^{3{\tilde{q}}+2} \bar{\varepsilon }^{-2-q} \\ & \leq \big((1+4 c_{2}^{2}) c(c_{1} +2)^{2+q}\big) d^{({\tilde{q}}+ \frac{1}{2}p+\frac{1}{2})(2+q)+3{\tilde{q}}+2} \varepsilon ^{-2-q}. \end{aligned}$$

(5.12)

Thus the number of weights is bounded polynomially in $d$ and $\varepsilon ^{-1}$, as claimed. Finally, we combine (5.6), (5.8) and (5.11) to estimate the approximation error as

\begin{aligned} sup_{s \in {[a, b]}^{d}} | u_{d} (T, s) - R (ψ_{ε, d}) (s) | \\ \leq sup_{s \in {[a, b]}^{d}} (| u_{d} (T, s) - E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d}})] | \\ + | E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d}})] - E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d, D}})] | \\ + | E [R (ϕ_{\bar{ε}, d}) (s e^{X_{T}^{d, D}})] - R ({\tilde{ψ}}_{\bar{ε}, d}) (s) |) \\ \leq \bar{ε} c_{1} d^{\tilde{q} + \frac{1}{2} p + \frac{1}{2}} + {\tilde{c}}_{1} e^{- D} d^{\tilde{q} + 1} + \frac{2 c_{2} d^{\tilde{q} + 1}}{\sqrt{n}} \\ \leq \bar{ε} (c_{1} d^{\tilde{q} + \frac{1}{2} p + \frac{1}{2}} + 2) = ε, \end{aligned}

as claimed. □

The proof of Theorem 5.1 is very similar to the proof of Proposition 4.1. Steps 1 and 4 in the proof of Theorem 5.1 are essentially identical in both proofs. The key difference is in Step 3: in the $d$-dimensional case we cannot use the comparison theorem for Rademacher complexities in Ledoux and Talagrand [39, Theorem 4.12], but instead need to use a comparison result for Gaussian complexities from Bartlett and Mendelson [6, Theorem 14]. In the $d$-dimensional case, the truncation with $D$ in Step 2 is needed to guarantee that the hypotheses of [6, Theorem 14] are satisfied; in the 1-dimensional case, this is not required for [39, Theorem 4.12].

As recently there have been several results on DNN expression rates in high-dimensional diffusion models, a discussion on the relation of the multivariate DNN expression rate result in Theorem 5.1 to other recent mathematical results on DNN expression rate bounds is in order. Given that geometric diffusion models are particular cases of the presently considered models (corresponding to $\nu ^{d} = 0$ in the Lévy triplet), it is of interest to consider to which extent the DNN expression error bound in Theorem 5.1 relates to these results.

Firstly, we note that with the exception of Gonon et al. [27] and Elbrächter et al. [22], previous results in the literature which are concerned with DNN approximation rates for Kolmogorov equations for diffusion processes (see e.g. Grohs et al. [30], Berner et al. [9], Grohs et al. [29], Reisinger and Zhang [45] and the references therein) study approximation with respect to the $L^{p}$-norm ($p<\infty $), whereas in Theorem 5.1 we study approximation with respect to the $L^{\infty }$-norm, which requires entirely different techniques. While the results in [22] rely on a specific structure of the payoff, the proof of the expression rates in [27] has some similarities with the proof of Theorem 5.1. However, the novelty in the proof of Theorem 5.1 is the use of statistical learning techniques (symmetrisation, Gaussian and Rademacher complexities) which allow weaker assumptions on the activation function than in [27]. In addition, the class of PDEs considered in [27] (heat equation and related) is different from the one considered in Theorem 5.1 (Black–Scholes PDE and Lévy PIDE).

Secondly, Theorem 5.1 is the first result on ReLU DNN expression rates for option prices in models with jumps or, equivalently, for partial integro-differential equations in non-divergence form

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ42_HTML.png

(5.13)

for

x \in R^{d}

, $\tau > 0$, or, when transformed from log-price variables $x_{i}$ to actual price variables $s_{i}$ via $(s_{1},\ldots ,s_{d})=(\exp (x_{1}),\ldots ,\exp (x_{d}))$ (and with the convention $s e^{y} = (s_{1}e^{y_{1}},\ldots ,s_{d} e^{y_{d}})$),

\begin{aligned} \partial_{t} u_{d} (τ, s) & = \frac{1}{2} \sum_{i, j = 1}^{d} A_{i, j}^{d} s_{i} s_{j} \partial_{s_{i}} \partial_{s_{j}} u_{d} (τ, s) + \sum_{i = 1}^{d} s_{i} {\tilde{γ}}_{i}^{d} \partial_{s_{i}} u_{d} (τ, s) \\ + \int_{R^{d}} (u_{d} (τ, s e^{y}) - u_{d} (τ, s) - \sum_{i = 1}^{d} s_{i} (e^{y_{i}} - 1) \partial_{s_{i}} u_{d} (τ, s)) ν^{d} (d y), \\ u_{d} (0, s) & = φ_{d} (s) \end{aligned}

(5.14)

for $s \in (0,\infty )^{d}$, $\tau > 0$ and with

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_IEq612_HTML.gif

(see for instance Hilber et al. [31, Theorem 4.1]). As in our assumptions also $A^{d} = 0$ is admissible under suitable conditions on $\nu ^{d}$, the present ReLU DNN expression rates are not mere generalisations of the diffusion case, but cover indeed the case of arbitrary pure jump models for both finite and infinite activity Lévy processes satisfying (5.5).

In the case of $X$ being a diffusion with drift, i.e., for $\nu ^{d}=0$, the Lévy PIDE reduces to a Black–Scholes PDE. In this particular case, we may compare the result in Theorem 5.1 to the recent results e.g. in Grohs et al. [29]. The results in the latter article are specialised to the Black–Scholes case in [29, Sect. 4], where Setting 4.1 specifies the coefficients $A^{d}_{i,j}$ (in our notation) as $\beta _{i}^{d} \beta _{j}^{d} (B^{d} (B^{d})^{\top })_{i,j}$ for some

β^{d} \in R^{d}

B^{d} \in R^{d \times d}

satisfying $(B^{d} (B^{d})^{\top })_{k,k} = 1$ for all

d \in N

, $i,j,k=1,\ldots ,d$ and $\sup _{d,i} |\beta _{i}^{d}| < \infty $. The coefficient $\gamma ^{d}$ is chosen as $\alpha ^{d}$ satisfying $\sup _{d,i} | \alpha ^{d}_{i}| < \infty $. Using that $\Sigma = B^{d} (B^{d})^{\top }$ is symmetric and positive definite, we obtain $\Sigma _{i,j} \leq \sqrt{\Sigma _{i,i}\Sigma _{j,j}} = 1$ and hence these assumptions imply that (5.5) is satisfied. Therefore, the DNN expression rate results from [29, Sect. 4] can also be deduced from our Theorem 5.1 in the case when the probability measure used to quantify the $L^{p}$-error in [29] is compactly supported, as in that case the $L^{\infty }$-bounds proved here imply the $L^{p}$-bounds proved in [29].

5.3 Exponential ReLU DNN expression rates via PIDEs

We now extend the univariate case discussed in Sect. 4.3, and prove an exponential expression rate bound similar to Proposition 4.8 for baskets of $d\geq 2$ Lévy-driven assets. In this subsection, we assume the ReLU activation function $\varrho (x)=\max \{x,0\}$. As in Sect. 5.1, we admit a general correlation structure for the marginal processes’ jumps. To prove DNN expression rate bounds, we exploit once more the fact that the stationarity and homogeneity of the

R^{d}

-valued LP $X^{d}$ imply that the Kolmogorov equation (5.13) has constant coefficients. Under the provision that

v_{d} (0, \cdot) \in L^{2} (R^{d})

holds in (5.13), this allows writing for every $\tau >0$ the Fourier transform $F_{x\to \xi }v_{d}(\tau , \, \cdot \,) = \hat{v}_{d}(\tau ,\xi )$ as

{\hat{v}}_{d} (τ, ξ) = exp (- τ ψ (ξ)) {\hat{v}}_{d} (0, ξ), ξ \in R^{d} .

(5.15)

Here, for

ξ \in R^{d}

, the symbol is given by $\psi (\xi ) = \exp (-ix^{\top }\xi ) {\mathcal{A}}(\partial _{x}) \exp (i x^{\top }\xi )$ with ${\mathcal{A}}(\partial _{x})$ denoting the constant coefficient spatial integro-differential operator in (5.13) by Courrège’s second theorem (see e.g. Applebaum [1, Theorem 3.5.5]), and (4.21) becomes

E [exp (i ξ^{⊤} X_{τ}^{d})] = exp (- τ ψ (ξ)), ξ \in R^{d} .

(5.16)

In fact, $\psi $ can be expressed in terms of the characteristic triplet $(A^{d},\gamma ^{d},\nu ^{d})$ of the LP $X^{d}$ as

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-021-00462-7/MediaObjects/780_2021_462_Equ46_HTML.png

(5.17)

We impose again the strong ellipticity assumption (4.17), but now with $|\xi |$ understood as $|\xi |^{2} = \xi ^{\top }\xi $ for

ξ \in R^{d}

. Then, reasoning exactly as in the proof of Proposition 4.8, we obtain with $C_{1}>0$ as in (4.17) for every $\tau >0$ for the variational solution $v_{d}$ of (5.13) the bound

{∥ (D_{x}^{k} v_{d}) (τ, \cdot) ∥}_{L^{2} (R^{d})}^{2} \leq {(\frac{k}{2 τ C_{1} ρ e})}^{k / ρ} {∥ v_{d} (0, \cdot) ∥}_{L^{2} (R^{d})}^{2}, \forall k \in N_{0} .

(5.18)

Here, $D^{k}_{x}$ denotes any weak derivative of total order

k \in N_{0}

with respect to

x \in R^{d}

With the Sobolev embedding theorem, we again obtain for any bounded cube

I^{d} = {[x_{-}, x_{+}]}^{d} \subseteq R^{d}

with $-\infty < x_{-} < x_{+} < \infty $ and for every fixed $\tau >0$ that there exist constants $C(d)>0$ and $A(\tau ,\rho ) > 0$ such that

sup_{x \in I^{d}} | (D_{x}^{k} v_{d}) (τ, x) | \leq C (d) {(A (τ, ρ))}^{k} {(k!)}^{1 / min {1, 2 ρ}}, \forall k \in N .

(5.19)

In (5.19), the constant $C(d)$ is independent of $x_{-},x_{+}$, but depends in general exponentially on the basket size (respectively the dimension) $d\geq 2$, and the constant $A( \tau ,\rho ) = (2\tau C_{1}\rho )^{-1/(2\rho )}$ denotes the constant from (5.18) and Stirling’s bound. If $\rho =1$ (which corresponds to the case of a non-degenerate diffusion part) and if $\tau >0$ is sufficiently large (so that $(2 \tau C_{1})^{1/(2\rho )} \geq 1$), the constant is bounded uniformly with respect to the dimension $d$. The derivative bound (5.19) implies that $v_{d}(\tau , \, \cdot \,)|_{I^{d}}$ is Gevrey-$\delta $-regular with $\delta = 1/\min \{1, 2\rho \}$.

In particular, for $\delta = 1$, i.e., when $\rho \geq 1/2$, for every fixed $\tau >0$, the function $x\mapsto v_{d}(\tau ,x)$ is real analytic in $I^{d}$, which is the case we consider first. In this case, we perform an affine change of coordinates to transform $v_{d}(\tau , \, \cdot \,)$ to the real analytic function $[-1,1]^{d} \ni \hat{x} \mapsto v_{c}(\tau ,\hat{x})$. This function admits a holomorphic extension to some open set

O \subseteq C^{d}

containing $[-1,1]^{d}$. By choosing $\bar{\varrho } > 1$ (the “semiaxis sums”) sufficiently close to 1, we obtain that ${\mathcal{E}}_{\bar{\varrho }} \subseteq O$, i.e., $v_{c}(\tau , \, \cdot \,)$ admits a holomorphic extension to ${\mathcal{E}}_{\bar{\varrho }}$, where the Bernstein polyellipse

E_{\bar{ϱ}} \subseteq C^{d}

is defined as the $d$-fold Cartesian product of the Bernstein ellipse

{(z + z^{- 1}) / 2 : z \in C, 1 \leq | z | < \bar{ϱ}}

. More precisely, $x\mapsto v_{d}(\tau ,x)$ admits, with respect to each co-ordinate $x_{i} \in [x_{-},x_{+}]$ of $x$, a holomorphic extension to an open neighborhood of $[x_{-},x_{+}]$ in ℂ (see e.g. Krantz and Parks [37, Sect. 1.2]). By Hartogs’ theorem (see e.g. Hörmander [33, Theorem 2.2.8]), for every fixed $\tau >0$, the function $x\mapsto v_{d}(\tau ,x)$ admits a holomorphic extension to a polyellipse in

C^{d}

with foci at $x_{-},x_{+}$ or, in normalised coordinates

$$ \hat{x}_{i} = \big(T^{-1}(x)\big)_{i} = 2[x_{i} - (x_{-} + x_{+})/2] / (x_{+}-x_{-}),\qquad i=1,\dots ,d, $$

(5.20)

the map $[-1,1]^{d} \ni \hat{x} \mapsto v_{d}(\tau ,T(\hat{x})) = v_{c}(\tau , \hat{x})$ admits a holomorphic extension to a Bernstein polyellipse

E_{\bar{ϱ}} \subseteq C^{d}

with foci at $\hat{x}_{i} = \pm 1$ and semiaxis sums $1<\bar{\varrho } = { \mathcal{O}}(A(\tau ,\rho )^{-1})$. As $\tau \mapsto A(\tau ,\rho )^{-1}$ is increasing for every fixed value of $\rho $, parabolic smoothing increases for $\rho \geq 1/2$ the domain of holomorphy with $\tau $.

In the general case $\delta = 1/\min \{1,2\rho \}$ with $\rho >0$ as in (4.17), ReLU DNN expression rates of multivariate holomorphic (if $\rho \geq 1/2$) and Gevrey-regular (if $0<\rho <1/2$) functions such as $\hat{x} \mapsto v_{c}(\tau ,\hat{x})$ have been studied in Opschoor et al. [43]. The holomorphy or Gevrey-$\delta $-regularity of the map $\hat{x} \mapsto {v_{c}}(\tau ,\hat{x})$ implies with [43, Theorem 3.6, Proposition 4.1] that there exist constants $\beta '=\beta '(\bar{\varrho },d)>0$ and $C = C(u_{d},\bar{\varrho },d) > 0$ and for every

N \in N

a ReLU DNN

{\tilde{u}}_{N} : {[- 1, 1]}^{d} \to R

such that

$$ M(\tilde{u}_{\mathcal{N}}) \leq {\mathcal{N}}, \qquad L(\tilde{u}_{ \mathcal{N}}) \le C {\mathcal{N}}^{\min \{\frac{1}{2}, \frac{1}{d+1/\delta }\}} \log {\mathcal{N}}$$

(5.21)

and such that the error bound

$$\begin{aligned} \|{v_{c}}(\tau , \, \cdot \,) - \tilde{u}_{{\mathcal{N}}}(\, \cdot \, ) \|_{W^{1,\infty }([-1,1]^{d})} \leq C\exp ( -\beta ' {\mathcal{N}}^{ \min \{ \frac{1}{2\delta }, \frac{1}{\delta d+1}\}} ) \end{aligned}$$

(5.22)

holds. Reversing the affine change of variables (5.20) in the input layer, we obtain the following result on the $\varepsilon $-complexity of the ReLU DNN expression error for $x \mapsto v_{d}(\tau ,x)$ at fixed $0<\tau \leq T$.

Theorem 5.4

Assume that the symbol $\psi $ of the

R^{d}

-valued LP $X^{d}$ satisfies (4.17) with $|\xi |^{2} = \xi ^{\top }\xi $ and some $\rho \in (0,1]$. For every $\varphi _{d}$ with

v_{d} (0, \cdot) = φ_{d} \circ exp \in L^{2} (R^{d})

, for every $\tau >0$, on every closed, bounded hypercube

I^{d} = {[x_{-}, x_{+}]}^{d} \subseteq R^{d}

and, respectively, $J^{d} = [s_{-},s_{+}]^{d} \subseteq (0,\infty )^{d}$ with $s_{\pm }= \exp (x_{\pm })$, the variational solutions $v_{d}$ of the Kolmogorov PIDE (5.13) at $\tau $ and $u_{d}(\tau ,s) = v_{d}(\tau , \log s)$ can then be expressed on $I^{d}$, $J^{d}$ by ReLU DNNs $\tilde{v}_{d,\varepsilon }$, $\tilde{u}_{d,\varepsilon }$ at exponential rate. Specifically, there exists a constant $C=C(x_{-},x_{+},\delta ,d,\tau )> 0$ such that with $\delta = 1/\min \{ 1, 2\rho \} \geq 1$, for every $0<\varepsilon \leq 1/2$, there exist ReLU DNNs $\tilde{v}_{d,\varepsilon }$, $\tilde{u}_{d,\varepsilon }$ for which we have

$$ \sup _{x \in I^{d}}| v_{d}(\tau ,x) - \mathrm{R}(\tilde{v}_{d, \varepsilon })(x) | , \qquad \sup _{s \in J^{d}}| u_{d}(\tau ,s) - \mathrm{R}(\tilde{u}_{d,\varepsilon })(s) | \leq \varepsilon $$

and

$$ \begin{aligned} M(\tilde{v}_{d,\varepsilon }) + M(\tilde{u}_{d,\varepsilon }) & \leq C | \log \varepsilon |^{\max \{ 2\delta , \delta d+1\} }, \\ L(\tilde{v}_{d,\varepsilon }) + L(\tilde{u}_{d,\varepsilon }) & \leq C | \log \varepsilon |^{\delta }|\log (|\log \varepsilon |)|. \end{aligned} $$

Here, the constants $C = C(\delta ,d,\tau ) > 0$ depend on $I$ and $J$ and, in general, exponentially on the basket size $d$.

Proof

The asserted bounds for $\mathrm{R}(\tilde{v}_{d,\varepsilon })$ follow by elementary manipulations from insisting that the expression error bound (5.22) must equal $\varepsilon \in (0,1/2]$ and subsequently inserting the resulting expression ${\mathcal{N}}\simeq | \log \varepsilon |^{\max \{ 2\delta , \delta d+1 \}} $ into the bounds (5.21) for the DNN size and depth.

The bounds for $\mathrm{R}(\tilde{u}_{d,\varepsilon })$ are then deduced from those for $\mathrm{R}(\tilde{v}_{d,\varepsilon })$ and the fact that the transformation $\log : J^{d} \to I^{d}$ (understood componentwise) is real analytic. Hence it admits a holomorphic extension to an open neighbourhood of $J^{d}$ in

C^{d}

. Then Opschoor et al. [43, Theorem 3.6], combined with the affine transformation $T: [-1,1]^{d} \to J^{d}$, implies that there are constants $C,\beta '>0$ such that for every

N \in N

, there exists a ReLU DNN $\widetilde{\log }_{\mathcal{N}}$ such that we have

$$ M (\widetilde{\log }_{\mathcal{N}}) \leq {\mathcal{N}}, \qquad L ( \widetilde{\log }_{\mathcal{N}}) \le C {\mathcal{N}}^{\frac{1}{d+1}} \log _{2}({\mathcal{N}}) $$

and the error bound

$$ \| \log \circ T - \mathrm{R}(\widetilde{\log }_{{\mathcal{N}}}) \circ T \|_{W^{1,\infty }([-1,1]^{d})} \leq C\exp ( -\beta ' { \mathcal{N}}^{\frac{1}{d+1}} ). $$

(5.23)

For every

N \in N

, the set $\widetilde{I^{d}} = \mathrm{R}(\widetilde{\log }_{\mathcal{N}})(J^{d}) \cup \log (J^{d}) \subseteq (-\infty ,\infty )^{d}$ is compact due to (5.23). For given $\varepsilon \in (0,1/2]$, we choose

N \in N

as before. Using that ${\mathcal{N}}^{\min \{\frac{1}{2\delta },\frac{1}{\delta d+1}\}} \leq { \mathcal{N}}^{\frac{1}{d+1}}$, this choice guarantees that $C\exp ( -\beta ' {\mathcal{N}}^{\frac{1}{d+1}}) \leq \varepsilon $ in (5.23). Then we define $\widetilde{u}_{d}(\tau , \, \cdot \,) = \mathrm{R}(\tilde{v}_{d, \varepsilon })(\, \cdot \, ) \circ \mathrm{R}(\widetilde{\log }_{ \mathcal{N}})(\, \cdot \, )$ and estimate

$$\begin{aligned} \sup _{s \in J^{d}}| u_{d}(\tau ,s) - \widetilde{u}_{d}(\tau ,s) | & = \sup _{s \in J^{d}}| v_{d}(\tau , \, \cdot \,) \circ \log s - \widetilde{u}_{d}(\tau ,s) | \\ & \leq \sup _{s \in J^{d}} | v_{d}(\tau , \, \cdot \,) \circ \log s - v_{d}( \tau , \, \cdot \,) \circ \mathrm{R}(\widetilde{\log }_{\mathcal{N}})(s) | \\ & \phantom{=:} + \sup _{s \in J^{d}} | v_{d}(\tau , \, \cdot \,) \circ \mathrm{R}( \widetilde{\log }_{\mathcal{N}})(s) \\ &\phantom{=:} \qquad \quad - \mathrm{R}(\tilde{v}_{d,\varepsilon })(\, \cdot \, ) \circ \mathrm{R}(\widetilde{\log }_{\mathcal{N}})(s) | \\ & \leq \| v_{d}(\tau , \, \cdot \,) \|_{W^{1,\infty }( \widetilde{I^{d}})} \sup _{s \in J^{d}}| \log s - \mathrm{R}( \widetilde{\log }_{{\mathcal{N}}})(s) | \\ & \phantom{=:} + \sup _{x \in \widetilde{I^{d}}}| v_{d}(\tau ,x) - \mathrm{R}( \tilde{v}_{d,\varepsilon })(x) | \\ & \leq C \varepsilon . \end{aligned}$$

Since the DNN size and depth are additive under composition of ReLU DNNs, the assertion for $\tilde{u}_{d,\varepsilon }$ follows (possibly adjusting the value of the constant $C$). □

Remark 5.5

Some sufficient conditions on the characteristic triplet $(A^{d},\gamma ^{d},\nu ^{d})$ that ensure (4.17) in the multivariate setting are as follows. Consider first the case when the diffusion component is non-degenerate, i.e., $A^{d}$ is positive definite. Then

\begin{aligned} Re ψ (ξ) & = \frac{1}{2} ξ^{⊤} A^{d} ξ - \int_{R^{d}} (cos (ξ^{⊤} y) - 1) ν^{d} (d y) \geq C_{1} {| ξ |}^{2}, \\ | ψ (ξ) | & \leq \frac{1}{2} | ξ^{⊤} A^{d} ξ | + | ξ^{⊤} γ^{d} | + \int_{{| y | \leq 1}} | e^{i ξ^{⊤} y} - 1 - i ξ^{⊤} y | ν^{d} (d y) \\ + 2 \int_{{| y | > 1}} ν^{d} (d y) \\ \leq C_{2} {| ξ |}^{2} + C_{3} \end{aligned}

for suitable choices of $C_{1},C_{2},C_{3} > 0$. In the case when $A^{d}$ is not positive definite, we refer for instance to Eberlein and Glau [20, Sect. 7] and Hilber et al. [32, Lemma 14.5.1] for sufficient conditions.

5.4 Breaking the curse of dimensionality

The result in Theorem 5.1 gives an ${\varepsilon }$ expression error for DNNs whose depth and size are bounded polynomially in terms of ${\varepsilon }^{-1}d$, for European-style options in multivariate exponential Lévy models. In particular, in Theorem 5.1, the curse of dimensionality is proved to be overcome for a market model with jumps: a DNN expression rate is shown which is algebraic in terms of the target accuracy ${\varepsilon }>0$ with constants that depend polynomially on the dimension $d$. The rates $\mathfrak{p},\mathfrak{q} \in [0,\infty )$ can be read off from the proof of Theorem 5.1; however, these constants could be large, thereby affording only low DNN expression rates.

Theorem 5.4, on the other hand, states exponential expressivity of deep ReLU NNs, i.e., a maximum expression error at time $\tau >0$ with accuracy $\varepsilon > 0$ can be attained by a deep ReLU NN of size and depth which grow polylogarithmically with respect to $|\log \varepsilon |$. This exponential expression rate bound is, however, still prone to the curse of dimensionality (CoD).

In the present section, we further address alternative mathematical arguments on how DNNs can overcome the CoD in the presently considered Lévy models. Specifically, two mathematical arguments in addition to the probabilistic arguments in Sect. 5.1 are presented. Both exploit stationarity of the LP $X^{d}$ which implies (5.15), (5.16), to obtain DNN expression rates free from the curse of dimensionality.

5.4.1 Barron space analysis

The first alternative approach to Theorem 5.1 is based on verifying, using (5.15), (5.16), regularity of option prices in the so-called Barron space introduced in the fundamental work of Barron [5]. It provides DNN expression error bounds with explicit values for $\mathfrak{p}$ and $\mathfrak{q}$, however, in [5] only for DNNs with sigmoidal activation functions $\varrho $; similar results for ReLU activations are asserted in E and Wojtowytsch [19]. For simplicity, we consider here a subset ℬ of the Barron space. An integrable function

f : R^{d} \to R

belongs to ℬ if

{∥ f ∥}_{B} = \int_{R^{d}} | ξ | | \hat{f} (ξ) | d ξ < \infty .

(5.24)

Recall that $\hat{f}$ denotes the Fourier transform of $f$. The explicit appearance of $\hat{f}$ renders the norm $\| \cdot \|_{{\mathcal{B}}}$ in (5.24) particularly suitable for our purposes due to (5.15)–(5.17). As was pointed out in [5, 19], the relevance of the Barron norm $\| \cdot \|_{{\mathcal{B}}}$ stems from it being sufficient for dimension-robust DNN approximation rates. For

m \in N

, consider the two-layer neural networks $f_{m}$ given by

f_{m} : R^{d} \to R, x \mapsto \frac{1}{m} \sum_{i = 1}^{m} a_{i} ϱ (w_{i}^{⊤} x + b_{i})

(5.25)

with parameters

(a_{i}, w_{i}, b_{i}) \in R \times R^{d} \times R

. Their relevance stems from the following result: If $\varrho $ is sigmoidal, i.e., bounded, measurable and $\varrho (z) \to 1$ as $z \to \infty $, $\varrho (z) \to 0$ as $z \to -\infty $, then for $f\in {\mathcal{B}}$ and for every $R>0$, every probability measure $\pi $ on $[-R,R]^{d}$ and every

m \in N

, there exist parameters $\{ (a_{i},w_{i},b_{i}) \}_{i=1}^{m}$ such that for the corresponding DNN $f_{m}$ in (5.25), we have

$$ \textstyle\begin{array}{c} \| f - f_{m} \|_{L^{2}([-R,R]^{d}, \pi )} \leq \max \{1,R\} m^{-1/2} \| f \|_{{\mathcal{B}}}. \end{array} $$

(5.26)

The bound (5.26) follows from Barron [5, Theorem 1], and was generalised in E and Wojtowytsch [19, Eq. (1.3)] to ReLU activation.

The bound in (5.26) is free from the CoD: the number $N$ of parameters in the DNN grows as ${\mathcal{O}}(md)$ so that $m^{-1/2} \leq CN^{-1/2} d^{1/2}$ with an absolute constant $C>0$.

With (5.15), (5.16), for every $\tau \geq 0$, sufficient conditions for $x\mapsto v_{d}(\tau ,x)$ to belong to ℬ can be verified. With (5.26), DNN mean-square expression rate bounds of option prices that are free from the CoD follow.

Proposition 5.6

Assume that $\varrho (z) = \mathrm{ReLU}(z) = \max \{z,0\}$ for

z \in R

. Assume furthermore that the payoff $v_{d}(0, \, \cdot \,)$ in log-variables belongs to ℬ. Then for every $\tau \geq 0$, $R>0$ and probability measure $\pi $ on $[-R,R]^{d}$, the price $x\mapsto v_{d}(\tau ,x)$ can be expressed by a NN $x\mapsto \tilde{v}_{d}(\tau ,x)$ of depth 2 and size $m(d+2)$ with

m \in N

and error bound

$$ \| v_{d}(\tau , \, \cdot \,) - \tilde{v}_{d}(\tau , \, \cdot \,) \|_{L^{2}([-R,R]^{d}{, \pi })} \leq \max \{1,R\} m^{-1/2} \| v_{d}(0 , \, \cdot \,) \|_{{ \mathcal{B}}} . $$

Proof

We observe that for every

ξ \in R^{d}

, the identity (5.16) with $\tau =1$ shows that

exp (- Re ψ_{X^{d}} (ξ)) = | exp (- ψ_{X^{d}} (ξ)) | = | E [exp (i ξ^{⊤} X_{1}^{d})] | \leq 1

and therefore $\mathrm{Re}\psi _{X^{d}}(\xi ) \geq 0$. From (5.15), (5.16), we obtain for $\tau \geq 0$ that

| {\hat{v}}_{d} (τ, ξ) | = | exp (- τ ψ_{X^{d}} (ξ)) {\hat{v}}_{d} (0, ξ) | = exp (- τ Re ψ_{X^{d}} (ξ)) | {\hat{v}}_{d} (0, ξ) |, \forall ξ \in R^{d} .

The payoff $v_{d}(0,\, \cdot \,)$ in log-price belonging to ℬ implies $\| v_{d}(0, \, \cdot \,) \|_{{\mathcal{B}}} <\infty $. Using $\mathrm{Re}\psi _{X^{d}}(\xi ) \geq 0$, we find for every $\tau \geq 0$,

ξ \in R^{d}

that $|\hat{v}_{d}(\tau ,\xi )| \leq |\hat{v}_{d}(0,\xi )|$. This implies for every $\tau \geq 0$ that $\| v_{d}(\tau , \, \cdot \,) \|_{{\mathcal{B}}} \leq \| v_{d}(0, \, \cdot \,) \|_{{\mathcal{B}}}$. The approximation bound (5.26) implies the assertion. □

Pointwise, $L^{\infty }$-norm error bounds can be obtained by using [19, Eq. (1.4)].

5.4.2 Parabolic smoothing and sparsity of chaos expansions

The second non-probabilistic approach to Theorem 5.1 towards DNN expression error rates not subject to the CoD is based on dimension-explicit derivative bounds of option prices, which allow in turn establishing summability bounds for generalised polynomial chaos (gpc for short) expansions of these prices. Good summability of gpc coefficient sequences is well known to imply high, dimension-independent rates of approximation by sparse, multivariate polynomials. This in turn implies corresponding expression rates by suitable DNNs; see Schwab and Zech [49, Theorem 3.9]. Key in this approach is to exploit parabolic smoothing of the Kolmogorov PDE. The corresponding dimension-independent expression rate results are in general higher than those based on probabilistic or Barron space analysis, but hold only for sufficiently large $\tau >0$.

We start by discussing more precisely the dependence of the constants in the proof of Theorem 5.4 on the dimension $d$.

Remark 5.7

The constant $C(d)=C(\tau ,d)$ in the derivative bound (5.19) need not be exponential in $d$. To see it, we bound (5.19) by the inverse Fourier transform and the Cauchy–Schwarz inequality. For

α \in N_{0}^{d}

with $|{\boldsymbol{\alpha }}| = \sum _{i=1}^{d} \alpha _{i} = k$, we find with the Cauchy–Schwarz inequality and the lower bound (4.17) that

\begin{aligned} sup_{x \in I^{d}} | (D_{x}^{α} v_{d}) (τ, x) | & = sup_{x \in I^{d}} | \frac{1}{{(2 π)}^{d / 2}} \int_{R^{d}} {(i ξ)}^{α} exp (i x^{⊤} ξ) exp (- τ ψ (ξ)) {\hat{v}}_{d} (0, ξ) d ξ | \\ \leq \frac{1}{{(2 π)}^{d / 2}} \int_{R^{d}} {| ξ |}^{k} | exp (- τ ψ (ξ)) | | {\hat{v}}_{d} (0, ξ) | d ξ \\ \leq \frac{1}{{(2 π)}^{d / 2}} {(\int_{R^{d}} exp (- 2 τ C_{1} {| ξ |}^{2 ρ}) d ξ)}^{1 / 2} \\ \times {(\int_{R^{d}} {| ξ |}^{2 k} exp (- 2 τ C_{1} {| ξ |}^{2 ρ}) {| {\hat{v}}_{d} (0, ξ) |}^{2} d ξ)}^{1 / 2} . \end{aligned}

The last factor can be bounded precisely by the square root of the right-hand side of (5.18) (by using (4.23)). Using $k^{k} \leq k!\,e^{k}$, we obtain the bound (5.19) as

sup_{x \in I^{d}} | (D_{x}^{α} v_{d}) (τ, x) | \leq C (d, τ) {(A (τ, ρ))}^{k} {(k!)}^{1 / min {1, 2 ρ}} {∥ v_{d} (0, \cdot) ∥}_{L^{2} (R^{d})}

(5.27)

with constant $A(\tau ,\rho ) = (2\tau C_{1}\rho )^{-1/(2\rho )}$ and the explicit constant

\begin{aligned} C (d, τ) & = \frac{1}{{(2 π)}^{d / 2}} {(\int_{R^{d}} exp (- 2 τ C_{1} {| ξ |}^{2 ρ}) d ξ)}^{1 / 2} \\ = \frac{1}{{(2 π)}^{d / 2}} {(2 \frac{π}{d} ω_{d} \int_{0}^{\infty} r^{d - 1} exp (- 2 τ C_{1} r^{2 ρ}) d r)}^{1 / 2} \\ = \frac{1}{{(2 π)}^{d / 2}} {(\frac{π}{ρ d} \frac{1}{{(2 τ C_{1})}^{d / (2 ρ)}} ω_{d} Γ (\frac{d}{2 ρ}))}^{1 / 2} \\ = \frac{1}{{(2 π)}^{d / 2}} {(\frac{π}{ρ d} \frac{1}{{(2 τ C_{1})}^{d / (2 ρ)}} \frac{π^{d / 2} Γ (\frac{d}{2 ρ})}{Γ (\frac{d}{2} + 1)})}^{1 / 2}, \end{aligned}

(5.28)

where $\omega _{d}$ denotes the volume of the unit ball in

R^{d}

. Inspecting the constant $C(d,\tau )$ in (5.28), we observe that e.g. for $\rho = 1$ and $\tau _{0} = \tau _{0}(C_{1}) = 1/(8\pi C_{1})$, $\tau \geq \tau _{0} > 0$ sufficiently large implies that $C(d,\tau )$ is bounded independently of $\tau $ and $d$.

Remark 5.8

In certain cases, the parabolic smoothing implied by the ellipticity assumption (4.17) on the generator ${\mathcal{A}}$ entails that the constant $C$ in the regularity estimates (5.19) grows only polynomially with respect to $d$. For instance, in Remark 5.7, we have provided sufficient conditions which ensure that the constant $C$ in the regularity estimates (5.19) is even bounded with respect to $d$. This allows deriving an explicit and dimension-independent bound on the series of Taylor coefficients. This in turn allows obtaining bounds on the constant in (5.22) which scale polynomially with respect to $d$. Consider for example $\rho =1$ (i.e., non-degenerate diffusion) and assume that $\tau >0$ is sufficiently large; specifically, $(2 \tau C_{1})^{1/2} \geq 1$ and $d A(\tau ,1) < 1$, where $A(\tau , 1) = (2\tau C_{1})^{-1/2}$ denotes the constant in large parentheses in (4.24). This holds if

$$ \tau > \frac{d^{2}}{2 C_{1}} . $$

(5.29)

With (5.29) and using

\sum_{α \in N_{0}^{d}, | α | = k} (\begin{matrix} k \\ α \end{matrix}) = d^{k}

, we may estimate with the multinomial theorem that

\begin{aligned} \sum_{α \in N_{0}^{d}} \frac{{sup}_{x \in I^{d}} | (D_{x}^{α} v_{d}) (τ, x) |}{α!} & \leq C (d, τ) \sum_{α \in N_{0}^{d}} \frac{A {(τ, 1)}^{| α |} (| α |)!}{α!} \\ = C (d, τ) \sum_{k = 0}^{\infty} {(d A (τ, 1))}^{k} = C (d, τ) \frac{1}{1 - d A (τ, 1)} . \end{aligned}

By Remark 5.7, (5.29) implies that $C(d,\tau )$ in (5.28) is bounded uniformly with respect to $d$. Thus in this case, one may obtain bounds on the constant in (5.22) which scale polynomially with respect to $d$. However, the DNN size still grows polylogarithmically with respect to the dimension $d$, in terms of $|\log \varepsilon |$ (i.e., at least as ${\mathcal{O}}(|\log \varepsilon |^{d})$), so that the curse of dimensionality is not overcome.

The constant $C>0$ in the exponential expression rate bounds established in Theorem 5.4 depends in general exponentially on the basket size $d$, resp. on the dimension of the solution space of the PIDE (5.13), due to the reliance on the ReLU DNN expression rate analysis in Opschoor et al. [43]. Furthermore, the DNN size grows polylogarithmically with respect to the dimension $d$, in terms of $|\log \varepsilon |$. Considering exponential expression rate bounds, this exponential dependence on $d$ in terms of $|\log \varepsilon |$ seems in general not avoidable, as can be seen from [43, Theorem 3.5]. Nevertheless, in Remark 5.8, we already hinted at parabolic smoothing implying sufficient regularity (under the $d$-dependent provision (5.29) on $\tau $) for constants in DNN expression rate bounds which are polynomial with respect to $d$.

In the following paragraphs, we settle for algebraic DNN expression rates and overcome exponential dependence on $d$ in ReLU DNN expression error bounds under certain sparsity assumptions on polynomial chaos expansions, as shown in Schwab and Zech [49], Cohen et al. [15] and the references there. We develop a variation of the results in [49] in the present context.

We impose the following hypothesis, which takes the place of the lower bound in (4.17). We still impose $| \psi (\xi ) | \leq C_{2} | \xi |^{2\rho } + C_{3}$, i.e., the second condition in (4.17) holds for each

d \in N

(but $C_{2}$, $C_{3}$ and $\rho $ in that condition are allowed to depend on $d$).

Assumption 5.9

There exist a constant $C_{1} >0$ and

{(ρ_{j})}_{j \in N}

with $\frac{1}{2}<\rho _{j}\leq 1$ such that for each

d \in N

, the symbol $\psi _{X^{d}}$ of the LP $X^{d}$ satisfies that

Re ψ_{X^{d}} (ξ) \geq C_{1} \sum_{j = 1}^{d} {| ξ_{j} |}^{2 ρ_{j}}, \forall ξ \in R^{d} .

(5.30)

Furthermore,

ρ = inf_{j \in N} ρ_{j} > \frac{1}{2} .

The payoff function $\varphi _{d}$ in (5.13) is such that

v_{d} (0, \cdot) = φ_{d} \circ exp \in L^{2} (R^{d})

In comparison to the lower bound in (4.17), the condition (5.30) is restricted to the case $\rho >\frac{1}{2}$. On the other hand, different exponents $\rho _{j}$ are allowed along each component. Furthermore, note that Assumption 5.9 imposes that $C_{1}$ does not depend on the dimension $d$.

Remark 5.10

Consider the pure diffusion case, i.e., when the characteristic triplet is $(A^{d},0,0)$ with a symmetric, positive definite diffusion matrix $A^{d}$ and Lévy symbol

ψ_{X^{d}} : R^{d} \to R

, $\xi \mapsto \xi ^{\top }A^{d} \xi $. A sufficient condition for assumption (5.30) to hold is that the eigenvalues $(\lambda _{i}^{d})_{i=1,\ldots ,d}$ of $A^{d}$ should be lower bounded away from zero, i.e.,

$$ C_{1} = \inf _{i,d} \lambda _{i}^{d} > 0. $$

(5.31)

To see this, write $Q^{\top }A^{d} Q = D$ for a diagonal matrix $D$ containing the eigenvalues of $A^{d}$ and an orthogonal matrix $Q$. Then we obtain for arbitrary

ξ \in R^{d}

that

$$ \begin{aligned} \psi _{X^{d}}(\xi ) = \xi ^{\top }A^{d} \xi = (\xi ^{\top }) Q D Q^{\top }\xi &= \sum _{i=1}^{d} \lambda _{i} (Q^{\top }\xi )_{i}^{2} \\ & \geq \big(\min _{i}{\lambda _{i}} \big) |Q^{\top }\xi |^{2} = \big( \min _{i}{\lambda _{i}} \big) |\xi |^{2} . \end{aligned} $$

Therefore (5.30) is satisfied with $C_{1}$ as in (5.31) and $\rho _{j}=1$ for all

j \in N

This condition imposes in applications that different assets (modelled by different components of the LP $X^{d}$) should not become asymptotically (perfectly) dependent as the dimension grows.

Remark 5.11

Consider characteristic triplets $(A^{d},\gamma ^{d},\nu ^{d})$ and the more general case of non-degenerate diffusion, i.e., with $A^{d}$ satisfying the condition (5.31) from Remark 5.10. Then the real part of the Lévy symbol $\psi _{X^{d}}$ of $X^{d}$ satisfies for all

ξ \in R^{d}

that

Re ψ_{X^{d}} (ξ) = \frac{1}{2} ξ^{⊤} A^{d} ξ - \int_{R^{d}} (cos (ξ^{⊤} y) - 1) ν^{d} (d y) \geq \frac{1}{2} ξ^{⊤} A^{d} ξ \geq C_{1} {| ξ |}^{2}

with $C_{1}$ as in (5.31). Hence Assumption 5.9 is satisfied also in this more general situation. Further examples of LPs satisfying Assumption 5.9 are based on stable-like processes and copula-based constructions as e.g. in Farkas et al. [24].

As we shall see below, Assumption 5.9 ensures good “separation” and “anisotropy” properties of the symbol (5.17) of the corresponding Lévy process $X^{d}$.

For $\tau >0$ satisfying (5.29), we now analyse the regularity of $x\mapsto v_{d}(\tau ,x)$. From Assumption 5.9, we find that for every $\tau >0$, $x\mapsto v_{d}(\tau ,x)$ is in

L^{2} (R^{d})

and that its Fourier transform has the explicit form

$$ \hat{v}_{d}(\tau ,\xi ) = F_{x\to \xi }v_{d}(\tau , \, \cdot \,) = \exp \big(-\tau \psi _{X^{d}}(\xi )\big) \hat{v}_{d}(0,\xi ). $$

(5.32)

For a multi-index

ν = (ν_{1}, \dots, ν_{d}) \in N_{0}^{d}

, denote by $\partial _{x}^{\boldsymbol{\nu }}$ the mixed partial derivative of total order $|{\boldsymbol{\nu }}|=\nu _{1}+ \cdots +\nu _{d}$ with respect to

x \in R^{d}

. Formula (5.32) and Assumption 5.9 can be used to show that for every $\tau >0$, $x\mapsto v_{d}(\tau ,x)$ is analytic at any

x \in R^{d}

. This is of course the well-known smoothing property of the generator of certain non-degenerate Lévy processes. To address the curse of dimensionality, we quantify the smoothing effect in a $d$-explicit fashion.

To this end, with Assumption 5.9, we calculate for any

ν \in N_{0}^{d}

at $x=0$ (by stationarity, the same bounds hold for the Taylor coefficients at any

x \in R^{d}

) that

\begin{aligned} {(2 π)}^{d / 2} | \partial_{x}^{ν} v_{d} (τ, 0) | & = | \int_{ξ \in R^{d}} {(i ξ)}^{ν} {\hat{v}}_{d} (τ, ξ) d ξ | \\ \leq \int_{ξ \in R^{d}} | {\hat{v}}_{d} (0, ξ) | \prod_{j = 1}^{d} {| ξ_{j} |}^{ν_{j}} exp (- τ C_{1} {| ξ_{j} |}^{2 ρ_{j}}) d ξ . \end{aligned}

We use (4.23) with $m:= \nu _{j}$, $\kappa := C_{1} \tau $, $\mu =2 \rho _{j}$ to bound the product as

$$ \prod _{j=1}^{d} |\xi _{j}|^{\nu _{j}} \exp (-\tau C_{1} |\xi _{j}|^{2 \rho _{j}}) \leq \prod _{j=1}^{d} \bigg( \frac{\nu _{j}}{2 \rho _{j} \tau C_{1} e} \bigg)^{\nu _{j}/(2\rho _{j})} . $$

For the Taylor coefficient of order

ν \in N_{0}^{d}

of $v_{d}(t,\,\cdot \,)$ at $x=0$, we arrive at the bound

\begin{aligned} | t_{ν} | & = | \frac{1}{ν!} \partial_{x}^{ν} v_{d} (τ, x) |_{x = 0} | \\ \leq \frac{1}{{(2 π)}^{d / 2}} {∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})} \prod_{j = 1}^{d} \frac{1}{ν_{j}!} {(\frac{ν_{j}}{2 ρ_{j} τ C_{1} e})}^{ν_{j} / (2 ρ_{j})} . \end{aligned}

(5.33)

Stirling’s inequality

n! \geq n^{n} e^{- n} \sqrt{2 π n} \geq n^{n} e^{- n}, \forall n \in N,

implies in (5.33) the bound

| t_{ν} | \leq \frac{1}{{(2 π)}^{d / 2}} {∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})} {({(ν!)}^{- 1} b^{ν})}^{ρ^{'}}, \forall ν \in N_{0}^{d} .

(5.34)

Here, $\rho ' = 1-\frac{1}{2\rho } > 0$ and the positive weight sequence $b= (b_{j})_{j\geq 1}$ is given by $b_{j}= (2 \rho _{j} \tau C_{1})^{-1/(2\rho _{j} \rho ')}$, $j=1,2,\dots $, and multi-index notation is employed: we write ${\boldsymbol{\nu }}^{-{\boldsymbol{\nu }}} = (\nu _{1}^{\nu _{1}} \nu _{2}^{\nu _{2}}\cdots )^{-1}$, $b^{\boldsymbol{\nu }}=b_{1}^{\nu _{1}}b_{2}^{\nu _{2}}\cdots $ and ${\boldsymbol{\nu }}! = \nu _{1}! \, \nu _{2} ! \, \ldots $, with the convention $0!=1$ and $0^{0} = 1$. We raise (5.34) to a power $q>0$ with $q < 1/\rho '$ and sum the resulting inequality over all

ν \in N_{0}^{d}

to estimate (generously)

\begin{aligned} \sum_{ν \in N_{0}^{d}} {| t_{ν} |}^{q} & \leq \frac{{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})}^{q}}{{(2 π)}^{d q / 2}} \sum_{ν \in N_{0}^{d}} {(\frac{1}{ν!} b^{ν})}^{q ρ^{'}} \\ \leq \frac{{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})}^{q}}{{(2 π)}^{d q / 2}} \sum_{ν \in N_{0}^{d}} {(\frac{| ν |!}{ν!} b^{ν})}^{q ρ^{'}} . \end{aligned}

To obtain the estimate (5.34), one could also use the $L^{2}$-bound with the explicit constant derived in (5.27), (5.28).

Under hypothesis (5.30) and for $\tau >0$ satisfying (5.29), $q$-summability of the Taylor coefficients follows; indeed,

\begin{aligned} \sum_{ν \in N_{0}^{d}} {| t_{ν} |}^{q} & \leq \frac{{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})}^{q}}{{(2 π)}^{d q / 2}} \sum_{ν \in N_{0}^{d}} {(\frac{| ν |!}{ν!} b^{ν})}^{q ρ^{'}} \\ \leq \frac{{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})}^{q}}{{(2 π)}^{d q / 2}} \sum_{k = 0}^{\infty} \sum_{ν \in N_{0}^{d} : | ν | = k} {(\frac{| ν |!}{ν!} {(2 ρ τ C_{1})}^{- k / (2 ρ ρ^{'})})}^{q ρ^{'}} \\ \leq \frac{{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})}^{q}}{{(2 π)}^{d q / 2}} \sum_{k = 0}^{\infty} {(2 ρ τ C_{1})}^{- q k / (2 ρ)} \sum_{ν \in N_{0}^{d} : | ν | = k} {(\frac{| ν |!}{ν!})}^{q ρ^{'}} . \end{aligned}

Using that $|{\boldsymbol{\nu }}|! \geq {\boldsymbol{\nu }}!$ and that $1 \geq q \rho ' >0 $, the multinomial theorem yields

\sum_{ν \in N_{0}^{d} : | ν | = k} {(\frac{| ν |!}{ν!})}^{q ρ^{'}} \leq \sum_{ν \in N_{0}^{d} : | ν | = k} \frac{| ν |!}{ν!} = d^{k} .

Hence, provided that

$$ \tau > \tau _{0}(d) := \frac{d^{2\rho /q}}{2 \rho C_{1}}, $$

(5.35)

it follows that

{∥ (t_{ν}) ∥}_{ℓ^{q} (N_{0}^{d})}^{q} = \sum_{ν \in N_{0}^{d}} {| t_{ν} |}^{q} \leq \frac{{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})}^{q}}{{(2 π)}^{d q / 2}} \frac{1}{1 - d {(2 ρ τ C_{1})}^{- q / (2 ρ)}} .

(5.36)

Therefore, we have proved $q$-summability of the Taylor coefficients of the map $x\mapsto v_{d}(\tau ,x)$ at $x=0$ for any $\tau > \tau _{0}(d)$ as in (5.35). The $q$-norm

{∥ (t_{ν}) ∥}_{ℓ^{q} (N_{0}^{d})}

is bounded independently of $d$ provided that $\tau > \tau _{0}(d)$ and

{∥ {\hat{v}}_{d} (0, \cdot) ∥}_{L^{1} (R^{d})} {(2 π)}^{- d / 2}

is bounded independently of $d$.

The $q$-summability (5.36) of the Taylor coefficients of $x\mapsto v_{d}(\tau ,x)$ at $x=0$ with $q=1$ implies for $\tau > \tau _{0}(d)$ absolute, pointwise convergence in the cube $[-1,1]^{d}$ of

v_{d} (τ, x) = \sum_{ν \in N_{0}^{d}} t_{ν} x^{ν}, x^{ν} = x_{1}^{ν_{1}} x_{2}^{ν_{2}} \dots .

(5.37)

Furthermore, as was shown in Schwab and Zech [49, Lemma 2.8], the fact that the sequence $( t_{\boldsymbol{\nu }})$ is $q$-summable for some $0 < q < 1$ and the coefficient bound (5.34) imply that for $\tau > \tau _{0}(d)$ with $\tau _{0}(d)$ as defined in (5.35), there exists a sequence

{(Λ_{n})}_{n \geq 1} \subseteq N_{0}^{d}

of nested, downward closed (i.e., if ${\boldsymbol{e}}_{j}\in \Lambda _{n}$, then ${\boldsymbol{e}}_{i}\in \Lambda _{n}$ for all $0\leq i\leq j$) multi-index sets

Λ_{n} \subseteq N_{0}^{d}

with $\#(\Lambda _{n}) \leq n$ such that general polynomial chaos (gpc) approximations given by the partial sums

$$ v_{d}^{\Lambda _{n}}(\tau ,x) = \sum _{{\boldsymbol{\nu }}\in \Lambda _{n}} t_{\boldsymbol{\nu }}x^{\boldsymbol{\nu }}$$

converge at the dimension-independent rate $r=1/q-1$ (see e.g. Cohen et al. [15, Lemma 5.5]), i.e.,

sup_{x \in {[- 1, 1]}^{d}} | v_{d} (τ, x) - v_{d}^{Λ_{n}} (τ, x) | \leq \sum_{ν \in N_{0}^{d} ∖ Λ_{n}} | t_{ν} | \leq n^{- (1 / q - 1)} {∥ (t_{ν}) ∥}_{ℓ^{q} (N_{0}^{d})} .

The summability (5.36) of the coefficients in the Taylor gpc expansion (5.37) also implies quantitative bounds on the expression rates of ReLU DNNs. With [49, Theorem 2.7, (ii)], we find that there exists a constant $C>0$ independent of $d$ such that

$$ \sup _{{\boldsymbol{\nu }}\in \Lambda _{n}} |{\boldsymbol{\nu }}|_{1} \leq C (1+\log n ) . $$

We now refer to [49, Theorem 3.9] (with $q$ in place of $p$ in the statement of that result) and, observing that in the proof of that theorem, only the $p$-summability of the Taylor coefficient sequence $( t_{\boldsymbol{\nu }})$ was used, we conclude that for $\tau >0$ satisfying (5.35), there exists a constant $C>0$ that is independent of $d$ and for every

n \in N

, there exists a ReLU DNN $\tilde{v}_{d}^{n}$ with input dimension $d$ such that

$$\begin{aligned} M(\tilde{v}_{d}^{n}) &\leq C \big(1+n\log n \log (\log n )\big), \\ L(\tilde{v}_{d}^{n}) &\leq C\big(1+\log n \log (\log n )\big), \\ \sup _{x \in [-1,1]^{d}}| v_{d}(\tau ,x) - \mathrm{R}(\tilde{v}_{d}^{n})(x) | &\leq C n^{-(1/q-1)} . \end{aligned}$$

(5.38)

6 Conclusion and generalisations

We have proved that prices of European-style derivative contracts on baskets of $d\geq 1$ assets in exponential Lévy models can be expressed by ReLU DNNs to accuracy $\varepsilon > 0$ with DNN size polynomially growing in $\varepsilon ^{-1}$ and $d$, thereby overcoming the curse of dimensionality. The technique of proof is based on probabilistic arguments and provides expression rate bounds that scale algebraically in terms of the DNN size. We have then also provided an alternative, analytic argument that allows proving exponential expressivity of ReLU DNNs of the option price, i.e., of the map $s\mapsto u(t,s)$ at any fixed time $0< t< T$, with the DNN size growing polynomially with respect to $\log \varepsilon $ to achieve accuracy $\varepsilon > 0$. For sufficiently large $t>0$, based on analytic arguments involving parabolic smoothing and sparsity of generalised polynomial chaos expansions, we have established in (5.38) a second, algebraic expression rate bound for ReLU DNNs that is free from the curse of dimensionality. In forthcoming work (Gonon and Schwab [28]), we address PIDEs (5.13) with non-constant coefficients. In addition, the main result of the present paper, Theorem 5.1, could be extended in the following directions.

First, the expression rates are almost certainly not optimal in general; for high-dimensional diffusions, which are a particular case with $A^{d} = I$ and $\nu ^{d} = 0$, we have established in Elbrächter et al. [22] for particular payoff functions a spectral expression rate in terms of the DNN size, free from the curse of dimensionality.

Next, solving Hamilton–Jacobi partial integro-differential equations (HJPIDEs for short) by DNNs: It is classical that the Kolmogorov equation for the exponential LP $X^{d}$ in Sect. 2.2 is in fact a special case of an HJPIDE (see e.g. Barles et al. [2], Barles and Imbert [3]). In forthcoming work [28], we aim at proving that the expression rate bounds obtained in Sect. 5 imply corresponding expression rate bounds for ReLU DNNs which are free from the curse of dimensionality for viscosity solutions of general HJPIDEs associated to the LP $X^{d}$ and for its exponential counterparts.

Barriers: We have considered payoff functions corresponding to European-style contracts. Here, the stationarity of the LP $X^{d}$ and exponential Lévy modelling have allowed to reduce our analysis to Cauchy problems of the Kolmogorov equations of $X^{d}$ in

R^{d}

. In the presence of barriers, option prices in Lévy models in general exhibit singularities at the barriers. More involved versions of the Fourier-transform-based representations are available (involving a so-called Wiener–Hopf factorisation of the Fourier symbol; see e.g. Boyarchenko and Levendorskiĭ [13]). For LPs $X^{d}$ with bounded exponential moments, the present regularity analysis may be localised to compact subsets, well separated from the barriers, subject to an exponentially small localisation error term; see Hilber et al. [32, Chap. 10.5]. Here, the semiheavy tails of the LPs $X^{d}$ enter crucially in the analysis. We therefore expect the present DNN expression rate bounds to remain valid also for barrier contracts, at least far from the barriers, for the LPs $X^{d}$ considered here.

Dividends: We have assumed throughout that contracts do not pay dividends. However, including a dividend stream (with constant rate over $(0,T]$) on the underlying does not change the mathematical arguments; we refer to Lamberton and Mikou [38, Sect. 3.1] for a complete statement of exponential Lévy models with constant dividend payment rate $\delta > 0$, and for the corresponding pricing of European- and American-style contracts for such models.

American-style contracts: Deep-learning-based algorithms for the numerical solution of optimal stopping problems for Markovian models have been recently proposed in Becker et al. [8]. For the particular case of American-style contracts in exponential Lévy models, [38] provide an analysis in the univariate case and establish qualitative properties of the exercise boundary $\{ (b(t),t): 0< t< T \}$. Here, for geometric Lévy models, in certain situations ($d=1$, i.e., single risky asset, monotonic, piecewise analytic payoff function), the option price as a function of

x \in R

at fixed $0< t< T$ is shown in [38] to be a piecewise analytic function which is globally Hölder-continuous with a possibly algebraic singularity at the exercise boundary $b(t)$. This holds likewise for the price expressed in the logarithmic coordinate $x=\log s $. The ReLU DNN expression rate of such functions has been analysed in Opschoor et al. [42, Sect. 5.4]. In higher dimensions $d>1$, recently also higher Hölder-regularity of the price in symmetric, stable Lévy models has been obtained for smooth payoffs in Barrios et al. [4].

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

next article Complete and competitive financial markets in a complex world

Applebaum, D.: Lévy Processes and Stochastic Calculus, 2nd edn. Cambridge University Press, Cambridge (2009) MATHCrossRef

Barles, G., Buckdahn, R., Pardoux, E.: Backward stochastic differential equations and integral-partial differential equations. Stoch. Stoch. Rep. 60, 57–83 (1997) MathSciNetMATHCrossRef

Barles, G., Imbert, C.: Second-order elliptic integro-differential equations: viscosity solutions’ theory revisited. Ann. Inst. Henri Poincaré, Anal. Non Linéaire 25, 567–585 (2008) MathSciNetMATHCrossRef

Barrios, B., Figalli, A., Ros-Oton, X.: Free boundary regularity in the parabolic fractional obstacle problem. Commun. Pure Appl. Math. 71, 2129–2159 (2018) MathSciNetMATHCrossRef

Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993) MathSciNetMATHCrossRef

Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Spec. Issue Comput. Learn. Theory), 463–482 (2002) MathSciNetMATH

Beck, C., Gonon, L., Jentzen, A.: Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations. Working paper (2020). Available online at arXiv:2003.00596

Becker, S., Cheridito, P., Jentzen, A.: Deep optimal stopping. J. Mach. Learn. Res. 20, 74 (2019). MathSciNetMATH

Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalisation error: empirical risk minimisation over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. SIAM J. Math. Data Sci. 2, 631–657 (2020) MathSciNetCrossRef

10.

Bertoin, J.: Lévy Processes. Cambridge University Press, Cambridge (1996) MATH

11.

Böttcher, B., Schilling, R., Wang, J.: Lévy Matters. III. Lévy-Type Processes: Construction, Approximation and Sample Path Properties. Lecture Notes in Mathematics, vol. 2099. Springer, Cham (2013) MATHCrossRef

12.

Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013) MATHCrossRef

13.

Boyarchenko, S., Levendorskiĭ, S.: Barrier options and touch-and-out options under regular Lévy processes of exponential type. Ann. Appl. Probab. 12, 1261–1298 (2002) MathSciNetMATHCrossRef

14.

Buehler, H., Gonon, L., Teichmann, J., Wood, B.: Deep hedging. Quant. Finance 19, 1271–1291 (2019) MathSciNetMATHCrossRef

15.

Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best $N$-term Galerkin approximations for a class of elliptic sPDEs. Found. Comput. Math. 10, 615–646 (2010) MathSciNetMATHCrossRef

16.

Cont, R., Tankov, P.: Financial Modelling with Jump Processes. Chapman & Hall/CRC, Boca Raton (2004) MATH

17.

Cont, R., Voltchkova, E.: A finite difference scheme for option pricing in jump diffusion and exponential Lévy models. SIAM J. Numer. Anal. 43, 1596–1626 (2005) MathSciNetMATHCrossRef

18.

Cont, R., Voltchkova, E.: Integro-differential equations for option prices in exponential Lévy models. Finance Stoch. 9, 299–325 (2005) MathSciNetMATHCrossRef

19.

E, W., Wojtowytsch, S.: Some observations on partial differential equations in Barron and multi-layer spaces. Working paper (2021). Available online at arXiv:2012.01484

20.

Eberlein, E., Glau, K.: Variational solutions of the pricing PIDEs for European options in Lévy models. Appl. Math. Finance 21, 417–450 (2014) MathSciNetMATHCrossRef

21.

Eberlein, E., Kallsen, J.: Mathematical Finance. Springer, Cham (2019) MATHCrossRef

22.

Elbrächter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. Constr. Approx. (2021). https://doi.org/10.1007/s00365-021-09541-6, forthcoming CrossRef

23.

Esche, F., Schweizer, M.: Minimal entropy preserves the Lévy property: how and why. Stoch. Process. Appl. 115, 299–327 (2005) MATHCrossRef

24.

Farkas, W., Reich, N., Schwab, C.: Anisotropic stable Lévy copula processes—analytical and numerical aspects. Math. Models Methods Appl. Sci. 17, 1405–1443 (2007) MathSciNetMATHCrossRef

25.

Gerber, H.U., Shiu, E.S.W.: Option pricing by Esscher transforms. Trans.—Soc. Actuar. 46, 99–191 (1994)

26.

Glau, K.: A Feynman–Kac-type formula for Lévy processes with discontinuous killing rates. Finance Stoch. 20, 1021–1059 (2016) MathSciNetMATHCrossRef

27.

Gonon, L., Grohs, P., Jentzen, A., Kofler, D., Šiška, D.: Uniform error estimates for artificial neural network approximations for heat equations. IMA J. Numer. Anal. (2021, forthcoming). https://doi.org/10.1093/imanum/drab027. Available online at arXiv:1911.09647

28.

Gonon, L., Schwab, C.: Deep ReLU neural networks overcome the curse of dimensionality for partial integro-differential equations. Working paper (2021). Available online at arXiv:2102.11707

29.

Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. Mem. Am. Math. Soc. (2021, forthcoming). Available online at arXiv:1809.02362

30.

Grohs, P., Hornung, F., Jentzen, A., Zimmermann, P.: Space-time error estimates for deep neural network approximations for differential equations. Working paper (2019). Available online at arXiv:1908.03833

31.

Hilber, N., Reich, N., Schwab, C., Winter, C.: Numerical methods for Lévy processes. Finance Stoch. 13, 471–500 (2009) MathSciNetMATHCrossRef

32.

Hilber, N., Reichmann, O., Schwab, C., Winter, C.: Computational Methods for Quantitative Finance. Finite Element Methods for Derivative Pricing. Springer, Heidelberg (2013) MATHCrossRef

33.

Hörmander, L.: An Introduction to Complex Analysis in Several Variables. Van Nostrand, Princeton (1966) MATH

34.

Ito, K., Reisinger, C., Zhang, Y.: A neural network based policy iteration algorithm with global $H^{2}$-superlinear convergence for stochastic games on domains. Found. Comput. Math. 21, 331–374 (2021) MathSciNetMATHCrossRef

35.

Jeanblanc, M., Klöppel, S., Miyahara, Y.: Minimal $f^{q}$-martingale measures of exponential Lévy processes. Ann. Appl. Probab. 17, 1615–1638 (2007) MathSciNetMATHCrossRef

36.

Kallsen, J., Tankov, P.: Characterisation of dependence of multidimensional Lévy processes using Lévy copulas. J. Multivar. Anal. 97, 1551–1572 (2006) MATHCrossRef

37.

Krantz, S.G., Parks, H.R.: A Primer of Real Analytic Functions. Birkhäuser, Basel (1992) MATHCrossRef

38.

Lamberton, D., Mikou, M.: The critical price for the American put in an exponential Lévy model. Finance Stoch. 12, 561–581 (2008) MathSciNetMATHCrossRef

39.

Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin (2011). Reprint of the 1991 edition MATH

40.

Matache, A.M., von Petersdorff, T., Schwab, C.: Fast deterministic pricing of options on Lévy driven assets M2AN. Math. Model. Numer. Anal. 38, 37–71 (2004) MathSciNetMATHCrossRef

41.

Nualart, D., Schoutens, W.: Backward stochastic differential equations and Feynman–Kac formula for Lévy processes, with applications in finance. Bernoulli 7, 761–776 (2001) MathSciNetMATHCrossRef

42.

Opschoor, J.A.A., Petersen, P.C., Schwab, C.: Deep ReLU networks and high-order finite element methods. Anal. Appl. 18, 715–770 (2020) MathSciNetMATHCrossRef

43.

Opschoor, J.A.A., Schwab, C., Zech, J.: Exponential ReLU DNN expression of holomorphic maps in high dimension. Constr. Approx. 18 (2021). https://doi.org/10.1007/s00365-021-09542-5

44.

Petersen, P., Voigtlaender, F.: Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw. 108, 296–330 (2018) MATHCrossRef

45.

Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Anal. Appl. 18, 951–999 (2020) MathSciNetMATHCrossRef

46.

Rodino, L.: Linear Partial Differential Operators in Gevrey Spaces. World Scientific, River Edge (1993) MATHCrossRef

47.

Ruf, J., Wang, W.: Neural networks for option pricing and hedging: a literature review. J. Comput. Finance 24, 1–45 (2020)

48.

Sato, K.: Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge (1999) MATH

49.

Schwab, C., Zech, J.: Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ. Anal. Appl. 17, 19–55 (2019) MathSciNetMATHCrossRef

Title: Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models
Authors: Lukas Gonon
Christoph Schwab
Publication date: 31-08-2021
Publisher: Springer Berlin Heidelberg
Published in: Finance and Stochastics / Issue 4/2021
Print ISSN: 0949-2984
Electronic ISSN: 1432-1122
DOI: https://doi.org/10.1007/s00780-021-00462-7

Springer Professional

Abstract

Publisher’s Note

1 Introduction

2 Exponential Lévy models and PIDEs

2.1 Lévy processes

2.2 Exponential Lévy models

2.3 PIDEs for option prices

3 Deep neural networks (DNNs)

4 DNN approximations for univariate Lévy models

4.1 DNN expression rates: probabilistic argument

4.2 DNN expression of European calls

4.3 ReLU DNN exponential expressivity

4.4 Summary and discussion

5 DNN approximation rates for multivariate Lévy models

5.1 DNN expression rate bounds via probabilistic arguments

5.2 Discussion of related results

5.3 Exponential ReLU DNN expression rates via PIDEs

5.4 Breaking the curse of dimensionality

5.4.1 Barron space analysis

5.4.2 Parabolic smoothing and sparsity of chaos expansions

6 Conclusion and generalisations

Publisher’s Note

Other articles of this Issue 4/2021

Additive logistic processes in option pricing

Complete and competitive financial markets in a complex world

Scenario-based risk evaluation

Càdlàg semimartingale strategies for optimal trade execution in stochastic order book models