nach oben

Calcolo

Erschienen in:

Open Access 01.03.2022

Randomised one-step time integration methods for deterministic operator differential equations

verfasst von: Han Cheng Lie, Martin Stahn, T. J. Sullivan

Erschienen in: Calcolo | Ausgabe 1/2022

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Uncertainty quantification plays an important role in problems that involve inferring a parameter of an initial value problem from observations of the solution. Conrad et al. (Stat Comput 27(4):1065–1082, 2017) proposed randomisation of deterministic time integration methods as a strategy for quantifying uncertainty due to the unknown time discretisation error. We consider this strategy for systems that are described by deterministic, possibly time-dependent operator differential equations defined on a Banach space or a Gelfand triple. Our main results are strong error bounds on the random trajectories measured in Orlicz norms, proven under a weaker assumption on the local truncation error of the underlying deterministic time integration method. Our analysis establishes the theoretical validity of randomised time integration for differential equations in infinite-dimensional settings.

The research of HCL and MS has been partially funded by the Deutsche Forschungsgemeinschaft (DFG)—Project-ID 318763901—SFB1294. The authors thank the anonymous reviewer for their constructive and helpful feedback.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The numerical solution of deterministic dynamical systems is an important task in many applications where the dynamical system is a spatiotemporal field that satisfies a partial differential equation (PDE). In this case, the field can be viewed as a function u mapping to an infinite-dimensional real separable Banach space $(V,\left| \cdot \right| _V)$, and the dynamical system is described by a deterministic operator differential equation initial value problem on a finite time interval [0, T] for some initial condition $\vartheta$:

$$\begin{aligned} u(0)=\vartheta ,\quad u'(t)=f(t,u(t)), \quad t\in [0,T]. \end{aligned}$$

Operator differential equations have been applied in peridynamics and elastic materials, e.g. [13, 25]. The purpose of this paper is to analyse the error of randomised time integration methods for solving such initial value problems. The methods are of the form

$$\begin{aligned} U_{k+1}:=\psi (h,U_{k})+\xi _k(h),\quad k\in \{0,\ldots ,N-1\}, \end{aligned}$$

where $\psi (h,U_{k})$ represents the output of a deterministic time integration method with time step h corresponding to the input $U_{k}$, and $\xi _k(h)$ is a V-valued random variable whose distribution depends on h. Our motivation for considering these methods comes from Bayesian inverse problems.

In many applications, the initial value problem depends on a parameter $\theta ^*$—for example, the initial condition $\vartheta$, or a parameter appearing in the vector field f—and it is of interest to infer the value of $\theta ^*$ given some observational data y, where y results from some fixed measurement process. Let $\varTheta$ and ${\mathcal {Y}}$ denote the set of feasible parameter values and the set of feasible data values respectively. We assume that $\varTheta$ is a Banach space and ${\mathcal {Y}}$ is a Hilbert space. Let S denote the solution operator that maps every $\theta '\in \varTheta$ to the solution of the corresponding initial value problem, and let O denote the observation operator that maps every continuous trajectory in V to the corresponding output ${\tilde{y}}\in {\mathcal {Y}}$ of the fixed measurement process. Then the inference problem is to determine the value of the unknown true parameter $\theta ^*$ given noisy data of the form

$$\begin{aligned} y=O\circ S(\theta ^*)+\eta , \end{aligned}$$

where $\eta$ is often assumed to be a centred Gaussian random variable with known, positive-definite covariance operator $\Gamma$. In general, the inverse problem is ill-posed, and one can apply deterministic or statistical approaches to solving the inverse problem.

In the Bayesian approach to inverse problems, one assumes that $\varTheta$ can be equipped with a probability measure $\mu _{0}$, called the ‘prior’. Let $G:=O\circ S:\varTheta \rightarrow {\mathcal {Y}}$ denote the parameter-to-observable map. The Bayesian solution to the inverse problem is given by the ‘posterior’ probability measure $\mu ^{y}$ on $\varTheta$, which satisfies

$$\begin{aligned} \mu ^{y}(\mathrm {d}\theta ')=\frac{1}{Z(y)} \exp \left( -\frac{1}{2}\left\| y-G(\theta ') \right\| _{\Gamma }^{2}\right) \mu _{0}(\mathrm {d}\theta ') \end{aligned}$$

where $\left\| x \right\| _{\Gamma }^{2}=\left\langle x,\Gamma ^{-1} x \right\rangle _{{\mathcal {Y}}}$ and Z(y) is a normalisation constant. The posterior is important because one can use it to perform uncertainty quantification for the unknown parameter $\theta ^*$. See [34, Section 2.4] for a presentation of the Bayesian approach to inverse problems posed on vector spaces.

For many differential equations arising in applications, one must approximate the exact solution operator S using another operator ${\tilde{S}}$ that results from a discretisation of the initial value problem. This leads to an approximation ${\tilde{G}}:=O\circ {\tilde{S}}$ of the parameter-to-observable map, which in turn leads to an approximation ${\tilde{\mu }}^{y}$ of the exact posterior $\mu ^{y}$ defined above. For a fixed data vector y and prior $\mu _{0}$, the error in ${\tilde{S}}$ is propagated via Bayes’ theorem to an error in ${\tilde{\mu }}^{y}$. Since the posterior is fundamental for performing inference on the unknown parameter $\theta ^*$, one seeks a principled way to take into account the discretisation error in ${\tilde{S}}$.

Under some assumptions, a bound on the error $G-{\tilde{G}}$ with respect to some appropriate norm can be used to prove a bound on the error in the posterior, as measured by the Hellinger metric, e.g. [34, Corollary 4.9]. Stability bounds of this type ensure that the approximate posterior ${\tilde{\mu }}^{y}$ converges in the Hellinger metric to the exact posterior $\mu ^{y}$, in the limit as the discretisation error vanishes. While this property ensures that we can ignore the error in the posterior in the limit of increasingly finer discretisations, it does not indicate how to treat the error in the posterior for a fixed discretisation.

One approach is to ignore the discretisation error. This approach is not ideal from the point of view of statistical inference, because the approximate posterior ${\tilde{\mu }}^{y}$ can be tightly concentrated around the wrong parameter values, even in the small-noise limit. This phenomenon of ‘overconfidence’ is undesirable for uncertainty quantification. See Sect. 1.1 below.

The approach presented in [10] approaches the problem of accounting for the discretisation error, by applying the standard procedure of using random variables as proxies for unknown quantities. Let $\psi (h,v)$ denote the output of applying a time integration method for time step h to the state v, for a fixed time step $h=T/N>0$, $N\in \mathbb {N}$. Consider the error $u(h)-\psi (h,u(0))$ between the exact solution and the numerical solution, incurred over one time step. Since the one-step error is unknown, we model it using a random variable $\xi _0(h)$. Thus,

$$\begin{aligned} u(h)\approx \psi (h,u(0))+\xi _0(h) =:U_1. \end{aligned}$$

If we model the one-step error for subsequent steps in a similar way, then this leads to the randomised time integration methods stated at the beginning of this section.

1.1 Illustration of overconfidence phenomenon

Consider the standard heat equation on a bounded domain $D\subset \mathbb {R}^{d}$ with homogeneous Dirichlet boundary conditions, written as the operator differential equation

$$\begin{aligned} u(0)=\vartheta \in H,\quad u'(t)+A u(t)=0,\quad t\in [0,h], \end{aligned}$$

where A is the negative Laplacian, $H=L^2(D)$, and $h>0$. In [34, Section 3.5], one considers the inverse problem of inferring the initial condition $\vartheta$ from a noisy observation of the solution at a later time. We shall use the assumptions and the approach stated there. The parameter-to-observable map is $G:H\rightarrow H$, $v\mapsto e^{-hA}v$. The data y is a realisation of the random variable

$$\begin{aligned} Y=G(\vartheta )+\delta ^{1/2}\eta =G\vartheta +\delta ^{1/2}\eta \end{aligned}$$

where the noise $\eta$ is a Gaussian random variable with distribution ${\mathcal {N}}(0,\Gamma_{\textup{obs }})$. The noise scaling $\delta$ is assumed to be known, and the small noise limit corresponds to $\delta \rightarrow 0$. For the unknown parameter $\vartheta$, we use the Gaussian prior $\mu _0={\mathcal {N}}(m_0,\Gamma _{0})$. The positive-definite covariance operators $\Gamma_{\textup{obs}}$ and $\Gamma_{0}$ are chosen so that 1) draws from ${\mathcal {N}}(0,\Gamma_{\textup{obs} })$ and from $\mu _{0}$ are H-valued, almost surely; and 2) $\Gamma_{0}$ is an appropriate negative fractional power of A. Applying [34, Theorem 6.20] to the jointly Gaussian random variable $(U,G(U)+\delta ^{1/2}\eta )$ with $U\sim \mu _0$ yields the Gaussian posterior measure $\mu ^{y}$ with mean and covariance

$$\begin{aligned} m&= m_0+\Gamma_{0} G (\delta \Gamma_{\textup{obs} }+G\Gamma _{0} G )^{-1} (y-Gm_0) \\ {\mathcal {C}}&= \Gamma_{0}-\Gamma_{0}G (\delta \Gamma_{\textup{obs} }+G\Gamma_{0} G )^{-1}G\Gamma_{0}. \end{aligned}$$

In the $\delta \rightarrow 0$ limit, $y\rightarrow G\vartheta$. Using this fact and the assumptions on $\Gamma_{0}$, it follows that ${\mathcal {C}}\rightarrow 0$ and $m\rightarrow \vartheta$ in the $\delta \rightarrow 0$ limit. Since Gaussian measures are completely characterised by their mean and covariance, the convergence of ${\mathcal {C}}$ and m implies the weak convergence (in the sense of probability measures) of the posterior measure to the Dirac measure at the true initial condition $\vartheta$ as $\delta \rightarrow 0$. This convergence captures the concentration of the posterior $\mu ^y$ around the true unknown $\vartheta$, and validates the Bayesian approach to the inverse problem.

Now suppose we approximate G using the map ${\tilde{G}}$ defined by a single step of the implicit Euler method, ${\tilde{G}}:H\rightarrow H$, $v\mapsto (I+hA)^{-1} v$. Applying [34, Theorem 6.20] as we did earlier with ${\tilde{G}}$ instead of G yields the associated approximate posterior ${\tilde{\mu }}^{y}$, which is Gaussian with mean and covariance

$$\begin{aligned} {\tilde{m}}&= m_0+\Gamma_{0} {\tilde{G}} (\delta \Gamma_{\textup{obs} }+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1} (y-{\tilde{G}}m_0) \\ \tilde{ {\mathcal {C}}}&= \Gamma_{0}-\Gamma _{0}{\tilde{G}} (\delta \Gamma_{\textup{obs} }+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1}{\tilde{G}}\Gamma_{0}. \end{aligned}$$

In the $\delta \rightarrow 0$ limit, $\tilde{{\mathcal {C}}}\rightarrow 0$, but ${\tilde{m}}\rightarrow {\tilde{G}}^{-1}G \vartheta \ne \vartheta$. Thus, the approximate posterior ${\tilde{\mu }}^{y}$ converges weakly in the small noise limit to a biased Dirac measure. This demonstrates the overconfidence phenomenon. The bias ${\tilde{G}}^{-1}G\vartheta -\vartheta$ in the limiting Dirac measure is the local truncation error of the implicit Euler method.

To address the overconfidence phenomenon, we use a random variable as a proxy for the unknown bias. Consider the randomised implicit Euler method given by ${\widehat{G}}(v):={\tilde{G}}v+h^{p+1}\zeta$, where $\zeta \sim {\mathcal {N}}(0,\Gamma_{1})$ is independent of the observation noise $\eta$, and $\Gamma_{1}$ is chosen so that draws from ${\mathcal {N}}(0,\Gamma_{1})$ are H-valued almost surely. By rewriting ${\widehat{G}}(U)+\delta ^{1/2}\eta ={\tilde{G}}U+(h^{p+1}\zeta +\delta ^{1/2}\eta )$ and applying [34, Theorem 6.20], it follows that the associated deterministic posterior ${\widehat{\mu }}^{y}$ is Gaussian, with mean and covariance

$$\begin{aligned} {\widehat{m}}&= m_0+\Gamma_{0} {\tilde{G}} (\delta \Gamma_{\textup{obs} }+h^{2p+2}\Gamma_{1}+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1} (y-{\tilde{G}}m_0) \\ \widehat{ {\mathcal {C}}}&= \Gamma_{0}-\Gamma _{0}{\tilde{G}} (\delta \Gamma_{\textup{obs} }+h^{2p+2}\Gamma_{1}+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1}{\tilde{G}}\Gamma_{0}. \end{aligned}$$

In the $\delta \rightarrow 0$ limit, ${\widehat{C}}$ does not converge to zero, because of the additional $h^{2p+2}\Gamma_{1}$ term. However, in the limit as $h,\delta \rightarrow 0$, the bias ${\tilde{G}}^{-1}G\vartheta -\vartheta$ associated to ${\tilde{G}}$ vanishes. Hence ${\widehat{m}}\rightarrow \vartheta$ and ${\widehat{C}}\rightarrow 0$. For fixed $h>0$, the additional $h^{2p+2}\Gamma_{1}$ term ensures that the deterministic approximate posterior ${\widehat{\mu }}^{y}$ associated to the randomised implicit Euler method ${\widehat{G}}$ is more ‘spread out’ than the approximate posterior ${\tilde{\mu }}^{y}$ associated to the non-randomised implicit Euler method ${\tilde{G}}$. In this way, the problem of overconfidence is mitigated.

1.2 Main contributions

In this paper, we rigorously prove strong forward error bounds for randomised one-step time integration methods applied to operator differential equations. Our work builds on the approach for proving the error bounds in $L^2$ of [10, Theorem 2.2] and the error bounds in $L^R$—for user-specified $R\in \mathbb {N}$—of [20, Theorem 3.5]. These bounds were stated for initial value problems formulated in $\mathbb {R}^{d}$, where the associated exact flow maps are globally Lipschitz, and where the randomised time integrators are generated using uniform time grids and numerical methods $\psi$ that satisfy a uniform local truncation error assumption.

The error bounds that we prove in this paper generalise the existing error bounds in multiple aspects. Our bounds are valid for time-dependent vector fields, non-uniform time grids (i.e. variable time steps), and operator differential equations that are formulated on Banach spaces or on Gelfand triples. In Theorem 3.7, we show that one can obtain strong error bounds in $L^R$ for $R>1$, without the assumption of uniform local truncation error of the numerical method, and without the assumption that the flow map of the initial value problem is globally Lipschitz. In fact, we show that one can obtain strong error bounds in more general Orlicz norms. The bounds that we prove in this paper demonstrate that the paradigm of randomised time integration extends in a natural way to the time integration for PDEs with time-dependent coefficients. Moreover, the proofs we give for our main results are simpler than the proofs of the corresponding results given in [20].

A related but distinct contribution that we make is to consider the setting where the random variables used in the randomisation are independent and centred. We generalise the $L^2$ uniform error bound [20, Theorem 3.4] for centred and independent randomisation—which was proven in the setting of ODEs in $\mathbb {R}^{d}$—to the setting of operator differential equations on Gelfand triples, under weaker assumptions on the time integration map $\psi$. We address the question of whether it is possible to obtain better error bounds under these additional assumptions. This question was implicit in the analysis of [20], but was not addressed there.

Randomised time integration methods for differential equations have been studied extensively in the context of ‘probabilistic numerics’. For some reviews of research in this area, see [9, 17, 27]. In probabilistic numerics, ODEs have been considered from many perspectives, including structure- or symmetry-preserving methods [1, 40], Bayesian modelling of the unknown solution with Gaussian processes [5, 10, 33, 36, 38, 40], data-based statistical estimation of discretisation error [24, 35], and filtering [19, 38]. The papers [10, 20] cited earlier also belong to this context. For PDEs, methods based on Bayesian inference and Gaussian processes [6, 8, 10, 28, 31, 39], multiscale techniques [29], and random meshes [2] have been studied. The research area of ‘information field dynamics’ [11, 14] also considers probabilistic simulation schemes for PDEs by using Gaussian processes and information theoretic ideas.

Random approximate posteriors arising from randomised solution operators for differential equations have been studied in [21, Section 5] under a strong assumption of exponentially integrable discretisation error $S-{\tilde{S}}$, and more recently under a weaker square integrability hypothesis in [15].

Two aspects differentiate the problem we consider from the problems considered in numerical methods for stochastic evolution equations. The most important aspect is that the operator differential equation of interest in this paper is deterministic. Thus, our context is fundamentally different from the context of numerical integration methods for stochastic differential equations and numerical integration methods for random differential equations. The second aspect is that the random variables used in the randomisation need not be constructed using i.i.d. copies of a Wiener process or Lévy process.

1.4 Overview

We introduce notation and some recurring objects in the next section. In Sect. 2, we consider the setting where the initial value problem is formulated on a Banach space. The main result is the strong error bound in Orlicz norm proven in Theorem 2.8 under the assumption of uniform local truncation error of the time integration method $\psi$.

In Sect. 3, we consider the setting where the initial value problem is formulated on a Gelfand triple, and where $\psi$ satisfies a weaker local truncation error assumption. This setting is considered in the variational approach to PDEs. We prove strong $L^2$ error bounds for mutually independent and centred randomisation in Sect. 3.1. In Sect. 3.2, we discuss the feasibility of obtaining $L^R$ bounds for $R>2$ that are of the same order in the time step h, under the same assumptions of independence and centredness. In Sect. 3.3, we state in Theorem 3.7 a strong error bound in Orlicz norm without assuming independence or centredness.

In Sect. 4, we show that the assumptions we make in Sect. 3 are reasonable for a class of operator differential equations that includes the heat equation on a $C^2$ bounded domain.

We conclude in Sect. 5. In the appendices, we collect material that is useful for the main part of the paper.

1.5 Notation and setup

Below, $(V,\left| \cdot \right| _V)$ and $(H,\left\langle \cdot ,\cdot \right\rangle _H)$ denote a real separable Banach space and a real separable Hilbert space respectively. We write $\left| \cdot \right| _H$ for the Hilbert space norm. All integrals are Bochner integrals unless otherwise stated. We define $C^1([0,T];V) :=\{ u \in C([0,T];V) \, | \, u' \in C([0,T];V)\}$ and equip it with the norm $\left\| u \right\| _{1,\infty } = \left\| u \right\| _{\infty } + \left\| u' \right\| _{\infty }$ where $\left\| u \right\| _{\infty }:=\sup _{t\in [0,T]}\vert u(t) \vert _V$. We define the space $C^1([0,T]; H)$ analogously.

All random variables will be defined on a common probability space $(\varOmega ,{\mathcal {F}},\mathbb {P})$. We denote expectation with respect to $\mathbb {P}$ by $\mathbb {E}[\cdot ]$ and write $X\sim \mu$ to mean that X has $\mu$ as its distribution. For a V-valued random variable X and $R\ge 1$, we shall write $\left\| X \right\| _{L^R(\varOmega ;V)}:=\mathbb {E}[\vert X \vert _V^R]^{1/R}$. Similarly, if X is H-valued, then $\left\| X \right\| _{L^R(\varOmega ;H)}:=\mathbb {E}[\left| X\right| _H^R]^{1/R}$. For a Young function $\varPsi :\mathbb {R}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}$, the corresponding Orlicz norm¹$\left\| \cdot \right\| _{\varPsi }$ of a $\mathbb {R}$-valued random variable Z is defined by

$$\begin{aligned} \left\| Z \right\| _{\varPsi }:=\inf \{k\in (0,\infty )\ :\ \mathbb {E}[\varPsi(\vert Z \vert /k)]\le 1\}. \end{aligned}$$

If Z is a V-valued (respectively, H-valued) random variable, then $\left\| Z \right\| _{\varPsi (\varOmega ;V)}:=\left\| \vert Z \vert _V \right\| _{\varPsi }$ (resp. $\left\| Z \right\| _{\varPsi (\varOmega ;H)}:=\left\| \vert Z \vert _H \right\| _{\varPsi }$). The $\left\| \cdot \right\| _{\varPsi (\varOmega ;V)}$ norm includes as a special case the $\left\| \cdot \right\| _{L^R(\varOmega ;V)}$ norm when $R>1$, but not when $R=1$. The analogous statement holds for the $\left\| \cdot \right\| _{\varPsi (\varOmega ;H)}$ norm. An important choice of Young function $\varPsi$ is given by $\varPsi _2(z):=\exp (z^2)-1$, because finiteness of $\left\| X \right\| _{\varPsi _2}$ implies that X is sub-Gaussian and hence exponentially square integrable.

We write $p\wedge q=\min \{p,q\}$ for $p,q\in \mathbb {R}$. For $h>0$, $p\ge 0$, and $a=a(h)\in \mathbb {R}$, we write $a={\mathcal {O}}(h^p)$ to mean that $\vert a \vert \le Ch^p$ for some h-independent term $C>0$. Given $N\in \mathbb {N}$, $[N]:=\{1,\ldots ,N\}$ and $[N]_0:=[N]\cup \{0\}=\{0,1,\ldots , N\}$.

Throughout the paper, we consider the following initial value problem on a deterministic time interval [0, T],

$$\begin{aligned} u(0)=\vartheta ,\quad u'(t)=f(t,u(t)),\quad t\in [0,T] \end{aligned}$$

(1.1)

for fixed $T>0$ and suitable initial condition $\vartheta$. We specify the domain and codomain of f in the following sections. We denote by $\varphi$ the exact flow map associated to (1.1) as follows: for suitable $h\in [0,T]$, $t\in [0,T-h]$, and $u_s$,

$$\begin{aligned} \varphi (h,t,u_s)=u_s+\int _{t}^{t+h}f(\tau ,\varphi (\tau ,t,u_s))\,\mathrm {d}\tau . \end{aligned}$$

(1.2)

We equip the time interval [0, T] in (1.1) with a time grid $(t_k)_{k\in [N]_0}$, where

$$\begin{aligned} 0=:t_0<t_1<\cdots <t_N:=T,\quad h_{k}:=t_{k+1}-t_k,\quad h:=\max _{k\in [N-1]_0}h_k. \end{aligned}$$

(1.3)

From (1.3) it follows that for any $\tau \ge 0$,

$$\begin{aligned} \sum _{\ell \in [N-1]_0}h_\ell ^{\tau +1}\le h^\tau \sum _{\ell \in [N-1]_0}h_\ell =h^\tau T. \end{aligned}$$

(1.4)

Given (1.2), the exact sequence $(u(t_k))_{k\in [N]_0}$ associated with the time grid satisfies

$$\begin{aligned} u(t_{k+1})=\varphi (h_k,t_k,u(t_k)),\quad k\in [N-1]_0. \end{aligned}$$

(1.5)

We denote by $\psi$ the approximate flow map associated to a time integration method, and define a deterministic approximating sequence $(u_k)_{k\in [N]_0}$ by

$$\begin{aligned} u_{k+1} :=\psi (h_k, t_k,u_k),\quad u_0=\vartheta . \end{aligned}$$

Let $(\xi _k)_{k\in \mathbb {N}_0}$ be a sequence of stochastic processes, where each $\xi _k$ is a stochastic process on $[0,\infty )$. In Sect. 2 (respectively, Sect. 3), each $\xi _k$ takes values in the Banach space V (resp. the Hilbert space H). Given the time grid in (1.3), we use $(\xi _k(h_k))_{k\in [N-1]_0}$ as a randomisation sequence in order to define the random approximating sequence $(U_k)_{k\in [N]_0}$ by

$$\begin{aligned} U_{k+1} :=\psi (h_k,t_k,U_k) + \xi _k(h_k),\quad k\in [N-1]_0 \end{aligned}$$

(1.6)

for a given random variable $U_0$. The sequence of errors $(e_k)_{k\in [N]_0}$ of the random approximating sequence (1.6) with respect to the exact sequence (1.5) is defined by

$$\begin{aligned} e_0=u(0)-U_0,\quad e_{k+1}:=u(t_{k+1})-U_{k+1},\quad k\in [N-1]_0. \end{aligned}$$

By (1.5) and (1.6), we obtain

$$\begin{aligned} e_{k+1}=\varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)-\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}$$

(1.7)

The equation (1.7) shall be the starting point for our error analysis.

2 Classical setting

In this section, we prove the generalisation of [10, Theorem 2.2] and [20, Theorem 3.5] to the setting of a time-dependent vector field f on an infinite-dimensional, real, separable Banach space V. We assume that the vector field f in (1.1) satisfies $f :[0,T] \times V \rightarrow V$. In addition, we assume that for every initial condition $\vartheta \in V$, there exists a unique classical solution $u \in C^1([0, T];V)$. For example, if f is continuous and uniformly Lipschitz in the second argument, then this assumption is satisfied, and $\varphi$ exists [12, Satz 7.2.6].

We state the assumptions needed to prove the main result of this section. The first is a Lipschitz continuity assumption on the exact flow map.

Assumption 2.1

The exact flow map $\varphi$ admits a constant $L_\varphi >0$ such that for any $t\in [0,T]$, for every $h\ge 0$ such that $t+h\le T$, and for every $x,y\in V$,

$$\begin{aligned} \left| \varphi (h,t,x)-\varphi (h,t,y)\right| _V \le (1+L_\varphi h) \left| x-y\right| _V. \end{aligned}$$

If f is uniformly Lipschitz in the second argument, then Assumption 2.1 is satisfied [12, Satz 7.3.4].

Ideally, the deterministic sequence $(u_k)_k$ approximates the exact sequence $(u(t_k))_k$ well. We make this precise by introducing the following uniform local truncation error assumption.

Assumption 2.2

The approximate flow map $\psi$ admits constants $0<h^*<\infty$, $0<C_{\varphi ,\psi }<\infty$, and $q\ge 0$, such that for all $0<h\le h^*$,

$$\begin{aligned} \sup _{\begin{array}{c} v \in V\\ t \in [0,T-h] \end{array}} \left| \varphi (h,t,v) - \psi (h,t,v) \right| _V \le C_{\varphi ,\psi } h^{q+1} \; . \end{aligned}$$

The parameter $h^*$ is included in order to account for implicit time integration methods that provide a unique output whenever the time step is small enough. In order to achieve an order of $q\ge 1$ for the truncation error, one usually requires higher regularity of f or equivalently higher regularity for the solution u [16, Section III.2, Theorem 2.4]. For classical one-step methods, the corresponding analysis extends to infinite-dimensional Banach spaces; see Appendix B.

The assumptions above are similar to [10, Assumption 2] and [20, Assumption 3.1, 3.2]. Note that Assumption 2.2 is restrictive, because it requires uniformity in t and v. For example, in [10], the analogous assumption is justified under the assumption that $f :\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}$ is sufficiently smooth and sufficiently many of its derivatives are uniformly bounded. However, Assumption 2.2 is not satisfied in general. For example, in the setting where the operator differential equation is given by $u'(t)=Au(t)\in H$ for a Hilbert space H and the infinitesimal generator A of an analytic semigroup with domain $\textup{Dom} (A)$, and $\psi$ is given by the implicit Euler method, there exists $C>0$ such that for all $\vartheta \in \textup{Dom} (A)$, $n\in \mathbb {N}$, and all sufficiently small $h>0$,

$$\begin{aligned} \left| \varphi (nh,0,\vartheta )-\psi (nh,0,\vartheta )\right| _H\le C h\left| A\vartheta \right| _H, \end{aligned}$$

see [37, Theorem 7.1].

For equations of the form (1.1) derived from PDEs and fixed time argument t, the right-hand side f is in many cases not Lipschitz from V to V. Furthermore, one cannot in general expect that (1.1) admits a classical solution $u\in C^1([0,T];V)$, because a classical solution requires regularity assumptions on the problem data that need not hold in general. In Sect. 3, we will consider vector fields f that do not satisfy the assumptions above. This will lead us to consider variational solutions of (1.1). There are other approaches to generalise the classical setting to problems with less regularity, e.g. mild solutions, but they are outside the scope of this paper.

2.1 Randomisation sequence

Recall the random approximating sequence $(U_k)_k$ defined in (1.6). In this section, we shall assume that each $\xi _k$ is a V-valued stochastic process indexed by $[0,\infty )$, and we shall assume $U_0$ is a V-valued random variable. Below, we shall impose the following regularity assumption on the $(\xi _k)_{k\in \mathbb {N}_0}$.

For the remainder of Sect. 2, we shall shorten notation and write $\left\| Z \right\| _{\varPsi }$ instead of $\left\| Z \right\| _{\varPsi (\varOmega ;V)}$ for any V-valued random variable Z.

Assumption 2.3

The collection $(\xi _k)_{k\in \mathbb {N}_0}$ admits an Orlicz norm $\left\| \cdot \right\| _{\varPsi }$ and constants $p\ge 0$ and $0<C_\xi <\infty$, such that for all $k\in \mathbb {N}_0$ and $t>0$,

$$\begin{aligned} \left\| \xi _k(t) \right\| _{\varPsi }\le C_\xi t^{p+1}. \end{aligned}$$

The assumption allows the stochastic processes to be non-Gaussian, to be probabilistically dependent, and to have different distributions and nonzero means. Furthermore, Assumption 2.3 allows for $\xi _k(t)$ to have different orders of integrability. The rates at which the absolute moments decrease to zero as t decreases to zero may differ as well. The function $\varPsi$ quantifies the maximal common order of integrability, and the parameter p quantifies the maximal common decay rate with respect to $\left\| \cdot \right\| _{\varPsi }$.

Assumption 2.3 generalises [20, Assumption 3.3], which in turn generalised [10, Assumption 1]. The latter two assumptions considered the $\left\| \cdot \right\| _{R}$ norm for $R\in \mathbb {N}$ and the $\left\| \cdot \right\| _{2}$ norm of $\mathbb {R}^{d}$-valued random variables respectively.

We recall the motivation given in [10] for the additive random perturbation in (1.6) and in particular for Assumption 2.3. Comparing (1.5) and (1.6) yields

$$\begin{aligned} u(t_1) = u(0)+\int _{0}^{h_0} f(s,u(s))\,\mathrm {d}s\approx \psi (h_0,0,u(0))+\xi _0(h_0) = U_1. \end{aligned}$$

Thus, the random variable $\xi _0(h_0)$ models the uncertainty in the value of the integral term due to the fact that the value of the solution u over the time interval $[0,h_0]$ is known only at time 0, and not at every time s in the interval $[0,h_0]$.

It is desirable that the approximation above is good with high probability. Given that any reasonable choice of $\psi$ must satisfy $\lim _{h_0\rightarrow 0}\psi (h_0,0,u_0)= u_0$, a necessary condition for the approximation above to be good with high probability is that the law of $\xi _0(h_0)$ concentrates around 0 as $h_0\rightarrow 0$, because the integral term $\int _{0}^{h_0}f(s,u(s))\,\mathrm {d}s\rightarrow 0$ as $h_0\rightarrow 0$. Using Assumption 2.3 with Markov’s inequality yields that for every $\varepsilon >0$,

$$\begin{aligned} \mathbb {P}(\left| \xi _k(t)\right| _V\ge \varepsilon )\le \left( \frac{ C_\xi t^{p+1}}{\varepsilon }\right) ^r. \end{aligned}$$

The inequality above shows that the parameter p quantifies the maximal common rate at which all the laws $(\mathbb {P}\circ (\left| \xi _k(t)\right| _V)^{-1})_{k}$ contract around the Dirac measure at zero, as t decreases to zero.

In [10, 20], the parameter p is chosen in order to ensure that the error of the random approximate solution sequence $(U_k)_{k}$ with respect to the exact sequence $(u(t_k))_{k}$ decreases with h at the same rate as the error of the deterministic approximate solution sequence $(u(t_k))_{k}$. This choice is motivated by the goal of showing that probabilistic integrators can have the same convergence rate as the underlying deterministic one-step method.

Recall that if V is a separable Banach space and $\mu$ is a Gaussian measure whose support equals V, then the Cameron–Martin space of $\mu$ is dense in V, and hence there exists a V-valued Wiener process $(W(t))_{t\ge 0}$ associated to $\mu$ such that $W(1)\sim \mu$ [4, Theorem 3.6.1, Proposition 7.2.3].² The next lemma shows that there exists a large class of Gaussian processes that satisfies Assumption 2.3.

Lemma 2.4

Let $\mu$ be a Gaussian distribution with support equal to V, and let $(W(t))_{t\ge 0}$ be a Wiener process associated to $\mu$ such that $W(1)\sim \mu$. Let $\xi$ be a stochastic process on $[0,\infty )$ defined by $t\mapsto \xi (t):=t^{p+1/2}W(t)$, and let $(\xi _k)_{k\in \mathbb {N}_0}$ be i.i.d. copies of $\xi$. Then

$$\begin{aligned} \left\| \xi (t) \right\| _{\varPsi }= \left\| \xi (1) \right\| _{\varPsi }t^{p+1}, \end{aligned}$$

(2.1)

for $\left\| \cdot \right\| _{\varPsi }=\left\| \cdot \right\| _{R}$, $R>1$, or $\left\| \cdot \right\| _{\varPsi }=\left\| \cdot \right\| _{\varPsi _2}$, $\varPsi _2(z):=\exp (z^2)-1$.

Proof

For $t>0$, we have $\left\| \xi (t) \right\| _{\varPsi }=t^{p+1/2}\left\| W(t) \right\| _{\varPsi }=t^{p+1}\left\| W(1) \right\| _{\varPsi }$. The first equation follows from the definition of $\xi (t)$, and the second equation follows from the scaling property of the Wiener process, i.e. that $W(t)=t^{1/2}W(1)$ in distribution for every $t>0$. The conclusion follows since $W(1)=\xi (1)$ as random variables, and because Gaussian random variables are exponentially square integrable by Fernique’s theorem. $\square$

Remark 2.5

The preceding discussion shows that a collection of i.i.d. copies of the standard Wiener process W satisfies Assumption 2.3 with $p=-1/2$, in which case we may set $\xi _k(h_k)$ in (1.6) to be a centred Gaussian random variable with variance proportional to $h_k$. This choice yields a time integration method that resembles methods for stochastic differential equations. However, for the error bound in Theorem 2.8 below to imply convergence in probability of $(U_n)_n$ to the exact solution sequence $(u(t_k))_{k}$, we need $p>0$. This observation highlights an important difference between the type of time integration methods that we analyse in this paper and time integration methods for stochastic differential equations.

2.2 Error bounds

Recall from (1.7) that

$$\begin{aligned} e_{k+1}=\varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)-\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}$$

The following bound is the generalisation of [10, Theorem 2] to our setting.

Lemma 2.6

Suppose that

Assumption 2.1 holds with parameters $L_{\varphi }$,
Assumption 2.2 holds with parameters $h^*$, $C_{\varphi ,\psi }$ and q,
Assumption 2.3 holds with parameters $\left\| \cdot \right\| _{\varPsi }$, p, and $C_\xi$, and
the initial state $U_0$ satisfies $\left\| U_0 \right\| _{\varPsi }<\infty$.

Then for any time grid $(t_k)_k$ such that $0<h\le h^*$, the corresponding error sequence $(e_k)_k$ satisfies

$$\begin{aligned} \max _k\left\| e_k \right\| _{\varPsi }\le \exp (L_\varphi T)\left\| e_0 \right\| _{\varPsi }+\frac{C_{\varphi ,\psi }+C_\xi }{L_\varphi }\left( \exp (L_\varphi T)-1\right) h^{p\wedge q}. \end{aligned}$$

In particular, if $\left\| e_0 \right\| _{\varPsi }=0$, then $\max _k\left\| e_k \right\| _{\varPsi }={\mathcal {O}}(h^{p\wedge q})$.

Proof

It suffices to prove the first statement. Let $k\in [N-1]_0$. From (1.7) we have

$$\begin{aligned} \left| e_{k+1}\right| _V&\le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _V+\left| \xi _k(h_k)\right| _V \nonumber \\&\le \left| \varphi (h_k,t_k,u(t_k))-\varphi (h_k,t_k,U_k)\right| _V+\left| \varphi (h_k,t_k,U_k)-\psi (h_k,t_k,U_k)\right| _V \nonumber \\&\quad +\left| \xi _k(h_k)\right| _V \nonumber \\&\le (1+L_\varphi h_k)\left| e_k\right| _V+C_{\varphi ,\psi }h_k^{q+1}+\left| \xi _k(h_k)\right| _V \end{aligned}$$

(2.2)

where (2.2) follows from Assumptions 2.1 and 2.2. By taking the $\left\| \cdot \right\| _{\varPsi }$ norm of both sides of (2.2), using the triangle inequality, Assumption 2.3, and the bound $h_k\le h$ from (1.3), we obtain

$$\begin{aligned} \left\| e_{k+1} \right\| _{\varPsi }\le (1+L_\varphi h)\left\| e_k \right\| _{\varPsi }+(C_{\varphi ,\psi }+C_\xi )h^{(p\wedge q)+1}. \end{aligned}$$

Applying the discrete Gronwall inequality in Lemma C.1 completes the proof. $\square$

Remark 2.7

In addition to bounds on the strong error $\left\| e_k \right\| _{\varPsi }$, one can prove bounds on the weak error, i.e. bounds of the form

$$\begin{aligned} \vert {\mathbb {E}}[\varPhi (U^h_n)]-\varPhi (u_n)\vert \le Ch^w, \end{aligned}$$

for all sufficiently smooth $\mathbb {R}$-valued functions $\varPhi$. Such bounds were proven in [10, Theorem 2.4] and [1, Section 3], for example. We focus on strong error bounds in this paper.

To prove Lemma 2.6, we take expectations via the $\left\| \cdot \right\| _{\varPsi }$ norm before applying the discrete Gronwall inequality in Lemma C.1 to conclude. By reversing the order of these operations and by using a different discrete Gronwall inequality, we can bound $\left\| \max _k\vert e_k \vert _V \right\| _{\varPsi }$. This yields the result below, which extends [20, Theorem 3.5] to our setting. On one hand, this bound has worse constants than the bound in Lemma 2.6. On the other hand, the bound is stronger, because

$$\begin{aligned} \max _k\left\| e_k \right\| _{\varPsi }\le \left\| \max _k\left| e_k\right| _V \right\| _{\varPsi }, \end{aligned}$$

(2.3)

and because the bound has the same order in h as Lemma 2.6.

Theorem 2.8

Suppose the hypotheses of Lemma 2.6 hold. Then for any time grid $(t_k)_k$ with $0<h\le h^*$, the corresponding error sequence $(e_k)_k$ satisfies

$$\begin{aligned} \left\| \max _k \left| e_k\right| _V \right\| _{\varPsi }\le \left( \left\| e_0 \right\| _{\varPsi }+ C_{\varphi ,\psi }h^q T+C_\xi h^p T\right) \exp \left( L_\varphi T\right) , \end{aligned}$$

In particular, if $\left\| e_0 \right\| _{\varPsi }=0$, then $\left\| \max _k\left| e_k\right| _V \right\| _{\varPsi }={\mathcal {O}}(h^{p\wedge q})$.

Remark 2.9

When $\varPsi (z)=\exp (z^2)-1$, then the strong error bound given in Theorem 2.8 implies the exponential square integrability of the pathwise error $\max _k\left| e_k\right| _V^2$. The exponential square integrability of the pathwise error was used in [21, Section 5] to establish local Lipschitz continuity of random approximate posteriors—measured in the Hellinger metric—with respect to the expected error of the randomised time integrator. In [20], exponential integrability was obtained by considering $\left\| \max _k\left| e_k\right| _V \right\| _{R}$ for all $R\in \mathbb {N}$ and using the series representation of the exponential function. The use of Orlicz norms allows us to exploit the fact that the random approximating sequence $(U_k)_k$ inherits the integrability properties of the collection $(\xi _k)_k$. This leads to a simpler proof of exponential integrability.

Proof of Theorem 2.8

Using (2.2) and applying the discrete Gronwall inequality in Lemma C.3, we obtain for every $k\in [N-1]_0$ that

$$\begin{aligned} \left| e_{k+1}\right| _V\le \left( \left| e_0\right| _V+\sum _{k \in [N-1]_0} \left( C_{\varphi ,\psi }h_k^{q+1}+\left| \xi _k(h_k)\right| _V\right) \right) \exp \left( \sum _{0\le j\le k} L_\varphi h_j\right) . \end{aligned}$$

Since the sum in the exponential increases with k, setting $k=N-1$ above and using (1.3) to obtain $\sum _{j\in [N-1]_0}h_j=T$ yields the ‘pathwise’ bound

$$\begin{aligned} \max _k\left| e_k\right| _V\le \left( \left| e_0\right| _V+ C_{\varphi ,\psi }h^{q}T+\sum _{k\in [N-1]_0}\left| \xi _k(h_k)\right| _V\right) \exp \left( L_\varphi T\right) . \end{aligned}$$

(2.4)

By taking the $\left\| \cdot \right\| _{\varPsi }$ norm of both sides of (2.4), the triangle inequality, Assumption 2.3, and (1.4), we obtain

$$\begin{aligned} \left\| \max _k \left| e_k\right| _V \right\| _{\varPsi } &\le \left( \left\| e_0 \right\| _{\varPsi }+C_{\varphi ,\psi }h^{q}T+\sum _{k\in [N-1]_0}\left\| \xi _k(h_k) \right\| _{\varPsi }\right) \exp \left( L_\varphi T\right) \\& \le \left( \left\| e_0 \right\| _{\varPsi }+C_{\varphi ,\psi }h^qT+C_\xi h^p T\right) \exp \left( L_\varphi T\right) , \end{aligned}$$

which completes the proof. $\square$

Remark 2.10

Under the assumption that $V=\mathbb {R}^{d}$ and under the assumption that the randomisation sequence $(\xi _k(h_k))_k$ consists of centred, independent random variables, [10, Theorem 2] and [20, Theorem 3.4] consider the special case where $\left\| \cdot \right\| _{\varPsi }=\left\| \cdot \right\| _{2}$ in Lemma 2.6 and Theorem 2.8, and establish ${\mathcal {O}}(h^{q\wedge (p+1/2)})$ bounds on the strong error respectively. The order in these bounds is better than the bounds we proved above. However, both the proofs of these results exploit both the inner product structure of $\mathbb {R}^{d}$ and the fact that linear functionals of the $\xi _k$ appear in the expansion of $\vert e_{k+1} \vert ^2_{\mathbb {R}^d}$. In the key inequality (2.2), we cannot exploit an inner product even if it were available, because we only consider $\vert e_{k+1} \vert _{V}$. In Sect. 3, we shall generalise [10, Theorem 2] and [20, Theorem 3.4] from $\mathbb {R}^{d}$ to general Hilbert spaces.

3 Variational setting

For evolution equations originating from PDEs with possibly non-smooth right-hand sides or non-smooth initial conditions, the classical solution theory that we considered in Sect. 2 might not apply, because the requirement that the operator f in (1.1) satisfies $f(t,v)\in V$ for every $v\in V$ and all suitable t might be too strong. For example, this requirement does not hold for the heat equation in Sobolev spaces $W^{k,p}$. There are several settings that extend the classical setting for such problems. In this section, we focus on the variational setting, because it is suitable for numerical time integration methods. In the variational setting, we consider a Gelfand triplet $V \hookrightarrow H \simeq H' \hookrightarrow V'$, which is a sequence of continuous embeddings of a Banach space V into a Hilbert space H that is identified with its dual space $H'$, which is then embedded in the dual space $V'$ of V [41, Proposition 23.13].

In this section, we further specify the operator differential equation (1.1) to be

$$\begin{aligned} u(0)=\vartheta \in H,\quad u'(t) + A(t,u(t)) = b(t)\in V',\quad t\in [0,T] \end{aligned}$$

(3.1)

for a given operator $A :[0,T] \times V\rightarrow V'$ and $b \in L^{p'}(0,T;V')$. The equation (3.1) is written in the form that is common in PDE theory instead of the form used in (1.1), where the right-hand side would be defined by $f(t,u(t)) :=b(t) -A(t,u(t))$. The solution of (3.1) belongs to the space

$$\begin{aligned} {\mathcal {W}}^p (0,T) :=\left\{ u \in L^p (0,T;V) \, \Big | \, u' \in L^{p'}(0,T;V') \, \text {with } \frac{1}{p} + \frac{1}{p'} = 1\right\} , \end{aligned}$$

which is continuously embedded into C([0, T]; H) [12, Satz 8.4.1]. We emphasise that a solution of (3.1) must satisfy the equation only for almost every $t\in [0,T]$, and not for every t.

There are several conditions—e.g. Lipschitz or one-sided Lipschitz conditions, strong positivity, monotonicity, or coercivity — that one can impose on A and b in order to guarantee the existence of a unique variational solution $u \in {\mathcal {W}}^p(0,T) \hookrightarrow C([0,T];H)$ [41, Prop. 23.23]. Under stronger assumptions, higher regularity of u can be achieved [12, Satz 8.5.1]. In some cases, the flow map is continuous and even Lipschitz; see [41, Theorem 23.A] for linear problems and [41, Corollary 23.26] for the time-dependent case.

Recall the definition (1.5) of the sequence $(u(t_k))_{k\in [N]_0}$ of states of the exact solution:

$$\begin{aligned} u(t_{k+1})=\varphi (h_k,t_k,u(t_k)),\quad k\in [N-1]_0, \end{aligned}$$

where $\varphi$ is the flow map associated to the differential equation of interest (3.1). In the variational setting, the flow map $\varphi$ maps $(h,t,u_s)$ with $h\in [0,T]$, $t\in [0, T-h]$, and $u_s\in H$ to a vector $\varphi (h,t,u_s)\in H$. Next, recall that $\psi$ is the approximate flow map associated to a time integration method, and that according to (1.6), we construct the random approximating sequence $(U_k)_{k\in [N]_0}$ according to

$$\begin{aligned} U_{k+1}=\psi (h_k,t_k,U_k)+\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}$$

In this section, we shall assume that the initial condition $U_0$ is a H-valued random variable, and that each $\xi _k$ is a H-valued stochastic process indexed by $[0,\infty )$.

We shall make the following assumptions on $\psi$.

Assumption 3.1

Let $h^*>0$, and let $\psi :[0,h^*]\times [0,T]\times H\rightarrow V$ satisfy the following conditions:

There exists a scalar $q\ge 0$, a function $C_{\varphi ,\psi } :[0,T]\times H\rightarrow (0,\infty )$ that is bounded on bounded subsets, and a dense subset ${\mathcal {D}}\subset H$, such that, for every $h\in [0,h^*]$ and for every $(t,x)\in [0,T-h]\times H$ with $x=\varphi (s,0,\vartheta ')$ for some $s\ge 0$ and $\vartheta '\in {\mathcal {D}}$,

$$\begin{aligned} \left| \varphi (h,t,x)-\psi (h,t,x)\right| _H\le C_{\varphi ,\psi }(t,x)h^{q+1}; \end{aligned}$$

(3.2)

There exists a constant $L_{\psi }>0$ such that for all $(h,t)\in [0,h^*]\times [0,T]$ such that $h+t\leq T$ and for any $x,y\in H$,

$$\begin{aligned} \left| \psi (h,t,x)-\psi (h,t,y)\right| _H\le (1+ L_{\psi }h)\left| x-y\right| _H. \end{aligned}$$

(3.3)

The first statement of Assumption 3.1 means that the one-step error bound (3.2) holds for any x that lies on some solution $u\in C([0,T];H)$ of (3.1), where the initial condition $\vartheta '=u(0)$ belongs to the dense subset ${\mathcal {D}}$. We make the hypothesis of density in order to account for known results concerning error bounds for time integration of PDEs, see e.g. [37, Chapter 7].

The local truncation error (3.2) is a reasonable requirement for any deterministic time integration method $\psi$ and weakens the uniform local truncation error bound of Assumption 2.2. Given (3.2), we define

$$\begin{aligned} \left\| C_{\varphi ,\psi } \right\| _{\infty }:=\left\| C_{\varphi ,\psi } \right\| _{\infty }(\vartheta):=\sup _{t\in [0,T]}C_{\varphi ,\psi }(t,u(t)), \end{aligned}$$

(3.4)

for any solution u of (3.1) with initial condition $\vartheta \in {\mathcal {D}}$. Since the solution u of (3.1) belongs to C([0, T]; H), it is a bounded set. Hence, the first statement of Assumption 3.1 ensures the finiteness of $\left\| C_{\varphi ,\psi } \right\| _{\infty }$ for any $\vartheta\in\mathcal{D}$. The second statement of Assumption 3.1 describes a global Lipschitz continuity property of the approximate flow map $\psi$ with respect to the third argument of the map $\psi$. For the error bounds that we prove in this section, the bounds (3.2) and (3.3) shall play the roles of Assumptions 2.2 and 2.1 respectively in the error bounds of Sect. 2.2. Next, we formulate the analogue of Assumption 2.3 for the collection $(\xi _k)_{k\in \mathbb {N}_0}$ of stochastic processes. For the remainder of Sect. 3, we shall simplify notation and write $\left\| Z \right\| _{\varPsi }$ instead of $\left\| Z \right\| _{\varPsi (\varOmega ;H)}$ for any H-valued random variable Z.

Assumption 3.2

$$\begin{aligned} \left\| \xi _k(t) \right\| _{\varPsi }\le C_\xi t^{p+1}. \end{aligned}$$

The only difference between Assumption 3.2 and Assumption 2.3 is that the stochastic processes are H-valued instead of V-valued.

3.1 $L^2$-error bounds for independent and centred randomisation

In this section, we assume that the $(\xi _k)_{k}$ are mutually independent and centred stochastic processes. In particular, for any time grid (1.3), the corresponding random variables $(\xi _k(h_k))_{k\in [N-1]_0}$ are mutually independent and centred. We shall generalise the $L^2$-error bounds from [10, Theorem 2] and [20, Theorem 3.4] to the variational setting.

For any time grid $(t_k)_{k\in [N]_0}$ and $k\in [N-1]_0$, let ${\mathcal {F}}_k:=\sigma (\xi _j(h_j): j\in [ k]_0)$, i.e. $({\mathcal {F}}_k)_{k\in [N-1]_0}$ is the filtration generated by the randomisation sequence $(\xi _j(h_j))_{j\in [N-1]_0}$.

The following lemma only requires mutual independence of the $(\xi _\ell )_{\ell }$.

Lemma 3.3

Suppose that Assumption 3.1 holds. Let $(t_k)_{k\in [N]_0}$ be an arbitrary time grid. Then for $j\in [N-1]_0$, $U_{j+1}$ is a measurable function of $U_0$ and $\{\xi _\ell (h_\ell )\ :\ \ell \in [j]_0\}$. In particular, if the $(\xi _\ell )_{\ell }$ are mutually independent, then for every $j\in [N-1]$, $\xi _j(h_j)$ and $U_j$ are independent, and $\xi _j(h_j)$ is independent of ${\mathcal {F}}_j$.

Proof

It follows from (3.3) in Assumption 3.1 that, for arbitrary (h, t), $\psi (h,t,z)$ is globally Lipschitz continuous with respect to $z \in H$. Hence, $U_{j+1}$ is a measurable function of $U_{j}$ and $\xi _j(h_j)$, for every $j\in [K-1]_0$. This proves the first statement. The second statement follows from the first and the definition of ${\mathcal {F}}_j$. $\square$

The following result is the generalisation of [10, Theorem 2.2], which considered the case $H=\mathbb {R}^{d}$ for $d\in \mathbb {N}$.

Lemma 3.4

Suppose the following statements are true:

Assumption 3.1 holds with parameters $h^*$, q, $C_{\varphi ,\psi }$, ${\mathcal {D}}$, and $L_\psi$,
Assumption 3.2 holds with parameters $\left\| \cdot \right\| _{\varPsi }:=\left\| \cdot \right\| _{2}$, p, and $C_\xi$,
the $(\xi _j)_{j}$ are mutually independent and centred, and
the initial condition $\vartheta$ of (3.1) belongs to ${\mathcal {D}}$, and $\left\| U_0 \right\| _{2}<\infty$.

Then there exists a $L'_{\psi }>0$ depending only on $L_\psi$, such that for any time grid $(t_k)_k$ satisfying $0<h\le 1\wedge h^*$, the associated error sequence $(e_k)_k$ satisfies

$$\begin{aligned} \max _{k}\left\| e_k \right\| ^2_{2}\le \left( \left\| e_0 \right\| ^2_{2}+ 3T \left\| C_{\varphi ,\psi } \right\| _\infty ^2 T h^{2q}+C_\xi ^2 Th^{2p+1}\right) \exp \left( L'_\psi T\right) . \end{aligned}$$

In particular, if $\left\| e_0 \right\| _{2}=0$, then $\max _{k\in [N]_0}\left\| e_k \right\| _{2}={\mathcal {O}}(h^{q\wedge (p+1/2)})$.

We state the proof below, even though it is very similar to the proof of [10, Theorem 2.2]. This is because the proof will be useful later in Sect. 3.2, where we discuss the feasibility of bounding $\max _{k} \left\| e_k \right\| _{R}$ for $R> 2$ under similar assumptions as Lemma 3.4. An important difference between our proof and the proof of [10, Theorem 2.2] is that the latter assumes uniform truncation error, e.g. as in Assumption 2.2. Instead, we use Assumption 3.1.

Proof of Lemma 3.4

Let $k\in [N-1]_0$. By the definition (1.7) of the error sequence $(e_k)_{k\in [N]_0}$,

$$\begin{aligned} \left| e_{k+1}\right| _H^2&= \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2+\left| \xi _k(h_k)\right| _H^2 \nonumber \\&\quad + 2\left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k), \xi _k(h_k) \right\rangle _H. \end{aligned}$$

(3.5)

Recall the term $\left\| C_{\varphi ,\psi } \right\| _{\infty }$ from (3.4). We obtain

$$\begin{aligned}&\left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2 \nonumber \\&\quad = \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| ^2_H \nonumber \\&\quad \le \left( 1+\left( \tfrac{2}{h_k}\right) \right) C_{\varphi ,\psi }(t_k,u(t_k))^2 h_{k}^{2q+2}+(1+2h_k)(1+L_\psi h_k)^2\left| e_k\right| _H^2 \nonumber \\&\quad \le 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+2h_k)(1+L_\psi h_k)^2\left| e_k\right| _H^2. \end{aligned}$$

(3.6)

The first inequality follows from the hypothesis that the initial condition $\vartheta$ of (3.1) belongs to ${\mathcal {D}}$, since we can then apply the local truncation error bound (3.2) of Assumption 3.1 and Young’s inequality. The second inequality follows from the fact that $h_k\le h\le 1$. By the same fact, there exists $L'_\psi >0$ that depends only on $L_\psi$ such that $(1+2h_k)(1+L_\psi h_k)^2\le 1+L'_\psi h_k$. Using this inequality in (3.6) yields

$$\begin{aligned} \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2\le 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+L'_\psi h_k)\left| e_k\right| _H^2. \end{aligned}$$

(3.7)

Substituting (3.7) into the bound (3.5) on $\vert e_{k+1} \vert _H^2$ yields

$$\begin{aligned} \left| e_{k+1}\right| _H^2 & \le \left( 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+L'_\psi h_k)\left| e_k\right| _H^2\right) +\left| \xi _k(h_k)\right| _H^2 \nonumber \\&\quad + 2\left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k), \xi _k(h_k) \right\rangle _H. \end{aligned}$$

(3.8)

By mutual independence of the $(\xi _j(h_j))_{j \in [N-1]_0}$, it follows from the second statement of Lemma 3.3 that the arguments of the inner product are independent. By taking expectations of (3.8) and centredness of the $(\xi _j(h_j))_{j \in [N-1]_0}$, the expectation of the inner product vanishes. By Assumption 3.2, we have

$$\begin{aligned} \left\| e_{k+1} \right\| ^2_2\le (1+L'_\psi h_k)\left\| e_k \right\| ^2_2+3\left\| C_{\varphi ,\psi } \right\| _\infty ^2h_k^{2q+1}+ C_\xi ^2 h_k^{2p+2}. \end{aligned}$$

Using the discrete Gronwall inequality in Lemma C.3 and (1.4) completes the proof. $\square$

We shall use the next result, Lemma 3.5, to prove Proposition 3.6 below. A similar result to Lemma 3.5 was established in the proof of [20, Theorem 3.4], under the assumption that $\psi$ preserves square integrability of random variables, i.e. that $\psi (Z)\in L^2(\varOmega ;\mathbb {R}^d)$ for every $Z\in L^2(\varOmega ;\mathbb {R}^d)$. Lemma 3.5 removes this assumption, by using Lemma 3.4.

Lemma 3.5

Suppose the hypotheses of Lemma 3.4 hold. Then for any time grid $(t_j)_{j\in [N]_0}$ with $h>0$, the stochastic process $(M_k)_{k\in [N-1]_0}$ defined by

$$\begin{aligned} M_k:=\sum _{j=0}^{k}\left\langle \varphi (h_j,t_j,u(t_j))-\psi (h_j,t_j,U_j),\xi _j(h_j) \right\rangle _H \end{aligned}$$

(3.9)

is a $\mathbb {R}$-valued, square-integrable martingale with respect to $({\mathcal {F}}_k)_{k\in [N-1]_0}$. If in addition the time grid $(t_j)_{j\in [N]_0}$ satisfies $h\le 1\wedge h^*$, then there exists a universal constant $\kappa >0$ such that for every $k\in [N-1]_0$,

$$\begin{aligned} \mathbb {E}\left[ \max _{j\in [k]_0}\left| M_k\right| \right] \le \left\| C_{\varphi ,\psi } \right\| _{\infty }^2h^{2q+1}+\frac{1}{4}\mathbb {E}\left[ \max _{j\in [k]_0}\left| e_j\right| ^2_H\right] +\kappa ^2(1+L'_\psi ) TC_\xi ^2 h^{2p+1}, \end{aligned}$$

(3.10)

for the same $L'_\psi$ given in Lemma 3.4.

Proof

See Sect. D.1 for the proof. $\square$

Next, we use Lemma 3.5 to prove the following error bound, which is stronger than the bound given in Lemma 3.4 because of (2.3).

Proposition 3.6

Suppose the hypotheses of Lemma 3.4 hold. Then for any time grid $(t_k)_{k}$ with $0<h\le 1\wedge h^*$, the corresponding error sequence $(e_k)_k$ satisfies

$$\begin{aligned}&\left\| \max _{k}\left| e_k\right| _H \right\| _{2}^2 \\&\quad \le 2\left( \left\| e_0 \right\| ^2_{2}+4\left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h^{2q}T+ C_\xi ^2 Th^{2p+1} (1+\kappa ^2(1+L'_\psi ))\right) \exp \left( 2L'_\psi T\right) , \end{aligned}$$

for the universal constant $\kappa$ in (3.10) and the constant $L'_\psi$ given in Lemma 3.4. In particular, if $\left\| e_0 \right\| _{2}=0$, then $\left\| \max _{k}\left| e_k\right| _H \right\| _{2}={\mathcal {O}}(h^{q\wedge (p+1/2)})$.

Proof

See Sect. D.2 for the proof. $\square$

3.2 Error bounds of higher integrability order for independent and centred randomisation

It is natural to ask if one can prove the analogues of Lemma 3.4 or Proposition 3.6 where we use $\left\| \cdot \right\| _{R}$, $R>2$, while keeping the same order in h. Suppose that we wish to prove the analogue of Lemma 3.4 for $R=3$. It follows from the triangle inequality and the definition (1.7) that

$$\begin{aligned} \left| e_{k+1}\right| _H\le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H+\left| \xi _k(h_k)\right| _H. \end{aligned}$$

Thus

$$\begin{aligned} \left| e_{k+1}\right| _H^3\le \left| e_{k+1}\right| _H^2\left( \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H+\left| \xi _k(h_k)\right| _H\right) \end{aligned}$$

and substituting (3.5) results in an upper bound on $\left| e_{k+1}\right| _H^3$ containing the mixed product of an inner product term and a norm term,

$$\begin{aligned} \left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k),\xi _k(h_k) \right\rangle _H\left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H. \end{aligned}$$

In general, this product will not vanish in expectation, because one can no longer exploit the commutativity of the inner product with the expectation operator. The same assertion is valid for $R\ge 3$. This is the important difference between the $R=2$ case that was proven in Lemma 3.4 and the case $R\ge 3$. This difference implies that we must use the Cauchy–Schwarz inequality to bound products. Using the Cauchy–Schwarz inequality yields

$$\begin{aligned} \left| e_{k+1}\right| _H^3\le \sum _{i=0}^{3}\begin{pmatrix} 3 \\ i\end{pmatrix} \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^i\left| \xi _k(h_k)\right| _H^{3-i}. \end{aligned}$$

We can obtain the same bound by applying the binomial theorem to the bound $\left| e_{k+1}\right| _H\le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H+\left| \xi _k(h_k)\right| _H$.

If the stochastic processes $(\xi _k)_k$ are mutually independent, then we may use the second statement of Lemma 3.3. Assuming that $e_0=0$ almost surely and taking expectations of the summand for $i=2$ yields

$$\begin{aligned}&\mathbb {E}\left[ \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2\left| \xi _k(h_k)\right| _H\right] \\&\quad = \mathbb {E}\left[ \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2\right] \mathbb {E}\left[ \left| \xi _k(h_k)\right| _H\right]&\qquad \text {by independence } \\&\quad \le \left( {\mathcal {O}}(h_{k}^{2q+1})+(1+L'_\psi h_k)\mathbb {E}\left[ \left| e_k\right| _H^2\right] \right) C_\xi h_k^{p+1}&\qquad \text {by}\, (3.7), \hbox { Assumption }3.2 \\&\quad \le \left( {\mathcal {O}}(h_{k}^{2q+1})+ {\mathcal {O}}(h^{2q})+{\mathcal {O}}(h^{2p+1})\right) C_\xi h_k^{p+1}&\text { by Lemma } 3.4. \end{aligned}$$

This yields a bound on $\left\| e_{k+1} \right\| _3^{3}$ by a term that is ${\mathcal {O}}(h^{(2q)\wedge (2p+1)+p+1})$. Applying a discrete Gronwall inequality produces a bound on $\max _{k\in [N]}\left\| e_k \right\| _{3}^{3}$ that is ${\mathcal {O}}(h^{(2q)\wedge (2p+1)+p})$. Since this upper bound on the exponent arises from the mixed product mentioned above, and since such mixed products will arise in any expansion of $\vert e_{k+1} \vert _H^R$, we cannot expect to prove that $\max _{k}\left\| e_k \right\| _{R}={\mathcal {O}}(h^{q\wedge (p+1/2)})$ for $R>2$ using the techniques that we applied earlier, even if the $(\xi _k)_{k}$ are mutually independent and centred.

For the $L^3$ analogue of Proposition 3.6, the fact that terms involving inner products do not vanish in expectation also poses a problem. This is because the proof of the $L^2$ case in Proposition 3.6 relies on the bound (3.10) in Lemma 3.5 on the martingale $(M_k)_k$. This bound in turn follows from the Burkholder–Davis–Gundy inequality for martingales [32, Chapter IV, §4, Theorem (4.1)]. For the $L^3$ case, the expectations of products containing an inner product term do not vanish, because one can no longer exploit commutativity of the inner product with the expectation operator, due to the mixed product. As a result, the martingale $(M_k)_k$ does not appear, and one cannot apply the Burkholder–Davis–Gundy inequality to prove a bound similar to (3.10). Instead, one must apply the Cauchy–Schwarz inequality or the binomial theorem, as we did above. This results in a bound on $\left\| \max _k\left| e_k\right| _H \right\| _{3}$ that is worse than ${\mathcal {O}}(h^{q\wedge (p+1/2)})$.

3.3 Error bounds of higher integrability order without independence or centredness assumptions

In this section, we prove a strong error bound for a general Orlicz norm instead of for the $\left\| \cdot \right\| _2$-norm. We use the same hypotheses as for Lemma 3.4 and Proposition 3.6, except that we do not assume mutual independence or centredness of the stochastic processes $(\xi _k)_{k\in \mathbb {N}_0}$.

Theorem 3.7

Suppose the following statements are true:

Assumption 3.1 holds with parameters $h^*$, q, $C_{\varphi ,\psi }$, ${\mathcal {D}}$, and $L_\psi$,
Assumption 3.2 holds with parameters $\left\| \cdot \right\| _{\varPsi }$, p, and $C_\xi$, and
the initial condition $\vartheta$ of (3.1) belongs to ${\mathcal {D}}$, and $\left\| U_0 \right\| _{\varPsi }<\infty$.

Then for any time grid $(t_k)_k$ with $0<h\le h^*$, the corresponding error sequence $(e_k)_k$ satisfies

$$\begin{aligned} \left\| \max _k\left| e_k\right| _H \right\| _{\varPsi } \le \left( \left\| e_0 \right\| _{\varPsi }+\left\| C_{\varphi ,\psi } \right\| _{\infty }h^q T+C_\xi h^p T\right) \exp \left( L_\psi T\right) . \end{aligned}$$

In the results from Sect. 3.1, we required that the maximal time step h associated to the time grid satisfies $h\le 1\wedge h^*$. In Theorem 3.7, we only require that $h\le h^*$. The discussion of exponential integrability in Remark 2.9 also applies to Theorem 3.7.

Proof of Theorem 3.7

Recall (1.7):

$$\begin{aligned} e_{k+1}=\varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)-\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}$$

By the triangle inequality, and by (3.2) and (3.3) from Assumption 3.1,

$$\begin{aligned}&\left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H \\&\quad \le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))\right| _H+\left| \psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H \\&\quad \le \left\| C_{\varphi ,\psi } \right\| _{\infty }h_k^{q+1}+(1+L_\psi h_k)\vert e_k \vert _H. \end{aligned}$$

From this it follows that

$$\begin{aligned} \left| e_{k+1}\right| _H\le \left\| C_{\varphi ,\psi } \right\| _{\infty }h_k^{q+1}+(1+L_\psi h_k)\left| e_k\right| _H+\left| \xi _k(h_k)\right| _H. \end{aligned}$$

(3.11)

Applying Lemma C.3 and using the same arguments that yielded (2.4), we obtain the analogous pathwise bound

$$\begin{aligned} \max _k\left| e_k\right| _H\le \left( \left| e_0\right| _H+\left\| C_{\varphi ,\psi } \right\| _{\infty }h^{q}T+\sum _{k\in [N-1]_0}\left| \xi _k(h_k)\right| _H\right) \exp \left( L_\psi T\right) . \end{aligned}$$

Taking the $\left\| \cdot \right\| _{\varPsi }$ norm of both sides and applying Assumption 3.2 completes the proof. $\square$

Remark 3.8

The inequality (3.11) in the proof of Theorem 3.7 closely resembles the inequality (2.2), which we used to prove Theorem 2.8. The key difference results from adding $0=\psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))$ before applying the triangle inequality to derive (3.11); for (2.2), we added $0=\varphi (h_k,t_k,U_k)-\varphi (h_k,t_k,U_k)$ instead. The decomposition we use for (3.11) enables us to exploit the weaker local truncation error bound (3.2) in Assumption 3.1 instead of the uniform local truncation error bound in Assumption 2.2.

4 Example: heat equation

Consider the heat equation on a $C^2$ bounded domain $D\subset \mathbb {R}^{d}$ with homogeneous Dirichlet boundary conditions

$$\begin{aligned} u(0) = u_0, \quad \partial _t u - \text {div} ({\mathcal {E}} \nabla u ) = b \, \text { on } [0,T] \times D, \end{aligned}$$

(4.1)

where ${\mathcal {E}}:[0,T]\times D\rightarrow \mathbb {R}^{d\times d}$ is a sufficiently smooth elliptic diffusion tensor. Upon multiplying the PDE by a test function and using integration by parts, the left-hand side of the PDE yields a bilinear form a(u(t), v), which allows us to rewrite the problem above as the operator differential equation

$$\begin{aligned} u(0)=u_0\in H, \quad u'(t) + A u(t) = b \in V'\, \end{aligned}$$

(4.2)

with spaces $H=L^2(D)$, $V=H^1_0(D)$, and $V'=H^{-1}(D)$. The bounded, linear operator $A\, : \, V \rightarrow V'$ is induced by the bilinear form $a(\cdot ,\cdot )$ on $V\times V$ according to $a(u,v) = \left\langle Au, v \right\rangle _{V'\times V}$, where $\left\langle \cdot ,\cdot \right\rangle _{V'\times V}$ denotes the dual pairing. For the particular PDE considered above, the operator A is strongly positive with constant $\mu >0$ on $V\times V$.

In this section, we will show that the results that we proved for the variational setting in Sect. 3 are valid for parabolic PDEs and the implicit Euler method, by showing that the conditions (3.2) and (3.3) from Assumption 3.1 are satisfied. We shall consider the more general setting of parabolic PDEs with possibly time-dependent coefficients, because this analysis includes the setting of time-independent coefficients — and hence the heat equation stated above—as a special case.

Let $L(V,V')$ be the set of all linear mappings from V to $V'$. Consider a mapping $a :[0,T] \times V \times V \rightarrow {\mathbb {R}}$ that is bilinear in the second and third argument. This mapping induces a collection $(A(t))_t\subset L(V,V')$ according to

$$\begin{aligned} \left\langle A(t) u(t), v \right\rangle _{V'\times V} = a(t,u(t),v),\quad \forall v \in V. \end{aligned}$$

Now we pose the following standard assumptions on a and state their equivalent formulation in terms of A.

Assumption 4.1

For fixed t, $a(t,\cdot , \cdot )$ is a bilinear form, and for fixed $u,v\in V$, $a(\cdot , u,v)$ is measurable. Equivalently, for every t, $A(t)\in L(V,V')$ is linear and $t\mapsto A(t)$ is measurable.

There exists $\beta >0$ such that for every (t, u, v), $a(t,u,v) \le \beta \left| u \right| _V \vert v \vert _V$. Equivalently, for every t we have $\left\| A(t) \right\| _{L(V,V')} \le \beta$.

A Gårding inequality holds, i.e. there exist $\mu >0$, $\kappa \ge 0$ such that

$$\begin{aligned} a(t,u,u) \ge \mu \left| u\right| _V^2 - \kappa \left| u\right| _H^2, \quad \forall (t,u)\in [0,T]\times V. \end{aligned}$$

(4.3)

Equivalently, for every $t\in [0,T]$, $A(t) + \kappa I \in L(V,V')$ is strongly positive.

For the special case of the heat equation (4.1) where ${\mathcal {E}}$ is the identity matrix, the first statement of Assumption 4.1 holds since ${\mathcal {E}}$ is constant. By definition of the bilinear form a and the spaces H and V, the second statement holds with $\beta =1$, and the third statement holds with equality for $\kappa =0$ and $\mu =1$.

Consider the implicit Euler scheme

$$\begin{aligned} \psi (h,t,v):=(I+h {\bar{A}}_{h,t})^{-1}(h{\bar{b}}_{h,t}+v), \end{aligned}$$

(4.4)

for $0<h\le h^*$, $0\le t\le T-h$ and $v\in H$. We specify an interval of suitable values of $h^*$ in Sect. 4.2. Above, ${\bar{A}}_{h,t}$ and ${\bar{b}}_{h,t}$ denote Steklov time averages of the linear operators $(A(t))_{t}$ and the right-hand side b respectively,

$$\begin{aligned} {\bar{A}}_{h,t}:=\frac{1}{h}\int _{t}^{t+h}A(s)\,\mathrm {d}s,\quad {\bar{b}}_{h,t}:=\frac{1}{h}\int _{t}^{t+h}b(s)\,\mathrm {d}s, \end{aligned}$$

where the integrals in the definitions of ${\bar{A}}_{h,t}$ and ${\bar{b}}_{h,t}$ are Bochner–Lebesgue integrals in $L(V,V')$ and $V'$ respectively. The existence of $\psi (h,t,v) \in V$ for $(h{\bar{b}}_{h,t}+v) \in V'$ is guaranteed by the Lax–Milgram theorem; see e.g. [7, Section 6.2]. For every suitable (h, t), the operator ${\bar{A}}_{h,t}$ inherits the properties of A stated in Assumption 4.1.

For the heat equation (4.1), $t\mapsto A(t)$ and $t\mapsto b(t)$ are constant. Therefore, ${\bar{A}}_{h,t}=A$ and ${\bar{b}}_{h,t}=b$, and (4.4) simplifies to $\psi (h,t,v):=(I+h A)^{-1}(hb+v)$.

4.1 Local truncation error condition

We verify the local truncation error condition (3.2) in Assumption 3.1, for $\psi$ as given in (4.4). Recall the definition (1.5) of $(u(t_k))_{k}$ and that $(u_k)_{k}$ is defined by $u_0=\vartheta$, $u_{k+1} :=\psi (h_k, t_k,u_k)$ for $k\in [N-1]_0$. Under the assumption that $(b-u')' \in L^2(0,T;V')$, the result [12, Satz 8.3.6] yields for any initial condition $\vartheta \in H$

$$\begin{aligned} \left| u_k - u(t_k) \right| _H^2 + \mu \sum _{j=1}^k h_j \left| u_j - u(t_j) \right| _V^2 \le \frac{h^2}{3 \mu } \left| (b-u')'\right| _{L^2(0,T;V')}^2, \end{aligned}$$

where $\mu$ is the constant from positivity assumption on A (4.3). Thus, (3.2) holds with $q=0$ and $C_{\varphi ,\psi }(t,x)=(3\mu )^{-1/2}\vert (b-u')' \vert _{L^2(0,T;V')}$ for all (t, x).

One can obtain numerical methods of higher order q, by assuming higher regularity of the solution. For example, [22, Theorems 4.2, 4.3, 4.4] assume $u,u',u'' \in {\mathcal {W}}^2(0,T)$, and show the existence of a numerical method $\psi$ that satisfies (3.2) with $q=1$. For a general result dealing with arbitrary regularity $u^{(k+1)} \in {\mathcal {W}}^2(0,T)$ and numerical method of order $q=k$, see [23, Theorem 3.2].

4.2 Lipschitz condition on approximate flow map

Next, we verify the Lipschitz condition (3.3) for $\psi$ given in (4.4), and determine an interval of suitable values for the upper bound $h^*$ on the time step of the implicit Euler scheme. Fix $0<h\le h^*$, $t\in [0,T-h]$, and $u_0,v_0\in V$. Test $w_1:=\psi (h,t,u_0)-\psi (h,t,v_0)\in V$ with $w_0:=u_0-v_0\in H$. Then

$$\begin{aligned} \frac{1}{2h} (\left| w_1\right| _H^2 -\left| w_0\right| _H^2)& \le \left\langle \frac{w_1-w_0}{h} , w_1 \right\rangle _{H} \le - \left\langle {\bar{A}}_{h,t} w_1 , w_1 \right\rangle _{V'\times V} \\& \le - \mu \left| w_1\right| _V^2 + \kappa \left| w_1\right| _H^2. \end{aligned}$$

The first inequality follows from rearranging $0\le \vert w_{1}+w_{0} \vert _{H}^{2}$. The second inequality follows since (4.4) is equivalent to $h^{-1}(\psi (h,t,u_0)-u_0)={\bar{b}}_{h,t}-{\bar{A}}_{h,t} \psi (h,t,u_0)$. The third inequality holds because ${\bar{A}}_{h,t}$ inherits the positivity property (4.3) from A. Using $(2h)^{-1}(\vert w_1 \vert _{H}^{2}-\vert w_0 \vert _{H}^{2})\le -\mu \vert w_1 \vert _{V}^{2}+\kappa \vert w_1 \vert _{H}^{2}$ and the definitions of $w_1$ and $w_0$, we obtain

$$\begin{aligned} \left| \psi (h,t,u_0)-\psi (h,t,v_0)\right| _H^2 \le (1-2h\kappa )^{-1}\left| u_0-v_0\right| _H^2. \end{aligned}$$

If $\kappa \le 0$, then $(1-2h\kappa )^{-1}\le (1+L_\psi h)$ for any $L_\psi >0$ and $h>0$. Therefore, suppose that $\kappa >0$. If the bound above on $\left| \psi (h,t,u_0)-\psi (h,t,v_0)\right| _H^2$ holds for all $0<h\le h^*$, then we must have $h^*<(2\kappa )^{-1}$. In fact, if $h^*<(2\kappa )^{-1}$, then $L_\psi :=[(2\kappa )^{-1}-h^*]^{-1}$ is equivalent to $h^*=\tfrac{L_\psi -2\kappa }{2\kappa L_\psi }$. In this case, $0<h\le h^*$ is equivalent to

$$\begin{aligned} h^2(2\kappa L_\psi )\le h(L_\psi -2\kappa )\Leftrightarrow 1\le (1+L_\psi h)(1-2h\kappa )\Leftrightarrow (1-2h\kappa )^{-1} \le 1+L_\psi h. \end{aligned}$$

Hence, the implicit Euler scheme (4.4) satisfies condition (3.3) in Assumption 3.1.

5 Conclusion

In this paper, we proved strong error bounds for general Orlicz norms for randomised time integration methods applied to operator differential equations, using possibly non-uniform time grids. Our work builds on the ideas and approaches of [10, 20]. We show that the proof techniques of the key error bounds contained therein can be applied in more general settings, where the differential equation is formulated on a possibly infinite-dimensional Banach or Hilbert space, and the numerical time integration method is applied to a possibly non-uniform time grid. Our work has two additional novel aspects relative to [10, 20].

First, we use a different error decomposition to bound the one-step error. Our error decomposition enables us to replace the strong assumption of uniform local truncation error with a weaker assumption on the local truncation error. This is important, because it is known that the strong assumption of uniform local truncation error is invalid even when the linear operator A in the operator differential equation generates an analytic semigroup [37, Theorem 7.1]. For the implicit Euler method, and for a large class of examples that includes the standard heat equation, we showed that our weaker local truncation error assumption is reasonable.

Second, we consider more general Orlicz norms instead of $L^R$-norms. Previous results concerning higher-order error bounds — for example, [20, Theorem 3.5]—were less direct: they involved finding bounds on the $L^R$ error for each $R\in \mathbb {N}$ and using the series expansion of the exponential function. The use of Orlicz norms leads to shorter and conceptually simpler proofs of our main results, Theorem 2.8 and Theorem 3.7, by exploiting the fact that the random approximating sequence $(U_k)_k$ inherits the integrability properties of the collection $(\xi _k)_k$.

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel Correction to: Analysis of generalised alternating local discontinuous Galerkin method on layer-adapted mesh for singularly perturbed problems

Nächster Artikel Optimal three-stage implicit exponentially-fitted RKN methods for solving second-order ODEs

Taylor expansion in Banach Spaces

The following version of Taylor’s theorem in Banach spaces is given in [7, Theorem 7.9-1].

Theorem A.1

Let V and W be normed vector spaces, let U be an open subset of V, let $[a,a+h]$ be a closed segment contained in U, let $f :U \rightarrow W$ be a given mapping, and let $m\in {\mathbb {N}}$.

(a)

(Taylor-Young) If f is $(m-1)$ times differentiable on U and m times differentiable at $a \in U$, then

$$\begin{aligned} f(a+h) = f(a) + f'(a) h + \cdots + \frac{1}{m!} f^{(m)} (a) h^m + \left\| h \right\| _V^m \delta (h) \end{aligned}$$

with $\lim _{h\rightarrow 0} \delta (h) = 0$.

(b)

(Integral remainder) If W is a Banach space and f is m times continuously differentiable on U, then

$$\begin{aligned} f(a+h)&= f(a) + f'(a) h + \cdots + \frac{1}{(m-1)!} f^{(m-1)} (a) h^{m-1}\\&\quad + \frac{1}{(m-1)!} \int _0^1 (1-t)^{m-1} \left( f^{(m)}(a+th) h^m \right) \,\mathrm {d}t. \end{aligned}$$

The noteworthy differences to the standard Taylor theorem in ${\mathbb {R}}$ are: 1) differentiability of f may be slightly more complicated; 2) the k-th derivative of f is a k-linear mapping from $\varPi _{i=1}^k U$ to W, taking k inputs from U, denoted by $h^k = (h,\ldots , h) \in U^k$.

In addition, one can derive a Taylor expansion almost everywhere for weakly differentiable functions.

Additional material for Section 2

In the setting of time integration for ODEs, one usually requires higher regularity of f or equivalently higher regularity for the solution u in order to achieve an order of $q\ge 1$ for the truncation error [16, Section III.2, Theorem 2.4]. The purpose of this section is to show that the same ideas apply in the infinite-dimensional Banach space setting. Below, the function f refers to the vector field in (1.1).

For the next lemma, we consider for a general explicit Euler one-step method the map $\psi :[0,h]\times {\mathbb {R}} \times V \rightarrow V$ in a Banach space $(V,\left| \cdot \right| _V)$, associated to some step function $\varUpsilon$:

$$\begin{aligned} u_{k+1}&:=\psi (h_k,t_k,u_k):=u_k + h_k \varUpsilon (h_k,t_k,u_k) \\&= u_k +h_k \left( a_1 f(t_k,u_k) + a_2f\left( t_k+b_1 h_k,t_k,u_k + b_2 h_k f(t_k,u_k) \right) \right) \nonumber \end{aligned}$$

(B.1)

with $a_1,a_2,b_1,b_2 \ge 0$.

Lemma B.1

(Lipschitz property of the numerical flow map) Let $\psi$ be as in (B.1). If $f :[0,T] \times V \rightarrow V$ is Lipschitz continuous in the second argument, then the approximate flow map $\psi (h_k,t_k,\cdot )$ is Lipschitz continuous, uniformly in k.

Proof

We use the Lipschitz property of f multiple times to get

$$\begin{aligned} \left| \psi (\tau ,t,u) - \psi (\tau ,t,v) \right| _V \le (1+ L \tau (a_1 L + a_2 L + a_2b_2 L^2 \tau )) \left| u - v\right| _V \; . \end{aligned}$$

(B.2)

$\square$

The following theorem is taken from [30, Theorem 7.1.5]. The proof there is only done in one dimension.

Theorem B.2

Assuming that the right-hand side f belongs to $C^2([0,T]\times V;V)$ and equivalently that the solution u is $C^3$, it follows that any explicit one-step scheme with step function $\varUpsilon$ as in (B.1) with

$$\begin{aligned} a_1+a_2 = 1 \, , \, a_2 b_1 = \frac{1}{2} \, , \, a_2 b_2 = \frac{1}{2} \end{aligned}$$

satisfies assumption 2.2 with $q = 2$.

Proof

We assume that $f \in C^2([0,T]\times V;V)$ in order to be able to use Taylor expansion. We first expand the step function $\varUpsilon$:

$$\begin{aligned} \varUpsilon (h,t,u(t)) = \left[ (a_1+a_2) f(t,u(t)) + h \left( a_2 b_1 \frac{\partial f}{\partial t} (t,u(t)) + a_2 b_2 \frac{\partial f}{\partial u } f(t,u(t)) \right) \right] + {\mathcal {O}} (h^2). \end{aligned}$$

Above, $\tfrac{\partial f}{\partial u}$ refers to the Gateaux derivative of f with respect to u, and $\tfrac{\partial f}{\partial u } f (t,u(t))$ denotes the linear mapping $\tfrac{\partial f}{\partial u} (t,u(t))$ from V to V acting on $f(t,u(t)) \in V$. Expanding u, we have

$$\begin{aligned} u(t+h)&= u(t) + u'(t) h + u''(t) \frac{h^2}{2} + {\mathcal {O}}(h^3)\\&= u(t) + \left[ h f(t,u(t)) + \frac{h^2}{2} \left( \frac{\partial f}{\partial t} (t, u(t)) + \frac{\partial f}{\partial u} (t,u(t)) f(t,u(t)) \right) \right] + {\mathcal {O}} (h^3)\\&= u(t) +\left[ h \varUpsilon (h,t,u(t)) + {\mathcal {O}}(h^3) \right] + {\mathcal {O}}(h^3) \; . \end{aligned}$$

The last equality follows from the conditions on the coefficients $a_1,b_1,a_2,b_2$. This gives consistency of order 2:

$$\begin{aligned} u (t+h) - (u (t) + h \varUpsilon (h,t,u(t)) ) = \varphi (h,t,u(t)) - \psi (h,t,u(t)) = {\mathcal {O}}(h^3). \end{aligned}$$

Convergence follows from the Lipschitz continuity of f with respect to the second argument [30, Theorem 7.10]. $\square$

Discrete Gronwall inequalities

The following statement is given in [26, Lemma 1.6].

Lemma C.1

Let $T>0$ be fixed. Let $N\in \mathbb {N}$ and $h=T/N$. Suppose $(y_k)_k\in [0,\infty )^{\mathbb {N}_0}$ is such that for some $A,B\ge 0$ and $p\ge 1$,

$$\begin{aligned} y_{k+1}\le (1+Ah)y_k+Bh^p,\quad k\in [N-1]_0. \end{aligned}$$

Then

$$\begin{aligned} y_k\le e^{AT}y_0+\tfrac{B}{A}(e^{AT}-1)h^{p-1},\quad k\in [N-1]_0 \end{aligned}$$

where for $A=0$, $A^{-1}(e^{AT}-1)=0$.

We restate the “special Gronwall inequality” of [18].

Proposition C.2

Let $(y_n)_n,(g_n)_n\in [0,\infty )^{\mathbb {N}_0}$, $c\ge 0$, and $N\in \mathbb {N}$ be arbitrary. If

$$\begin{aligned} y_{k+1}\le c+\sum _{0\le j\le k}g_j y_j,\quad k\in [N-1]_0 \end{aligned}$$

then

$$\begin{aligned} y_{k+1}\le c\exp \left( \sum _{0\le j\le k}g_j\right) ,\quad k\in [N-1]_0. \end{aligned}$$

The following lemma is a corollary of Proposition C.2.

Lemma C.3

Let $T>0$ and $N\in \mathbb {N}$ be fixed. Let $(h_k)_k,(y_k)_k,(b_k)_k\in [0,\infty )^{\mathbb {N}_0}$ be such that for some $A\ge 0$,

$$\begin{aligned} y_{k+1}\le (1+Ah_k)y_k+b_k,\quad k\in [N-1]_0. \end{aligned}$$

Then

$$\begin{aligned} y_{k+1}\le \left( y_0+\sum _{ \ell \in [N-1]_0}b_\ell \right) \exp \left( \sum _{0\le j\le k} Ah_j\right) ,\quad k\in [N-1]_0. \end{aligned}$$

Proof

Rewriting the upper bound on $y_{k+1}$ yields

$$\begin{aligned} y_{j+1}-y_j\le Ah_j y_j +b_j,\quad j\in [N-1]_0. \end{aligned}$$

Summing the differences from $j=0$ to $j=k\in [N-1]_0$ yields

$$\begin{aligned} y_{k+1}\le y_0+\sum _{0\le j\le k} \left( Ah_j y_j+b_j\right) \le \left( y_0+\sum _{0\le \ell \le N-1}b_\ell \right) +\sum _{0\le j\le k} Ah_j y_j,\quad k\in [N-1]_0. \end{aligned}$$

Applying Proposition C.2 completes the proof. $\square$

Proofs for Section 3.1

Proof of Lemma 3.5

Lemma 3.5 states that the stochastic process $(M_k)_{k\in [N-1]_0}$ defined by (3.9),

$$\begin{aligned} M_k:=\sum _{j=0}^{k}\left\langle \varphi (h_j,t_j,u(t_j))-\psi (h_j,t_j,U_j),\xi _j(h_j) \right\rangle _H, \end{aligned}$$

is a $\mathbb {R}$-valued, square-integrable martingale with respect to the filtration $({\mathcal {F}}_k)_{k\in [N-1]_0}$ generated by the $(\xi _k(h_k))_{k\in [N-1]_0}$, and that there exists a universal constant $\kappa >0$ such that for every $k\in [N-1]_0$, the bound (3.10) holds:

$$\begin{aligned} \mathbb {E}\left[ \max _{j\in [k]_0}\left| M_k\right| \right] \le \left\| C_{\varphi ,\psi } \right\| _{\infty }^2h^{2q+1}+\frac{1}{4}\mathbb {E}\left[ \max _{j\in [k]_0}\vert e_j \vert ^2_H\right] +\kappa ^2(1+L'_\psi ) TC_\xi ^2 h^{2p+1}. \end{aligned}$$

Proof of Lemma 3.5

For $(M_k)_k$ to satisfy the definition of a martingale, we must show that for every k, $M_k\in L^1(\varOmega ;\mathbb {R})$ and is ${\mathcal {F}}_k$-measurable, and that the martingale property $\mathbb {E}[M_{k+1}-M_{k}\vert {\mathcal {F}}_k]=0$ holds for $k\in [N-2]_0$. The measurability of $M_k$ with respect to ${\mathcal {F}}_k$ follows from the definition of ${\mathcal {F}}_k$ and Lemma 3.3. By the triangle inequality, the Cauchy–Schwarz inequality, (3.7), and Assumption 3.2,

$$\begin{aligned} \left\| M_k \right\| _{L^2(\varOmega ;\mathbb {R})}\le&\sum _{j=0}^{k}\left\| \varphi (h_j,t_j,u(t_j))-\psi (h_j,t_j,U_j) \right\| _{L^2(\varOmega ;H)}^2\left\| \xi _j(h_j) \right\| _{L^2(\varOmega ;H)}^2 \\ \le&\sum _{j=0}^{k}\left( 3\left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+L'_\psi h_k)^2\left\| e_k \right\| _{L^2(\varOmega ;H)}^2\right) C_\xi ^2 h^{2p+2}. \end{aligned}$$

By Lemma 3.4, $e_k\in L^2(\varOmega ;H)$ for every k, and thus $M_k\in L^2(\varOmega ;\mathbb {R})\subset L^1(\varOmega ;\mathbb {R})$. Hence, $(M_k)_k$ is a square-integrable martingale.

Next, we prove the martingale property. By Lemma 3.3, $\xi _{k+1}(h_{k+1})$ is independent of $U_{k+1}$ and ${\mathcal {F}}_k$. By the definition (3.9) of $M_k$, the tower property of conditional expectation, and the centredness of the $(\xi _k(h_k))_k$,

$$\begin{aligned} \mathbb {E}[\left\langle \varphi (h_{k+1},u(t_{k+1}))-\psi (h_{k+1},U_{k+1}), \xi _{k+1}(h_{k+1}) \right\rangle _H]=0, \end{aligned}$$

and thus $(M_k)_k$ is a $({\mathcal {F}}_k)_k$-martingale.

Finally, we prove the second statement. Since $(M_k)_k$ is a square integrable martingale, the Burkholder–Davis–Gundy inequality ensures that for every $k\in [N-1]_0$

$$\begin{aligned} \mathbb {E}\left[ \max _{j\in [k]_0}\left| M_j\right| \right] \le \kappa \mathbb {E}\left[ \left\langle M \right\rangle _{k}^{1/2}\right] \end{aligned}$$

where $\kappa >0$ is the same universal constant appearing in (3.10). The quadratic variation process $\left\langle M \right\rangle$ is defined by $\left\langle M \right\rangle _0:=0$ and $\left\langle M \right\rangle _{k}:=\sum _{j\in [k]}\mathbb {E}[(M_j-M_{j-1})^2\vert {\mathcal {F}}_{j-1}]$ for $k\in [N-1]$, see e.g. [32, Chapter I, Definition 2.3]. Using (3.9), the measurability of $U_j$ with respect to ${\mathcal {F}}_{j-1}$ (cf. Lemma 3.3), the Cauchy–Schwarz inequality, and (3.7),

$$\begin{aligned} \left\langle M \right\rangle _{k}\le&\sum _{j\in [k]} \mathbb {E}\left[ \left\langle \varphi (h_j,t_j,u(t_j))-\psi (h_j,t_j,U_j),\xi _j(h_j) \right\rangle ^2_H \Big \vert {\mathcal {F}}_{j-1}\right] \\ \le&\max _{j\in [k]}\left| \varphi (h_j,t_j,u(t_j))-\psi (h_j,t_j,U_j)\right| _H^2\sum _{j\in [k]} \mathbb {E}\left[ \left| \xi _j(h_j)\right| ^2_H \Big \vert {\mathcal {F}}_{j-1}\right] \\ \le&\max _{j\in [k]}\left( 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{j}^{2q+1}+(1+L'_\psi h_j)\left| e_j\right| _H^2\right) \sum _{j\in [k]} \mathbb {E}\left[ \left| \xi _j(h_j)\right| ^2_H \Big \vert {\mathcal {F}}_{j-1}\right] . \end{aligned}$$

The hypothesis that $0<h\le 1$ was used to obtain (3.7). Using Young’s inequality with $s,s'>1$ such that $s^{-1}+(s')^{-1}=1$

$$\begin{aligned} ab\le \frac{\delta }{s} a^{s}+\frac{1}{\delta ^{s'/s} s'}b^{s'} \end{aligned}$$

with $s=2$ and $\delta =2\kappa (1+L'_\psi h)$, and using $h_k\le h\le 1$,

$$\begin{aligned} \left\langle M \right\rangle _{k}^{1/2}&\le \frac{1}{\kappa (4+4L'_\psi h)}\max _{j\in [k]}\left( 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{j}^{2q+1}+(1+L'_\psi h_j)\left| e_j\right| _H^2\right) \\&\quad + \kappa (1+L'_\psi h)\sum _{j\in [k]} \mathbb {E}\left[ \left| \xi _j(h_j)\right| ^2_H \Big \vert {\mathcal {F}}_{j-1}\right] \\& \le \frac{\left\| C_{\varphi ,\psi } \right\| _{\infty }^2}{\kappa }h^{2q+1}+\frac{1}{4\kappa }\max _{j\in [k]_0}\vert e_j \vert ^2_H+\kappa (1+L'_\psi ) \sum _{j\in [N-1]_0} \mathbb {E}\left[ \left| \xi _j(h_j)\right| ^2_H \Big \vert {\mathcal {F}}_{j-1}\right] . \end{aligned}$$

By taking expectations, the tower property removes the conditioning on ${\mathcal {F}}_{j-1}$ in each summand. Using (1.4) and the Burkholder–Davis–Gundy inequality completes the proof. $\square$

Proof of Proposition 3.6

Proposition 3.6 states the error bound

$$\begin{aligned}&\left\| \max _{k\in [N]_0}\left| e_k\right| _H \right\| _{2}^2 \\&\quad \le 2\left( \left\| e_0 \right\| ^2_{2}+4\left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h^{2q}T+ C_\xi ^2 Th^{2p+1} (1+\kappa ^2(1+L'_\psi ))\right) \exp \left( 2L'_\psi T\right) \end{aligned}$$

for the same universal constant $\kappa$ in (3.10).

Proof of Proposition 3.6

Since $0<h\le 1$, we may use (3.8) to obtain

$$\begin{aligned} \vert e_{k+1} \vert ^2_H-\vert e_k \vert _H^2& \le L'_\psi h_k \vert e_k \vert _H^2+3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+\left| \xi _k(h_k)\right| _H^2 \\&\quad +2\left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k), \xi _k(h_k) \right\rangle _H. \end{aligned}$$

Using that $\sum _{j\in [k+1]_0}(\vert e_{k+1} \vert ^2_H-\vert e_k \vert _H^2)=\vert e_{k+1} \vert _H^2-\vert e_0 \vert _H^2$, and using the definition (3.9) of the martingale $(M_k)_k$, we obtain

$$\begin{aligned} \vert e_{k+1} \vert _H^2& \le \vert e_0 \vert _H^2+\sum _{j\in [N-1]_0}\left( 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{j}^{2q+1}+ \left| \xi _j(h_j)\right| _H^2\right) \\&\quad+2 M_k+L'_\psi \sum _{j\in [k]_0}h_j\vert e_j \vert _H^2. \end{aligned}$$

Since only $M_k$ can attain negative values, the above bound implies

$$\begin{aligned} \max _{j\in [k+1]_0}\vert e_j \vert _H^2& \le \vert e_0 \vert _H^2+\sum _{j\in [N-1]_0}\left( 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{j}^{2q+1}+ \left| \xi _j(h_j)\right| _H^2\right) \\&\quad+2\max _{j\in [k]_0}\vert M_j \vert +L'_\psi \sum _{j\in [k]_0}h_j\max _{\ell \in [j]_0}\vert e_\ell \vert _H^2. \end{aligned}$$

Take expectations, apply Assumption 3.2, apply the bound (3.10) from Lemma 3.5, and use that $\mathbb {E}[\max _{j\in [k]_0}\vert e_j \vert _H^2] \le \mathbb {E}[\max _{j\in [k+1]_0}\vert e_j \vert _H^2]$ to obtain

$$\begin{aligned} \mathbb {E}\left[ \max _{j\in [k+1]_0}\vert e_j \vert _H^2\right]& \le \mathbb {E}\left[ \vert e_0 \vert _H^2\right] +3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 T h^{2q}+ C_\xi ^2 Th^{2p+1} \\&\quad+2\mathbb {E}\left[ \max _{j\in [k]_0}\vert M_j \vert \right] +L'_\psi \sum _{j\in [k]_0}h_j\mathbb {E}\left[ \max _{\ell \in [j]_0}\vert e_\ell \vert _H^2\right] \\& \le \mathbb {E}\left[ \vert e_0 \vert _H^2\right] +4\left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h^{2q}T+ C_\xi ^2 Th^{2p+1}(1+\kappa ^2(1+L'_\psi )) \\&\quad+\frac{1}{2}\mathbb {E}\left[ \max _{j\in [k+1]_0}\vert e_j \vert _H^2\right] +L'_\psi \sum _{j\in [k]_0}h_j\mathbb {E}\left[ \max _{\ell \in [j]_0}\vert e_\ell \vert _H^2\right] . \end{aligned}$$

Subtracting $\tfrac{1}{2}\mathbb {E}[\max _{j\in [k+1]_0}\vert e_j \vert _H^2]$ from both sides and applying the discrete Gronwall inequality in Proposition C.2 completes the proof. $\square$

See [3, Chapter 8] for a general introduction to Orlicz spaces and norms.

The cited results assume centredness of $\mu$, but do not require this property.

Abdulle, A., Garegnani, G.: Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration. Stat. Comput. 30(4), 907–932 (2020). https://doi.org/10.1007/s11222-020-09926-wMathSciNetCrossRefMATH

Abdulle, A., Garegnani, G.: A probabilistic finite element method based on random meshes: error estimators and Bayesian inverse problems. Comput. Methods Appl. Mech. Eng. 384, 113961 (2021). https://doi.org/10.1016/j.cma.2021.113961MathSciNetCrossRefMATH

Adams, R.A., Fournier, J.J.F.: Sobolev spaces, Pure and Applied Mathematics (Amsterdam), vol. 140, second edn. Elsevier/Academic Press, Amsterdam (2003)

Bogachev, V.I.: Gaussian Measures, Mathematical Surveys and Monographs, vol. 62. American Mathematical Society, Providence, RI (1998). https://doi.org/10.1090/surv/062

Chkrebtii, O.A., Campbell, D.A.: Adaptive step-size selection for state-space probabilistic differential equation solvers. Stat. Comput. 29(6), 1285–1295 (2019). https://doi.org/10.1007/s11222-019-09899-5MathSciNetCrossRefMATH

Chkrebtii, O.A., Campbell, D.A., Calderhead, B., Girolami, M.A.: Bayesian solution uncertainty quantification for differential equations. Bayesian Anal. 11(4), 1239–1267 (2016). https://doi.org/10.1214/16-BA1017MathSciNetCrossRefMATH

Ciarlet, P.G.: Linear and Nonlinear Functional Analysis with Applications, vol. 130. Society for Industrial and Applied Mathematics, Philadelphia (2013)MATH

Cockayne, J., Oates, C., Sullivan, T.J., Girolami, M.: Probabilistic numerical methods for PDE-constrained Bayesian inverse problems. In: G. Verdoolaege (ed.) Proceedings of the 36^th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AIP Conference Proceedings, vol. 1853, pp. 060001–1–060001–8 (2017). https://doi.org/10.1063/1.4985359

Cockayne, J., Oates, C.J., Sullivan, T.J., Girolami, M.: Bayesian probabilistic numerical methods. SIAM Rev. 61(4), 756–789 (2019). https://doi.org/10.1137/17M1139357MathSciNetCrossRefMATH

10.

Conrad, P.R., Girolami, M., Särkkä, S., Stuart, A.M., Zygalakis, K.C.: Statistical analysis of differential equations: introducing probability measures on numerical solutions. Stat. Comput. 27(4), 1065–1082 (2017). https://doi.org/10.1007/s11222-016-9671-0MathSciNetCrossRefMATH

11.

Dupont, M., Enßlin, T.: Consistency and convergence of simulation schemes in information field dynamics. Phys. Rev. E 98, 043307 (2018). https://doi.org/10.1103/PhysRevE.98.043307MathSciNetCrossRef

12.

Emmrich, E.: Gewöhnliche und Operator-Differentialgleichungen. Eine integrierte Einführung in Randwertprobleme und Evolutionsgleichungen für Studierende. Wiesbaden: Vieweg (2004). https://doi.org/10.1007/978-3-322-80240-8

13.

Emmrich, E., Weckner, O.: Analysis and numerical approximation of an integro-differential equation modeling non-local effects in linear elasticity. Math. Mech. Solids 12(4), 363–384 (2007). https://doi.org/10.1177/1081286505059748MathSciNetCrossRefMATH

14.

Enßlin, T.A.: Information field dynamics for simulation scheme construction. Phys. Rev. E 87, 013308 (2013). https://doi.org/10.1103/PhysRevE.87.013308CrossRef

15.

Garegnani, G.: Sampling methods for Bayesian inference involving convergent noisy approximations of forward maps (2021). ArXiv:2111.03491

16.

Hairer, E., Nørsett, S.P., Wanner, G.: Solving ordinary differential equations. I, Springer Series in Computational Mathematics, vol. 8, second edn. Springer-Verlag, Berlin (1993). Nonstiff problems

17.

Hennig, P., Osborne, M.A., Girolami, M.: Probabilistic numerics and uncertainty in computations. P. Roy. Soc. Lond. A Mat. 471(2179), 20150142 (2015). https://doi.org/10.1098/rspa.2015.0142MathSciNetCrossRefMATH

18.

Holte, J.M.: Discrete Gronwall lemma and applications (2009). http://homepages.gac.edu/~holte/publications/GronwallLemma.pdf. Accessed 14-12-2021

19.

Kersting, H., Sullivan, T.J., Hennig, P.: Convergence rates of Gaussian ODE filters. Stat. Comput. 30(6), 1791–1816 (2020). https://doi.org/10.1007/s11222-020-09972-4MathSciNetCrossRefMATH

20.

Lie, H.C., Stuart, A.M., Sullivan, T.J.: Strong convergence rates of probabilistic integrators for ordinary differential equations. Stat. Comput. 29(6), 1265–1283 (2019). https://doi.org/10.1007/s11222-019-09898-6MathSciNetCrossRefMATH

21.

Lie, H.C., Sullivan, T.J., Teckentrup, A.L.: Random forward models and log-likelihoods in Bayesian inverse problems. SIAM/ASA J. Uncertain. Quantif. 6(4), 1600–1629 (2018). https://doi.org/10.1137/18M1166523MathSciNetCrossRefMATH

22.

Lubich, C., Ostermann, A.: Linearly implicit time discretization of non-linear parabolic equations. IMA J. Numer. Anal. 15(4), 555–583 (1995). https://doi.org/10.1093/imanum/15.4.555MathSciNetCrossRefMATH

23.

Lubich, C., Ostermann, A.: Runge-Kutta approximation of quasi-linear parabolic equations. Math. Comp. 64(210), 601–627 (1995). https://doi.org/10.2307/2153442MathSciNetCrossRefMATH

24.

Matsuda, T., Miyatake, Y.: Estimation of ordinary differential equation models with discretization error quantification. SIAM/ASA J. Uncertain. Quantif. 9(1), 302–331 (2021). https://doi.org/10.1137/19M1278405MathSciNetCrossRefMATH

25.

Meinlschmidt, H., Meyer, C., Walther, S.: Optimal control of an abstract evolution variational inequality with application to homogenized plasticity. J. Nonsmooth Anal. Optim. 1, 1–41 (2020). https://doi.org/10.46298/jnsao-2020-5800

26.

Milstein, G.N., Tretyakov, M.V.: Stochastic numerics for mathematical physics. Scientific Computation. Springer-Verlag, Berlin (2004). https://doi.org/10.1007/978-3-662-10063-9

27.

Oates, C.J., Sullivan, T.J.: A modern retrospective on probabilistic numerics. Stat. Comput. 29(6), 1335–1351 (2019). https://doi.org/10.1007/s11222-019-09902-zMathSciNetCrossRefMATH

28.

Owhadi, H.: Bayesian numerical homogenization. Multiscale Model. Simul. 13(3), 812–828 (2015). https://doi.org/10.1137/140974596MathSciNetCrossRefMATH

29.

Owhadi, H., Zhang, L.: Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients. J. Comput. Phys. 347, 99–128 (2017). https://doi.org/10.1016/j.jcp.2017.06.037MathSciNetCrossRefMATH

30.

Plato, R.: Numerische Mathematik kompakt. Grundlagenwissen für Studium und Praxis. Wiesbaden: Vieweg (2004). https://doi.org/10.1007/978-3-8348-9644-5

31.

Raissi, M., Perdikaris, P., Karniadakis, G.E.: Numerical Gaussian processes for time-dependent and nonlinear partial differential equations. SIAM J. Sci. Comput. 40(1), A172–A198 (2018). https://doi.org/10.1137/17M1120762MathSciNetCrossRefMATH

32.

Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion, Grundlehren der mathematischen Wissenschaften, vol. 293, third edn. Springer-Verlag, Berlin (2009). Corrected Third Printing

33.

Schober, M., Särkkä, S., Hennig, P.: A probabilistic model for the numerical solution of initial value problems. Stat. Comput. 29(1), 99–122 (2019). https://doi.org/10.1007/s11222-017-9798-7MathSciNetCrossRefMATH

34.

Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010). https://doi.org/10.1017/S0962492910000061MathSciNetCrossRefMATH

35.

Teymur, O., Lie, H.C., Sullivan, T.J., Calderhead, B.: Implicit probabilistic integrators for ODEs. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (eds.) Advances in Neural Information Processing Systems 31 (NIPS 2018), vol. 31. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7955-implicit-probabilistic-integrators-for-odes

36.

Teymur, O., Zygalakis, K., Calderhead, B.: Probabilistic Linear Multistep Methods. In: D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (eds.) Advances in Neural Information Processing Systems 29, pp. 4321–4328. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6356-probabilistic-linear-multistep-methods.pdf

37.

Thomée, V.: Galerkin finite element methods for parabolic problems. Berlin: Springer (2006). https://doi.org/10.1007/3-540-33122-0

38.

Tronarp, F., Kersting, H., Särkkä, S., Hennig, P.: Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective. Stat. Comput. 29(6), 1297–1315 (2019). https://doi.org/10.1007/s11222-019-09900-1MathSciNetCrossRefMATH

39.

Wang, J., Cockayne, J., Chkrebtii, O., Sullivan, T.J., Oates, C.: Bayesian numerical methods for nonlinear partial differential equations. Stat. Comput. 31(5), no. 55, 20pp. (2021). https://doi.org/10.1007/s11222-021-10030-w

40.

Wang, J., Cockayne, J., Oates, C.: A role for symmetry in the Bayesian solution of differential equations. Bayesian Anal. 15(4), 1057–1085 (2020). https://doi.org/10.1214/19-BA1183MathSciNetCrossRefMATH

41.

Zeidler, E.: Nonlinear Functional Analysis and its Applications. II/A. Springer-Verlag, New York (1990). Linear Monotone Operators, Translated from the German by the author and Leo F. Boron. https://doi.org/10.1007/978-1-4612-0985-0

Titel: Randomised one-step time integration methods for deterministic operator differential equations
verfasst von: Han Cheng Lie
Martin Stahn
T. J. Sullivan
Publikationsdatum: 01.03.2022
Verlag: Springer International Publishing
Erschienen in: Calcolo / Ausgabe 1/2022
Print ISSN: 0008-0624
Elektronische ISSN: 1126-5434
DOI: https://doi.org/10.1007/s10092-022-00457-6

Springer Professional

Randomised one-step time integration methods for deterministic operator differential equations

Abstract

Publisher's Note

1 Introduction

1.1 Illustration of overconfidence phenomenon

1.2 Main contributions

1.4 Overview

1.5 Notation and setup

2 Classical setting

2.1 Randomisation sequence

2.2 Error bounds

3 Variational setting

3.1 \(L^2\)-error bounds for independent and centred randomisation

3.2 Error bounds of higher integrability order for independent and centred randomisation

3.3 Error bounds of higher integrability order without independence or centredness assumptions

4 Example: heat equation

4.1 Local truncation error condition

4.2 Lipschitz condition on approximate flow map

5 Conclusion

Declarations

Conflict of interest

Publisher's Note

Taylor expansion in Banach Spaces

Additional material for Section 2

Discrete Gronwall inequalities

Proofs for Section 3.1

Proof of Lemma 3.5

Proof of Proposition 3.6

Springer Professional

Abstract

Publisher's Note

1 Introduction

1.1 Illustration of overconfidence phenomenon

1.2 Main contributions

1.3 Related work

1.4 Overview

1.5 Notation and setup

2 Classical setting

2.1 Randomisation sequence

2.2 Error bounds

3 Variational setting

3.1 \(L^2\)-error bounds for independent and centred randomisation

3.2 Error bounds of higher integrability order for independent and centred randomisation

3.3 Error bounds of higher integrability order without independence or centredness assumptions

4 Example: heat equation

4.1 Local truncation error condition

4.2 Lipschitz condition on approximate flow map

5 Conclusion

Declarations

Conflict of interest

Publisher's Note

Taylor expansion in Banach Spaces

Additional material for Section 2

Discrete Gronwall inequalities

Proofs for Section 3.1

Proof of Lemma 3.5

Proof of Proposition 3.6

Weitere Artikel der Ausgabe 1/2022

A proximal point like method for solving tensor least-squares problems

PVTSI: A novel approach to computation of Hadamard finite parts of nonperiodic singular integrals

Optimal three-stage implicit exponentially-fitted RKN methods for solving second-order ODEs

A mass and energy conservative fourth-order compact difference scheme for the Klein-Gordon-Dirac equations

Efficient and accurate computation for the -functions arising from exponential integrators

Local minimizers of the Crouzeix ratio: a nonsmooth optimization case study