Top

Finance and Stochastics

Published in:

Open Access 04-06-2020

Adapted Wasserstein distances and stability in mathematical finance

Authors: Julio Backhoff-Veraguas, Daniel Bartl, Mathias Beiglböck, Manu Eder

Published in: Finance and Stochastics | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Assume that an agent models a financial asset through a measure ℚ with the goal to price/hedge some derivative or optimise some expected utility. Even if the model ℚ is chosen in the most skilful and sophisticated way, the agent is left with the possibility that ℚ does not provide an exact description of reality. This leads us to the following question: will the hedge still be somewhat meaningful for models in the proximity of ℚ?

If we measure proximity with the usual Wasserstein distance (say), the answer is No. Models which are similar with respect to the Wasserstein distance may provide dramatically different information on which to base a hedging strategy.

Remarkably, this can be overcome by considering a suitable adapted version of the Wasserstein distance which takes the temporal structure of pricing models into account. This adapted Wasserstein distance is most closely related to the nested distance as pioneered by Pflug and Pichler (SIAM J. Optim. 20:1406–1420, 2009, SIAM J. Optim. 22:1–23, 2012, Multistage Stochastic Optimization, 2014). It allows us to establish Lipschitz properties of hedging strategies for semimartingale models in discrete and continuous time. Notably, these abstract results are sharp already for Brownian motion and European call options.

J. Backhoff gratefully acknowledges financial support by the FWF through grant P30750 and by the Vienna University of Technology. D. Bartl has been funded by the Austrian Science Fund (FWF) under Project P28661. M. Beiglböck and M. Eder gratefully acknowledge financial support by the FWF through grant Y782.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

1.1 Outline

Assume that a reference measure ℙ is used to model the evolution of a financial asset $X$ with the purpose to hedge a financial claim or to maximise some expected utility. We do not expect that the model ℙ captures reality in an absolutely accurate way. However, supposing that ℙ is close enough to reality (described by a probability ℚ), we still hope that a strategy which is developed for ℙ leads to reasonable results.

A main goal of this paper is to establish this intuitive idea rigorously based on a new notion of adapted Wasserstein distance$\mathcal{AW}_{p}$ between semimartingale measures. To fix ideas, we provide a first example of the results we are after.

Theorem 1.1

Let${\mathbb{P}}, {\mathbb{Q}}$be continuous semimartingale models for the asset price process$X$, and assume that$C(X)$denotes an$L$-Lipschitz payoff of a (path-dependent) derivative$C$. Assume that a predictable trading strategy$H=(H_{t})$, $|H| \leq k$, and an initial endowment$m\in {\mathbb{R}}$constitute a ℙ-superhedge of$C(X)$, i.e.,

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equa_HTML.png

Then there is a predictable$G$such that$m,G$constitute an “almost” ℚ-superhedge in the sense that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equ1_HTML.png

(1.1)

While the adapted Wasserstein distance will be defined in abstract terms (see (1.3)), it relates directly to the model parameters for “simple” models. In particular, if ${\mathbb{P}}, {\mathbb{Q}}$ are Brownian models with different volatilities, then the distance between these models is just the difference of the volatilities. Moreover, the bound in (1.1) (as well as further Lipschitz bounds given below) are already sharp in such a simple setting and for $C$ a European call option.

Below we provide a number of results with a similar flavour as Theorem 1.1. For example, we provide versions where the hedging error is controlled in terms of risk measures, and we show that a Lipschitz bound of the type (1.1) applies (with bigger constants) if the same trading strategy $H$ is applied in the model ℙ as well as in the model ℚ. Importantly, we establish that comparable results of Lipschitz-continuity apply to utility maximisation and utility indifference pricing.

We emphasise that familiar concepts such as the Lévy–Prokhorov metric or the usual Wasserstein distance do not appear suitable to derive results comparable to Theorem 1.1. For example, in the vicinity of financially meaningful models, there are models with arbitrarily high arbitrage even for bounded strategies; similar phenomena appear with respect to completeness/incompleteness. Instead, we introduce an adapted Wasserstein distance $\mathcal{AW}_{p}$ which takes the temporal structure of semimartingale models into account. These distances are conceptually closely related to the nested distance as pioneered by Pflug and Pichler [47, 48, 49]; see Acciaio et al. [1], Glanzer et al. [26], Bion-Nadal and Talay [18] for first articles which link such a type of distance to finance. We describe these contributions more closely in Sect. 2 below.

1.2 Notation and adapted Wasserstein distances

Throughout, we let

$$ \Omega := {\mathbb{R}}^{T}\qquad \text{or}\qquad \Omega := C([0,T]). $$

The first setting is referred to as the discrete-time case, and the second as the continuous-time case.¹ In the first case, we denote by $I=\{1,\dots ,T\}$ the time-index set, and in the second $I=[0,T]$. Throughout the article, we provide definitions and results without specifying which of the two cases we are referring to; this means that the definitions/results apply in both cases. Only occasionally we consider one case specifically, and in such a situation, we state this explicitly.

We interpret $\Omega $ as the set of all possible evolutions (in time) of the one-dimensional asset price. Importantly, mutatis mutandis, all our results (except Propositions 3.3, 3.6 and Example 3.4) remain true for multidimensional asset price processes (corresponding to $\Omega =({\mathbb{R}}^{d})^{T}$ resp. $\Omega =C([0,T]; {\mathbb{R}}^{d})$). We chose to go for the one-dimensional version to simplify notation.

The mappings $X,Y\colon \Omega \to \Omega $ denote the canonical processes (i.e., the identity map), and we make the convention that on $\Omega \times \Omega $, the process $X$ denotes the first coordinate and $Y$ the second one. The spaces $\Omega $ and $\Omega \times \Omega $ are endowed with the maximum norm and the corresponding Borel $\sigma $-field. In continuous time, the space $\Omega $ is endowed with the right-continuous filtration generated by $X$; in discrete time, we use the plain filtration generated by $X$. In any case, we denote this filtration by $\mathbb{F}=({\mathcal{F}}_{t})$ and endow $\Omega \times \Omega $ with the product filtration ${\mathbb{F}}\otimes {\mathbb{F}}$. Given a $\sigma $-field ${\mathcal{G}}$ and a probability ℙ on ${\mathcal{G}}$, we write ${\mathcal{G}}^{\mathbb{P}}$ for the ℙ-completion of ${\mathcal{G}}$. The set $\mathrm{Cpl}({\mathbb{P}}, {\mathbb{Q}})$ of couplings between probability measures ${\mathbb{P}}, {\mathbb{Q}}$ consists of all probability measures $\pi $ on $\Omega \times \Omega $ such that $X(\pi )={\mathbb{P}}$ and $Y(\pi )= {\mathbb{Q}}$. A Monge coupling is a coupling that is of the form $\pi = (\text{Id}, T)({\mathbb{P}})$ for some Borel mapping $T:\Omega \to \Omega $ that transports ℙ to ℚ, i.e., satisfies $T({\mathbb{P}})={\mathbb{Q}}$. Given a metric $d$ on $\Omega $ and $p\geq 1$, the $p$-Wasserstein distance of ${\mathbb{P}}, {\mathbb{Q}}$ is

$$\begin{aligned} \mathcal{W}_{p}({\mathbb{P}}, {\mathbb{Q}})=\inf \{{\mathbb{E}}_{\pi }[d(X,Y)^{p}]^{1/p}: \pi \in \mathrm{Cpl}({\mathbb{P}}, {\mathbb{Q}})\}. \end{aligned}$$

(1.2)

In many cases of practical interest, the infimum in (1.2) remains unchanged if one minimises only over Monge couplings; cf. [50].

Before defining the adapted Wasserstein distance between measures ℙ and ℚ on $\Omega $, let us hint why distances related to weak convergence are not suitable for the results we have in mind. Assume for example that we are interested in a utility maximisation problem in two periods and that Fig. 1 describes the laws ${\mathbb{P}}, {\mathbb{Q}}$ of two traded assets. Clearly, they are very close in the Wasserstein distance, as follows from considering the obvious Monge coupling induced by $T: \Omega \to \Omega , T({\mathbb{P}})= {\mathbb{Q}}$ depicted in Fig. 1. At the same time, the outcome of utility maximisation is certainly very different. Similarly, ℙ is a martingale measure while ℚ allows arbitrage. The clear reason for that is the different structure of information available at time 1.

To exhibit why the Wasserstein distance does not reflect this different structure of information, let us review the transport condition $T({\mathbb{P}})= {\mathbb{Q}}$. We rephrase it as

$$\begin{aligned} \big(T_{1}(X_{1}, X_{2}), T_{2}(X_{1}, X_{2})\big) \overset{(\mathrm{{d}})}{=} (Y_{1}, Y_{2}), \end{aligned}$$

(1.3)

where $\overset{(\mathrm{{d}})}{=}$ stands for equality in law. While this condition is of course perfectly natural in mass transport, (1.3) almost seems like cheating when viewed from a probabilistic perspective: the map $T_{1} $ should not be allowed to consider the future value $X_{2}$ in order to determine $Y_{1}$. To define an adapted version of the Wasserstein distance, the “process” $(T_{i})_{i=1,2}$ should be taken to be adapted in order to account for the different information structures of ℙ and ℚ.

Naturally, our formal definition of adapted Wasserstein distances will not refer to adapted Monge transports, but rather to couplings which are “adapted” in an appropriate sense. Following Lassalle [41], we call such couplings (bi-)causal. Since the definition below may appear a bit technical at first glance, the following may be reassuring: In the discrete-time setting and for measures ℙ absolutely continuous with respect to Lebesgue measure, the weak closure (in the sense of weak convergence of measures) of the set of adapted Monge couplings, i.e., $\pi = (\text{Id}, T)({\mathbb{P}})$ for $T$ adapted, is precisely the set of all causal couplings; see Lacker [38].

Definition 1.2

For a coupling $\pi $ of $\mathbb{P}, \mathbb{Q}\in \mathcal{P}(\Omega )$, let $\pi (d\omega ,d\eta )=\mathbb{P}(d\omega )\pi _{\omega }(d\eta )$ denote a regular disintegration with respect to ℙ. The set $\mathrm{Cpl}_{\mathrm{C}}({\mathbb{P}}, {\mathbb{Q}})$ of causal couplings consists of all $\pi \in \mathrm{Cpl}({\mathbb{P}}, {\mathbb{Q}})$ such that for all $t\in I$ and $A\in \mathcal{F}_{t}$,

$$ \omega \mapsto \pi _{\omega }(A) \text{ is }\mathcal{F}^{{\mathbb{P}}}_{t} \text{-measurable}. $$

The set of all bi-causal couplings $\mathrm{Cpl}_{\mathrm{BC}}({\mathbb{P}}, {\mathbb{Q}})$ consists of all $\pi \in \mathrm{Cpl}_{\mathrm{C}}({\mathbb{P}}, {\mathbb{Q}})$ such that also $S(\pi ) \in \mathrm{Cpl}_{\mathrm{C}}({\mathbb{Q}}, {\mathbb{P}})$, where $S:\Omega \times \Omega \to \Omega \times \Omega , S(\omega ,\eta ):= ( \eta ,\omega )$.

In discrete time, a coupling $\pi $ is causal if and only if

$$\begin{aligned} \pi \big( (Y_{1},\dots ,Y_{t})\in A \big| X\big)& = \pi \big( (Y_{1}, \dots ,Y_{t})\in A \big| X_{1},\dots X_{t} \big) \end{aligned}$$

ℙ-a.s. for every $t$ and Borel set $A\subseteq \mathbb{R}^{t}$, that is, at time $t$, given the past $(X_{1}, \ldots , X_{t}) $ of $X$, the distribution of $Y_{t}$ does not depend on the future $(X_{t+1}, \ldots , X_{N})$ of $X$.

Replacing couplings by bi-causal couplings in (1.2), one arrives at the nested distance as introduced by Pflug and Pichler [46, 47]. Since our goal is to compare also semimartingale models in continuous time, we work with an adapted Wasserstein distance that is defined slightly differently. (Notably, it is straightforward that the two distances are equivalent for probabilities on ${\mathbb{R}}^{N}$. We elaborate in Sect. 3.3 below why the definition in (1.4) is more appropriate for our purposes even in discrete time.)

In continuous time, we denote by $\mathcal{SM}(\Omega )$ the set of all probabilities ℙ on (the Borel $\sigma $-field of) $\Omega $ under which the canonical process $X$ is a (continuous) semimartingale. In discrete time, $\mathcal{SM}(\Omega )$ denotes the set of all Borel probabilities ℙ on $\Omega $ under which $X$ is integrable. In both cases, we can uniquely decompose $X=M+A$, with $A$ a finite variation predictable process starting at zero and $M$ a local martingale. Indeed, in the first case, $X$ is a special semimartingale and $M$ and $A$ can be chosen continuous as well, and in the second case, this is the Doob decomposition of an integrable adapted discrete-time process. For $p\in [1,\infty )$, we denote by $\mathcal{SM}_{p}(\Omega )$ the subset of $\mathcal{SM}(\Omega )$ for which

$$ \mathbb{E}_{\mathbb{P}}\big[[M]_{T}^{p/2} +|A|_{\text{1-var}}^{p}\big]< \infty , $$

where $[\cdot ]$ is the quadratic variation and $|\cdot |_{\text{1-var}}$ the first variation. Note also that by the BDG inequality, ${\mathbb{E}}_{\mathbb{P}}[\sup _{s\leq T} |M_{s}|] < \infty $ for $\mathbb{P} \in \mathcal{SM}_{p}(\Omega )$; hence $M$ is then a true martingale.

Definition 1.3

For $\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$, $p\geq 1$, we define the adapted Wasserstein distance as

$$\begin{aligned} &\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q}) \\ &:=\inf \big\{ \mathbb{E}_{\pi }\big[ [M^{X}-M^{Y}]_{T}^{p/2} + |A^{X}-A^{Y}|_{ 1\text{-var}}^{p}\big]^{1/p} : \pi \in \mathrm{Cpl}_{ \mathrm{BC}}({\mathbb{P}},{\mathbb{Q}})\big\} , \end{aligned}$$

(1.4)

where $X=M^{X}+A^{X}, Y=M^{Y}+A^{Y}$ denote the semimartingale decompositions of $X$ and $Y$, respectively.

It is shown in Lemma 3.1 that $\mathcal{AW}_{p}$ is well defined (i.e., that $X-Y$ is a semimartingale under every bi-causal coupling) and in Lemma 3.2 that $\mathcal{AW}_{p}$ in fact defines a metric.

Remark 1.4

In the continuous-time setup, the adapted Wasserstein distance can also be computed through

$$\begin{aligned} \mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})=\inf \big\{ \mathbb{E}_{\pi }\big[ [X-Y]_{T}^{p/2} + \mbox{MV}_{T}[|X-Y|^{p} ]\big]^{1/p} : \pi \in \mathrm{Cpl}_{\mathrm{BC}}({\mathbb{P}},{\mathbb{Q}})\big\} . \end{aligned}$$

Here $\mbox{MV}$ denotes the mean variation, i.e.,

$$ \mbox{MV}_{T} [Z]= \sup _{\Delta }\sum _{t_{j}\in \Delta }| {\mathbb{E}}_{\pi }[Z_{t_{j+1}} - Z_{t_{j}}|{\mathcal{F}}_{t_{j}}]|, $$

where the supremum is taken over all finite partitions $\Delta $ of $[0,T]$.

In Sect. 3.2 below, we give explicit formulae for the adapted Wasserstein distance in the case of semimartingale measures described by simple SDEs.

1.3 Stability of superhedging

For the rest of this article, fix some $k\in \mathbb{R}_{+}$ and let $\mathcal{H}_{k}$ be the set of all predictable processes

$$ H\colon \Omega \times I\to [-k,k]. $$

For every $p\geq 1$, write $b_{p}$ for the “upper” Burkholder–Davis–Gundy (BDG) constant. In particular, it is known that $b_{1} \leq 6 $ and that $b_{2}=2$.

Our first main result concerns the stability of superhedging and constitutes a stronger version of Theorem 1.1 stated above.

Theorem 1.5

Let$\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{1}(\Omega )$, $H \in \mathcal{H}_{k}$and let$C\colon \Omega \to \mathbb{R}$be Lipschitz with constant$L$. Then the hedging error under ℚ is bounded by the distance of ℙ and ℚ plus the hedging error under ℙ in the following sense: There exists$G\in \mathcal{H}_{k}$such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equ5_HTML.png

(WHI)

Assume in addition that$H_{t}\colon \Omega \to \mathbb{R}$is Lipschitz with constant$\tilde{L}$for every$t\in I$. Then we can take$G=H$and obtain

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equ6_HTML.png

(SHI)

where$\beta :=2\sqrt{2}\, b_{1} \tilde{L} \min \{ \mathcal{AW}_{2}( \mathbb{P},\delta _{0}),\mathcal{AW}_{2}(\mathbb{Q},\delta _{0})\}$.

Importantly, it is impossible to transfer a superhedge under ℙ into a superhedge under ℚ. This occurs already in a one-period framework and is not a by-product of our definition of the adapted Wasserstein distance; see Remark 5.2. A similar reasoning requires to consider only trading strategies bounded by $k$; see Remark 5.3.

It is worthwhile to compare the inequalities (WHI) and (SHI):

(S) In a certain sense, the “strong hedging inequality” (SHI) seems to be the more relevant assertion; after all, a trader does not know that the model ℚ (rather than the model ℙ) describes reality and hence she might (somewhat stubbornly) stick to the initial plan of hedging her risk according to the strategy $H$. The inequality (SHI) then allows quantifying the losses due to this model error.

(W) However, the “weak hedging inequality” (WHI) also has a particular merit. Suppose that a trader $W$ starts with the prior belief that the asset price evolves according to a Black–Scholes model with volatility $\sigma _{1}$, but soon after time 0 realises that a volatility $\sigma _{2} \neq \sigma _{1}$ yields a more adequate description of reality. If the witty trader $W$ makes an accurate guess about the correct model and updates her trading strategy accordingly, her losses can be controlled through the tighter bound in (WHI).

In Theorem 4.2, we provide a version of Theorem 1.5 where $(\cdot )^{+}$ is replaced by a convex, strictly increasing loss function $\ell \colon \mathbb{R}\to \mathbb{R}_{+}$.

Another way to gauge the effectiveness of an almost superhedge is by means of risk measures. We postpone the general formulation to Theorem 4.3 and first present a version that appeals to the average value of risk $\mathrm{AVaR}^{\mathbb{P}}_{\alpha }$. Recall that for a random variable $Z\colon \Omega \to \mathbb{R}$,

$$ \mathrm{AVaR}^{\mathbb{P}}_{\alpha }(Z):=\inf _{m\in \mathbb{R}} \mathbb{E}_{\mathbb{P}}[m+(Z-m)^{+}/\alpha ] $$

is the average value at risk at level $\alpha \in (0,1)$ under model ℙ. We then have

Theorem 1.6

Assume that$C\colon \Omega \to \mathbb{R}$is Lipschitz with constant$L$. Then

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equj_HTML.png

for$r:=b_{1}(L+k)/\alpha $. If$H\in \mathcal{H}_{k}$is such that$H_{t}\colon \Omega \to [-k,k]$is Lipschitz with constant$\tilde{L}$for every$t\in I$and$\beta $is the constant defined in Theorem1.5, then

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equk_HTML.png

The interpretation of this result is similar to that of Theorem 1.5: As $\mathrm{AVaR}^{\mathbb{P}}_{\alpha }(\cdot )$ is translation invariant, one has

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equl_HTML.png

and the right-hand side constitutes a relaxed version of the superhedging price. Notably, the explicit calculations of the adapted Wasserstein distance given in Sect. 3.2 imply that Theorem 1.6 (and similarly Theorem 1.5) are sharp.

Example 1.7

For hedging in a Brownian framework, consider a European call option $C(X)=(X_{T}-K)^{+}$, where for simplicity $K=0$. Moreover, let $\mathbb{P}^{\sigma }$ be Wiener measure with constant volatility $\sigma \geq 0$. Then for every $\sigma ,\hat{\sigma }\geq 0$, $k\geq 1$ and $\alpha \in (0,1)$, it holds that (we defer the proof of this fact to Sect. 4)

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equm_HTML.png

This shows that the estimate in Theorem 1.6 is tight (up to constants) in the sense that it is essentially impossible to improve on the probability metric $\mathcal{AW}_{1}$.

We make the important remark that Glanzer et al. [26] use the nested distance to control acceptability prices in discrete-time models in a Lipschitz fashion through the nested distance of these models. Specifically, in a discrete-time one-period framework, [26, Proposition 3] and Theorem 1.6 yield almost the same assertion; in that setup, the only difference is that [26, Proposition 3] does not specify a Lipschitz constant and does not assume uniform boundedness of the admissible hedging strategy. (However, the latter seems to be in conflict with our Remark 5.3 below.)

1.4 Stability of utility maximisation and utility indifference pricing

We move on to consider the continuity of utility maximisation. Let $U\colon \mathbb{R}\to \mathbb{R}$ be a utility function which is concave and increasing, and denote by $U'$ the left-continuous version of the derivative. We have

Theorem 1.8

Let$C\colon \Omega \to \mathbb{R}$be Lipschitz-continuous and assume that there exists$c\geq 0$such that$U'(x)\leq c(1+|x|^{p-1})$for all$x$. Then for every$R\geq 0$, there exists a constant$K$such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equn_HTML.png

for all$\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$with$\mathcal{AW}_{p}(\mathbb{P},\delta _{0}),\mathcal{AW}_{p}(\mathbb{Q}, \delta _{0})\leq R$.

The failure of usual Wasserstein distances to guarantee stability of utility maximisation is illustrated in Remark 5.1.

A common way of quantifying the value of a claim is via utility indifference pricing:² Given a claim $C$, the utility indifference (bid) price $v$ is defined as a solution of the equation

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equo_HTML.png

Continuing in the spirit of the present paper, we are interested in the stability of ${\mathbb{P}\mapsto v(\mathbb{P})}$, where the latter denotes a utility indifference price associated to the model ℙ. If $U$ is strictly increasing, then $v$ is unique.

Theorem 1.9

Let$C\colon \Omega \to \mathbb{R}$be Lipschitz-continuous and assume that there exists$c\geq 0$such that$0< U'(x)\leq c(1+|x|^{p-1})$for all$x$. Then for every$R\geq 0$, there exists a constant$K$such that

$$ | v(\mathbb{P})-v(\mathbb{Q}) | \leq K \mathcal{AW}_{p}(\mathbb{P}, \mathbb{Q}) $$

for all$\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$with$\mathcal{AW}_{p}(\mathbb{P},\delta _{0}),\mathcal{AW}_{p}(\mathbb{Q}, \delta _{0})\leq R$.

1.5 Structure of the paper

In Sect. 2, we briefly review the literature related to this paper. In Sect. 3, we establish some basic properties of the adapted Wasserstein distance, discuss the choice of cost function and give some examples. Moreover, we derive a contraction principle (Theorem 3.10) which relates the adapted Wasserstein distance with a “weak” (in the sense of Gozlan et al. [28]) transport distance. This result forms the basis for the proofs of the results mentioned in the introduction, as well as certain extensions of these results; see Sect. 4. Finally, we conclude with some remarks in Sect. 5.

2 Literature

The articles closest in spirit to ours are [1, 18, 26]. Acciaio et al. [1] consider an object related to the adapted Wasserstein distance in continuous time in connection with utility maximisation, enlargement of filtrations and optimal stopping. Glanzer et al. [26] prove a deviation inequality for the so-called nested distance in a discrete-time framework,³ and consider acceptability pricing over an ambiguity set described through the nested distance. Bion-Nadal and Talay [18] study via PDE arguments a continuous-time optimisation problem which is related to the adapted Wasserstein distance.

The concept of causal couplings, and optimal transport over causal couplings, has been recently popularised by Lassalle [41], although precursors can be found in the works by Yamada and Watanabe [55] and Rüschendorf [52]. This notion is central to the recent articles by Acciaio et al. [1] and Backhoff-Veraguas et al. [10, 8, 9].

The idea of strengthening weak convergence of measures in order to account for a temporal evolution has some history. Indeed, several authors have independently introduced different approaches to address this challenge. The seminal unpublished work by Aldous [2] introduces the notion of extended weak convergence for the study of stability of optimal stopping problems. The principal idea is not to compare the laws of processes directly, but rather the laws of the corresponding prediction processes. Independently, Hellwig [29] introduces the information topology for the stability of equilibrium problems in economics. Roughly, two probability measures on a product $X_{1}\times \cdots \times X_{N}$ of finitely many spaces are considered to be close if for each $t \leq N$, the projections onto the first $t$ coordinates as well as the corresponding conditional (regular) disintegrations are close. Unrelated to these developments, Pflug and Pichler [46, 47, 48] have introduced nested distances for the stability of stochastic programming in discrete time. The nested distance is the obvious role model for the adapted Wasserstein distances considered in this article, and (as mentioned above) for a fixed number of time steps and $p \geq 1$, they are obviously equivalent. Yet another idea to account for the temporal evolution of processes would be to symmetrise the causal transport costs $\mathcal{W}_{c}({\mathbb{P}},{\mathbb{Q}})$ defined by Lassalle [41] by taking the maximum or sum of $\mathcal{W}^{2}_{c}({\mathbb{P}},{\mathbb{Q}})$ and $\mathcal{W}^{2}_{c}({\mathbb{Q}},{\mathbb{P}})$; this was pointed out by Soumik Pal.

In parallel work [6], the four authors of the present article investigate the relations between these concepts in detail. Remarkably, in (finite) discrete time, all of the concepts mentioned above (adapted Wasserstein distances, extended weak convergence, information topology, nested distances, symmetrised causal transport costs) define the same topology. As noted above, this “weak adapted topology” refines the usual weak topology (properly for $T\geq 2$; see also Remark 5.2). The articles [8, 6, 23] investigate basic properties of this topology; e.g., the weak adapted topology is Polish [8, Sect. 5], and sets are totally bounded with respect to the adapted Wasserstein distance/nested distance if and only if they are totally bounded with respect to the usual Wasserstein distance [6, Lemma 1.6]. For recent applications of these concepts to optimal transport and probabilistic variants thereof, we refer to Backhoff-Veraguas et al. [11, 12] and Wiesel [54].

In contrast, fundamental topological properties of the above-mentioned concepts in the continuous-time case seem to be much less understood and, as far as the authors are concerned, pose an interesting challenge for future research. Specifically, it is not clear to us whether the topology associated to the adapted Wasserstein distance is Polish in the continuous-time case. In a similar vein, we expect that results analogous to those of the present article should apply in the case of càdlàg paths, but such an extension is beyond the scope of our current understanding of adapted Wasserstein distances.

The question of stability in mathematical finance has been studied from different perspectives over the years. Notably, starting with the articles of Lyons [42] and Avellaneda et al. [5], the area of robust finance has mainly focused on extremal models and hedging strategies which dominate the payoff for every model in a specified class. Following the publication of Hobson’s seminal article [32], connections with the Skorokhod embedding problem have been a driving force of the field; see the surveys of Hobson [34] and Obłój [44]. Recently, this has been complemented by techniques coming from (martingale) optimal transport; early papers which advance this viewpoint include Hobson [35], Beiglböck et al. [15, 16], Galichon et al. [25], Bouchard and Nutz [19], Dolinsky and Soner [22], Campi et al. [20], and Beiglböck and Siorpaes [14]. The literature on “local” misspecification of volatility in a sense more closely related to the present article appears more sparse. El Karoui et al. [24] establish in a stochastic volatility framework that if the misspecified volatility dominates the true volatility, then the misspecified price of call options dominates the real price; see also the elegant account of Hobson [33]. More recently, the question of pricing and hedging under uncertainty about the volatility of a reference local volatility model is studied by Herrmann et al. [31] (see also Herrmann and Muhle-Karbe [30]). Less plausible models are penalised through a mean-square distance to the volatility of the reference model, and the authors obtain explicit formulas for prices and hedging strategies in a limit for small uncertainty aversion. Becherer and Kentia [13] derive worst-case good-deal bounds under model ambiguity which concerns drift as well as volatility. Indeed, discussions with Dirk Becherer motivated us to consider also models with drift in our results on stability of superhedging. The behaviour of the superhedging price in a ball (with respect to various notions of distance) around a reference model is studied in depth by Obłój and Wiesel [45] for a $d$-dimensional asset and one time period.

A notable implication of our work is that it yields a coherent way to measure model uncertainty (in the sense of Cont’s influential article [21]): Fix a subset $M_{0}$ of the set $M$ of all consistent models, i.e., martingale measures which are consistent with benchmark instruments whose price can be observed on the market. Given $M_{0}$, the model uncertainty associated to a derivative $f$ can be gauged through

$$ \rho _{M_{0}}(f):=\sup \{ \mathbb{E}_{\mathbb{Q}} f :\mathbb{Q}\in M_{0} \} - \inf \{\mathbb{E}_{\mathbb{Q}} f : \mathbb{Q}\in M_{0} \}. $$

The worst-case approach typically pursued in robust finance then yields $\rho _{M_{0}}(f)$ for $M_{0}=M$, but it appears equally natural to take $M_{0}$ to be an infinitesimal ball around a reference model. This approach is being pursued by Bartl, Drapeau, Obłój, Wiesel and one of the present authors in a one-period framework. Our results indicate that an adapted Wasserstein distance provides a way to extend this to a multi-period setup, and we intend to pursue this further in future work.

On a different note, much work has been done regarding the convergence of discrete-time models to their continuous-time analogues. Due to the vastness of this literature, we refer the reader to the book by Prigent [51] for references. Finally, in more recent times and starting from the works of Kardaras and Žitković, the stability of utility maximisation has been studied in Kardaras and Žitković [37], Larsen [39], Larsen and Žitković [40], Mocha and Westray [43] and Weston [53], among others.

3 The adapted Wasserstein distance

3.1 Basic properties of $\mathcal{AW}_{p}$

The following lemma shows that $\mathcal{AW}_{p}$ is well defined.

Lemma 3.1

Let$\mathbb{P},\mathbb{Q}$be integrable (semi-)martingale measures for$X,Y\colon \Omega \to \Omega $respectively, and let$\pi $be a bi-causal coupling between ℙ and ℚ. Then the process$X,Y,X-Y\colon \Omega \times \Omega \to \Omega $are (semi)-martingales with respect to$\pi $. Further, if$X=M+A$denotes the semimartingale decomposition under ℙ, then up to evanescence, $M+A$is the semimartingale decomposition of$X$under$\pi $.

Proof

Let $X=M+A$ be the semimartingale decomposition under ℙ and consider $M$ and $A$ as processes on $\Omega \times \Omega $ via $M(\omega ,\eta ):=M(\omega )$ and $A(\omega ,\eta ):=A(\omega )$. Further let $\pi =\mathbb{P}(d\omega )\pi _{\omega }(d\eta )$ be a bi-causal coupling between ℙ and ℚ. To show that $X=M+A$ remains the semimartingale decomposition under $\pi $, it is enough to show that $M$ is a martingale under $\pi $. To that end, let $0\leq s\leq t$ and let $Z\colon \Omega \times \Omega \to \mathbb{R}$ be $\mathcal{F}_{s}\otimes \mathcal{F}_{s}$-measurable and bounded. (Recall that $\mathbb{F}=({\mathcal{F}}_{t})$ denotes the right-continuous filtration generated by $X$ and that we endow $\Omega \times \Omega $ with the filtration ${\mathbb{F}}\otimes {\mathbb{F}}$.) Then the random variable $Z'\colon \Omega \to \mathbb{R}$ defined by

$$ Z'(\omega ):=\int Z(\omega ,\eta )\,\pi _{\omega }(d\eta ) \qquad \text{is $\mathcal{F}^{\mathbb{P}}_{s}$-measurable,} $$

and clearly bounded. Indeed, if $Z(\omega ,\eta )=Z^{1}(\omega )Z^{2}(\eta )$ for $\mathcal{F}_{s}$-measurable bounded functions $Z^{1}$ and $Z^{2}$, it follows from the definition of bi-causality that $Z'$ is $\mathcal{F}^{\mathbb{P}}_{s}$-measur- able; the general statement then follows from a monotone class argument. Therefore

$$\begin{aligned} \mathbb{E}_{\pi }[(M_{t}-M_{s})Z] &=\int \big(M_{t}(\omega )-M_{s}( \omega )\big)\int Z(\omega ,\eta )\,\pi _{\omega }(d\eta )\,\mathbb{P}(d \omega ) \\ &=\mathbb{E}_{\mathbb{P}}[(M_{t}-M_{s})Z'] \\ &=0 \end{aligned}$$

by the martingale property of $M$ under ℙ. This shows that $M$ is a martingale under $\pi $ and therefore that $X=M+A$ is the semimartingale decomposition under $\pi $. □

Lemma 3.2

$\mathcal{AW}_{p}$defines a metric on the set$\mathcal{SM}_{p}(\Omega )$.

We note that very similar arguments could be used to show that $\mathcal{AW}_{p}$ defines a metric for semimartingales with infinite time horizon on ℕ or $[0,\infty )$.

Proof of Lemma 3.2

It is clear that $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})=\mathcal{AW}_{p}(\mathbb{Q}, \mathbb{P})\geq 0$ for all $\mathbb{P},\mathbb{Q}$ in $\mathcal{SM}_{p}(\Omega )$. Suppose that $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})=0$. As $\|\cdot \|_{\infty }\leq |\cdot |_{1\text{-var}}$, it is immediate that if $\pi $ participates in the infimum defining $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})$ and $X-Y=M+A$, then

$$ \mathbb{E}_{\pi }[ \|X-Y\|_{\infty }^{p} ] \leq 2^{p-1} \mathbb{E}_{\pi }\big[ \| M\|_{\infty }^{p} + |A|_{1\text{-var}}^{p}\big] \leq 2^{p-1} b_{p} \mathbb{E}_{\pi }\big[ [M]_{T}^{p/2} + |A|_{1\text{-var}}^{p} \big], $$

where $b_{p}$ denotes the BDG constant and we used the BDG inequality for the martingale $M$. Hence the usual Wasserstein distance between ℙ and ℚ (defined with respect to the $\|\cdot \|_{\infty }$-norm) is dominated from above by $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})$, and so $\mathbb{P}=\mathbb{Q}$.

We now prove the triangle inequality. Let $\mathbb{P},\mathbb{Q},\mathbb{R}$ given. We fix $\varepsilon >0$ and assume $\pi $ is bi-causal $\varepsilon $-optimal for $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})$ and $\tilde{\pi }$ is bi-causal $\varepsilon $-optimal for $\mathcal{AW}_{p}(\mathbb{Q},\mathbb{R})$. In the next couple of lines, $\omega $ always denotes the first coordinate of a vector in $\Omega ^{3}$, $\eta $ the second and $\gamma $ the last. Let

$$ \pi (d\omega ,d\eta )=\pi _{\eta }(d\omega )\,\mathbb{Q}(d\eta ) \qquad \text{and}\qquad \tilde{\pi }(d\eta ,d\gamma )=\tilde{\pi }_{\eta }(d\gamma )\mathbb{Q}(d\eta ) $$

be disintegrations, and define $\Pi \in \mathcal{P}(\Omega ^{3})$ by

$$ \Pi (d\omega ,d\eta ,d\gamma )= \pi _{\eta }(d\omega )\,\tilde{\pi }_{\eta }(d\gamma )\,\mathbb{Q}(d\eta ). $$

If $\overline{\pi }(d\omega ,d\gamma ):=\int _{\Omega }\Pi (d\omega ,d \eta ,d\gamma )$ is the projection of $\Pi $ onto the first and third components, then it is clear that the first and second marginals of $\overline{\pi }$ are ℙ and $\mathbb{R,}$ respectively. Moreover, a disintegration of $\overline{\pi }=\overline{\pi }_{\omega }(d\gamma )\,\mathbb{P}(d \omega )$ is given by

$$ \overline{\pi }_{\omega }(d\gamma )= \int _{\Omega }\tilde{\pi }_{\eta }(d \gamma )\,\pi _{\omega }(d\eta ), $$

where as indicated above, $\pi _{\omega }$ now denotes the disintegration of $\pi $ with respect to the first coordinate, that is, $\pi (d\omega ,d\eta )=\pi _{\omega }(d\eta )\,\mathbb{P}(d\omega )$. We claim that for every $A\in \mathcal{F}_{t}$, the mapping $\omega \mapsto \overline{\pi }_{\omega }(A)$ is $\mathcal{F}^{\mathbb{P}}_{t}$-measurable. Indeed, by bi-causality of $\tilde{\pi }$, one has that $\eta \mapsto \tilde{\pi }_{\eta }(A)$ is $\mathcal{F}^{\mathbb{Q}}_{t}$-measurable. Thus there are an $\mathcal{F}_{t}$-measurable function $X$ and a ℚ-almost surely zero function $N$ such that $\tilde{\pi }_{\eta }(A)=X(\eta )+N(\eta )$ for all $\eta \in \Omega $. Then $\overline{\pi }_{\omega }(A)=\int _{\Omega }X(\eta )\,\pi _{\omega }(d \eta ) + \int _{\Omega }N(\eta )\pi _{\omega }(d\eta )$ for all $\eta \in \Omega $. The first term is $\mathcal{F}^{\mathbb{P}}_{t}$-measurable (by bi-causality of $\pi $), and as $\pi $ is a coupling between ℙ and ℚ, one has that $\int _{\Omega }N(\eta )\pi _{\omega }(d\eta )=0$ for ℙ-almost all $\omega \in \Omega $.

The argument for $\overline{\pi }=\overline{\pi }_{\gamma }(d\omega )\,\mathbb{R}(d \gamma )$ is similar and therefore $\overline{\pi }$ is a bi-causal coupling between ℙ and ℝ. Finally, it follows as in the proof of Lemma 3.1 that if $X=M^{X}+A^{X}$, $Y=M^{Y}+A^{Y}$ and $Z=M^{Z}+A^{Z}$ are the semimartingale decompositions under ℙ, ℚ, and ℝ, then they remain the semimartingale decompositions under $\Pi $ on $\Omega ^{3}$ endowed with the product filtration.

To finish the proof of the triangle inequality, we observe that

$$\begin{aligned} \mathcal{AW}_{p}(\mathbb{P},\mathbb{R}) &\leq \mathbb{E}_{ \overline{\pi }}\big[ [M^{X}-M^{Z} ]_{T}^{p/2} + |A^{X}-A^{Z}|_{ 1\text{-var}}^{p} \big]^{1/p} \\ &= \mathbb{E}_{\Pi }\big[ [(M^{X}-M^{Y})+(M^{Y}-M^{Z}) ]_{T}^{p/2} \\ &\phantom{=\mathbb{E}_{\Pi }\big[}+ |(A^{X}-A^{Y})+(A^{Y}-A^{Z})|_{1\text{-var}}^{p} \big]^{1/p}. \end{aligned}$$

The function $M\mapsto \mathbb{E}_{\Pi }[ [M ]_{T}^{p/2}]^{1/p}$ is known to be a norm on the space $\mathcal{M}_{p}(\Pi )$ of $\Pi $-martingales starting at zero whose supremum is $p$-integrable. Likewise, the function $A\mapsto \mathbb{E}_{\Pi }[ |A|_{1\text{-var}}^{p} ]^{1/p}$ is a norm on the space of finite variation processes with $p$-integrable variation. Hence

$$ (M,A)\mapsto \|(M,A)\|:= \mathbb{E}_{\Pi }\big[[M ]_{T}^{p/2}+ |A|_{ 1\text{-var}}^{p} \big]^{1/p} $$

is a norm on the product of these spaces. We conclude the proof for the triangle inequality with

$$\begin{aligned} \mathcal{AW}_{p}(\mathbb{P},\mathbb{R})& \leq \|(M^{X}-M^{Y},A^{X}-A^{Y})+(M^{Y}-M^{Z},A^{Y}-A^{Z}) \| \\ &\leq \|(M^{X}-M^{Y},A^{X}-A^{Y})\| + \|(M^{X}-M^{Y},A^{X}-A^{Y})\| \\ &= \mathbb{E}_{\pi }\big[ [M^{X}-M^{Y} ]_{T}^{p/2}+ |A^{X}-A^{Y}|_{ 1\text{-var}}^{p} \big]^{1/p} \\ &\phantom{=:}+ \mathbb{E}_{\tilde{\pi }} \big[ [M^{Y}-M^{Z} ]_{T}^{p/2}+ |A^{Y}-A^{Z}|_{ 1\text{-var}}^{p} \big]^{1/p} \\ &\leq 2\varepsilon + \mathcal{AW}_{p}(\mathbb{P},\mathbb{Q}) + \mathcal{AW}_{p}(\mathbb{Q},\mathbb{R}), \end{aligned}$$

as the semimartingale decomposition of $X-Y$ under $\pi $ is ${(M^{X}-M^{Y})+(A^{X}-A^{Y})}$, with an analogous expression for $Y-Z$ under $\tilde{\pi }$.

To conclude the proof, it remains to show that we have $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})<\infty $ for all $\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$. But Lemma 3.1 gives $\mathcal{AW}_{p}(\mathbb{P},\delta _{0})= \mathbb{E}_{\mathbb{P}}[[M]_{T}^{p/2} + |A|_{1\text{-var}}^{p}]^{1/p}$, where $X=M+A$ is the semimartingale decomposition under ℙ. Therefore the triangle inequality implies that $\mathcal{AW}_{p}$ is real-valued on $\mathcal{SM}_{p}(\Omega )$. □

3.2 Examples and explicit calculations

We start by a simple result which permits to give a closed-form expression of the adapted Wasserstein distance in given continuous-time situations.

Proposition 3.3

For$i\in \{1,2\}$, consider the SDEs with bounded progressive coefficients

$$ dX^{i}_{t}=\mu _{i}\big(t,(X^{i}_{s})_{s\leq t}\big)dt+\sigma _{i} \big(t,(X^{i}_{s})_{s\leq t}\big)dB^{i}_{t}. $$

(3.1)

Assume that each SDE admits a unique strong solution and denote by$\mathbb{P}^{\mu _{i},\sigma _{i}}$the respective laws. Further assume that

$\mu _{1}$is a function of time only (namely$\mu _{1}\colon [0,T]\to \mathbb{R}$);
$\sigma _{1},\sigma _{2}\geq 0$and at least one of them is a function of time only.

Then the synchronous coupling (namely$\pi ^{*}=$joint law of$(X^{1},X^{2})$, where$B^{1}=B^{2}$in (3.1)) is optimal in the definition of$\mathcal{AW}_{p}(\mathbb{P}^{\mu _{1},\sigma _{1}},\mathbb{P}^{\mu _{2}, \sigma _{2}})$.

The discrete-time version of the above synchronous coupling is given by the Knothe–Rosenblatt rearrangement [10], and a variant of the previous result can also be obtained in the discrete-time framework.

Proof

Proof of Proposition 3.3 Let $\pi $ be a feasible coupling for $\mathcal{AW}_{p}(\mathbb{P}^{\mu _{1},\sigma _{1}},\mathbb{P}^{\mu _{2}, \sigma _{2}})$, leading to a finite cost. For this proof, we denote the coordinate process on $\Omega \times \Omega $ by $(X^{1},X^{2})$. As before, let $X^{i}=A^{i}+M^{i}$ be the unique continuous semimartingale decomposition of $X^{i}$ under the $\mathbb{P}^{\mu _{i},\sigma _{i}}$-completion of its right-continuous filtration. Observe that $t \mapsto \frac{d}{dt}A^{1}_{t}$ is a.s. deterministic, by the assumption on $\mu _{1}$, and that the law of $t \mapsto \frac{d}{dt}A^{2}_{t}$ is independent of the coupling $\pi $. Both facts can be easily derived from the identity

$$ \frac{d}{dt}A^{i}_{t}= \lim _{\varepsilon \searrow 0} \frac{\mathbb{E}_{\pi }[ X^{i}_{t+\varepsilon } | \mathcal{F}^{X^{i}}_{t} ] -X^{i}_{t}}{\varepsilon }, $$

which by the Lebesgue differentiation theorem holds $dt\otimes d\pi $-a.e. As a consequence, the term $\mathbb{E}_{\pi }[|A^{1}-A^{2}|^{p}_{1-\mathrm{var}}]$ is independent of the coupling $\pi $, and so we may ignore it and only focus on the term $\mathbb{E}_{\pi }[[M^{1}-M^{2}]_{T}^{p/2}]$.

By Doob’s martingale representation [36, Theorem 3.4.2], on a possibly enlarged filtered probability space $(\tilde{\Omega },\tilde{\mathcal{F}},\tilde{\pi })$, we may represent the martingale $(M^{1},M^{2})$ by

$$\begin{aligned} M^{i}=\int \sigma _{i1}dW + \int \sigma _{i2}d\hat{W}, \end{aligned}$$

where $W,\hat{W}$ are independent standard one-dimensional Brownian motions and $\sigma _{ik}$, $i,k\in \{1,2\}$, real-valued processes, both of them adapted in the enlarged filtered space. In the following, we omit the argument $(X^{i}_{s})_{s\leq t}$ from $\sigma _{i}$. Necessarily, we have

$$ \sigma _{i,t}^{2}=\frac{d}{dt}[M^{i}]_{t}=\sigma _{i1,t}^{2}+\sigma _{i2,t}^{2} \qquad \text{$dt\otimes d\tilde{\pi }$-a.e.} $$

By the Cauchy–Schwarz inequality, we deduce that almost surely,

$$ [M^{1},M^{2}]_{T} = \int _{0}^{T}(\sigma _{11}\sigma _{21}+\sigma _{12} \sigma _{22})dt \leq \int _{0}^{T} \sigma _{1}\sigma _{2} dt, $$

and accordingly we get the lower bound

$$ \mathbb{E}_{\pi }\big[[M^{1}-M^{2}]_{T}^{p/2}\big] \geq \mathbb{E}_{ \pi }\bigg[\bigg( \int _{0}^{T} (\sigma _{1} -\sigma _{2})^{2} dt \bigg)^{p/2}\bigg]. $$

As in the beginning of the proof, the right-hand side does not depend on the coupling $\pi $ thanks to one of the $\sigma ^{i}$ being a function of time only. To conclude, observe that for the synchronous coupling $\pi ^{*}$, we have equality in the above equation. □

As an easy consequence, we have

Example 3.4

For bounded Lipschitz functions $\mu _{1},\mu _{2},\sigma _{1},\sigma _{2}$, we denote by $\mathbb{P}^{\mu _{i},\sigma _{i}}$ the law of the diffusion

$$ dX^{i}_{t}=\mu _{i}(t,X^{i}_{t})dt+\sigma _{i}(t,X^{i}_{t})dB_{t}. $$

Assume that

$\mu _{i}$ is independent of the $x$-variable for some $i\in \{1,2\}$, and
$\sigma _{k}$ is independent of the $x$-variable for some $k\in \{1,2\}$.

For $j\in \{1,2\}\backslash \{i\}$ and $\ell \in \{1,2\}\backslash \{k\}$, we have

$$\begin{aligned} \mathcal{AW}_{p}(P^{\mu _{1},\sigma _{1}},P^{\mu _{2},\sigma _{2}})^{p} &= \mathbb{E}\bigg[ \bigg( \int _{0}^{T} \big(\sigma _{\ell }(t,X_{t}^{\ell })-\sigma _{k}(t)\big)^{2} dt\bigg)^{p/2} \bigg] \\ &\phantom{=:}+ \mathbb{E}\bigg[ \bigg(\int _{0}^{T} |\mu _{j}(t,X^{j}_{t})-\mu _{i}(t)|dt \bigg)^{p} \bigg]. \end{aligned}$$

We now illustrate that in general, it is not true that the straightforward synchronous coupling of Proposition 3.3 is optimal. As a consequence, we do not expect a closed-form expression for the adapted Wasserstein distance. A discrete-time version of this observation is discussed in [8, Sect. 7].

Example 3.5

Consider $d=1$, $T=2$ and for each $c\in \mathbb{R}$ introduce

$$ \mu ^{c}_{t}(\omega ):=c 1_{[1,2]}(t) \mathop{\mathrm{sign}}(\omega _{1}), \qquad \hat{\mu }^{c}_{t}(\omega ):=-\mu ^{c}_{t}(\omega ). $$

Assuming that $B$ is a Brownian motion and for $\sigma \in \mathbb{R}_{+}$, we introduce the couplings

$$\begin{aligned} \pi _{1}&:= \mathop{\mathrm{Law}} \bigg(\sigma B + \int \mu ^{c}_{t}(B)dt, \sigma B + \int \hat{\mu }^{c}_{t}(B)dt \bigg), \\ \pi _{2}&:= \mathop{\mathrm{Law}} \bigg(\sigma B + \int \mu ^{c}_{t}(B)dt, -\sigma B + \int \hat{\mu }^{c}_{t}(-B)dt \bigg). \end{aligned}$$

These couplings share the same marginals and each of them is bi-causal. Writing as before $X-Y=M+A$, it is easy to compute

$$\begin{aligned} \mathbb{E}_{\pi _{1}}\big[ [M]_{T}^{p/2} + |A|_{1\text{-var}}^{p} \big] &= (2c)^{p}, \\ \mathbb{E}_{\pi _{2}}\big[ [M]_{T}^{p/2} + |A|_{1\text{-var}}^{p} \big] &=(8\sigma ^{2})^{p/2}. \end{aligned}$$

We conclude that for each $p$, there are plenty of pairs $(c,\sigma )$ such that the “synchronous” coupling $\pi _{1}$ is not optimal between its marginals for the metric $\mathcal{AW}_{p}$.

To close this section, we estimate the distance between two geometric Brownian motions with different volatilities.

Proposition 3.6

For$i=1,2$, let$\mathbb{P}^{\sigma _{i}}$denote the law of the solution to the SDE$dZ^{i}_{t}=\sigma _{i} Z^{i}_{t} dB_{t}^{i}$with$Z^{i}_{0}=1$, where$B^{i}$denotes Brownian motion and$\sigma _{i}\in {\mathbb{R}}_{+}$. Letting$R\sim {\mathcal{N}}(0,T)$, we then have

$$ \mathcal{AW}_{2}(\mathbb{P}^{\sigma _{1}},\mathbb{P}^{\sigma _{2}} )^{2}= \mathbb{E}\bigg[\bigg(e^{\sigma _{1}R-\frac{\sigma _{1}^{2}T}{2}}-e^{ \sigma _{2} R-\frac{\sigma _{2}^{2}T}{2}} \bigg)^{2}\bigg]=e^{\sigma _{1}^{2}T}-2e^{ \sigma _{1}\sigma _{2}T} +e^{\sigma _{2}^{2}T}, $$

and for$p>1$,

$$ \mathcal{AW}_{p}(\mathbb{P}^{\sigma _{1}},\mathbb{P}^{\sigma _{2}} )^{p} \leq c_{p} \mathbb{E}\bigg[\bigg(e^{\sigma _{1}R- \frac{\sigma _{1}^{2}T}{2}}-e^{\sigma _{2}R- \frac{\sigma _{2}^{2}T}{2}} \bigg)^{p}\bigg], $$

where$c_{p}$is the constant in the BDG inequality which allows controlling the quadratic variation by the terminal value.

Proof

We have

$$\begin{aligned} & \mathcal{AW}_{p}(\mathbb{P}^{\sigma _{1}},\mathbb{P}^{\sigma _{2}} )^{p} \\ &= \inf \big\{ \mathbb{E}_{\pi }\big[ [Z^{1}-Z^{2}]_{T}^{p/2}\big] : \pi \in \mathrm{Cpl}_{\mathrm{BC}}({\mathbb{P}},{\mathbb{Q}})\big\} \\ &\leq c_{p} \inf \big\{ \mathbb{E}_{\pi }\big[(Z_{T}^{1}-Z_{T}^{2})^{p} \big] : \pi \in \mathrm{Cpl}_{\mathrm{BC}}({\mathbb{P}},{\mathbb{Q}}) \big\} \\ &= c_{p} \inf \bigg\{ \int \bigg(e^{\sigma _{1} r_{1} - \frac{\sigma _{1}^{2}T}{2}}-e^{\sigma _{2}r_{2}- \frac{\sigma _{2}^{2}T}{2}} \bigg)^{p}\, d\pi (r_{1},r_{2}) : \pi \in \mathrm{Cpl}(\gamma _{T},\gamma _{T})\bigg\} \\ &= c_{p} \mathbb{E}\bigg[\bigg(e^{\sigma _{1}R- \frac{\sigma _{1}^{2}T}{2}}-e^{\sigma _{2}R- \frac{\sigma _{2}^{2}T}{2}} \bigg)^{p}\bigg], \end{aligned}$$

where $\gamma _{T}$ denotes a centered Gaussian with variance $T$. For $p=2$ and $c_{2}=1$, we obtain equality. □

3.3 Choice of the “cost functional”

Recall from Definition 1.3 that the adapted Wasserstein distance is given through

$$\begin{aligned} \mathcal{AW}_{p}(\mathbb{P},\mathbb{Q}):=\inf \{ \Phi : \pi \in \mathrm{Cpl}_{\mathrm{BC}}({\mathbb{P}},{\mathbb{Q}}) \}, \end{aligned}$$

where the “cost functional”

$$\begin{aligned} \Phi =\mathbb{E}_{\pi }\big[ [M^{X}-M^{Y}]_{T}^{p/2} + |A^{X}-A^{Y}|_{ 1\text{-var}}^{p}\big]^{1/p} \end{aligned}$$

(3.2)

is defined using the semimartingale decompositions $X=M^{X}+A^{X}, Y=M^{Y}+A^{Y}$. The distinctive property of this “quadratic plus first variation” functional is that it exhibits the proper scaling to interpret the discrete-time case as an approximation to the continuous-time counterpart. To see this, consider $\Omega =C([0,1])$ and let $\mathbb{P}^{\sigma }$ be the law of $X$, where $X_{t}=\int _{0}^{t} \sigma _{s}\, dB_{s}$, $B$ is a Brownian motion and $\sigma \in C([0,1]), \sigma \geq 0$. For each $N$, denote by $\mathbb{P}_{N}^{\sigma }$ the law of a random walk on $\{0,1/N,2/N, \dots ,1\}$ with independent increments from $n/N$ to $(n+1)/N$ distributed according to $\mathcal{N}(0,\sigma _{n/N}^{2}/N)$. Then one can compute that for $0\leq \sigma , \sigma '\in C([0,1])$,

$$\begin{aligned} \mathcal{AW}_{2}(\mathbb{P}^{\sigma }_{N},{\mathbb{P}}^{\sigma '}_{N}) &= \bigg( \sum _{n=0}^{N-1} \frac{1}{N}|\sigma _{n/N}-\sigma _{n/N}'|^{2} \bigg)^{1/2} \\ &\longrightarrow \bigg( \int _{0}^{1} |\sigma _{t}-\sigma _{t}'|^{2} \,dt)\bigg)^{1/2} =\mathcal{AW}_{2}(\mathbb{P}^{\sigma },\mathbb{P}^{ \sigma '} ). \end{aligned}$$

For comparison, consider the consequences of replacing the “cost functional” $\Phi $ in (3.2) with $\tilde{\Phi }= {\mathbb{E}}_{\pi }[\sum _{i=0}^{N} (X_{i}-Y_{i})_{i}^{2}]^{1/2}$ corresponding to a quadratic nested distance (in terms of Pflug and Pichler [47]). While $\widetilde{\mathcal{AW}}_{2}$ and ${\mathcal{AW}}_{2}$ are equivalent metrics for each fixed$N$, $\widetilde{\mathcal{AW}}_{2}$ does not exhibit the appropriate scaling for large $N$. A straightforward computation shows that $\widetilde{\mathcal{AW}}_{2}(\mathbb{P}_{N}^{\sigma },\mathbb{P}_{N}^{ \sigma '})\to \infty $ as $N\to \infty $ whenever $\sigma \neq \sigma '$. In consequence, bounds on the hedging error in terms of $\widetilde{\mathcal{AW}}_{2}(\mathbb{P}_{N}^{\sigma },\mathbb{P}_{N}^{ \sigma '})$ become progressively weaker as $N\to \infty $. In particular, they do not allow a meaningful continuous-time limit.

When restricting solely to martingale measures ${\mathbb{P}}, {\mathbb{Q}}$, a sensible alternative to (3.2) would be to consider the maximum norm, i.e., $\Phi '= {\mathbb{E}}_{\pi }[\sup _{t}|X_{t}-Y_{t}|^{p}]^{1/p}$. In fact, by the BDG inequalities, this is essentially equivalent to our choice in (3.2). However, when considering semimartingales, this cost is too coarse. For example, let $(\omega _{n})$ be a sequence in $\Omega $ which converges to zero in the maximum norm, but for which the first variation tends to infinity. Then $\mathbb{P}_{n}:=\delta _{\omega _{n}}$ converges to $\mathbb{P}:=\delta _{0}$ (when the adapted distance is defined only with the maximum norm as cost), but none of our optimisation problems converge (take a strategy $H\in \mathcal{H}_{k}$ for which

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq424_HTML.gif

almost surely).

3.4 Stochastic integrals and a contraction principle

We present here the two technical results which underlie the proofs of the main theorems in the article. The first one is

Lemma 3.7

Let$\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{1}(\Omega )$, $H\in \mathcal{H}_{k}$and$\pi $be a bi-causal coupling between ℙ and ℚ. Then there exists a process$G\in \mathcal{H}_{k}$such that$G_{t}(Y)=\mathbb{E}_{\pi }[H_{t}(X)|Y]$for every$t$, $\pi $-almost surely. Moreover, we have

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq432_HTML.gif

$\pi $-almost surely.

Proof

In discrete time, we can always write $H=\sum _{t=1}^{N} H_{t} 1_{\{t\}}$ for Borel functions $H_{t} \colon \mathbb{R}^{t-1}\to [-k,k]$. Let $\pi =\pi _{\eta }(d\omega )\,\mathbb{P}(d\omega )$ be a disintegration and define

$$ G'_{t}(\eta ):=\int H_{t}(\omega )\pi _{\eta }(d\omega ) $$

for every $t$ and $\eta \in \Omega $. By the definition of a bi-causal coupling, $G'_{t}$ is $\mathcal{F}^{\mathbb{Q}}_{t-1}$-measurable. It remains to pick functions $G_{t}$ which are $\mathcal{F}_{t-1}$-measurable such that $G_{t}=G'_{t}$ ℚ-almost surely. Since $\mathbb{E}_{\pi }[H_{t}(X)|Y]=G_{t}(Y)$$\pi $-almost surely, it is clear that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq446_HTML.gif

$\pi $-almost surely.

In continuous time, we take $G$ to be the predictable projection of $H$ under the reference measure $\pi $, with respect to the $\pi $-completion of the filtration $\{\emptyset ,\Omega \}\otimes \mathbb{F}^{Y}$. By [1, Lemma C.1], the result is $\pi $-indistinguishable from a predictable process under the ℚ-completion of the filtration $\mathbb{F}^{Y}$. The $t$-by-$t$, $\pi $-almost sure equality $G_{t}(Y)= \mathbb{E}_{\pi }[H_{t}(X)|Y]$ is then a consequence of the definition of the predictable projection. The $\pi $-almost sure equality

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq460_HTML.gif

is established in Lemma 3.8 below, assuming that $\mathbb{E}_{\mathbb{Q}}[[Y ]_{T}]<\infty $. The general case follows by localisation. □

Lemma 3.8

In the continuous-time context of Lemma3.7, assume further that we have$\mathbb{E}_{\mathbb{Q}}[[Y ]_{T}]<\infty $. Then

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq463_HTML.gif

$\pi $-almost surely.

Proof

The statement is true if instead of the stochastic integrals, we consider the integrals with respect to the finite variation part of $Y$ (either by properties of Riemann–Stieltjes integrals, or directly from the definition of the predictable projection). For this reason, we may now assume that $Y$ is itself a martingale.

We first take for granted the following result: If $h$ is bounded and predictable in the filtration of $(X,Y)$ and if $g$ denotes its predictable projection in the filtration of $Y$ under the measure $\pi $, then

$$\begin{aligned} \mathbb{E}_{\pi }\bigg[\int _{0}^{T} |g_{t}|^{2}d[Y]_{t} \bigg]\leq \mathbb{E}_{\pi }\bigg[\int _{0}^{T} |h_{t}|^{2}d[Y]_{t} \bigg]. \end{aligned}$$

(3.3)

We know that there exists a sequence $(H^{n})$ of predictable simple processes such that

$$ \lim _{n\to \infty }\mathbb{E}_{\pi }\bigg[\int _{0}^{T} |H_{t}-H^{n}_{t}|^{2}d[Y]_{t} \bigg]=0. $$

By the Itô isometry, the stochastic integrals

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq473_HTML.gif

converge in $L^{2}(\pi )$ to

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq475_HTML.gif

. Denoting by $G^{n}$ the predictable projection of $H^{n}$ with respect to the $Y$-filtration, we deduce from (3.3) that

$$ \lim _{n\to \infty }\mathbb{E}_{\pi }\bigg[\int _{0}^{T} |G_{t}-G^{n}_{t}|^{2}d[Y]_{t} \bigg]=0; $$

so again by the Itô isometry,

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq479_HTML.gif

converges in $L^{2}(\pi )$ to

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq481_HTML.gif

. The $\pi $-almost sure equality

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq483_HTML.gif

follows easily by the bi-causality of the coupling $\pi $, and by taking $L^{2}$-limits, the desired conclusion is obtained.

To finish the proof, we must establish (3.3). First we observe that

$$\begin{aligned} \mathbb{E}_{\pi }\bigg[\int _{0}^{T} |g_{t}|^{2}d[Y]_{t} \bigg]^{1/2}&= \sup _{\substack{f \text{ is $Y$-predictable,}\\ \|f\|\leq 1} } \mathbb{E}_{\pi }\bigg[\int _{0}^{T} f_{t}g_{t}d[Y]_{t} \bigg] \\ &=\sup _{\substack{f \text{ is $Y$-predictable,}\\ \|f\|\leq 1} } \mathbb{E}_{\pi }\bigg[\int _{0}^{T} f_{t}h_{t}d[Y]_{t} \bigg], \end{aligned}$$

as follows from predictable projection and with $\|f\|^{2}:=\mathbb{E}_{\pi }[\int _{0}^{1} |f_{t}|^{2}d[Y]_{t} ]$. The result (3.3) is then a consequence of the equality

$$ \mathbb{E}_{\pi }\bigg[\int _{0}^{T} |h_{t}|^{2}d[Y]_{t} \bigg]^{1/2}= \sup _{\substack{f \text{ is $(X,Y)$-predictable,}\\ \|f\|\leq 1} } \mathbb{E}_{\pi }\bigg[\int _{0}^{T} f_{t}h_{t}d[Y]_{t} \bigg]. $$

□

Our next crucial technical result is given in Theorem 3.10 below. But first we need some preparation.

Lemma 3.9

Let p$\geq 1$. Let$\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$, let$\pi $be a bi-causal coupling between ℙ and ℚ, let$H\in \mathcal{H}_{k}$and write$X-Y=M+A$for the semimartingale decomposition under$\pi $. Then we have

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equau_HTML.png

where$b_{p}$is the upper constant in the BDG inequality. If further$H_{t}\colon \Omega \to \mathbb{R}$is$\tilde{L}$-Lipschitz-continuous for every$t$, then we have

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equav_HTML.png

where$\alpha =2^{3p-2}\tilde{L}^{p}b_{p} b_{2p}^{1/2}\min \{\mathcal{AW}_{2p}( \mathbb{P},\delta _{0})^{p},\mathcal{AW}_{2p}(\mathbb{Q},\delta _{0})^{p} \}$.

Proof

The elementary inequality $(x+y)^{p}\leq 2^{p-1} x^{p}+ 2^{p-1} y^{p}$ for $x,y\geq 0$ together with the BDG inequality and the fact that $\|\cdot \|_{\infty }\leq |\cdot |_{1\text{-var}}$ implies

$$\begin{aligned} \mathbb{E}_{\pi }[\|X-Y\|_{\infty }^{p}] &\leq 2^{p-1} \mathbb{E}_{\pi }\big[\|M\|_{\infty }^{p}\big] + 2^{p-1}\mathbb{E}_{\pi }\big[|A|_{ 1\text{-var}}^{p}\big] \\ &\leq 2^{p-1} b_{p} \mathbb{E}_{\pi }\big[[M ]_{T}^{p/2}+|A|_{ 1\text{-var}}^{p}\big]. \end{aligned}$$

This proves the first part. The same arguments imply

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equax_HTML.png

from which the second part follows. To prove the third claim, write

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equay_HTML.png

The second term is at most $2^{p-1}2^{p-1}k^{p} b_{p} \mathbb{E}_{\pi }[[M]_{T}^{p/2}+|A|_{ 1\text{-var}}^{p}]$ by the second part. It remains to estimate

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq502_HTML.gif

. Write $X=N+B$ for the semimartingale decomposition of $X$ under ℙ. By Lemma 3.1, the semimartingale decomposition under $\pi $ is still $X=N+B$. Moreover, the BDG inequality, the Lipschitz-continuity of $H$ and Hölder’s inequality imply that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equaz_HTML.png

It now follows from the first part that

$$ \mathbb{E}_{\pi }\big[\|X-Y\|_{\infty }^{2p}\big]^{1/2} \leq (2^{2p-1} b_{2p})^{1/2} \mathbb{E}_{\pi }\big[[M ]_{T}^{p}+|A|_{1\text{-var}}^{2p}\big]^{1/2}, $$

and by Lemma 3.1, we have

$$ \mathbb{E}_{\pi }\big[([N]_{T}^{p/2} + |B|_{1\text{-var}}^{p})^{2} \big]^{1/2}\leq 2^{1/2}\mathcal{AW}_{2p}(\mathbb{P},\delta _{0})^{p}. $$

Putting all estimates together and replacing $X$ and $Y$ yields the claim. □

Denote by $\mathcal{P}_{p}(\mathbb{R})$ the set of all Borel probability measures $\mu $ on ℝ such that $\int |x|^{p}\,\mu (dx)<\infty $. Moreover, let $d_{p}(\mu ,\nu )$ be the usual $p$-Wasserstein distance, and let $d_{p}^{w}$ be the weak $p$-Wasserstein cost, that is,

$$\begin{aligned} d_{p}(\mu ,\nu )&:=\inf \bigg\{ \bigg(\int |x-y|^{p} \gamma (dx,dy) \bigg)^{1/p} \!: \gamma \text{ is a coupling of }\mu \text{ and }\nu \bigg\} , \\ d_{p}^{w}(\mu ,\nu )&:=\inf \bigg\{ \bigg(\int \bigg|x-\int y\, \gamma ^{x}(dy)\bigg|^{p}\mu (dx)\bigg)^{1/p} \!: \gamma \text{ is a coupling of }\mu \text{ and }\nu \bigg\} . \end{aligned}$$

Here $\gamma =\mu (dx)\gamma ^{x}(dy)$ denotes the disintegration. Note that $d_{p}^{w}$ is not symmetric and that as a consequence of Jensen’s inequality, we always have $d_{p}^{w}\leq d_{p}$. Problems akin to $d_{p}^{w}(\mu ,\nu )$ go under the name of “weak optimal transport” and have been recently introduced by Gozlan et al. in [28], but see also Alfonsi et al. [3], Alibert et al. [4], Backhoff-Veraguas et al. [11, 9] and Gozlan et al. [27]. We have

Theorem 3.10

Let$\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$, let$\pi $be a bi-causal coupling between ℙ and ℚ, let$C\colon \Omega \to \mathbb{R}$be Lipschitz with constant$L$, and let$H\in \mathcal{H}_{k}$. Further denote by$X-Y=M+A$the semimartingale decomposition under$\pi $and let$G\in \mathcal{H}_{k}$be such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq529_HTML.gif

$\pi $-almost surely. Then

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equ10_HTML.png

(3.4)

Now assume in addition that$H_{t}\colon \Omega \to \mathbb{R}$is$\tilde{L}$-Lipschitz-continuous for every$t$. Then

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbd_HTML.png

where$\alpha $is the constant from Lemma3.9.

Proof

We start by proving the first claim. Let $\pi $ be as stated, and define

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Eqube_HTML.png

Now let $\gamma :=(b(Y),a(X))(\pi )$ so that $\gamma $ is trivially a coupling between $b(Y)(\mathbb{Q})$ and $a(X)(\mathbb{P})$. Therefore

$$\begin{aligned} d_{p}^{w}\big( b(Y)(\mathbb{Q}) , a(X) ( \mathbb{P} ) \big) &\leq \mathbb{E}_{\pi }\big[ \big| b(Y) - \mathbb{E}_{\pi }[ a(X)|b(Y)] \big|^{p} \,\big]^{1/p}. \end{aligned}$$

By assumption, it holds that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbg_HTML.png

Thus by using the tower property and Jensen’s inequality, it follows that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbh_HTML.png

The claim now follows from the first and second estimates in Lemma 3.9.

If $H$ is additionally Lipschitz, let

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbi_HTML.png

as well as $\gamma :=(e(Y),d(Y))(\pi )$. Then similarly as before,

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbj_HTML.png

and the claim follows from the first and third estimates in Lemma 3.9. □

Remark 3.11

An evident question is whether an estimate for the usual Wasserstein distance holds true without the (Lipschitz-)continuity assumption on $H$, i.e., whether (3.4) holds for $d_{p}$ instead of $d_{p}^{w}$. The following example shows that this is not true. In a two-period discrete-time model $(T=2)$, let

$$ \mathbb{P}:=\delta _{0}\otimes \big((\delta _{1}+\delta _{-1})/2\big), \qquad \mathbb{P}_{\varepsilon }:=\big((\delta _{\varepsilon }+\delta _{- \varepsilon } )/2\big)\otimes \big((\delta _{1}+\delta _{-1})/2\big) $$

so that $\mathcal{AW}_{p}(\mathbb{P}_{\varepsilon },\mathbb{P})\to 0$ as $\varepsilon \to 0$ for every $p$. Then we define $H_{1}:=0$ and $H_{2}:=1_{(0,\infty )}-1_{(-\infty ,0)}$. For the projection under any bi-causal coupling between $\mathbb{P}_{\varepsilon }$ and ℙ of $H$ onto $Y$, one computes $G_{1}=0$ and $G_{2}=0$. In particular,

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq556_HTML.gif

ℙ-almost surely. However, for every $\varepsilon >0$, one has

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq558_HTML.gif

which implies that the respective laws cannot converge.

4 Proofs of the results stated in the introduction and extensions

Thanks to the work done in the previous section, the strategy for the proofs boils down to two parts. In the first step, one forgets about the space $\Omega $ and only focuses on continuity of the problem at hand with respect to $d_{p}$ or $d_{p}^{w}$ when image measures on ℝ are plugged in; e.g. in utility maximisation, this means to study continuity of $\mu \mapsto \int U(x)\,\mu (dx)$. In the second step, one uses the obtained continuity and the contraction result in Theorem 3.10.

4.1 Proof of Theorem 1.5

We need the following elementary estimate.

Lemma 4.1

Let$\mu ,\nu \in \mathcal{P}_{1}(\mathbb{R})$and let$f\colon \mathbb{R}\to \mathbb{R}$be convex and Lipschitz. Then

$$\begin{aligned} \int f(x)\mu (dx)-\int f(y)\,\nu (dy)\leq L\, d^{w}_{1}(\mu ,\nu ), \end{aligned}$$

(4.1)

where$L$is the Lipschitz constant of$f$.

Proof

Let $\gamma $ be a coupling of $\mu $ and $\nu $. Applying Jensen’s inequality, we obtain

$$\begin{aligned} \int f(x)\,\mu (dx)-\int f(y)\,\nu (dy) &=\int \big(f(x)-f(y)\big)\, \gamma (dx,dy) \\ &=\int \bigg(f(x) -\int f(y)\,\gamma ^{x}(dy)\bigg)\,\mu (dx) \\ &\leq \int \bigg( f(x) - f\Big(\int y \,\gamma ^{x}(dy)\Big)\bigg) \,\mu (dx) \\ &\leq L\int \bigg| x- \int y\,\gamma ^{x}(dy)\bigg| \mu (dx). \end{aligned}$$

As $\gamma $ was arbitrary, this implies the claim. □

In fact, there is equality in the previous lemma if one takes the supremum on the left-hand side of (4.1) over all $L$-Lipschitz convex functions; this is shown in Gozlan et al. [28, Proposition 3.2].

We now turn to the

Proof of Theorem 1.5

For $n>0$, let $\pi $ be a bi-causal coupling which attains the infimum in the definition of $\mathcal{AW}_{1}(\mathbb{P},\mathbb{Q})$ up to $1/n$. By Lemma 3.7, there is $G^{n}\in \mathcal{H}_{k}$ such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq577_HTML.gif

$\pi $-almost surely. Define

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbm_HTML.png

(Note that $\mu ^{n},\nu \in \mathcal{P}_{1}(\mathbb{R})$ as $\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{1}(\Omega )$.) By Lemma 4.1, we have

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbn_HTML.png

From Theorem 3.10, we obtain

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equ12_HTML.png

(4.2)

Assume first that $\mathbb{E}_{\mathbb{Q}}[[Y]_{T}]<\infty $ and denote by $A$ the finite variation process associated to $Y$.Then, as $(G^{n})$ is uniformly in $n$ bounded by $k$, there exist a predictable $G$ and a sequence of forward-convex combinations of $(G^{n})$ which converge in $L^{2}(d\mathbb{Q}\otimes d([Y]+A))$ to $G$. This, (4.2) and the convexity of $(\cdot )^{+}$ lead to the desired conclusion. The general case follows by a simple but notationally heavy localisation argument.

The proof in case that $G=H$ and $H$ is Lipschitz follows analogously from the second part of Theorem 3.10. □

4.2 Proof of Theorem 1.6

In a first step notice that for all $\mathbb{P},\mathbb{P}'$ and random variables $Z,Z'$, it follows as in Lemma 4.1 that

$$ \mathrm{AVaR}_{\alpha }^{\mathbb{P}}(Z)-\mathrm{AVaR}_{\alpha }^{\mathbb{P}'}(Z') \leq d_{1}^{w}\big(Z(\mathbb{P}),Z'(\mathbb{P}')\big)/\alpha . $$

Indeed, if $\gamma $ is a coupling from $\mu :=Z(\mathbb{P})$ to $\nu :=Z'(\mathbb{P}')$, then

$$\begin{aligned} &\mathrm{AVaR}_{\alpha }^{\mathbb{P}}(Z)-\mathrm{AVaR}_{\alpha }^{ \mathbb{P}'}(Z') \\ &= \inf _{m}\int \bigg( \frac{1}{\alpha }(x-m)^{+}-m \bigg)\mu (dx) - \inf _{m} \frac{1}{\alpha }\int \bigg(\int (y-m)^{+}\gamma ^{x}(dy)-m \bigg)\nu (dy) \\ &\leq \sup _{m} \frac{1}{\alpha }\int \big((x-m)^{+}-(y-m)^{+} \big) \gamma (dx,dy) \\ &\leq \sup _{m} \frac{1}{\alpha }\int \bigg((x-m)^{+}-\Big(\int y\, \gamma ^{x}(dy)-m\Big)^{+} \bigg)\mu (dx) \\ &\leq \frac{1}{\alpha }\int \bigg|x- \int y \, \gamma ^{x}(dy) \bigg| \mu (dx), \end{aligned}$$

so that minimising over $\gamma $ yields the claim.

The rest of the proof now follows the same line of argumentation as in the proof of Theorem 1.5. Fix $\mathbb{P},\mathbb{Q}\in \mathcal{SM}_{1}(\Omega )$. Assume (only for notational simplicity) that there exists a bi-causal coupling $\pi $ which attains the infimum in the definition of $\mathcal{AW}_{1}(\mathbb{P},\mathbb{Q})$, and that there exist $H^{\ast }\in \mathcal{H}_{k}$ such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbq_HTML.png

By Lemma 3.7, there is $G^{\ast }\in \mathcal{H}_{k}$ such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq605_HTML.gif

$\pi $-almost surely. Therefore

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbr_HTML.png

where the last inequality is due to Theorem 3.10. Interchanging the role of ℙ and ℚ yields the desired conclusion. The proof for the second estimate follows analogously. □

4.3 Proof of Example 1.7

First note that $\mathrm{AVaR}^{\mathbb{P}}_{\alpha }(Z)\geq \mathbb{E}_{\mathbb{P}}[Z]$ for every integrable random variable $Z$. Indeed, this follows from integrating the pointwise inequality

$${x=x+m-m\leq (x+m)^{+}/\alpha -m}.$$

Therefore, as the Brownian stochastic integral has expectation zero, we conclude that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq609_HTML.gif

. On the other hand, define

$$ f(t,x):=\int C(x+y) \, \mathcal{N}\big(0,\sigma ^{2}(T-t)\big)(dy) \qquad \text{for } (t,x)\in [0,T]\times \mathbb{R}, $$

where $\mathcal{N}(0,\sigma ^{2}(T-t))$ stands for the normal distribution with mean 0 and variance $\sigma ^{2}(T-t)$. Then $C(X)=f(T,X_{T})$ and $\mathbb{E}_{\mathbb{P}}[f(t,X_{t})|\mathcal{F}_{s}]= f(s,X_{s})$ for every $0\leq s\leq t\leq T$. Thus Itô’s formula and the fact that the martingale property implies that the finite variation part vanishes imply $f(T,X_{T})=f(0,0)+ (H^{\ast }(X)\cdot X)_{T}$ for the predictable trading strategy $H^{\ast }_{t}:=\partial _{x} f(t,X_{t})$. As further $|H^{\ast }_{t}|\leq 1$ for every $t$ and $f(0,0)=\sigma /\sqrt{2\pi }$, one has

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbu_HTML.png

The proof now follows from the explicit formula for the adapted Wasserstein distance derived in Example 3.4 and the fact that $\mathbb{E}_{\mathbb{P}}[C(X)]=\sigma /\sqrt{2\pi }$. □

4.4 Proof of Theorem 1.8

Recall that we have $U'(x)\leq c(1+|x|^{p-1})$ for all $x\in \mathbb{R}$ and some constant $c$. Let $\mathbb{P}, \mathbb{Q}\in \mathcal{SM}_{p}(\Omega )$ be arbitrary and assume (only for notational simplicity) that there is $H^{\ast }\in \mathcal{H}_{k}$ such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbv_HTML.png

and that there is a bi-causal coupling $\pi $ between ℙ and ℚ which is optimal for $\mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})$. Due to Lemma 3.7, there exists $G^{\ast }\in \mathcal{H}_{k}$ with the property that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq629_HTML.gif

$\pi $-almost surely. Let

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbw_HTML.png

and let $\gamma $ be an (almost) optimal coupling for $d_{p}^{w}(\mu ,\nu )$. As $U$ is concave and increasing, we have $U(y)-U(x)\leq U'(\min \{x,y\})|x-y|$. Using Jensen’s inequality for the concave function $U$ gives

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbx_HTML.png

where we used Hölder’s inequality in the last line and $q$ denotes the conjugate exponent of $p$, i.e., $1/p+1/q=1$. As $q(p-1)=p$, the growth assumption on $U'$ implies that $|U'(\min \{x,y\})|^{q}\leq c(1+|x|^{p}+|y|^{p})$ for some (new) constant $c$. Then by Lemma 3.9, we have

$$\begin{aligned} &\int \bigg|U'\bigg(\min \bigg\{ x,\int y\,\gamma ^{x}(dy)\bigg\} \bigg)\bigg|^{q} \,\mu (dx) \\ &\leq c\bigg( 1+ \int |x|^{p} \,\mu (dx)+ \int \bigg|\int y\,\gamma ^{x}(dy) \,\bigg|^{p}\,\mu (dy)\bigg) \\ &\leq c\bigg( 1+ \int |x|^{p} \,\mu (dx)+ \int |y|^{p}\,\nu (dy) \bigg) \\ &\leq \tilde{c} \big(1+ \mathcal{AW}_{p}(\mathbb{Q},\delta _{0})^{p} + \mathcal{AW}_{p}(\mathbb{P},\delta _{0})^{p} \big) =:e \end{aligned}$$

for $e:=\tilde{c}(1+R^{p}+R^{p})$. Exchanging the roles of ℙ and ℚ and using Theorem 3.10 completes the proof. □

4.5 Proof of Theorem 1.9

In a first step, we claim that $v(\mathbb{P})$ is uniformly bounded over all ℙ which satisfy $\mathcal{AW}_{p}(\mathbb{P}, \delta _{0})\leq R$. Indeed, using the growth assumption on $U$, the fact that $U$ is stricly increasing and the BDG inequality to control the $p$th moment of

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq649_HTML.gif

, it follows that there exist $a,A\in \mathbb{R}$ such that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equ13_HTML.png

(4.3)

for all ℙ with $\mathcal{AW}_{p}(\mathbb{P},\delta _{0})\leq R$. Now assume that there exists a sequence $(\mathbb{P}_{n})$ with $\mathcal{AW}_{p}(\mathbb{P}_{n},\delta _{0})\leq R$, but $v(\mathbb{P}_{n})\to \infty $. Then using the BDG inequality once more, it follows that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equbz_HTML.png

which is a contradiction to (4.3). The case $v(\mathbb{P}_{n})\to -\infty $ is excluded analogously.

At this point, using the definition of $v(\mathbb{P})$, a twofold application of Theorem 1.8 yields

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equca_HTML.png

Indeed, while a direct application of the theorem would give a constant $K$ which depends on $v(\mathbb{P})$, an inspection of its proof shows that the constant $K$ depends only on the size of $v(\mathbb{P})$. By the first step, this is bounded uniformly over ℙ with $\mathcal{AW}_{p}(\mathbb{P},\delta _{0})\leq R$.

Now let $\varepsilon >0$ and $H\in \mathcal{H}_{k}$ be arbitrary and set

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq664_HTML.gif

. Then it follows that there is some constant $c>0$ (depending on $R\geq \mathcal{AW}_{p}(\mathbb{Q},\delta _{0})$ and $U$) such that

$$ \mathbb{E}_{\mathbb{Q}}[U(Y+\varepsilon )] = \mathbb{E}_{\mathbb{Q}}[U(Y)] +\mathbb{E}_{\mathbb{Q}}\bigg[ \int _{Y}^{Y+\varepsilon } U'(z)\,dz \bigg] \geq \mathbb{E}_{\mathbb{Q}}[U(Y)] +\varepsilon c. $$

Indeed, this would follow directly if $Y$ were bounded by a fixed constant, but readily extends to the present setting as $\mathbb{E}_{\mathbb{Q}}[|Y|^{p}]\leq C$ for some constant $C>0$, independent of $H$ and ℚ, as long as $\mathcal{AW}_{p}(\mathbb{Q},\delta _{0})\leq R$. In a similar manner, we also obtain $\mathbb{E}_{\mathbb{Q}}[U(Y-\varepsilon )]\leq \mathbb{E}_{\mathbb{Q}}[U(Y)] -\varepsilon c$. Putting everything together yields

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equcc_HTML.png

for some $\varepsilon < \tilde{C} \mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})$, where $\tilde{C}$ is a new constant emerging from $K$ and $c$. Thus $|v(\mathbb{Q})-v(\mathbb{P})|\leq \varepsilon \leq \tilde{C} \mathcal{AW}_{p}(\mathbb{P},\mathbb{Q})$ which completes the proof. □

4.6 Two generalisations

The following two results can be proved using almost the same arguments as in the proofs of Theorems 1.6 and 1.8. In particular, the proofs boil down to establishing convergence for image measures with respect to $d_{p}$ and give no new insight on adapted Wasserstein distances, so we skip them.

Proposition 4.2

Let$\ell \colon \mathbb{R}\to \mathbb{R}_{+}$be a convex and strictly increasing function and let$\delta >0$. Assume that$p\geq 1$is such that$\ell '(x)\leq c(1+|x|^{p-1})$for some constant$c$. Then for every Lipschitz-continuous function$C\colon \Omega \to \mathbb{R}$, the function

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equcd_HTML.png

is continuous on$(\mathcal{SM}_{p}(\Omega ),\mathcal{AW}_{p})$.

Let $\rho $ be a law-invariant risk measure which we directly view as a functional from $\mathcal{P}_{p}(\mathbb{R})$ to ℝ. For $\mathbb{P}\in \mathcal{SM}_{p}(\Omega )$ and a random variable $Z\colon \Omega \to \mathbb{R}$ such that $Z( \mathbb{P})\in \mathcal{P}_{p}(\mathbb{R})$, we write $\rho ^{\mathbb{P}}(Z)=\rho (Z(\mathbb{P}))$. A typical example of a law-invariant risk measure which satisfies $\rho (\mu )-\rho (\nu )\leq L d^{w}(\mu ,\nu )$ for some constant $L$ depending on the $p$th moment of $\mu $ and $\nu $ is the optimised certainty equivalent, introduced to the mathematical finance community in Ben-Tal and Teboulle [17]. For a convex, increasing function $\ell \colon \mathbb{R}\to \mathbb{R}$ which is bounded from below and satisfies $\ell (x)/x\to \infty $ as $x\to \infty $, the optimised certainty equivalent is defined via

$$ \rho ^{\mathbb{P}}(Z) :=\inf _{m\in \mathbb{R}} \big( \mathbb{E}_{\mathbb{P}}[\ell (Z-m)]+m \big) =\inf _{m\in \mathbb{R}} \bigg( \int \ell (x-m)\,\big(Z( \mathbb{P})\big)(dx)+m \bigg). $$

If $\ell '(x)\leq c(1+|x|^{p-1})$, then it follows that the infimum over $m$ can be taken in some compact set depending on the $p$th moments. Due to cash additivity of $\rho $, the following proposition has the same interpretation as Theorem 1.6.

Proposition 4.3

Assume that$\rho \colon \mathcal{P}_{p}(\mathbb{R})\to \mathbb{R}$satisfies$\rho (\mu )-\rho (\nu )\leq L d^{w}(\mu ,\nu )$for some constant$L$depending on the$p$-th moment of$\mu $and$\nu $. Then for every Lipschitz function$C\colon \Omega \to \mathbb{R}$, the mapping

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equcf_HTML.png

is locally Lipschitz-continuous on$(\mathcal{SM}_{p}(\Omega ), \mathcal{AW}_{p})$.

Finally, let us point out that although it is not a convex risk measure, Value-at-Risk ($\mathrm{VaR}$) would be another natural candidate to study continuity. However, as $\mathrm{VaR}$ is not continuous with respect to weak convergence, continuity of the mapping

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_IEq715_HTML.gif

fails already in a one-period model.

5 Final remarks

The first two remarks illustrate situations where using the usual Wasserstein distance does not work.

Remark 5.1

We note that convergence in the usual Wasserstein distance is not sufficient to obtain continuity in any of the problems we study in this paper. Consider a two-period market with

$$\begin{aligned} \mathbb{P}_{n}&=\frac{1}{4} \big( \delta _{(1/n,1)} +\delta _{(1/n,0)} + \delta _{(-1/n,0)} + \delta _{(-1/n,-1)} \big), \\ \mathbb{P}&=\frac{1}{4} \big( \delta _{(0,1)} +2\delta _{(0,0)} + \delta _{(0,-1)} \big). \end{aligned}$$

Then ℙ and each $\mathbb{P}_{n}$ satisfy the classical no-arbitrage condition, unlike the situation described in Fig. 1. While $(\mathbb{P}_{n})$ converges to ℙ in the usual Wasserstein distance, one can verify that convergence in the nested distance does not hold. For example, in utility maximisation of the trivial claim $C\equiv 0$, we have

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equch_HTML.png

by Jensen’s inequality (as $X$ is a martingale under ℙ). For $\mathbb{P}_{n}$, taking the strategy $H^{\ast }$ consisting of $H^{\ast }_{0}=0$ and $H^{\ast }_{1}(x)=k\mathop{\mathrm{sign}}(x)$, one gets

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equci_HTML.png

showing the lack of continuity.

Remark 5.2

As explained in the introduction, the objective in Theorem 1.5 can be seen as a relaxed version of the superhedging problem. The reason to consider this relaxation is not a technical simplification, but necessary to obtain continuity without further assumptions. Indeed, the problem of superhedging,

inf {m \in R : there is H \in H_{k} such that m + {(H X)}_{T} \geq C (X) P -almost surely}

is not continuous in ℙ with respect to the adapted distance for any $k\in [0,\infty ]$. In fact, this already happens in one period, where the adapted and the usual Wasserstein distances coincide. Consider a sequence $(\mathbb{P}_{n})$ of measures with full support which converge weakly to a measure ℙ. Then the superhedging price with respect to $\mathbb{P}_{n}$ equals the concave envelope of $C$, while the superhedging price with respect to ℙ equals the concave envelope of $C$ restricted to the support of ℙ. For a recent paper on this problem in one period, see the work of Obłój and Wiesel [45].

The final remark illustrates that uniformly bounded strategies are necessary.

Remark 5.3

Similarly as in Remark 5.2, the restriction to trading strategies in $\mathcal{H}_{k}$ (i.e., uniformly bounded strategies) is also not merely a technical simplification. For example, in a one-period framework, the measures $\mathbb{P}^{\varepsilon }:=(1-\varepsilon )\delta _{(0,\varepsilon )}+ \varepsilon \delta _{(0,-\varepsilon )}$ converge to $\mathbb{P}:=\delta _{(0,0)}$ in every (adapted) Wasserstein distance. However, we have for small $\varepsilon >0$ that

https://static-content.springer.com/image/art%3A10.1007%2Fs00780-020-00426-3/MediaObjects/780_2020_426_Equck_HTML.png

where $\mathcal{H}_{\infty }:=\bigcup _{k\in \mathbb{N}}\mathcal{H}_{k}$ is the set of all bounded trading strategies.

Acknowledgements

All authors are grateful to the anonymous referees whose insightful comments had a significant impact on this article.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Conditional Davis pricing

next article Option valuation and hedging using an asymmetric risk function: asymptotic optimality through fully nonlinear partial differential equations

Indeed, the arguments in the discrete and the continuous case use the same set of ideas, but the presentation is significantly less technical in the discrete case, which was an important reason to include the discrete case in the paper.

We are grateful to an anonymous referee for pointing out that we could include the stability of utility indifference pricing with respect to the adapted Wasserstein distance.

Note added in revision: improved convergence rates have been recently obtained in Backhoff-Veraguas et al. [7] for a related sample-based estimator. Together with the results of the present article, this gives statistical consistency for an empirical version of the financial problems considered.

Acciaio, B., Backhoff-Veraguas, J., Zalashko, A.: Causal optimal transport and its links to enlargement of filtrations and continuous-time stochastic optimization. Stoch. Process. Appl. 130, 2918–2953 (2020). MathSciNetMATH

Aldous, D.J.: Weak Convergence and General Theory of Processes. Unpublished monograph; Department of Statistics, University of California, Berkeley (1981). Available online at https://www.stat.berkeley.edu/~aldous/Papers/weak-gtp.pdf

Alfonsi, A., Corbetta, J., Jourdain, B.: Sampling of one-dimensional probability measures in the convex order and computation of robust option price bounds. Int. J. Theor. Appl. Finance 22, 1950002-1–1950002-41 (2019) MathSciNetMATH

Alibert, J.J., Bouchitte, G., Champion, T.: A new class of cost for optimal transport planning. Eur. J. Appl. Math. 30, 1229–1263 (2019) MathSciNetMATH

Avellaneda, M., Levy, A., Parás, A.: Pricing and hedging derivative securities in markets with uncertain volatilities. Appl. Math. Finance 2, 73–88 (1995)

Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Eder, M.: All adapted topologies are equal. Preprint (2019). Available online at https://arxiv.org/abs/1905.00368

Backhoff-Veraguas, J., Bartl, D., Beiglböck, M., Wiesel, J.: Estimating processes in adapted Wasserstein distance. Preprint (2020). Available online at https://arxiv.org/abs/2002.07261

Backhoff Veraguas, J., Beiglböck, M., Eder, M., Pichler, A.: Fundamental properties of process distances. Stoch. Process. Appl. (2020, forthcoming). https://doi.org/10.1016/j.spa.2020.03.017

Backhoff-Veraguas, J., Beiglböck, M., Huesmann, M., Källblad, S.: Martingale Benamou–Brenier: a probabilistic perspective. Ann. Probab. (2018, forthcoming). Available online at https://www.e-publications.org/ims/submission/AOP/user/submissionFile/39384?confirm=ec3bf61d

10.

Backhoff-Veraguas, J., Beiglböck, M., Lin, Y., Zalashko, A.: Causal transport in discrete time and applications. SIAM J. Optim. 27, 2528–2562 (2017) MathSciNetMATH

11.

Backhoff-Veraguas, J., Beiglböck, M., Pammer, G.: Existence, duality, and cyclical monotonicity for weak transport costs. Calc. Var. Partial Differ. Equ. 58, 203-1–203-28 (2019) MathSciNetMATH

12.

Backhoff-Veraguas, J., Pammer, G.: Stability of martingale optimal transport and weak optimal transport. Preprint (2019). Available online at https://arxiv.org/abs/1904.04171

13.

Becherer, D., Kentia, K.: Good deal hedging and valuation under combined uncertainty about drift and volatility. Probab. Uncertain. Quant. Risk 2, 13-1–13-40 (2017) MathSciNetMATH

14.

Beiglböck, M., Cox, A., Huesmann, M.: The geometry of multi-marginal Skorokhod embedding. Probab. Theory Relat. Fields 176, 1045–1096 (2020) MathSciNetMATH

15.

Beiglböck, M., Henry-Labordère, P., Penkner, F.: Model-independent bounds for option prices: a mass transport approach. Finance Stoch. 17, 477–501 (2013) MathSciNetMATH

16.

Beiglböck, M., Juillet, N.: On a problem of optimal transport under marginal martingale constraints. Ann. Probab. 44, 42–106 (2016) MathSciNetMATH

17.

Ben-Tal, A., Teboulle, M.: An old-new concept of convex risk measures: the optimized certainty equivalent. Math. Finance 17, 449–476 (2007) MathSciNetMATH

18.

Bion-Nadal, J., Talay, D.: On a Wasserstein-type distance between solutions to stochastic differential equations. Ann. Appl. Probab. 29, 1609–1639 (2019) MathSciNetMATH

19.

Bouchard, B., Nutz, M.: Arbitrage and duality in nondominated discrete-time models. Ann. Appl. Probab. 25, 823–859 (2015) MathSciNetMATH

20.

Campi, L., Laachir, I., Martini, C.: Change of numeraire in the two-marginals martingale transport problem. Finance Stoch. 21, 471–486 (2017) MathSciNetMATH

21.

Cont, R.: Model uncertainty and its impact on the pricing of derivative instruments. Math. Finance 16, 519–547 (2006) MathSciNetMATH

22.

Dolinsky, Y., Soner, H.M.: Martingale optimal transport and robust hedging in continuous time. Probab. Theory Relat. Fields 160, 391–427 (2014) MathSciNetMATH

23.

Eder, M.: Compactness in adapted weak topologies. Preprint (2019). Available online at https://arxiv.org/abs/1905.00856

24.

El Karoui, N., Jeanblanc-Picqué, M., Shreve, S.: Robustness of the Black and Scholes formula. Math. Finance 8, 93–126 (1998) MathSciNetMATH

25.

Galichon, A., Henry-Labordère, P., Touzi, N.: A stochastic control approach to no-arbitrage bounds given marginals, with an application to lookback options. Ann. Appl. Probab. 24, 312–336 (2014) MathSciNetMATH

26.

Glanzer, M., Pflug, G.C., Pichler, A.: Incorporating statistical model error into the calculation of acceptability prices of contingent claims. Math. Program. 174, 499–524 (2019) MathSciNetMATH

27.

Gozlan, N., Roberto, C., Samson, P.M., Shu, Y., Tetali, P.: Characterization of a class of weak transport-entropy inequalities on the line. Ann. Inst. Henri Poincaré Probab. Stat. 54, 1667–1693 (2018) MathSciNetMATH

28.

Gozlan, N., Roberto, C., Samson, P.M., Tetali, P.: Kantorovich duality for general transport costs and applications. J. Funct. Anal. 273, 3327–3405 (2017) MathSciNetMATH

29.

Hellwig, M.F.: Sequential decisions under uncertainty and the maximum theorem. J. Math. Econ. 25, 443–464 (1996) MathSciNetMATH

30.

Herrmann, S., Muhle-Karbe, J.: Model uncertainty, recalibration, and the emergence of delta–vega hedging. Finance Stoch. 21, 873–930 (2017) MathSciNetMATH

31.

Herrmann, S., Muhle-Karbe, J., Seifried, F.T.: Hedging with small uncertainty aversion. Finance Stoch. 21, 1–64 (2017) MathSciNetMATH

32.

Hobson, D.: Robust hedging of the lookback option. Finance Stoch. 2, 329–347 (1998) MATH

33.

Hobson, D.: Volatility misspecification, option pricing and superreplication via coupling. Ann. Appl. Probab. 8, 193–205 (1998) MathSciNetMATH

34.

Hobson, D.: The Skorokhod embedding problem and model-independent bounds for option prices. In: Carmona, R., et al. (eds.) Paris–Princeton Lectures on Mathematical Finance 2010. Lecture Notes in Math., vol. 2003, pp. 267–318. Springer, Berlin (2011)

35.

Hobson, D., Neuberger, A.: Robust bounds for forward start options. Math. Finance 22, 31–56 (2012) MathSciNetMATH

36.

Karatzas, I., Shreve, S.: Brownian Motion and Stochastic Calculus. Springer, Berlin (1988) MATH

37.

Kardaras, C., Žitković, G.: Stability of the utility maximization problem with random endowment in incomplete markets. Math. Finance 21, 313–333 (2011) MathSciNetMATH

38.

Lacker, D.: Dense sets of joint distributions appearing in filtration enlargements, stochastic control, and causal optimal transport. Preprint (2018). Available online at https://arxiv.org/abs/1805.03185

39.

Larsen, K.: Continuity of utility-maximization with respect to preferences. Math. Finance 19, 237–250 (2009) MathSciNetMATH

40.

Larsen, K., Žitković, G.: Stability of utility-maximization in incomplete markets. Stoch. Process. Appl. 117, 1642–1662 (2007) MathSciNetMATH

41.

Lassalle, R.: Causal transference plans and their Monge–Kantorovich problems. Stoch. Anal. Appl. 36, 452–484 (2018) MathSciNetMATH

42.

Lyons, T.J.: Uncertain volatility and the risk-free synthesis of derivatives. Appl. Math. Finance 2, 117–133 (1995)

43.

Mocha, M., Westray, N.: The stability of the constrained utility maximization problem: a BSDE approach. SIAM J. Financ. Math. 4, 117–150 (2013) MathSciNetMATH

44.

Obłój, J.: The Skorokhod embedding problem and its offspring. Probab. Surv. 1, 321–390 (2004) MathSciNetMATH

45.

Obłój, J., Wiesel, J.: Statistical estimation of superhedging prices. Preprint (2018). Available online at https://arxiv.org/abs/1807.04211

46.

Pflug, G.C.: Version-independence and nested distributions in multistage stochastic optimization. SIAM J. Optim. 20, 1406–1420 (2009) MathSciNetMATH

47.

Pflug, G.C., Pichler, A.: A distance for multistage stochastic optimization models. SIAM J. Optim. 22, 1–23 (2012) MathSciNetMATH

48.

Pflug, G.C., Pichler, A.: Multistage Stochastic Optimization. Springer Series in Operations Research and Financial Engineering. Springer, Berlin (2014) MATH

49.

Pflug, G.C., Pichler, A.: From empirical observations to tree models for stochastic optimization: convergence properties. SIAM J. Optim. 26, 1715–1740 (2016) MathSciNetMATH

50.

Pratelli, A.: On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation. Ann. Inst. Henri Poincaré Probab. Stat. 43, 1–13 (2007) MathSciNetMATH

51.

Prigent, J.L.: Weak Convergence of Financial Markets. Springer, Berlin (2003) MATH

52.

Rüschendorf, L.: The Wasserstein distance and approximation theorems. Z. Wahrscheinlichkeitstheor. Verw. Geb. 70, 117–129 (1985) MathSciNetMATH

53.

Weston, K.: Stability of utility maximisation in nonequivalent markets. Finance Stoch. 20, 511–541 (2016) MathSciNetMATH

54.

Wiesel, J.: Continuity of the martingale optimal transport problem on the real line. Preprint (2019). Available online at https://arxiv.org/abs/1905.04574

55.

Yamada, T., Watanabe, S.: On the uniqueness of solutions of stochastic differential equations. J. Math. Kyoto Univ. 11, 155–167 (1971) MathSciNetMATH

Title: Adapted Wasserstein distances and stability in mathematical finance
Authors: Julio Backhoff-Veraguas
Daniel Bartl
Mathias Beiglböck
Manu Eder
Publication date: 04-06-2020
Publisher: Springer Berlin Heidelberg
Published in: Finance and Stochastics / Issue 3/2020
Print ISSN: 0949-2984
Electronic ISSN: 1432-1122
DOI: https://doi.org/10.1007/s00780-020-00426-3

Springer Professional

Adapted Wasserstein distances and stability in mathematical finance

Abstract

Publisher’s Note

1 Introduction

1.1 Outline

1.2 Notation and adapted Wasserstein distances

1.3 Stability of superhedging

1.4 Stability of utility maximisation and utility indifference pricing

1.5 Structure of the paper

2 Literature

3 The adapted Wasserstein distance

3.1 Basic properties of \(\mathcal{AW}_{p}\)

3.2 Examples and explicit calculations

3.3 Choice of the “cost functional”

3.4 Stochastic integrals and a contraction principle

4 Proofs of the results stated in the introduction and extensions

4.1 Proof of Theorem 1.5

4.2 Proof of Theorem 1.6

4.3 Proof of Example 1.7

4.4 Proof of Theorem 1.8

4.5 Proof of Theorem 1.9

4.6 Two generalisations

5 Final remarks

Acknowledgements

Publisher’s Note

Springer Professional

Abstract

Publisher’s Note

1 Introduction

1.1 Outline

1.2 Notation and adapted Wasserstein distances

1.3 Stability of superhedging

1.4 Stability of utility maximisation and utility indifference pricing

1.5 Structure of the paper

2 Literature

3 The adapted Wasserstein distance

3.1 Basic properties of \(\mathcal{AW}_{p}\)

3.2 Examples and explicit calculations

3.3 Choice of the “cost functional”

3.4 Stochastic integrals and a contraction principle

4 Proofs of the results stated in the introduction and extensions

4.1 Proof of Theorem 1.5

4.2 Proof of Theorem 1.6

4.3 Proof of Example 1.7

4.4 Proof of Theorem 1.8

4.5 Proof of Theorem 1.9

4.6 Two generalisations

5 Final remarks

Acknowledgements

Publisher’s Note

Other articles of this Issue 3/2020

Option valuation and hedging using an asymmetric risk function: asymptotic optimality through fully nonlinear partial differential equations

Realised volatility and parametric estimation of Heston SDEs

A splitting strategy for the calibration of jump-diffusion models

Fast mean-reversion asymptotics for large portfolios of stochastic volatility models

Conditional Davis pricing

Time reversal and last passage time of diffusions with applications to credit risk management