Skip to main content
Top

Open Access 11-11-2024

Quantitative Convergence of a Discretization of Dynamic Optimal Transport Using the Dual Formulation

Authors: Sadashige Ishida, Hugo Lavenant

Published in: Foundations of Computational Mathematics

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present a discretization of the dynamic optimal transport problem for which we can obtain the convergence rate for the value of the transport cost to its continuous value when the temporal and spatial stepsize vanish. This convergence result does not require any regularity assumption on the measures, though experiments suggest that the rate is not sharp. Via an analysis of the duality gap we also obtain the convergence rates for the gradient of the optimal potentials and the velocity field under mild regularity assumptions. To obtain such rates, we discretize the dual formulation of the dynamic optimal transport problem and use the mature literature related to the error due to discretizing the Hamilton–Jacobi equation.
Notes
Communicated by Lénaïc Chizat.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The dynamic optimal transport problem and its discretization
In this work, we are interested in the dynamic optimal transport problem. Given two probability measures \(\mu , \nu \) over a spatial domain \(\Omega \), it reads
$$\begin{aligned} \inf _{(\rho _t, v_t)} \int _0^1 \int _\Omega L(v_t) \, \textrm{d}\rho _t \, \textrm{d}t, \end{aligned}$$
(1.1)
where \(L : \mathbb {R}^d \rightarrow \mathbb {R}\) is a given convex function and the infimum is taken over all pairs \((\rho _t, v_t)\) of a time-dependent probability distribution and velocity field which are solutions of
$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \rho _t + \nabla \cdot (\rho _t v_t) = 0, \\ \rho _0 = \mu , \quad \rho _1 = \nu , \end{array}\right. } \end{aligned}$$
(1.2)
that is of the continuity equation with temporal boundary conditions \(\mu \) and \(\nu \). Here \(\rho _t, v_t\) are indexed by a temporal variable \(t \in [0,1]\) and \(\nabla \cdot \) stands for the divergence with respect to the spatial variable. The interpretation is that \((\rho _t)\) must join \(\mu \) and \(\nu \) while being transported by the flow of \((v_t)\), at a minimal cost. Originally introduced by Benamou and Brenier in [5] for numerical purposes as it is linked to the (static) optimal transport problem with cost \(c(x,y) = L(y-x)\), this formulation turned out to be very fruitful. From a theoretical point of view, it is a robust formulation which enables to extend and generalize the optimal transport problem: it is used for optimal transport on graphs [36], for unbalanced optimal transport [19, 30, 35], for optimal transport of matrix-valued measures [16] to cite a few extensions. From the numerical point of view, in addition to being one of the first methods proposed to solve the optimal transport problem in dimension more than one, it can be adapted to a great variety of related problems: Wasserstein gradient flow [7, 17], mean field games [6, 8], and trajectory inference [48] to mention a few.
The usual road to solve (1.1) with the constraints (1.2) is to first rewrite it as a convex problem by using the momentum \(m_t = \rho _t v_t\) as an unknown rather than \(v_t\). In this case both \(m_t\) and \(\rho _t\) are measures, and \(v_t\) is recovered as the Radon-Nikodym derivative \(\textrm{d}m_t / \textrm{d}\rho _t\) of \(m_t\) with respect to \(\rho _t\). This leads to
$$\begin{aligned} \inf _{(\rho _t, m_t)} \int _0^1 \int _\Omega L \left( \frac{\textrm{d}m_t}{\textrm{d}\rho _t} \right) \, \textrm{d}\rho _t \, \textrm{d}t, \quad \text {with constraints} \quad {\left\{ \begin{array}{ll} \partial _t \rho _t + \nabla \cdot m_t = 0, \\ \rho _0 = \mu , \quad \rho _1 = \nu . \end{array}\right. } \end{aligned}$$
(1.3)
The constraint \(\partial _t \rho _t + \nabla \cdot m_t = 0\) is now linear, and the functional to be minimized is convex as \((x,y) \mapsto L(x/y) y\) is a convex function both in x and y when extended to \(+ \infty \) for \(y \leqslant 0\), except at (0, 0) where it is 0. Then one proposes a finite dimensional version of the problem (1.3) where both the time and space variables are discretized, and solves the resulting finite dimensional convex problem with standard methods in non-smooth convex optimization. We refer to [17, 32, 38, 39, 42] for instantiations of this approach with plain optimal transport. This comes with two challenges from the viewpoint of numerical analysis:
1.
Guarantee (quantitatively) the convergence of the convex optimization solver used to solve the discretized problem.
 
2.
Guarantee (quantitatively) the convergence of the value and solutions to the discretized problem to the original one (1.3) when the temporal and spatial stepsizes vanish.
 
The first point is a question of convex optimization which is quite well understood and that we will not address referring to [29, 42]. Our main concern is rather the second question. The difficulty lies in the roughness of the functional to optimize: \((x,y) \mapsto L(x/y) y\) is only lower semi-continuous and discontinuous at (0, 0). Moreover a priori the data \(\mu , \nu \) could be allowed to be arbitrary probability distributions (even Dirac masses), and there is no regularizing effect, so that \(\rho _t\) could be a Dirac mass for every \(t \in [0,1]\). Thus one really has to face the discontinuity of the functional to optimize. Nevertheless, numerical examples in the aforementioned works suggest that convergence holds even when the measures \(\mu , \nu \) are quite rough.
In [31] building on the previous works [17, 24, 28], the second author gave an answer to the second question by providing sufficient conditions for any discretization of the dynamic optimal transport problem to indeed converge to the original problem when the temporal and spatial stepsizes vanish. This framework was later used in [39] and extended to matrix-valued optimal transport in [33, 34]. However the arguments were inspired by the theory of \(\Gamma \)-convergence and were not quantitative, that is, they did not come with a convergence rate.
The goal of this article is to provide quantitative convergence rates of a discretized version of (1.3) to the original problem, as the temporal and spatial stepsize vanish. There are few results already available: the works [17] and [38] show that their respective methods reach first order of convergence. Specifically, for h the (common) temporal and spatial stepsize, the error in the value of problem (1.3), that is, the transport cost, is of order h. The article [38] also extends this convergence to the solutions of the problem via an analysis of the dual gap. However, in both cases this rate is only available when the data \(\mu , \nu \) have smooth densities which are bounded from below, and moreover the solution \(\rho \) to the optimal transport problem needs to have a smooth density bounded from below. The latter assumption (positivity of the solution \(\rho \) everywhere) is very strong, not directly implied by \(\mu , \nu \) smooth and bounded from below [47]. Moreover, as we said above, in practice the discretizations behave well even when \(\mu , \nu \) or \(\rho \) are not bounded from below.
Let us already emphasize that this result comes with two important limitations. First we are not able to show that the rate improves if \(\mu , \nu \) have a smooth density, and the numerical experiments we conducted suggest that our rate is not sharp. To prove better convergence rates for more regular inputs, we would need a refinement of the previous result [23], which we largely rely on (see Remark 5.6).
The second point is about the efficiency of numerical computation. Our discretization is restricted to periodic boundary conditions although we can treat the transport between \(\mu ,\nu \) on a bounded domain without any theoretical limitations by taking the entire periodic domain large enough. A full extension to bounded domains for more numerical efficiency may be potentially done, but is out of the scope of the present paper (see Remark 4.4).
Intuition of our proposed discretization
Our strategy to get such rates is the following. Rather than solving (1.3), we look at the dual problem which is known to be
$$\begin{aligned} \sup _{\varphi } \int _\Omega \varphi (1,\cdot ) \, \textrm{d}\nu - \int _\Omega \varphi (0, \cdot ) \, \textrm{d}\mu \quad \text {with constraint} \quad \partial _t \varphi + H(\nabla \varphi ) \leqslant 0, \end{aligned}$$
(1.4)
where \(\varphi = \varphi (t,x)\) for \(t \in [0,1]\) and \(x \in \Omega \) is a time-space dependent scalar function which is subsolution of the Hamilton–Jacobi equation \(\partial _t \varphi + H(\nabla \varphi ) = 0\), being H the Legendre transform of L.
Discretizations of the Hamilton–Jacobi equation in a finite difference fashion have been studied in the context of viscosity solutions starting with the seminal work of Crandall and Lions [23] and the rate of convergence is proved \(\sqrt{h}\) under fairly general assumptions. This holds specifically for the initial value problem
$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \varphi + H(\nabla \varphi ) = 0, \\ \varphi (0,\cdot ) \text { given}, \end{array}\right. } \end{aligned}$$
for the Hamilton–Jacobi equation, importantly under an a priori Lipschitz bound on \(\varphi (0, \cdot )\). In the context of optimal transport, an optimal solution \(\varphi \) to the dual problem (1.4) is actually known to be Lipschitz in space, with Lipschitz constant depending on the diameter of \(\Omega \) but independent of \(\mu , \nu \) (see Proposition 2.2). In our analysis, we also need to ensure that an optimal solution to the discretized version of (1.4) is also Lipschitz. As it cannot be guaranteed a priori, we add Lipschitz continuity at \(t=0\) as an additional constraint in our discretized problem; see Definition 3.2 and in particular the constraint (3.4).
A brief comment on rates for other numerical methods
Dynamic optimal transport is not the only way to solve the optimal transport problem, we refer to [43] and references therein for a comprehensive introduction. When faced with the linear programming formulation and its entropic regularization, the standard setting is to assume that measures are approximated via i.i.d. samples and rates should be understood in a statistical setting (that is, written in probability or in expectation as a function of the sample size) rather than a numerical analysis one (that is, written as a deterministic function of the stepsizes). We refer to [40] and [37] as well as references therein for results in this direction. In semi-discrete optimal transport, that is, when only one of the two measures is discrete, rates have been investigated both in a statistical setting [2], but also in a more standard setting of numerical analysis, related to the stability of the Monge-Ampère equation [12]. As far as PDEs methods are concerned, in addition to dynamic optimal transport, one could also solve the Monge-Ampère equation to compute a transport map [11]. The convergence when the resolution of discretization is reduced has been established in the context of viscosity solutions [10]; see also [9, 13]. Note that these convergence results concern a different method, so comparison with the present work is hard to do. We only point out that the results are not quantitative and typically need to assume some regularity of the input measures \(\mu , \nu \) (like absolute continuity) in order to be able to write the Monge-Ampère equation in the first place.
Organization
We first define properly the dynamic optimal transport problem, and we present the Hamilton–Jacobi equation together with its finite difference discretization: this is a well-understood theory that we summarize in Sect. 2. We move to our proposed discretization in Sect. 3. Our main result, the quantitative convergence rate of the optimal transport cost, is presented in Sect. 4. With a standard analysis of the duality gap, we prove, under an additional regularity assumption, that we obtain quantitative convergence of some variables (the velocity field v and \(\nabla \varphi \)) from their discrete to their continuous counterpart in Sect. 5. We numerically illustrate our results on simple one-dimensional test cases in Sect. 6: this shows that our rates are likely not sharp.

2 Settings and Preliminaries

In this section, we present our setting and review the previous works that are going to be ingredients of our discretization of dynamic optimal transport.
Assumptions
We assume the following in the rest of the article.
  • We restrict to \(\Omega := \mathbb {R}^d/ (D\mathbb {Z}^d) \) to be a d-dimensional torus with diameter
    $$\begin{aligned} \operatorname {diam}(\Omega ) = \frac{\sqrt{d}}{2} D. \end{aligned}$$
  • The measures \(\mu , \nu \) are Borel probability measures on \(\Omega \) and we do not make any assumption such as absolute continuity unless otherwise stated.
  • We take \(L : \mathbb {R}^d \rightarrow [0, + \infty )\), the “Lagrangian” which is a non-negative, strictly convex and superlinear function.
As for the first point about the domain, we could assume \(D=1\), or \(\operatorname {diam}(\Omega )=1\) without loss of generality. However we prefer to keep it that way to emphasize how some constants depend on \(\operatorname {diam}(\Omega )\). The last condition for the Lagrangian includes the most common choice \(L(v) = |v|^p/p\) for \(p \in (1, + \infty )\).

2.1 Dynamic (and Static) Optimal Transport

We briefly recall some ingredients of the dynamic optimal transport problem. We refer to the textbooks [43, 45, 49] for additional details.
We directly move to the convex formulation already mentioned in (1.3) where we now define each term. The infimum will run over all the pairs \((\rho _t, m_t)_{t \in [0,1]}\) of probability distributions \(\rho _t \in \mathcal {P}(\Omega )\) and vector-valued measure \(m_t \in \mathcal {M}(\Omega )^d\) such that \(t \mapsto \rho _t\) is continuous for the weak topology with \(\rho _0 = \mu \) and \(\rho _1 = \nu \). The continuity equation
$$\begin{aligned}&\partial _t \rho _t + \nabla \cdot m_t=0, \end{aligned}$$
is meant in the sense of distributions over the space \([0,1] \times \Omega \). For the functional to be optimized, we need a bit more of notations. The so-called Benamou-Brenier functional is defined as a functional for measures \(\rho \in \mathcal {M}(\Omega )\) and \(m \in \mathcal {M}^d(\Omega )\) by,
$$\begin{aligned} \mathcal {B}(\rho ,m):=\sup _{(a,b)} \int _\Omega a \ \textrm{d}\rho + \int _\Omega b \cdot \textrm{d}m \end{aligned}$$
where the pair \((a,b)\) runs through \(C_b(\Omega ; K)\) the space of continuous bounded functions valued in the convex domain \(K\) given by
$$\begin{aligned} K:=\left\{ (s,w) \in \mathbb {R}\times \mathbb {R}^d \text { such that } s+H(w)\leqslant 0\right\} . \end{aligned}$$
Here the function \(H : \mathbb {R}^d\rightarrow \mathbb {R}\) is called the Hamiltonian, given as the Legendre transform of \(L\) by \(H(w)=\sup _v \langle v,w \rangle - L(v)\). With a slight variation of [45, Proposition 7.7], it can be proved that if \(\mathcal {B}(\rho ,m) < + \infty \) then \(\rho \) is a non-negative measure and the measure m is absolutely continuous with respect to \(\rho \), and in this case
$$\begin{aligned} \mathcal {B}(\rho ,m) = \int _\Omega L(v) \, \textrm{d}\rho , \end{aligned}$$
being \(v \in L^1(\Omega ,\rho )^d\) the Radon-Nikodym density of m with respect to \(\rho \). With these notations the problem  (1.3) now reads
$$\begin{aligned} \mathcal {K}(\mu ,\nu ) = \inf _{(\rho _t,m_t)} \int _0^1 \mathcal {B}(\rho _t,m_t) \, \textrm{d}t \quad \text {such that} \quad {\left\{ \begin{array}{ll} \partial _t \rho _t + \nabla \cdot m_t = 0 \text { weakly}, \\ \rho _0 = \mu , \quad \rho _1 = \nu . \end{array}\right. } \end{aligned}$$
(2.1)
The existence of a minimizer to the problem can be shown by the direct method of calculus of variations (see e.g. [19, Sect. 2] for a proof in a more general context), or alternatively by building it via an optimal solution for the static primal problem (2.4) introduced below as in (2.5). We do not include the proof as it is not our main concern.
Theorem 2.1
(Existence of a solution in the primal problem) Under our assumptions the infimum in (2.1) is attained.
The question of uniqueness is more subtle and one would go to the static problem introduced below to analyze it. The outcome is that the minimum may not be unique, but it is so if at least one of the two measures \(\mu , \nu \) is absolutely continuous with respect to the Lebesgue measure [45, Theorem 1.25].
Being a convex optimization problem under constraint, the dynamic optimal transport problem has a dual form. It reads
$$\begin{aligned} \mathcal {K}(\mu ,\nu )=\sup _{\varphi } \int _\Omega \varphi (1,\cdot ) \, \textrm{d}\nu - \int _\Omega \varphi (0,\cdot ) \, \textrm{d}\mu \quad \text {such that} \quad \partial _t \varphi +H(\nabla \varphi )\leqslant 0 \nonumber \\ \end{aligned}$$
(2.2)
where \(\varphi \) runs over Lipschitz functions defined over \([0,1] \times \Omega \), and the Hamilton–Jacobi constraint \(\partial _t \varphi +H(\nabla \varphi )\leqslant 0\) means that \(\varphi \) is a viscosity subsolution of \(\partial _t \varphi +H(\nabla \varphi )= 0\) as we explain below in Sect. 2.2. Here \(\varphi \) should be interpreted as a Lagrange multiplier for the continuity equation. The key result is that, not only a solution to the dual problem exists, but it has some Lipschitz regularity.
As L is convex, it is Lipschitz on bounded domains and we denote by \(\operatorname {Lip}(L, B_r)\) its Lipschitz constant on the ball \(B_r\) centered at 0 and of radius r. On the other hand, \(\textrm{Lip}(\psi )\) for a real-valued function \(\psi \) stands for its Lipschitz constant on its whole domain of defintion.
Proposition 2.2
(Existence of a solution in the dual problem) Under our assumptions there exists an optimal potential \({\overline{\varphi }}\) in the dual dynamic transport problem satisfying
$$\begin{aligned} \operatorname {Lip}({\overline{\varphi }}(t,\cdot )) \leqslant \operatorname {Lip}(L, B_{\operatorname {diam}(\Omega )}), \quad \forall t \in [0,1]. \end{aligned}$$
(2.3)
We delay the proof of this result until the next section as we need additional preliminary results, including the static formulation.
The dynamic formulation of optimal transport is to be contrasted with its static one, which we introduce for the sake of completeness, and with which the reader may be more familiar. The cost function \(c : \Omega \times \Omega \rightarrow [0, + \infty )\) associated to our Lagrangian L is
$$\begin{aligned} c(x,y) := \inf _k \{ L(y-x-k), \, k \in D \mathbb {Z}^d \} \end{aligned}$$
as we are on the torus. The transport cost (2.1) is actually equal to:
$$\begin{aligned} \mathcal {K}(\mu ,\nu ):=\inf _{\pi \in \operatorname {ADM}(\mu ,\nu )}\int _{\Omega \times \Omega }c(x,y) \, \textrm{d}\pi (x,y) \end{aligned}$$
(2.4)
where \(\operatorname {ADM}(\mu ,\nu )\) is the set of the probability measures on \(\Omega \times \Omega \) whose marginals are \(\mu \) and \(\nu \). Given a solution \(\pi \) to the static optimal transport problem, one can construct a solution to the dynamic optimal transport problem as follows: for any \(t \in [0,1]\), being \(\Gamma _t : (x,y) \in \Omega \rightarrow (1-t)x + ty \in \Omega \) (where the addition is understood modulo \(D \mathbb {Z}^d\)), one sets for any Borel set A,
$$\begin{aligned} \rho _t(A) = \pi (\Gamma _t^{-1}(A)), \qquad m_t(A) = \int _{\Gamma _t^{-1}(A)} (y-x) \, \textrm{d}\pi (x,y). \end{aligned}$$
(2.5)
The intuition is that, once its initial and final position are chosen via the coupling \(\pi \), each particle travels at constant speed on a straight line. Finally, we introduce the dual formulation of the static optimal transport problem. It reads
$$\begin{aligned} \mathcal {K}(\mu ,\nu )=\sup _{\psi } \int _\Omega \psi ^c \, \textrm{d}\nu + \int _\Omega \psi \, \textrm{d}\mu \end{aligned}$$
(2.6)
where \(\psi ^c\) is the \(c-\)transform of \(\psi \) given by \(\psi ^c(y):=\inf _x c(x,y) - \psi (x)\) and \(\psi \) runs through functions on \({\Omega }\) such that \(\psi =\phi ^c\) for some \(\phi \in C(\Omega )\). When a function \({\overline{\psi }}\) attains the maximum, it is called a Kantorovich potential. The link between the static dual problem and the dynamic dual problem will be made clear in the next section.

2.2 Viscosity Solutions to the Hamilton–Jacobi Equation

We briefly review the notion of viscosity solutions as it plays a central role in this article and highlights PDE aspects of optimal transport. The Hamilton–Jacobi equations often do not admit a classical solution, but naive notions of weak solution such as continuous functions differentiable almost Lebesgue everywhere can be too weak so that infinitely many solutions may exist. To ensure both existence and weakness, Evans [25] and Crandall and Lions [22] independently introduced so-called viscosity solutions. For a Hamilton–Jacobi equation of the form
$$\begin{aligned} \partial _t \varphi + H(\nabla \varphi )=0 \end{aligned}$$
on a domain \([0,1]\times \Omega \), a Lipschitz function \(\varphi \) is said to be a viscosity subsolution (resp. supersolution) if, for any test function \(f\in C^1([0,1] \times \Omega )\), any local maximum \(x_0\) of \(f-\varphi \) (resp. local minimum) satisfies
$$\begin{aligned}&\partial _t f (x_0)+H\left( \nabla f(x_0)\right) \leqslant 0 \quad \text {(resp. } \partial _t f (x_0)+H\left( \nabla f(x_0)\right) \geqslant 0 \text {)}. \end{aligned}$$
The inequality constraint of the dynamic dual optimal transport (2.2) means that a competitor \(\varphi \) is required to be a viscosity subsolution.
When \(\varphi \) is both a viscosity subsolution and a viscosity supersolution, it is called a viscosity solution. This definition indeed guarantees the existence of a unique solution. We present here the statement of [21] summarizing the results of [22]: even though it is phrased in \(\mathbb {R}^d\), it can be adapted at no cost on \(\Omega = \mathbb {R}^d / (D \mathbb {Z}^d)\) by simply identifying a function on \(\Omega \) with a D-periodic function on \(\mathbb {R}^d\).
Theorem 2.3
(Existence and uniqueness of viscosity solution) Let \(H : \mathbb {R}^d\rightarrow \mathbb {R}\) be continuous and \(\varphi _0 : \Omega \rightarrow \mathbb {R}\) be Lipschitz. Then there exists exactly one viscosity solution to the initial value problem,
$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \varphi + H(\nabla \varphi )=0 & in [0,\infty )\times \Omega , \\ \varphi (0,\cdot )=\varphi _0 & in \Omega . \end{array}\right. } \end{aligned}$$
Moreover, the Lipschitz constant does not increase in time i.e.
$$\begin{aligned} \operatorname {Lip}(\varphi (t',\cdot )) \leqslant \operatorname {Lip}(\varphi (t,\cdot )) \quad for t'>t. \end{aligned}$$
Actually, in this simple setting where the Hamiltonian is not dependent on the spatial variable, the unique viscosity solution is characterized by the Hopf-Lax formula [26, Theorem 3 in Chapter 10.3]: with the assumptions and notations of the theorem it is equal to
$$\begin{aligned} \varphi (t,x) :=\inf _y \; \varphi _0(y)+tL\left( \frac{x-y}{t}\right) . \end{aligned}$$
(2.7)
The Hopf-Lax formula is a key ingredient to bridge the dual formulations of optimal transport in its static (2.6) and dynamic (2.2) form. This is also what enables us to prove Proposition 2.2.
Proof of Proposition 2.2
We can choose an optimal Kantorovich potential \({\overline{\psi }}\) for the static dual problem (2.6) so that it is Lipschitz continuous with \(\operatorname {Lip}({\overline{\psi }}) \leqslant \sup _y \operatorname {Lip}(c(\cdot ,y)) \leqslant \operatorname {Lip}(L, B_{\operatorname {diam}(\Omega )}) \) [45, Sect. 1.2.]. Then we define \({\overline{\varphi }}\) as the unique viscosity solution of the Hamilton–Jacobi equation with initial data \(-{\overline{\psi }}\). Observe by (2.7) that \({\overline{\varphi }}(1,\cdot )={\overline{\psi }}^c \) and hence
$$\begin{aligned} \int _\Omega {\overline{\varphi }}(1,\cdot ) \, \textrm{d}\nu - \int _\Omega {\overline{\varphi }}(0,\cdot ) \, \textrm{d}\mu = \int _\Omega {\overline{\psi }}^c \, \textrm{d}\nu + \int _\Omega {\overline{\psi }}\, \textrm{d}\mu = \mathcal {K}(\mu ,\nu ). \end{aligned}$$
Therefore, \({\overline{\varphi }}\) is not merely an admissible competitor but is an optimal potential for the dynamic dual problem (2.2). Finally Theorem 2.3 asserts that \(\operatorname {Lip}(\varphi (t,\cdot )) \leqslant \operatorname {Lip}(\varphi (0,\cdot )) = \operatorname {Lip}({\overline{\psi }}) \) for \(t\in [0,1]\). \(\square \)

2.3 Discretization of the Hamilton–Jacobi Equation

As we mentioned in the introduction, our key idea is to discretize not the primal formulation (2.1) but rather the dual formulation (2.2). Indeed discretization of the Hamilton–Jacobi equations is a widely studied topic, in particular since the seminal work of Crandall and Lions [23].
We adapt Crandall and Lions’s original setting of the domain \([0,\infty ) \times \mathbb {R}^d\) for \(Q=[0,1]\times (\mathbb {R}^d / (D \mathbb {Z}^d))\). Their results are still valid on \(Q\) without additional conditions as we can simply extend functions on \(\Omega = \mathbb {R}^d / (D \mathbb {Z}^d)\) by copying it infinite times for \(\mathbb {R}^d\).
The time domain \([0,1]\) is discretized by uniformly sampling points: we define \(T_D :=\{0, \Delta t, \ldots , (N_T-1)\Delta t,1\} \) with some positive integer \(N_T\) and \(\Delta t =1/N_T\). The space domain \(\Omega =\mathbb {R}^d / (D \mathbb {Z}^d)\) is discretized in the same way, that is we write \(\Omega _D :=\{(j_1 \Delta x, \ldots , j_d \Delta x)\}\) with indices \((j_1,\ldots , j_d)\in (\mathbb {Z}/N_X \mathbb {Z})^d :=\{0,1,\ldots ,N_X-1\}^d\) with some positive integer \(N_X\) and \(\Delta x=D/N_X\). In the sequel, we write \(j:=(j_1,\ldots , j_d)\) and \(j\Delta x:=(j_1 \Delta x, \ldots , j_d \Delta x)\) for simplicity. Hence the discretization of \(Q\) is given by \(Q_D:=T_D \times \Omega _D\).
Each continuous function \(\varphi \) is approximated by a discrete function \(\Phi : Q_D\rightarrow \mathbb {R}\) given as the evaluation of \(\varphi \) at the grid points. For the value at \((i\Delta t,j\Delta x)\), we write \(\Phi ^i_j\) or \(\Phi ^{i}_{j_1,\ldots ,j_d}\), but we will occasionally omit subscripts \(i\) and \(j\) when we mean the collection \(\{\Phi ^i_j\}^i_j\) or we do not need to specify \(i\) or \(j\). We denote by \(\mathbb {R}^Z\) the collection of functions from a set \(Z\subseteq Q_D\) to \(\mathbb {R}\), and define
$$\begin{aligned} \Vert \Psi \Vert _{L^\infty (Z)}:=\max _{z\in Z} |\Psi (z)|. \end{aligned}$$
for function \(\Psi \in \mathbb {R}^Z\).
Discretization of the Hamilton–Jacobi equation of the form \(\partial _t \varphi +H(\nabla \varphi )=0\) is given as follows. The initial state of the discrete function \(\Phi ^0\) is given on the grid points \(\{0\}\times \Omega _D\subset Q_D\) and it is time-updated as
$$\begin{aligned} \Phi ^{i+1} =\mathcal {S}( \Phi ^{i} ), \end{aligned}$$
by a map \(\mathcal {S}: \mathbb {R}^{\Omega _D} \rightarrow \mathbb {R}^{\Omega _D}\) called a scheme. We will deal with a certain class of schemes following a standard finite difference setting as in [23]. A scheme \(\mathcal {S}\) is said of difference form if there exists a function \(\mathcal {G}:\mathbb {R}^{2d}\rightarrow \mathbb {R}\) such that
$$\begin{aligned} \mathcal {S}_j( \Psi ) = \Psi _{j} - \Delta t \mathcal {G}\left( \frac{ {\Delta _{-,j}}\Psi }{\Delta x}, \frac{ {\Delta _{+,j}}\Psi }{\Delta x}\right) , \end{aligned}$$
for each \(j\) where \({\Delta _{-,j}}:=(\Delta _{-,j}^1,\ldots ,\Delta _{-,j}^d)\) takes the spatially backward difference
$$\begin{aligned} {\Delta _{-,j}^k}\Psi :=\Psi _{j_1,\ldots , j_{k}, \ldots , j_d }- \Psi _{j_1,\ldots , j_{k}-1, \ldots , j_d }, \end{aligned}$$
for each k and \({\Delta _{+,j}}:=(\Delta _+^1,\ldots ,\Delta _+^d)\) takes the forward difference
$$\begin{aligned} {\Delta _{+,j}^k}\Psi :=\Psi _{j_1,\ldots , j_k+1, \ldots , j_d }- \Psi _{j_1,\ldots , j_k, \ldots , j_d }, \end{aligned}$$
for each \(k=\{1,\ldots ,d\}\). Of course the expression \(j_k \pm 1\) is understood modulo \(N_X\) as we have periodic boundary conditions. We will omit the subscript \(j\) for \(\Delta _{\pm }\) when there is no confusion. The two important properties that a scheme can have are the following:
  • Consistency. A scheme \(\mathcal {S}\) of difference form is consistent with \(H\) if \(\mathcal {G}\) satisfies \(\mathcal {G}(a,a)=H(a)\) for any \(a\in \mathbb {R}^d\). It is equivalent to say that \(\mathcal {S}\) gives the exact solution on grid points for any time-space affine function.
  • Monotonicity. Let us define a subset of discrete space functions
    $$\begin{aligned} \mathcal {C}_R :=\left\{ \Psi \in \mathbb {R}^{\Omega _D} \text { such that } \forall k,j \, \left| \frac{\Delta ^k_{+,j} \Psi }{\Delta x} \right| \leqslant R \right\} \end{aligned}$$
    for a fixed \(R>0\). The space \(\mathcal {C}_R\) can be interpreted as the discrete counterpart of the functions which are R-Lipschitz in each coordinate. A scheme is monotone on \([-R,R]\) if for each grid point j the restriction of \(\mathcal {S}_j\) to \(\mathcal {C}_R\) is non-decreasing with respect to any of its variables.
The striking result is that these two properties guarantee not only convergence of the discrete solutions to the continuous ones, but also a quantitative rate of convergence. Crandall and Lions showed a quantitative estimate for the approximation error of such discrete solutions on a family of time-space discretizations of domain \(Q\) [23, Theorem 1]. In their work, the ratio \(\zeta :=\Delta t/ \Delta x\) within a family of discretizations is fixed. Throughout this article, we follow their setting, and for each discretization \(Q_D\) we denote its resolution by \(h:=\Delta t\).
Theorem 2.4
(Convergence of discrete Hamilton–Jacobi equation) Let \(H : \mathbb {R}^d\rightarrow \mathbb {R}\) be continuous and \(\varphi \) be the unique viscosity solution to the initial value problem,
$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \varphi +H(\nabla \varphi )=0 \quad in Q=[0,1]\times \Omega ,\\ \varphi (0,\cdot )=\varphi _0, \end{array}\right. } \end{aligned}$$
with initial data \(\varphi _0\) Lipschitz in each coordinate with Lipschitz constant \(R>0\). Let \(Q_D=T_D \times \Omega _D\) be a discretization of \(Q\) with resolution \(\Delta t = h\) and ratio \(\zeta = \Delta t/ \Delta x\) fixed. Define a discrete solution \(\Phi \in \mathbb {R}^{Q_D}\) by
$$\begin{aligned} {\left\{ \begin{array}{ll} \Phi ^{i+1}=\mathcal {S}(\Phi ^i), & i\in \{0,\ldots ,N_T-1\},\\ \Phi ^0_j:=\varphi _0(j \Delta x), & \forall j, \end{array}\right. } \end{aligned}$$
with a scheme \(\mathcal {S}\) of difference form which is consistent with \(H\) and monotone on \([-R-\delta ,R+\delta ]\) for some \(\delta > 0\).
Then we have the estimate,
$$\begin{aligned} \Vert \Phi - \varphi \Vert _{L^\infty (Q_D)}\leqslant C \sqrt{h} \end{aligned}$$
with a constant \(C\) depending only on the scheme \(\mathcal {S}\), \(\Vert \varphi _0\Vert _{L^\infty (Q)}, R+\delta \), and \(H\).
This theorem will be the main result we will use to get our quantitative convergence rates for the optimal transport problem.
Remark 2.5
The theorem is usually stated with \(\delta = 1\), that is, the scheme should be monotone on \([-R-1,R+1]\), but a close inspection of the proof in [23] reveals that any \(\delta > 0\) is possible.
Remark 2.6
(Lipschitz in each coordinate and why we restrict to a periodic setting) We have done another modification compared to the classical statement of the theorem. We assume that \(\varphi _0\) is R-Lipschitz in each coordinate instead of simply R-Lipschitz, that is, for every k and \((x_1, \ldots x_{k-1}, x_{k+1}, \ldots ,x_d) \in \mathbb {R}^{d-1} / (D \mathbb {Z}^{d-1})\) the function
$$\begin{aligned} x \mapsto \varphi _0(x_1, \ldots x_{k-1}, x ,x_{k+1}, \ldots ,x_d) \end{aligned}$$
is R-Lipschitz. A R-Lipschitz function is R-Lipschitz in each coordinate, and a function which is R-Lipschitz in each coordinate is \(\sqrt{d} R\)-Lipschitz in the classical sense. Functions which are R-Lipschitz in each coordinate are the perfect analogue of the space \(\mathcal {C}_R\) of discrete functions, and this will be useful in the proof of our Theorem 4.1.
An important step in the proof of the theorem is the propagation of the regularity for the discrete solution: one can prove that if \(\Phi ^0 \in \mathcal {C}_R\) then
$$\begin{aligned} \Phi ^i \in \mathcal {C}_R, \quad \forall i. \end{aligned}$$
(2.8)
It is obtained by combining two simple arguments. The first one is that, as the scheme commutes with the addition with constant functions and is monotone, it is non-expansive in \(L^\infty \):
$$\begin{aligned} \Vert \mathcal {S}(\Psi ) - \mathcal {S}(\Psi ')\Vert _{L^\infty (\Omega _D)} \leqslant \Vert \Psi - \Psi '\Vert _{L^\infty (\Omega _D)} \end{aligned}$$
for any pair of \(\Psi \) and \(\Psi '\) in \(\mathcal {C}_R\) [23, Proposition 3.1]. The second one is to apply the non-expansiveness to \(\Psi '\), a shifted version of \(\Psi \). As a shift in space commutes with the discrete differential operators \(\Delta ^k_+, \Delta ^k_-\) for fixed k, we obtain
$$\begin{aligned} \Vert \Delta _+^k\mathcal {S}(\Psi )\Vert _{L^\infty (\Omega _D)}\leqslant \Vert \Delta _+^k\Psi \Vert _{L^\infty (\Omega _D)}, \end{aligned}$$
and from there an immediate induction gives (2.8). Note that the same argument at the continuous level gives that, if \(\varphi _0\) is R-Lipschitz in each coordinate, then so is \(\varphi (t,\cdot )\) for any \(t \geqslant 0\).
Importantly, this regularizing effect of the scheme does not work if the spatial domain is no longer the torus nor the whole Euclidean space: the first estimate stays valid but the second argument about the commutativity breaks down on the boundary. For this reason, our work is phrased on a periodic domain rather than on a bounded domain: we will not rely on the estimate (2.8) in itself, but this estimate is necessary in order for Theorem 2.4 to be true.

2.3.1 Vanishing Viscosity Scheme

An example of monotone and consistent schemes presented in [23] is the so-called vanishing viscosity scheme, which is a discrete analogue of the vanishing viscosity method for the continuous Hamilton–Jacobi equation. For a discrete space function \(\Psi \in \mathbb {R}^{\Omega _D}\), it is given by
$$\begin{aligned} \mathcal {S}(\Psi )=\Psi -\Delta t \left\{ H(\nabla _{D} \Psi ) - \varepsilon \Delta _D \Psi \right\} , \end{aligned}$$
(2.9)
with some \(\varepsilon >0\). Here \(\nabla _D, \Delta _D\) are respectively the discrete centered gradient and the discrete Laplacian given by,
$$\begin{aligned} \nabla _{D,j}\Psi :=\frac{ \Delta _{-,j}\Psi + {\Delta _{+,j}\Psi }}{2\Delta x}, \qquad \Delta _{D,j}\Psi :=\sum _k \frac{{\Delta _{+,j}^k}\Psi -{\Delta _{-,j}^k}\Psi }{(\Delta x)^2} \end{aligned}$$
at each location \(j\). With the (forward) discrete time derivative
$$\begin{aligned} \partial _{\Delta t} ^i\Phi :=\frac{\Phi ^{i+1}-\Phi ^{i}}{\Delta t}, \end{aligned}$$
the scheme (2.9) reads
$$\begin{aligned} \partial _{\Delta t} \Phi + H(\nabla _D \Phi ) - \varepsilon \Delta _D \Phi = 0, \end{aligned}$$
for a discrete time-space function \(\Phi \in \mathbb {R}^{Q_D}\), which is a discrete analogue of the Hamilton–Jacobi equation with the viscosity term.
The consistency is immediate: for \(\Psi \in \mathbb {R}^{\Omega _D}\) satisfying \(\Delta _{+} \Psi / \Delta x=\Delta _{-}\Psi / \Delta x =(a_1,\ldots ,a_d)\), we have \(H(\nabla _{D}\Psi )=H(a_1,\ldots ,a_d)\) and \( \Delta _{D} \Psi =0 \). The monotonicity is not attained only with the Hamiltonian term, but can be ensured together with the viscosity term. A simple computation reveals that the scheme is monotone on \(\mathcal {C}_R\) when \(\varepsilon \) satisfies
$$\begin{aligned} \frac{\operatorname {Lip}(H,B_R)}{2} \leqslant \frac{\varepsilon }{\Delta x} \leqslant \frac{\Delta x}{2d\Delta t} \end{aligned}$$
(2.10)
where \(\operatorname {Lip}(H,B_R)\) is the Lipschitz constant of H on the centered ball of radius R. Recalling that we assume that the ratio \(\zeta =\Delta t/\Delta x\) is fixed among a family of discretizations of \(Q\), we need to choose \(\zeta \geqslant d \operatorname {Lip}(H,B_R)\), and in this case \(\varepsilon \) will go to zero at rate \(h = \Delta t \propto \Delta x\): the viscosity \(\varepsilon \) vanishes together with the stepsize.
Several alternatives can be found in the literature, such as the upwind scheme also in [23] (for \(d=1\)), higher order finite difference schemes [41], and also discontinuous Galerkin methods [18]. We, however, focus on the vanishing viscosity scheme in this article because of the following reasons: it is simple; we are able to solve efficiently the discrete dynamic optimal transport we build on it; and the function \({\overline{\varphi }}\) is a priori not expected to be better than Lipschitz uniformly over the space-time domain so using high order schemes seems less helpful as far as the theoretical analysis is concerned.

3 Discrete Dynamic Optimal Transport

In this section, we introduce a discrete formulation of dynamic optimal transport. We first do so using a general monotone and consistent scheme, and then specifically with the vanishing viscosity scheme explained in the previous section.

3.1 Discrete Problem for a General Scheme

We discretize the dynamic dual problem (2.2) on \(Q= [0,1] \times \Omega \). For the discretization of the domain, we use \(Q_D= T_D \times \Omega _D\) defined in Sect.2.3.
Regarding the discretization of probability measures, we follow a standard approach. For a given \(\mu \in \mathcal {P}(\Omega )\), we define its discretization \(\Pi \mu \) in a way that \(\Pi \mu \overset{*}{\rightharpoonup }\mu \) as \(\Delta x \rightarrow 0\). A simple choice is the projection onto the Dirac measures on grid points, given by
$$\begin{aligned} \Pi \mu = \sum _j \mu ({B_{j\Delta x}})\delta _{j\Delta x}, \end{aligned}$$
(3.1)
where \(B_{j\Delta x}\) is the \(d-\)dimensional half-open box centered at \(j\Delta x\) with edge length \(\Delta x\) given by,
$$\begin{aligned} B_{j\Delta x}=\big [(j_1-1/2) \Delta x, (j_1+1/2)\Delta x\big ) \times \cdots \times \big [(j_d-1/2) \Delta x, (j_d+1/2)\Delta x\big ). \end{aligned}$$
Remark 3.1
(Choice of discrete measure) In this article, we stick to this specific discretization of measures, but this is not the unique choice. For example, piecewise uniform measures in the boxes \(\{B_{j\Delta x}\}_j\) is also a reasonable option. All our results easily extend to this discretization of measures as soon as Lemma 4.2 is valid.
Clamped discrete gradient With the above settings, we are almost ready to introduce our discrete optimal transport problem. We finally impose a constraint on discrete functions as follows. We will later guarantee the convergence of the discrete transport cost to the continuous one. For this result, we will need Theorem 2.4 which requires a monotone scheme on \([-R-\delta ,R+\delta ]\) with \(\delta > 0\) and an initial condition \(\Phi ^0\in \mathcal {C}_R\). However, the boundedness of \(\Delta _{+} \Phi ^0 / \Delta x\) is not a priori guaranteed in our upcoming formulation of discrete optimal transport, in contrast to that, the gradient of an optimal potential can be chosen to be bounded by \( \operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )}) \) in the continuous setting stated as Proposition 2.2. To cope with this problem, we explicitly constrain
$$\begin{aligned} \forall j,k, \quad \left| \frac{ \Delta ^k_{+,j} \Phi ^0}{\Delta x} \right| \leqslant R \end{aligned}$$
for a parameter \(R \ge \operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})\). Said differently we impose \(\Phi ^0 \in \mathcal {C}_{\operatorname {Lip}(L,B_{R})}\).
Definition 3.2
(Discrete dynamic optimal transport) Let \(\mu ,\nu \in \mathcal {P}(\Omega )\) and choose a parameter \( R \ge \operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})\). Let us assume a time-space discretization as defined so far, and \(\mathcal {S}\) be a scheme of difference form that is monotone on \([-R-\delta ,R+\delta ]\) for some \(\delta > 0\) and consistent with the Hamiltonian \(H\) of the continuous optimal transport problem. We say that the discrete optimal transport cost between \(\mu \) and \(\nu \) is
$$\begin{aligned} \mathcal {K}_D(\mu ,\nu ):=\max _{\Phi } \int _\Omega \Phi ^{N_T} \, \textrm{d}\Pi \nu - \int _\Omega \Phi ^{0} \, \textrm{d}\Pi \mu \end{aligned}$$
(3.2)
where \(\Phi \in \mathbb {R}^{Q_D}\) runs over the discrete functions satisfying,
$$\begin{aligned}&\Phi ^{i+1} -\mathcal {S}(\Phi ^i ) \leqslant 0,&\forall i \in \{0,\ldots ,N_T-1\}, \end{aligned}$$
(3.3)
$$\begin{aligned}&\left| \frac{\Delta ^k_{+,j} \Phi ^0}{\Delta x}\right| \leqslant R&\forall j \in \{ 0, \ldots , N_X-1 \}^d, \, \forall k \in \{ 1, \ldots , d \}. \end{aligned}$$
(3.4)
If a discrete function \({\overline{\Phi }}\) attains \(\mathcal {K}_D(\mu ,\nu )\), we call \({\overline{\Phi }}\) an optimal potential for the discrete problem.
This definition clearly mimics (2.2) which is the continuous dual problem. At this level of generality, this is not necessarily a concave maximization problem. It would depend on the precise choice of the scheme \(\mathcal {S}\). We emphasize that our convergence result for the transport cost (Theorem 4.1) holds even in the absence of concavity.
Remark 3.3
(Break of the symmetry between \(\mu \) and \(\nu \)) This discrete optimal transport is in general not symmetric between \(\mu \) and \(\nu \), that is \(\mathcal {K}_D(\mu ,\nu ) \ne \mathcal {K}_D(\nu ,\mu )\) as can be seen both from the Hamilton–Jacobi constraint (3.3) and the constraint (3.4) on the initial potential. This is in contrast with other discretizations like [32, 42] which are symmetric in \(\mu \) and \(\nu \).
Remark 3.4
The threshold R for clamping in Definition 3.2 is allowed to be greather than \(\operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})\) as far as the scheme \(\mathcal {S}\) is monotone. But choosing a larger R makes room for the scheme to be monotone smaller (see (2.10)), and makes the convergence a bit slower in the sense that the multiplicative constant in front of \(\sqrt{h}\) in Theorem 4.1 increases with R. Thus choosing a larger R would make sense only if one does not have an exact access to \(\operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})\).

3.2 Discrete Problem with Vanishing Viscosity

We have introduced a discretization of dynamic optimal transport. Notice it is a discrete counterpart of the dynamic dual formulation. In order to obtain quantities such as optimal measures and optimal velocities, we need a primal formulation as well. This will not be needed for our main convergence result about the transport cost (Theorem 4.1), but will be necessary for the convergence of optimizers (Theorem 5.1) to make sense.
We focus on the vanishing viscosity scheme explained in Sect. 2.9 as this is a simple and practical example of schemes that make the discrete optimal transport a convex problem. With this scheme the constraint (3.3) is written as
$$\begin{aligned} \partial _{\Delta t}\Phi +H(\nabla _D \Phi ) - \varepsilon \Delta _D\Phi \leqslant 0. \end{aligned}$$
which is a convex constraint.
Remark 3.5
(Range of admissible parameters) Let’s focus on the case where L (and thus H) are radial. Note that the scheme must be monotone on \([-R-\delta , R+\delta ]\) with \(\delta > 0\) and \(R :=\operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})\). But as \(\nabla L\) and \(\nabla H\) are the inverse to each other, the constraint (2.10) reads
$$\begin{aligned} \frac{\operatorname {diam}(\Omega )}{2} < \frac{\varepsilon }{\Delta x} \leqslant \frac{\Delta x}{2 d\Delta t}. \end{aligned}$$
Interestingly it does not depend on L.
To obtain the primal formulation, we begin with writing the dual problem (Definition 3.2) more concretely for the vanishing viscosity scheme. To make the exposition simple, we introduce some notations. We fix \(\mu ,\nu \in \mathcal {P}(\Omega )\) in the rest of the section and define a functional \(F_D : \mathbb {R}^{Q_D}\rightarrow \mathbb {R}\) by
$$\begin{aligned} F_D \Phi = \int _\Omega \Phi ^{N_T} \, \textrm{d}\Pi \nu - \int _\Omega \Phi ^0 \, \textrm{d}\Pi \mu , \end{aligned}$$
(3.5)
and set a constant \(R \ge \operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})\). Next, we concatenate discrete differential operators as follows. We define \(T_D':=\{0,\Delta t,\ldots , (N_T-1)\Delta t \}\) by dropping the last time step \(N_T=1\) from \(T_D\) and define a subset of the entire discrete space \(Q_D\) by \(Q_D':=T_D' \times \Omega _D\). We then define an operator \(A=(A_t,A_x,A_R) : \mathbb {R}^{Q_D}\rightarrow \mathbb {R}^{Q'_D +d\times Q'_D+ d\times \Omega _D}\) by
$$\begin{aligned}&A_t \Phi =(\partial _{\Delta t}^0\Phi -\varepsilon \Delta _D \Phi ^0, \ldots , \partial _{\Delta t}^{N_T-1}\Phi - \varepsilon \Delta _D \Phi ^{N_T-1})\in \mathbb {R}^{Q_D'},\\&A_x\Phi =(\nabla _D \Phi ^0, \ldots , \nabla _D \Phi ^{N_T-1}) \in \mathbb {R}^{d\times Q_D'},\\&A_R \Phi =(\Delta _{+}\Phi ^0/\Delta x ) \in \mathbb {R}^{d\times \Omega _D}. \end{aligned}$$
We rewrite the problem with these settings.
Definition 3.6
(Dual problem with vanishing viscosity) Let \(\mu ,\nu \in \mathcal {P}(\Omega )\), \(H\) be the Hamiltonian of the continuous problem, and \(\mathcal {S}\) be the vanishing viscosity scheme. The dual formulation of discrete transport problem is defined as
$$\begin{aligned}&\sup _\Phi F_D\Phi , \end{aligned}$$
(3.6)
with constraints
$$\begin{aligned}&A_t\Phi +H(A_x \Phi )\leqslant 0, \end{aligned}$$
(3.7)
$$\begin{aligned}&|A_R \Phi | \leqslant R. \end{aligned}$$
(3.8)
To derive the expression of the dual of this problem, which we call the primal problem, we use the method of Lagrange multipliers. We introduce a new variable \(\Sigma \) in the dual which at optimality coincides with \(A \Phi \), and take a Lagrange multiplier \(\Lambda \) to enforce the constraint \(A \Phi = \Sigma \). Specifically the problem reads as the saddle point problem
$$\begin{aligned} \mathcal {K}_D(\mu ,\nu ) = \sup _{\Phi , \Sigma } \inf _{\Lambda } \; \mathcal {L}(\Phi , \Sigma , \Lambda ), \end{aligned}$$
(3.9)
with the functional
$$\begin{aligned} \mathcal {L}(\Phi , \Sigma , \Lambda )= F_D \Phi - I_{\Sigma _t+H(\Sigma _x)\leqslant 0} - I_{|\Sigma _R|\leqslant R} - \Lambda \cdot (A\Phi -\Sigma ). \end{aligned}$$
The newly introduced symbols are defined as follows. We set the variable \(\Sigma :=(\Sigma _t, \Sigma _x, \Sigma _R)\in \mathbb {R}^{Q'_D +d\times Q'_D+ d\times \Omega _D}\). Each indicator function \(I\) takes \(0\) if the constraint is satisfied, and otherwise \(+ \infty \). The variables \(\Lambda :=(\Lambda _\rho , \Lambda _m, \Lambda _\eta )\in \mathbb {R}^{Q'_D +d\times Q'_D+ d\times \Omega _D} \) are the Lagrange multipliers. Finally by “\(\cdot \)” we denote the standard Euclidean product between vectors in \(\mathbb {R}^N\), with a dimension N which should be clear from context. It should be interpreted as the integral of a discrete scalar or vector field by a discrete scalar or vector-valued measure respectively, which is just the summation of element-wise products.
Now let us consider the formal exchange of the infimum and the supremum. By a direct computation, we have
$$\begin{aligned}&\inf _{\Lambda } \sup _{\Phi ,\Sigma } \; \mathcal {L}(\Phi , \Sigma , \Lambda ) = \inf _{\Lambda } \sup _{\Phi ,\Sigma } \; (F_D-A^\top \Lambda ) \cdot \Phi - I_{\Sigma _t+H(\Sigma _x)\leqslant 0} - I_{|\Sigma _R|\leqslant R} + \Lambda \cdot \Sigma \\&\quad =\inf _\Lambda \sup _{\Phi , \Sigma _t, \Sigma _x} \; (F_D-A^\top \Lambda ) \cdot \Phi - I_{\Sigma _t+H(\Sigma _x)\leqslant 0} \\ &\qquad + \Lambda _\rho \cdot \Sigma _t + \Lambda _m \cdot \Sigma _m + R \cdot |\Lambda _\eta | \end{aligned}$$
where in the last line we have performed the maximization over \(\Sigma _R\) by taking \(\Sigma _R = R \Lambda _\eta / |\Lambda _\eta |\) elementwise. If \(\Lambda _\rho \) is not element-wise non-negative, then the supremum in \(\Sigma _t \rightarrow - \infty \) would yield \(+ \infty \). But once we know \(\Lambda _\rho \geqslant 0\), we see that the supremum in \(\Sigma \) is attained in the boundary of the convex constraints, namely \(\Sigma _t=-H(\Sigma _x)\). When we take the supremum in \(\Sigma _x\) we see that \(\Lambda _\rho \) should be strictly positive if \(\Lambda _m\) is non-zero and that in this case
$$\begin{aligned} \sup _{\Sigma _x} \; \Lambda _m \cdot \Sigma _x- \Lambda _\rho \cdot H(\Sigma _x) = L\left( \frac{\Lambda _m}{\Lambda _\rho }\right) \cdot \Lambda _\rho \end{aligned}$$
Note that in this case we also have at optimality
$$\begin{aligned} \frac{\Lambda _m}{\Lambda _\rho } = \nabla H (\Sigma _x)=\nabla H (\nabla _D \Phi ) \end{aligned}$$
(3.10)
for element-wise non-zero \(\Lambda _\rho \). Thus the problem boils down to
$$\begin{aligned} \inf _{\Lambda } \sup _{\Phi ,\Sigma } \; \mathcal {L}(\Phi , \Sigma , \Lambda ) = \inf _\Lambda L\left( \frac{\Lambda _m}{\Lambda _\rho }\right) \cdot \Lambda _\rho + R \cdot |\Lambda _\eta | + \sup _{\Phi } \; \left\{ (F_D-A^\top \Lambda ) \cdot \Phi \right\} \end{aligned}$$
Writing \(R \cdot |\Lambda _\eta | = R \Vert \Lambda _\eta \Vert _{L^1(\Omega _D)}\) and interpreting \(\Phi \) as a Lagrange multiplier we obtain the following minimization problem.
Definition 3.7
(Primal problem with vanishing viscosity) Let \(\mu ,\nu \in \mathcal {P}(\Omega )\), \(L\) be the Lagrangian of the continuous problem, and \(\mathcal {S}\) be the vanishing viscosity scheme. The primal formulation of discrete transport problem is defined as
$$\begin{aligned} \inf _\Lambda \; L\left( \frac{\Lambda _m}{\Lambda _\rho }\right) \cdot \Lambda _\rho +R \Vert \Lambda _\eta \Vert _{L^1(\Omega _D)} \end{aligned}$$
(3.11)
where \(\Lambda \) runs through discrete functions satisfying the discrete continuity equation,
$$\begin{aligned} A^\top \Lambda =F_D \end{aligned}$$
(3.12)
together with the element-wise constraints \(\Lambda _\rho \geqslant 0\) and \(\Lambda _m = 0\) as soon as \(\Lambda _\rho = 0\). When \({\overline{\Lambda }}=({\overline{\Lambda }}_\rho , {\overline{\Lambda }}_m,{\overline{\Lambda }}_\eta )\) attains the minimum, we call \({\overline{\Lambda }}_\rho \) an optimal measure and \({\overline{\Lambda }}_m\) an optimal momentum.
Remark 3.8
Note that (3.12) is a discrete version of the continuity equation with a viscosity term. The temporal boundary conditions are encoded in the equation as expanding (3.12) reads
$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _{-,\Delta t}\Lambda _\rho - \nabla _D ^\top \Lambda _m + \varepsilon \Delta _D \Lambda _\rho = 0, \\ \displaystyle {\Lambda _\rho ^{-1} = \Pi \mu - \frac{\Delta _{-}\Lambda _\eta }{\Delta x}} , \quad \Lambda _\rho ^{N_T-1} = \Pi \nu . \end{array}\right. } \end{aligned}$$
Here \(\nabla _D^\top \) is the adjoint of the discrete gradient \(\nabla _D\) with respect to the standard Euclidean product, hence \(-\nabla _D^\top \) is a discrete analogue of divergence. The operator \(\partial _{-,\Delta t}\) is the backward discrete time derivative given as \(\partial ^i_{-,\Delta t} \Lambda _\rho := (\Lambda _\rho ^{i} - \Lambda _\rho ^{i-1})/\Delta t\) for \(i \in \{ 0,1, \ldots , N_T -1 \}\) provided that \(\Lambda ^{-1}_\rho \) is defined as in the second line above.
Thus it is tempting to think of the primal problem (Definition 3.7) as the discretization of
$$\begin{aligned} \inf _{\rho , m} \int ^1_0 \int _\Omega L\left( \frac{\textrm{d}m}{\textrm{d}\rho }\right) \, \textrm{d}\rho \textrm{d}t \quad \text {such that} \quad {\left\{ \begin{array}{ll} \partial _t \rho + \nabla \cdot m = - \varepsilon \Delta \rho \\ \rho _0 =\mu , \quad \rho _1=\nu . \end{array}\right. } \end{aligned}$$
Said differently: from the beginning the parameter \(\varepsilon \) was interpreted in the dual problem as a regularization parameter, as it introduces a diffusive term \(\varepsilon \Delta {\overline{\varphi }}\) in the Hamilton–Jacobi equation. The computation we made shows that, in the primal problem, the same parameter \(\varepsilon \) also adds diffusion, but this time in the continuity equation which becomes \(\partial _t \rho + \nabla \cdot m = - \varepsilon \Delta \rho \). In particular, the case of \(L(v)=|v|^2/2\) amounts to the entropic regularized optimal transport [27].
However, this analogy is not perfect. First, because of the additional variable \(\Lambda _\eta \) used to constrain the discrete gradient in the dual formulation, but also because \(\varepsilon \) depends on \(\Delta x\) and vanishes as \(\Delta x\) goes to 0. Nevertheless, this analogy explains why our discrete solutions would be slightly more smooth than the continuous solutions (and less and less as the stepsize decreases), something that we observe numerically in Sect.6.
We conclude this section by stating that our exchange between infimum and supremum was formal, but can be made rigorous.
Proposition 3.9
(Strong duality of discrete optimal transport) There is no duality gap between the dual problem (Definition 3.6) and the primal problem (Definition 3.7) i.e. we have
$$\begin{aligned} F_D {\overline{\Phi }} = L\left( \frac{{\overline{\Lambda }}_m}{{\overline{\Lambda }}_\rho }\right) \cdot {\overline{\Lambda }}_\rho + R \left\| {\overline{\Lambda }}_\eta \right\| _{L^1(\Omega _D)}, \end{aligned}$$
for a maximizer \({\overline{\Phi }}\) of the dual problem and a minimizer \({\overline{\Lambda }}\) of the primal problem.
Proof
Slater’s condition states that the strong duality holds if the feasible region has non-empty interior [15, Sects. 5.2.3 & 5.3.2]. In our case, it suffices to show the existence of a strictly admissible competitor of the dual problem i.e. \(\Phi \in \mathbb {R}^{Q_D}\) satisfying,
$$\begin{aligned} A_t\Phi +H(A_x \Phi )< 0, \qquad |A_R \Phi | < R. \end{aligned}$$
It is attained for instance by defining \(\Phi ^i_j:=- i\Delta t H(0) -\epsilon \) with some \(\epsilon >0\). \(\square \)

4 Convergence of the Optimal Transport Cost

We now state and prove our main result on the quantitative convergence of the optimal transport cost. This result is valid for a general monotone and consistent scheme.
Theorem 4.1
(Convergence of transport cost) Let \(\mu ,\nu \in \mathcal {P}(\Omega )\) and let \(\mathcal {K}(\mu ,\nu )\) and \(\mathcal {K}_D(\mu ,\nu )\) be the continuous and the discrete optimal transport cost; see respectively (2.2) and Definition 3.2. Then there is a constant \(C\) depending only on \(\Omega \), \(L\) and \(R\) such that
$$\begin{aligned} | \mathcal {K}(\mu ,\nu ) - \mathcal {K}_D(\mu ,\nu ) | \leqslant C \sqrt{h}. \end{aligned}$$
To make the exposition of the proof simple, let us introduce the following notations: for a function \(\varphi : Q\rightarrow \mathbb {R}\) we define the functionals F and \(F_D\) via
$$\begin{aligned} F \varphi :=\int _{\Omega } \varphi (1,\cdot ) \, \textrm{d}\nu - \int _{\Omega } \varphi (0,\cdot ) \, \textrm{d}\mu , \qquad F_D \varphi :=\int _{\Omega } \varphi (1,\cdot ) \, \textrm{d}\Pi \nu - \int _{\Omega } \varphi (0,\cdot ) \, \textrm{d}\Pi \mu . \end{aligned}$$
Our goal is to control the gap between \(F{\overline{\varphi }}\) and \(F_D{\overline{\Phi }}\) for the continuous and discrete optimal potentials \({\overline{\varphi }}\) and \({\overline{\Phi }}\) respectively. However, the relation between \({\overline{\varphi }}\) and \({\overline{\Phi }}\) is unclear as they are solutions to two different problems. To circumvent this issue, we replace them with functions that can be easily compared with each other. For this purpose, we exploit not only a solution to the discrete Hamilton–Jacobi equation, but also a viscosity solution to the continuous Hamiltonian–Jacobi equation. Before proving the theorem, we review elementary results about errors caused by discretizing measures.
Lemma 4.2
(Gap between continuous and discrete measures) Let \(\varphi :Q\rightarrow \mathbb {R}\) be a Lipschitz function. Then we have a control,
$$\begin{aligned} |F\varphi -F_D\varphi |\leqslant \frac{\sqrt{d}}{2} \Delta x \left( \operatorname {Lip}(\varphi (0,\cdot )) + \operatorname {Lip}(\varphi (1,\cdot )) \right) . \end{aligned}$$
Proof
First note that the Wasserstein-1 distances \( W_1(\Pi \mu ,\mu )\) and \( W_1(\Pi \nu ,\nu )\) are bounded by \(\sqrt{d} \Delta x/2\) since each mass moves up to \(\sqrt{d} \Delta x/2\) via discretization. The claim follows from the fact that, for any Lipschitz function \(f:\Omega \rightarrow \mathbb {R}\) and \(\rho _1, \rho _2\in \mathcal {P}(\Omega )\) we have,
$$\begin{aligned} \int _\Omega f \, \textrm{d}(\rho _1-\rho _2)\leqslant \operatorname {Lip}(f) W_1(\rho _1,\rho _2). \end{aligned}$$
\(\square \)
Lemma 4.3
(Gap between discrete integral of functions) The functional \(F_D\) is \(2\)-Lipschitz with respect to \(L^\infty (Q_D)\) norm.
Proof
The claim follows from the definition as \(\Pi \mu \), \(\Pi \nu \) are probability measures. \(\square \)
Proof of Theorem 4.1
Let \({\overline{\varphi }}\) and \({\overline{\Phi }}\) be respectively the solutions to the continuous and discrete problem. Note that we can take \({\overline{\varphi }}\) to satisfy the Lipschitz bound (2.3) thanks to Proposition 2.2. One the other hand \({\overline{\Phi }}\) satisfies a similar bound (3.4) by design.
We first show \( F_D{\overline{\Phi }} \leqslant F{\overline{\varphi }} + C \sqrt{h}\). Let us take the solution to the discrete initial value problem,
$$\begin{aligned} {\left\{ \begin{array}{ll} {\widetilde{\Phi }}^{i+1}=\mathcal {S}({\widetilde{\Phi }}^i), \\ {\widetilde{\Phi }}^{0}= {\overline{\Phi }}^0. \end{array}\right. } \end{aligned}$$
Note that \({\overline{\Phi }}\) and \({\widetilde{\Phi }}\) have the same initial data, and that \({\overline{\Phi }}\leqslant {\widetilde{\Phi }}\) due to the inequality constraint (3.3) and the monotonicity of the scheme, hence \(F_D {\overline{\Phi }} \leqslant F_D {\widetilde{\Phi }}\). To estimate \(F_D {\widetilde{\Phi }}\) let us consider \({\widetilde{\varphi }}\) the unique viscosity solution to the the continuous initial value problem
$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t {\widetilde{\varphi }} + H(\nabla {\widetilde{\varphi }})=0, \\ {\widetilde{\varphi }} (0,\cdot ) = \operatorname {LI}\left( {\overline{\Phi }}^0\right) , \end{array}\right. } \end{aligned}$$
where \( \operatorname {LI}\) gives the piecewise linear interpolation of a discrete function. When \(d=1\), it is given for a discrete space function \(\Psi \) by
$$\begin{aligned} \operatorname {LI}(\Psi )(x) :=\frac{(j+1)\Delta x-x}{\Delta x}\Psi _j + \frac{x-j\Delta x}{\Delta x} \Psi _{j+1}, \quad x\in [j\Delta x, (j+1)\Delta x]. \end{aligned}$$
For higher dimensions, it is given by bilinear interpolation, trilinear interpolation, and so on. As \({\overline{\Phi }}^0 \in \mathcal {C}_R\) we can check that \({\widetilde{\varphi }}(0,\cdot )\) is R-Lipschitz in each coordinate and moreover \(\operatorname {Lip}({\widetilde{\varphi }}(0,\cdot )) \leqslant \sqrt{d} R\). Thus in particular \(\operatorname {Lip}({\widetilde{\varphi }}(t,\cdot )) \leqslant \sqrt{d} R\) for any \(t \in [0,1]\) (see Theorem 2.3). As \({\widetilde{\varphi }}\) is an admissible competitor, we have \(F {\widetilde{\varphi }} \leqslant F{\overline{\varphi }}\). Thus we see
$$\begin{aligned} F_D{\overline{\Phi }} \leqslant F_D{\widetilde{\Phi }} \leqslant F_D{\widetilde{\Phi }} + F{\widetilde{\varphi }} - F{\widetilde{\varphi }} \leqslant F{\overline{\varphi }} + (F_D{\widetilde{\Phi }} - F_D{\widetilde{\varphi }}) + (F_D{\widetilde{\varphi }} - F{\widetilde{\varphi }}). \end{aligned}$$
For the term \(F_D{\widetilde{\Phi }} - F_D{\widetilde{\varphi }}\) we use here Lemma 4.3 followed by Theorem 2.4, as our constraint (3.4) guarantees that \({\widetilde{\varphi }}(0,\cdot )\) is R-Lipschitz in each coordinate:
$$\begin{aligned} F_D{\widetilde{\Phi }}- F_D{\widetilde{\varphi }} \leqslant 2\Vert {\widetilde{\Phi }}-{\widetilde{\varphi }}\Vert _{L^\infty (Q_D)} \leqslant C\sqrt{h}. \end{aligned}$$
On the other hand, it follows from Lemma 4.2 and the regularization effect of the continuous Hamilton–Jacobi equation (Theorem 2.3) that
$$\begin{aligned} F_D{\widetilde{\varphi }}- F{\widetilde{\varphi }} \leqslant \frac{\sqrt{d}\Delta x}{2}\left( \operatorname {Lip}({\widetilde{\varphi }}(1,\cdot )) + \operatorname {Lip}({\widetilde{\varphi }}(0,\cdot )) \right) \leqslant \sqrt{d} \Delta x \operatorname {Lip}({\widetilde{\varphi }}(0,\cdot )) \leqslant C h, \end{aligned}$$
hence we obtained the claimed inequality as C can be taken independent on \(\mu , \nu \) thanks to (3.4).
For the other inequality \( F{\overline{\varphi }}\leqslant F_D{\overline{\Phi }} + C \sqrt{h}\), we start from an optimal potential \({\overline{\varphi }}\) to the continuous dual transport problem which satisfies the estimate (2.3) and is a viscosity solution to the continuous Hamilton–Jacobi equation. To transform it into a discrete competitor we consider the discrete system
$$\begin{aligned} {\left\{ \begin{array}{ll} {\widetilde{\Phi }}^{i+1}=\mathcal {S}({\widetilde{\Phi }}^i), \\ {\widetilde{\Phi }}^{0}_j= {\overline{\varphi }}(0,j\Delta x), \end{array}\right. } \end{aligned}$$
using the point sample of the continuous potential \({\overline{\varphi }}\) as initial data. Then its solution \({\widetilde{\Phi }}\) is an admissible competitor for the discrete transport problem: this is because the discrete constraint (3.4) is automatically satisfied thanks to (2.3). Thus we have \(F_D{\widetilde{\Phi }}\leqslant F_D{\overline{\Phi }} \) that we use in
$$\begin{aligned} F{\overline{\varphi }} = F{\overline{\varphi }} - F_D {\overline{\varphi }} + F_D {\overline{\varphi }} + F_D {\widetilde{\Phi }} - F_D {\widetilde{\Phi }} \leqslant F_D{\overline{\Phi }} + (F{\overline{\varphi }} - F_D {\overline{\varphi }}) + (F_D {\overline{\varphi }} - F_D {\widetilde{\Phi }}). \end{aligned}$$
In a similar way, we have \(F{\overline{\varphi }} - F_D {\overline{\varphi }} \leqslant Ch\) thanks to Lemma 4.2 and the bound (2.3). On the other hand \(F_D {\overline{\varphi }} - F_D {\widetilde{\Phi }} \leqslant C \sqrt{h}\): for this estimate we rely again on Lemma 4.3 followed by Theorem 2.4. \(\square \)
Remark 4.4
(Optimal transport on a bounded domain) We built a discrete formulation on the spatial domain \(\Omega = \mathbb {R}^d / (D \mathbb {Z}^d)\) to avoid boundary conditions, and we prove our convergence results in this setting. At least from a theoretical point of view, given measures \(\mu , \nu \) with compact support, it is possible to embed them in \(\mathbb {R}^d / (D \mathbb {Z}^d)\) with D large enough such that no mass crosses the “periodic boundary”, and thus the optimal transport on the torus between \(\mu \) and \(\nu \) coincides with the optimal transport on \(\mathbb {R}^d\). However from a numerical point of view this is problematic, as most of the domain \(\Omega \) would be empty, and thus most of the nodes of the discretized domain \(\Omega _D\) do not record any motion of mass. This leads to a lot of wasted computational resources.
One way to address this issue may be to use a discretization of the Hamilton–Jacobi equation on a bounded domain. We should then impose normal boundary conditions \(\partial \varphi / \partial n = 0\), being n the outward normal to the domain. We refer to [44] and [1] for discussions of possible discretization. However, at this point the challenge would be to adapt the proof of Theorem 4.1. As we discussed in Remark 2.6, the main problem is that the regularizing effect (2.8) is no longer available automatically in such schemes, so that even clamping the value of the gradient at the initial time is not enough.
We leave for future work the proposal of a discretization of dynamic optimal transport via its dual formulation which would also handle the case of convex domains of \(\mathbb {R}^d\).
Remark 4.5
By looking into the proof of Theorem 4.1, we see that a quantitative convergence of the transport cost in our discrete problem is provable if, for the chosen scheme \(\mathcal {S}\), a) solutions of the discrete Hamilton–Jacobi equation converge to continuous viscosity solutions at a known rate, and b) the discrete solutions have controlled Lipschitz constants at time \(t= 0\) and \(t=1\). This may leave the room for other scheme. The challenge though would be to design an optimization method to solve the resulting discrete problem. Typically, certain types of upwind schemes define \(\mathcal {S}\) piecewise, making it challenging to optimize over constraints such as \(\Phi ^{i+1} -\mathcal {S}(\Phi ^i ) \le 0\). We stick to the vanishing viscosity scheme because of the simplicity of its implementation.

5 Convergence of the Optimizers

We can also obtain a quantitative convergence of solutions to discrete problems using the convergence of transport cost we showed in the previous section. Note that if the functionals to optimize were uniformly strictly convex, then this would be a direct consequence of the convergence of the value, that is, of the transport cost. This is not possible here: the functional in the dual problem (2.2) is only linear in \(\varphi \)! Actually without further assumptions on \(\mu , \nu \) there is not necessarily a unique solution to the continuous problem so it is very unlikely we can prove any convergence of the optimizers.
Thus in this section we impose additional assumptions that are typically satisfied when the measures \(\mu , \nu \) have some smoothness. Then, by a (now classical) study of the duality gap between the primal and dual problem, it is possible to recover some convergence of the optimizers.
Assumption on the scheme We give our result specifically for the discrete problem with the vanishing viscosity scheme we introduced in Sect. 3.2. At this point we are not aware of the exact conditions for extending the results to a general scheme as it requires also the primal formulation of the discrete problem.
Assumption on the Lagrangian and the Hamiltonian We suppose that the Lagrangian \(L\) and the Hamiltonian \(H\) satisfy for any \(v,w\in \mathbb {R}^d\),
$$\begin{aligned} L(v)+H(w)\geqslant v\cdot w + |f_L(v)-f_H(w)|^2, \end{aligned}$$
(5.1)
for some functions \(f_L,f_H : \mathbb {R}^d\rightarrow \mathbb {R}^d\). This is an improvement of Young’s inequality \( L(v)+H(w)\geqslant v\cdot w\) and we refer to [46] for a discussion of this assumption, including the link with Bregman divergences. We refer to two examples of such \(L\) and \(H\) from [46, Lemma 3.3 and Lemma 3.2]. If \(L\) is uniformly convex i.e. \(D^2L \geqslant \lambda I\) for some \(\lambda >0\), we have
$$\begin{aligned} L(v)+H(w)\geqslant v\cdot w + \frac{\lambda }{2}|v- \nabla H(w)|^2. \end{aligned}$$
On the other hand, for \(L(v)=|v|^p/p\) and \(H(w)=|w|^q/q\) with some \(p,q> 1\) such that \(1/p+1/q=1\), we have
$$\begin{aligned} L(v)+H(w)\geqslant v\cdot w + \frac{1}{2\max \{p,q\}}|v^{p/2}-w^{q/2}|^2, \end{aligned}$$
where the power of a vector is defined as \(v^p= v|v|^{p-1}\). Note in any case, as we know that there is equality in Young’s inequality when \(v = \nabla H(w)\), that we must necessarily have for \(w \in \mathbb {R}^d\):
$$\begin{aligned} f_H(w) = f_L(\nabla H(w)). \end{aligned}$$
(5.2)
Assumption on the optimal potential Finally, we assume that an optimal potential of the continuous dual problem \({\overline{\varphi }}\) is of class \(C^{1,1}\) on \([0,1] \times \Omega \). That is, \({\overline{\varphi }}\) is time-space differentiable and its derivatives are time-space Lipschitz. If the measures \(\mu , \nu \) are in \(\mathbb {R}^d\), by Caffarelli’s regularity theory a sufficient condition for this is that they are supported on a uniformly convex \(C^2\) domains \(X_\mu , X_\nu \) and have Lebesgue densities \(f_\mu , f_\nu \) that are Hölder continuous and bounded from below by a strictly positive constant on \(\overline{X_\mu }, \overline{X_\nu }\) respectively [49, Theorem 4.14]. On the periodic domain \(\Omega \), it is still the case if \(X_\mu , X_\nu \) are small enough and close to each other so that the situation reduces to the case of \(\Omega =\mathbb {R}^d\) and the optimal velocity of mass \({\overline{v}}=\nabla H(\nabla {\overline{\varphi }})\) has no discontinuity on the support of the measures. Alternatively on the torus a sufficient condition for \({\overline{\varphi }} \) of class \(C^{1,1}\) is \(f_\mu , f_\nu \) being nowhere vanishing and Hölder continuous [20, Theorem 1], [3, Theorem 2.2 (iii)].
Such a condition on \(\mu ,\nu \) is still weaker than the one required for similar results in the recent work [38], which has to assume that the optimal density \({\overline{\rho }}\) solving the continuous primal problem is uniformly bounded from below in the entire domain Q.
Statement of the result Now we state our convergence result for optimizers. We quantify the result in terms of the discrete \(L^2\) norm \(\Vert \cdot \Vert _{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\) on \(Q'_D\) given by
$$\begin{aligned} \Vert M \Vert ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q_D')}= \sum _{z\in Q_D'} |M_z|^2 {{\overline{\Lambda }}_{\rho ,z}} \end{aligned}$$
for \(M\in \mathbb {R}^{d\times Q_D'}\): it is norm weighted by \({\overline{\Lambda }}_\rho \) which is solution to the discrete primal problem (see Definition 3.7). Eventually, for a continuous function \(\varphi \) defined on our domain Q, we denote by \(\Pi \varphi \) the function defined on \(Q_D\) which corresponds to the pointwise evaluation of the function on grid points, that is, \((\Pi \varphi )^i_j=\varphi (i\Delta t, j\Delta x)\).
Theorem 5.1
(Norm convergence of optimizers) Suppose that the Lagrangian \(L\) and the Hamiltonian \(H\) of the continuous problem satisfy the inequality (5.1) with some functions \(f_L\) and \(f_H\). Let \({\overline{\varphi }}\) and \({\overline{\Phi }}\) be optimal potentials for the continuous and the discrete dual problems, and let \({\overline{\Lambda }}=({\overline{\Lambda }}_\rho ,{\overline{\Lambda }}_m,{\overline{\Lambda }}_\eta )\) be an optimizer for the discrete primal problem. Assume that \({\overline{\varphi }}\in C^{1,1}(Q)\). Then we have estimate,
$$\begin{aligned} \left\| f_H(\nabla _D {\overline{\Phi }})- f_H(\nabla _D \Pi {\overline{\varphi }})\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\leqslant C\sqrt{h}, \end{aligned}$$
(5.3)
where the constant \(C\) depends only on \(\operatorname {diam}(\Omega )\), \(L\) and \(R\). If \(f_H\) is Lipschitz on bounded sets we also have an estimate
$$\begin{aligned} \left\| f_L({\overline{V}})- f_L({\overline{v}})\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\leqslant C\sqrt{h}, \end{aligned}$$
(5.4)
where \({\overline{v}} = \nabla H(\nabla {\overline{\varphi }})\) is the optimal velocity for an optimal pair \(({\overline{\rho }},{\overline{v}})\) of the continuous problem and the optimal discrete velocity \({\overline{V}}:={{\overline{\Lambda }}_m}/{{\overline{\Lambda }}_\rho }\) defined on \(\operatorname {support}({\overline{\Lambda }}_\rho )\).
Corollary 5.2
The estimates in Theorem 5.1 hold for the Lagrangians \(L(v)=|v|^p/p\) with \(p\leqslant 2\), and in this case they read, with \(1/p + 1/q = 1\),
$$\begin{aligned} \left\| \left( \nabla _D {\overline{\Phi }} \right) ^{q/2}- \left( \nabla _D \Pi {\overline{\varphi }} \right) ^{q/2} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\leqslant C\sqrt{h}, \qquad \left\| {\overline{V}} ^{p/2}- {\overline{v}}^{p/2} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\leqslant C\sqrt{h}. \end{aligned}$$
In particular for \(p=2\), it simplifies to
$$\begin{aligned} \left\| \nabla _D {\overline{\Phi }}- \nabla _D \Pi {\overline{\varphi }} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\leqslant C\sqrt{h}, \qquad \left\| {\overline{V}}- {\overline{v}} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\leqslant C\sqrt{h}. \end{aligned}$$
Note that a quantity such as \(\left\| {\overline{V}} ^{p/2}- {\overline{v}}^{p/2} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}\) is not a discrete \(L^p\) norm, it is a discrete \(L^2\) norm between (vectorial) powers of quantities of interest.
Proof
The map \(f_H(w)=w^{q/2} / \sqrt{2 q}\) for \(q = p/(p-1) \geqslant 2\) is Lipschitz on bounded sets, while we can take \(f_L(v) = v^{p/2} / \sqrt{2 q}\), and the factor \(\sqrt{2q}\) can be absorbed in the constant C. The second claim follows from \(f_L(v)=v/2\) and \(f_H(w)=w/2\) for \(p=2\). \(\square \)
To prove the theorem we follow an argument already present in the literature. It consists in quantifying the duality gap between an optimal primal solution and a (non-optimal) dual competitor: see [8, 46] and references therein for this argument applied to get regularity estimates at the continuous level, and [38] where this method is used to analyze discretization of dynamic optimal transport. We first prove some primal-dual relationship at the continuous level, then estimate the duality gap at the discrete level, and finally proceed with the proof.
Proposition 5.3
(Primal-dual relation at optimality in the continuous problem) Let \(L,H\), \(f_L,f_H\) be assumed in Theorem 5.1. Let \(({\overline{\rho }},{\overline{v}})\) be an optimizer for the continuous primal problem and \({\overline{\varphi }}\) be an optimizer in the dual problem. Then at least \({\overline{\rho }}\)-a.e.
$$\begin{aligned} {\overline{v}} = \nabla H ( {\overline{\varphi }} ), \end{aligned}$$
(5.5)
and in particular combining with (5.2) we obtain
$$\begin{aligned} f_L\left( {\overline{v}}\right) = f_H\left( \nabla {\overline{\varphi }} \right) . \end{aligned}$$
(5.6)
When \({\overline{\varphi }}\) is \(C^{1,1}\) the relation (5.5) is actually a way to define \({\overline{v}}\) in an unambiguous manner outside of the support of \({\overline{\rho }}\). The analogue for the discrete problem would be (3.10) that we derived formally when doing the inf-sup exchange. For the proof one could use arguments in the style of [8, Lemma 4.1] and even quantify a duality gap but we would have to be careful between the pairing of the continuity equation in a weak sense and the Hamilton–Jacobi equation in a viscosity sense. Rather we exploit the precise structure of the optimizers that come from the static problem, at the price of more obscure proof for the non-expert reader.
Proof
Recall that \(- {\overline{\varphi }}(0, \cdot )\) is an optimal Kantorovich potentials in the static dual problem (2.6); see the proof of Proposition 2.2. Thus the optimal transport map T between \(\mu \) and \(\nu \) is given by \(T(x) = x + \nabla H(\nabla \varphi (0,x))\) at least \(\mu \)-a.e. (see e.g. [45, Sect. 1.3]). Let us write \(T_t = (1-t)\textrm{Id} + t T\) which is the optimal transport between \(\mu \) and \({\overline{\rho }}_t\) and is invertible [45, Lemma 5.29]. Proposition 5.30 in [45] yields \({\overline{v}}_t = (T - \textrm{Id}) \circ T_t^{-1}\).
On the other hand as \({\overline{\varphi }}\) is smooth and solves the Hamilton–Jacobi equation we can use the method of characteristics [26, Sect. 3.3]. The characteristics are given by \(t \mapsto T_t(x)\), and we know that \(\nabla {\overline{\varphi }}\) is constant along characteristics. That gives \(\nabla {\overline{\varphi }}(t,T_t(x)) = \nabla {\overline{\varphi }}(0,x)\). Composing with \(\nabla H\) and using the definition of T we indeed obtain \(\nabla H(\nabla {\overline{\varphi }})(t,T_t(x)) = \nabla H(\nabla {\overline{\varphi }})(0,x) = T(x) - x = {\overline{v}}_t(T_t(x))\). Eventually we compose on the left with \(T_t^{-1}\) to get the final result. \(\square \)
In the next lemma, we show the key estimate: the quantification of the duality gap for the discrete problem.
Lemma 5.4
(Quantification of the duality gap in the discrete problem) Let \(L,H\), \(f_L,f_H\), \({\overline{\Phi }},{\overline{\Lambda }}=({\overline{\Lambda }}_\rho ,{\overline{\Lambda }}_m,{\overline{\Lambda }}_\eta ), {\overline{V}}= {\overline{\Lambda }}_m /{\overline{\Lambda }}_\rho \) be assumed to be as in Theorem 5.1. Then for any admissible discrete competitor \(\Phi \), we have,
$$\begin{aligned} \left\| f_H\left( \nabla _D{\overline{\Phi }}\right) - f_H(\nabla _D \Phi )\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)} \leqslant F_D{\overline{\Phi }} - F_D\Phi . \end{aligned}$$
(5.7)
Proof
We will actually prove
$$\begin{aligned} \left\| f_L\left( {\overline{V}}\right) - f_H(\nabla _D \Phi )\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)} \leqslant F_D{\overline{\Phi }} - F_D\Phi , \end{aligned}$$
(5.8)
and by setting \(\Phi ={\overline{\Phi }}\) we obtain \(f_L\left( {\overline{V}}\right) = f_H\left( \nabla _D {\overline{\Phi }} \right) \) on \(\operatorname {support}\left( {\overline{\Lambda }}_\rho \right) \) (that we already derived combining (3.10) and (5.2)), thus the stated inequality.
Thanks to the absence of duality gap at optimality (see Proposition 3.9), we obtain with the discrete continuity equation \(F_D=A^\top \Lambda \) (3.12),
$$\begin{aligned} F_D{\overline{\Phi }}-F_D\Phi&= L\left( \frac{{\overline{\Lambda }}_m}{{\overline{\Lambda }}_\rho }\right) \cdot {\overline{\Lambda }}_\rho + R \left\| {\overline{\Lambda }}_\eta \right\| _{L^1(\Omega _D)} - F_D\Phi \\&= L\left( \frac{{\overline{\Lambda }}_m}{{\overline{\Lambda }}_\rho }\right) \cdot {\overline{\Lambda }}_\rho + R \left\| {\overline{\Lambda }}_\eta \right\| _{L^1(\Omega _D)} - (A^\top {\overline{\Lambda }}) \cdot \Phi \\&= L\left( {\overline{V}}\right) \cdot {\overline{\Lambda }}_\rho + R \left\| {\overline{\Lambda }}_\eta \right\| _{L^1(\Omega _D)} - {\overline{\Lambda }}_\rho \cdot (A_t \Phi ) \\ &\quad - {\overline{\Lambda }}_m \cdot (A_x \Phi ) - {\overline{\Lambda }}_\eta \cdot (A_R \Phi ) \\&\geqslant L\left( {\overline{V}}\right) \cdot {\overline{\Lambda }}_\rho - {\overline{\Lambda }}_\rho \cdot (A_t \Phi ) - {\overline{\Lambda }}_m \cdot (A_x \Phi ) \\&= {\overline{\Lambda }}_\rho \cdot \left\{ L\left( {\overline{V}}\right) - {\overline{V}}\cdot A_x \Phi - A_t \Phi \right\} \\&\geqslant {\overline{\Lambda }}_\rho \cdot \left\{ -A_t \Phi - H(A_x \Phi ) + \left| f_L\left( {\overline{V}}\right) - f_H \left( A_x \Phi \right) \right| ^2 \right\} \\&\geqslant \left\| f_L\left( {\overline{V}}\right) - f_H(\nabla _D \Phi )\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}, \end{aligned}$$
where we have used in the last inequality the constraint (3.7) coming from the admissibility of \(\Phi \). \(\square \)
We next quantify the violation of the discrete constraint that occurs by discretizing a continuous admissible competitor. This is where we crucially rely on the \(C^{1,1}\) regularity of \(\varphi \).
Lemma 5.5
(Violation of the Hamilton–Jacobi constraint due to discretization) Let \({\varphi }\) be an admissible potential for the continuous dual problem and \(\Pi {\varphi }\) be its discretization. Assume that \({\varphi }\in C^{1,1}({Q})\). Then the maximum violation of the discrete Hamilton–Jacobi inequality by \(\Pi {\varphi }\) is controlled as,
$$\begin{aligned} \Vert \left[ \partial _{\Delta t} \Pi {\varphi } + H (\nabla _D \Pi {\varphi })-\varepsilon \Delta _D \Pi {\varphi }\right] ^+\Vert _{L^\infty (Q'_D)} \leqslant Ch, \end{aligned}$$
with a constant \(C\) depending only on H and \(\Vert D^2 {\varphi } \Vert _{L^\infty (Q)}\). Here \([\cdot ]^+\) denotes the positive part and the parameter \(\varepsilon \) satisfies the monotonicity condition (2.10) for the vanishing viscosity scheme.
Proof
Using the admissibility of \({\varphi }\) and a standard argument by the mean value theorem, we have
$$\begin{aligned}&\Vert \left[ \partial _{\Delta t} \Pi {\varphi } + H (\nabla _D \Pi {\varphi })-\varepsilon \Delta _D \Pi {\varphi }\right] ^+\Vert \\&\quad \leqslant \Vert \partial _{\Delta t} \Pi \varphi - \partial _t \varphi \Vert + \Vert H (\nabla _D \Pi \varphi ) - H(\nabla \varphi )\Vert + \varepsilon \Vert \Delta _D \Pi \varphi \Vert + \Vert \left[ \partial _t \varphi + H(\nabla \varphi ) \right] ^+ \Vert \\&\quad \leqslant \Delta t\Vert \partial _{tt} \varphi \Vert + \left( \Delta x \operatorname {Lip}(H,B_R)+\varepsilon \right) \sum _{k,l}\Vert \partial _{kl} \varphi \Vert , \end{aligned}$$
where we denoted \(\Vert \cdot \Vert _{L^\infty (Q'_D)}\) by \(\Vert \cdot \Vert \) for simplicity. Finally, the condition (2.10) ensures that \(\varepsilon \) scales like \(\Delta x\), so that together with the assumption \(\varphi \in C^{1,1}({Q})\), we get the stated inequality. \(\square \)
Proof of Theorem 5.1
Let \(\delta _{{\overline{\varphi }}}:=\Vert \left[ \partial _{\Delta t} \Pi {\overline{\varphi }} + H (\nabla _D \Pi {\overline{\varphi }})-\varepsilon \Delta _D \Pi {\overline{\varphi }}\right] ^+\Vert _{L^\infty (Q_D')}\) be the maximum violation of the discrete Hamilton–Jacobi inequality by \(\Pi {\overline{\varphi }}\). We define a continuous potential \({\widetilde{\varphi }}\) by
$$\begin{aligned} {\widetilde{\varphi }}(t,x):={\overline{\varphi }}(t,x) - t \delta _{{\overline{\varphi }}} \end{aligned}$$
so that it is an admissible competitor for the continuous problem and \(\Pi {\widetilde{\varphi }}\) is an admissible competitor for the discrete problem by construction. This allows us to apply Lemma 5.4 to obtain
$$\begin{aligned} \left\| f_H(\nabla _D{\overline{\Phi }})- f_H(\nabla _D \Pi {\overline{\varphi }})\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)} =\left\| f_H(\nabla _D{\overline{\Phi }})- f_H(\nabla _D \Pi {\widetilde{\varphi }})\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)} \\ \leqslant F_D{\overline{\Phi }} - F_D\Pi {\widetilde{\varphi }} =F_D{\overline{\Phi }} - F_D{\overline{\varphi }}+\delta _{{\overline{\varphi }}}. \end{aligned}$$
As \(F_D{\overline{\varphi }} \geqslant F {\overline{\varphi }} - Ch\) by Lemma 4.2, and that \(F {\overline{\varphi }} \geqslant F_D{\overline{\Phi }} - C \sqrt{h}\) thanks to Theorem 4.1, we obtain indeed
$$\begin{aligned} \left\| f_H(\nabla _D{\overline{\Phi }})- f_H(\nabla _D \Pi {\overline{\varphi }})\right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)} \leqslant C h + C \sqrt{h} + \delta _{{\overline{\varphi }}}. \end{aligned}$$
Then we use Lemma 5.5 to bound \(\delta _{{\overline{\varphi }}}\) and get (5.3).
The estimate (5.4) is a simple consequence. We have
$$\begin{aligned} \Vert \nabla {\overline{\varphi }}-\nabla _D \Pi {\overline{\varphi }}\Vert _{L^\infty (Q_D')}\leqslant \Delta x \Vert D^2 {\overline{\varphi }}\Vert _{L^\infty (Q)} \leqslant C' h, \end{aligned}$$
with some constant \(C'\geqslant 0\) by a standard argument using the mean value theorem. With Proposition 5.3 and the regularity assumption for \(f_H\) we have the inequality,
$$\begin{aligned} \left\| f_L\left( {\overline{V}}\right) - f_L\left( {\overline{v}}\right) \right\| _{L^2_{{\overline{\Lambda }}_\rho }(Q_D')}&= \Vert f_H(\nabla _D {\overline{\Phi }}) -f_H(\nabla {\overline{\varphi }}) \Vert _{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)} \\&\leqslant \left\| f_H(\nabla _D {\overline{\Phi }})- f_H(\nabla _D \Pi {\overline{\varphi }})\right\| _{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)}+ C''h. \end{aligned}$$
with some \(C'' \geqslant 0\) depending on the Lipschitz constant of \(f_H\) on \(B_{\operatorname {Lip}(L,\operatorname {diam}(\Omega ))}\). Applying the estimate (5.3) concludes the proof. \(\square \)
Remark 5.6
(Comments on the rate \(\sqrt{h}\)) We see from the proofs of Theorem 4.1 and Theorem 5.1 that the lowest powers of convergence come from Theorem 2.4 (Theorem 1 in [23]). This means that the convergence rate \(\sqrt{h}\) for the solutions of the discrete Hamilton–Jacobi equations to the continuous one determines the convergence rate of both the cost and the optimizer of the optimal transport problem. In our numerical experiments in Sect 6, we found no examples where the convergence rate is exactly \(\sqrt{h}\). This suggests that the convergence for any pair of (possibly very rough) input measures is better than our theoretical result \(\sqrt{h}\). While the root \(\sqrt{h}\) for a discrete Hamilton–Jacobi equation (Theorem 2.4) is sharp because viscosity solutions may develop shocks in finite time, it is known that solutions \({\overline{\varphi }}\) to the dual problem can only have shocks exactly at time \(t=0\) or \(t=1\) (see the proof of [49, Theorem 5.51]). It is however still unclear to us if this property can be leveraged to improve the rate of convergence. Some discretizations of Hamilton–Jacobi equation with higher order of convergence for smooth inputs can be found e.g. in [18, 50] but the convergence is in \(L^2\), and not in \(L^\infty \) as in Theorem 2.4. Note, however, that a different discretization of the Hamilton–Jacobi equation with higher rates of convergence does not automatically improve our result: one would have to make sure that the steps in the proof of Theorem 4.1 and Theorem 5.1 carry through, and also that the constraint (3.3) is easy to handle numerically in case the scheme is different.

6 Numerical Illustrations

We numerically examine our discrete optimal transport problems and convergence results.

6.1 Settings

We solve the discrete problem with the vanishing viscosity scheme introduced in Sect. 3.2. For the cost function, we choose the quadratic Lagrangian \(L(v)=|v|^2/2\) following many works in the numerical computation of optimal transport although our theory is not limited to this specific cost function.
As for the domain, we discretize the \(1+1\) dimensional time-space domain \(Q=[0,1]\times [-\frac{1}{2}, \frac{1}{2}]\) with identification \((t,-\frac{1}{2})\sim (t,\frac{1}{2})\). We make a family of discrete domains by subdividing \(Q\) into grid points with subdivisions \(N_T\in \{16\times 2^{n} \mid n=0,1,2,3,4,5\}\) and \(N_X=N_T\). Namely the resolution \(h=\Delta t=\Delta x\) of each discrete domain is in \(\{\frac{1}{16}, \ldots , \frac{1}{16\times 2^5}\}\).
For each input probability measure, we compute its discretization given as (3.1) that we evaluate with explicit expressions or via numerical integration.
Implementation of our discretization
For each test case, we fix a pair of probability measures \(\mu \) and \(\nu \) and numerically solve the transport problem on each discrete domain. This can be performed by any standard convex optimization framework. We can obtain an optimal potential \({\overline{\Phi }}\) by solving the dual problem (Definition 3.6) and a minimizer \({\overline{\Lambda }}=({\overline{\Lambda }}_\rho , {\overline{\Lambda }}_m,{\overline{\Lambda }}_\eta )\) by solving the primal problem (Definition 3.7). As an example implementation, we solved the problems using the alternating direction method of multipliers (ADMM) [14]. As this method directly finds a saddle point of (3.9) the output of the algorithm contains both \({\overline{\Phi }}\) and \({\overline{\Lambda }}\). We ran ADMM until the \(L^2\) norms of both the primal and dual residuals became smaller than \(10^{-5}\). Such stopping criteria are discussed in literature such as [15]. Similarly to observations made in other discretizations of optimal transport such as [32], when the mesh size h decreases in our discretization, the number of necessary iterations for ADMM is unaffected while the time per iteration increases, thus the total computational times increases. For interested readers, we share the source code in Jupyter (Python 3); see https://​github.​com/​sdsgisd/​DynamicOTwithDua​lFormulation.
Comparison with standard finite differences
As a point of comparison, we also solve the same test cases with the finite difference discretization proposed by Papadakis, Peyré, and Oudet [42] while it does not guarantee any convergence result. For that end, we used the Julia code originally written by the second author to handle regularized unbalanced optimal transport [4]. In this code the resulting convex optimization problem is solved with proximal splitting which makes hard an exact comparison with our ADMM implementation. The Julia code is also shared in the same repository as above.
Monitoring of the errors
We consider the following gaps between continuous and discrete quantities:
$$\begin{aligned}&\epsilon _\mathcal {K}:=|\mathcal {K}(\mu ,\nu )-\mathcal {K}_D(\mu ,\nu )|,\\&\epsilon _{\varphi }:=\left\| \nabla _D \Pi {\overline{\varphi }}- \nabla _D {\overline{\Phi }} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)},\\&\epsilon _{v}:=\left\| {\overline{v}}-{\overline{V}} \right\| ^2_{L^2_{{\overline{\Lambda }}_\rho }(Q'_D)},\\&\epsilon _{\rho }:=\left\| \Pi {\overline{\rho }}_t-{\overline{\Lambda }}_\rho \right\| _{L^1(Q'_D) }. \end{aligned}$$
Here \({\overline{\rho }},{\overline{v}}\) are the optimal measure, velocity, and potential for the continuous problem. The error \(\epsilon _\mathcal {K}\) is for the transport cost as in Theorem 4.1 and the error \( \epsilon _v\) is for optimal velocities in Corollary 5.2 (a special case of Theorem 5.1). While we do not have a proven estimate between the continuous and discrete optimal measures, we also compute the error \(\epsilon _\rho \) for a reference purpose.
We plotted these errors in Fig.3, in which we also estimated the rates \(\alpha \) such that \(\varepsilon \sim h^\alpha \) by a standard routine with linear regression in the log domain. We omitted the plots for the error \(\epsilon _\varphi \) since the explicit expression for \({{\overline{\varphi }}}\) is unavailable in Test case 1, and \(\epsilon _\varphi \) overlapped almost perfectly with \(\epsilon _v\) in Test case 2 and 3 as \({{\overline{V}}}=\nabla _D {\overline{\Phi }}\) and the difference between \(\epsilon _v\) and \(\epsilon _{\varphi }\) originates from the discrepancy between \(\nabla _D \Pi {\overline{\varphi }}\) and \({\overline{v}} = \nabla {\overline{\varphi }}\) which is always zero except at a few exceptional grid points in these test cases.

6.2 Test Cases

We test three simple examples with different levels of regularity. In the second and the third test cases, the measures \(\mu , \nu \) do not have full support, so they do not satisfy the assumptions of the previous works finding quantitative rates.
Test case 1 Our first example is a pair of smooth densities bounded from below, which is of the highest regularity among our test cases. We set
$$\begin{aligned} \frac{\textrm{d}\mu }{\textrm{d}x}(x)=1+\frac{1}{2} \cos (2\pi w x), \qquad \frac{\textrm{d}\nu }{\textrm{d}x}(x)=1, \end{aligned}$$
with a parameter \(w\in \mathbb {Z}\setminus 0\), which we set \(w=1\) (Fig. 1). The optimizers and the transport cost are
$$\begin{aligned} \frac{\textrm{d}{{\overline{\rho }}}}{\textrm{d}x}(t,x) & =\frac{1+\frac{1}{2}\cos (2w\pi T_{t}^{-1}(x))}{1+\frac{t}{2}\cos (2w\pi T_{t}^{-1}(x))}, \quad {\overline{v}}(t,x)=\frac{1}{4\pi w}\sin (2\pi w T_t^{-1}(x)), \\ & \quad \mathcal {K}(\mu ,\nu )=\frac{1}{64\pi ^2 w^2}. \end{aligned}$$
where the inverse of the time-t optimal transport map \(T_t(y)=y+t\sin (2\pi w y)/4\pi w\) can be computed by the Newton-Raphson method, for instance. Note that the expression for \({\overline{\rho }}\) was obtained through the change of variable formula \(\textrm{d}{{\overline{\rho }}}_t/\textrm{d}x (T_t(y),t) = \textrm{d}\mu / \textrm{d}x(y) (T_t'(y))^{-1}\). The input measures have absolutely continuous densities bounded from below, which is a condition required in the previous work for the quantitative convergence [38]. We observed that all the errors are decreasing linearly or faster than the stepsize h as in Fig.3, thus faster than our upper bound.
Test case 2 We next test the transport between triangular densities. We set \(\mu \) and \(\nu \) by,
$$\begin{aligned} \frac{\textrm{d}\mu }{\textrm{d}x}(x)=\frac{1}{w^2}\max (w-|x|,0), \qquad \frac{\textrm{d}\nu }{\textrm{d}x}(x)=\frac{1}{4w^2}\max \left( 2w-|x|,0\right) , \end{aligned}$$
with a parameter \(0<w<1/4\), which we set \(w=0.2\) (Fig. 1). The optimizers and the transport cost are
$$\begin{aligned} \frac{\textrm{d}{\overline{\rho }}}{\textrm{d}x}(t,x)=\frac{1}{(1+t)^2w^2}\max \left( (1+t)w-|x|,0\right) , \quad {\overline{\varphi }}(t,x)=\frac{x^2}{2(1+t)}, \\ {\overline{v}}(t,x)=\frac{x}{1+t}, \quad \mathcal {K}(\mu ,\nu )=\frac{w^2}{12}. \end{aligned}$$
The measure \({\overline{\rho }}\) is expanded and flattened in time as in Fig.2. Even though \(\mu \) and \(\nu \) are not supported on the whole space, \({\overline{\varphi }}\) is of class \( C^{1,1}\) thus both Theorem 4.1 and Theorem 5.1 apply. All the errors are empirically decreasing linearly or faster as in Fig.3.
Test case 3 Our last example is the breaking of a unimodal density toward another one, where the potential is not of class \(C^{1,1}\). We set \(\mu \) and \(\nu \) as characteristic functions,
$$\begin{aligned} \frac{\textrm{d}\mu }{\textrm{d}x}(x)=\frac{1}{2w} \mathbb {1}_{|x|\leqslant w}, \qquad \frac{\textrm{d}\nu }{\textrm{d}x}(x)=\frac{1}{2w} \mathbb {1}_{1/2-|x|\leqslant w}, \end{aligned}$$
with a parameter \(0<w<1/2\), which we set \(w=0.05\) (Fig.1). The optimizers and the transport cost are
$$\begin{aligned} \frac{\textrm{d}{\overline{\rho }}}{\textrm{d}x}(t,x) & =\frac{1}{2w}\mathbb {1}_{0\leqslant |x|-ts\leqslant w}, \quad {\overline{\varphi }}(t,x)=|x|s-\frac{s^2t}{2}, \quad {\overline{v}}(t,x)=s \operatorname {sign}(x),\\ & \quad \mathcal {K}(\mu ,\nu )=\frac{s^2}{2} \end{aligned}$$
where all the mass travel the same distance \(s=\frac{1}{2}-w\). Note that the mass travels in two directions as seen in Fig.2 and the velocity field \({\overline{v}}\) is discontinuous at \(x=0\). We are in the situation where Theorem 4.1 applies, but the assumptions of Theorem 5.1 are not satisfied. Though our rates are again not sharp, the convergence of the transport cost seems to be a bit worse than in the first case, between \(\sqrt{h}\) and \(h\) as can be seen in Fig.3.
Remark 6.1
As we discussed in Sect. 3.2, the viscosity coefficient \(\varepsilon \) vanishes as the resolution \(h\rightarrow 0\). In the both examples, we see that for moderately small values of h (equivalently \(\varepsilon \)) there is a numerical smoothing whose effect becomes weaker as \(h\) decreases as in Fig. 2.
Remark 6.2
In both our test cases, we observed that our constraint (3.4), which here reads \(| \Delta _+ \Phi ^0/{\Delta x}| \leqslant R:=\operatorname {Lip}(L,B_{\operatorname {diam}(\Omega )})=\operatorname {diam}(\Omega )=1/2\), was effective i.e. \(\Vert {\overline{\Lambda }}_\eta \Vert _{L^1(\Omega _D)}>0\). That is, the discrete dual optimal transport problem has no regularizing effect on the potential and we really need to encode it as a constraint (3.4). We can also run numerical computations with a larger R (including \(\infty \)) for which the constraint may become ineffective. As mentioned in Remark 3.4, the scheme may not be monotone when R is too large. In that case, we lose the theoretical guarantee for convergence as the control between a continuous solution and a discrete solution to the Hamilton–Jacobi equation (Theorem 2.4) is missing. We, however, have observed no cases so far without convergence of the cost or the optimizers while we lose control of their Lipschitz constants which may be quite large though seems independent of the stepsize. Said otherwise, the method behaves well even when the theory breaks, which happens quite often in the numerical study of dynamic optimal transport.
Remark 6.3
The finite difference discretization by Papadakis, Peyré, and Oudet [42] better performed in all the examples. We speculate that this is because a) this method uses staggered grids i.e. vector values and scalar values are stored in different locations, which more accurately discretizes vector quantities than the collocated grids that our discretization relies on, b) the discretization in [42] is symmetric in both time and space while ours is asymmetric in time as mentioned in Remark 3.1, and c) ours regularizes the solution of the Hamilton–Jacobi equation with viscosity.
Note, however, that convergence of the cost or the optimizer in [42] is not guaranteed. Indeed, though it would be possible to write the dual problem to their discretization and it looks like a discretization of the Hamilton–Jacobi equation, it is one for which no convergence is guaranteed due to the lack of properties such as monotonicity.

Acknowledgements

The authors would like to thank Chris Wojtan for his continuous support and several interesting discussions. Part of this research was performed during two visits: one of SI to the BIDSA research center at Bocconi University, and one of HL to the Institute of Science and Technology Austria. Both host institutions are warmly acknowledged for the hospitality. HL is partially supported by the MUR-Prin 2022-202244A7YL “Gradient Flows and Non-Smooth Geometric Structures with Applications to Optimization and Machine Learning”, funded by the European Union - Next Generation EU. SI is supported in part by ERC Consolidator Grant 101045083 “CoDiNA” funded by the European Research Council.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literature
1.
go back to reference Rémi Abgrall. Numerical discretization of boundary conditions for first order hamilton–jacobi equations. SIAM Journal on Numerical Analysis, 41(6):2233–2261, 2003.MathSciNetCrossRef Rémi Abgrall. Numerical discretization of boundary conditions for first order hamilton–jacobi equations. SIAM Journal on Numerical Analysis, 41(6):2233–2261, 2003.MathSciNetCrossRef
2.
go back to reference Jason M Altschuler, Jonathan Niles-Weed, and Austin J Stromme. 2022 Asymptotics for semidiscrete entropic optimal transport. SIAM Journal on Mathematical Analysis, 54(2):1718–1741MathSciNetCrossRef Jason M Altschuler, Jonathan Niles-Weed, and Austin J Stromme. 2022 Asymptotics for semidiscrete entropic optimal transport. SIAM Journal on Mathematical Analysis, 54(2):1718–1741MathSciNetCrossRef
3.
go back to reference Luigi Ambrosio, Maria Colombo, Guido Philippis, and Alessio Figalli. Existence of Eulerian Solutions to the Semigeostrophic Equations in Physical Space: The \(2\)-Dimensional Periodic Case. Communications in Partial Differential Equations, 37, 11 2011.MathSciNet Luigi Ambrosio, Maria Colombo, Guido Philippis, and Alessio Figalli. Existence of Eulerian Solutions to the Semigeostrophic Equations in Physical Space: The \(2\)-Dimensional Periodic Case. Communications in Partial Differential Equations, 37, 11 2011.MathSciNet
4.
go back to reference Aymeric Baradat and Hugo Lavenant. Regularized unbalanced optimal transport as entropy minimization with respect to branching brownian motion. Astérisque, 2023. Aymeric Baradat and Hugo Lavenant. Regularized unbalanced optimal transport as entropy minimization with respect to branching brownian motion. Astérisque, 2023.
5.
go back to reference Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84:375–393, 2000.MathSciNetCrossRef Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84:375–393, 2000.MathSciNetCrossRef
6.
go back to reference Jean-David Benamou and Guillaume Carlier. Augmented Lagrangian methods for transport optimization, mean field games and degenerate elliptic equations. Journal of Optimization Theory and Applications, 167:1–26, 2015.MathSciNetCrossRef Jean-David Benamou and Guillaume Carlier. Augmented Lagrangian methods for transport optimization, mean field games and degenerate elliptic equations. Journal of Optimization Theory and Applications, 167:1–26, 2015.MathSciNetCrossRef
7.
go back to reference Jean-David Benamou, Guillaume Carlier, and Maxime Laborde. An augmented Lagrangian approach to Wasserstein gradient flows and applications. ESAIM: Proceedings and surveys, 54:1–17, 2016. Jean-David Benamou, Guillaume Carlier, and Maxime Laborde. An augmented Lagrangian approach to Wasserstein gradient flows and applications. ESAIM: Proceedings and surveys, 54:1–17, 2016.
8.
go back to reference Jean-David Benamou, Guillaume Carlier, and Filippo Santambrogio. Variational mean field games. Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 141–171, 2017. Jean-David Benamou, Guillaume Carlier, and Filippo Santambrogio. Variational mean field games. Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 141–171, 2017.
9.
go back to reference Jean-David Benamou and Vincent Duval. Minimal convex extensions and finite difference discretisation of the quadratic Monge–Kantorovich problem. European Journal of Applied Mathematics, 30(6):1041–1078, 2019.MathSciNetCrossRef Jean-David Benamou and Vincent Duval. Minimal convex extensions and finite difference discretisation of the quadratic Monge–Kantorovich problem. European Journal of Applied Mathematics, 30(6):1041–1078, 2019.MathSciNetCrossRef
10.
go back to reference Jean-David Benamou, Brittany D Froese, and Adam M Oberman. A viscosity solution approach to the Monge-Ampère formulation of the optimal transportation problem. arXiv preprint arXiv:1208.4873, 2012. Jean-David Benamou, Brittany D Froese, and Adam M Oberman. A viscosity solution approach to the Monge-Ampère formulation of the optimal transportation problem. arXiv preprint arXiv:​1208.​4873, 2012.
11.
go back to reference Jean-David Benamou, Brittany D Froese, and Adam M Oberman. 2014 Numerical solution of the optimal transportation problem using the Monge–Ampère equation. Journal of Computational Physics, 260:107–126MathSciNetCrossRef Jean-David Benamou, Brittany D Froese, and Adam M Oberman. 2014 Numerical solution of the optimal transportation problem using the Monge–Ampère equation. Journal of Computational Physics, 260:107–126MathSciNetCrossRef
12.
go back to reference Robert J. Berman. Convergence Rates for Discretized Monge-Ampère Equations and Quantitative Stability of Optimal Transport. Foundations of Computational Mathematics, 21(4):1099-1140,2021.MathSciNetCrossRef Robert J. Berman. Convergence Rates for Discretized Monge-Ampère Equations and Quantitative Stability of Optimal Transport. Foundations of Computational Mathematics, 21(4):1099-1140,2021.MathSciNetCrossRef
13.
go back to reference Guillaume Bonnet and Jean-Marie Mirebeau. Monotone discretization of the Monge–Ampère equation of optimal transport. ESAIM: Mathematical Modelling and Numerical Analysis, 56(3):815–865, 2022. Guillaume Bonnet and Jean-Marie Mirebeau. Monotone discretization of the Monge–Ampère equation of optimal transport. ESAIM: Mathematical Modelling and Numerical Analysis, 56(3):815–865, 2022.
14.
go back to reference Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.CrossRef Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.CrossRef
15.
go back to reference Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, March 2004. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, March 2004.
16.
go back to reference Yann Brenier and Dmitry Vorotnikov. On optimal transport of matrix-valued measures. SIAM Journal on Mathematical Analysis, 52(3):2849–2873, 2020.MathSciNetCrossRef Yann Brenier and Dmitry Vorotnikov. On optimal transport of matrix-valued measures. SIAM Journal on Mathematical Analysis, 52(3):2849–2873, 2020.MathSciNetCrossRef
17.
go back to reference José A Carrillo, Katy Craig, Li Wang, and Chaozhen Wei 2022 Primal dual methods for Wasserstein gradient flows. Foundations of Computational Mathematics, 22:389–443MathSciNetCrossRef José A Carrillo, Katy Craig, Li Wang, and Chaozhen Wei 2022 Primal dual methods for Wasserstein gradient flows. Foundations of Computational Mathematics, 22:389–443MathSciNetCrossRef
18.
go back to reference Yingda Cheng and Chi-Wang Shu. A discontinuous Galerkin finite element method for directly solving the Hamilton–Jacobi equations. Journal of Computational Physics, 223(1):398–415, 2007.MathSciNetCrossRef Yingda Cheng and Chi-Wang Shu. A discontinuous Galerkin finite element method for directly solving the Hamilton–Jacobi equations. Journal of Computational Physics, 223(1):398–415, 2007.MathSciNetCrossRef
19.
go back to reference Lénaïc Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. Unbalanced optimal transport: Dynamic and Kantorovich formulations. Journal of Functional Analysis, 274(11):3090–3123, 2018.MathSciNetCrossRef Lénaïc Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. Unbalanced optimal transport: Dynamic and Kantorovich formulations. Journal of Functional Analysis, 274(11):3090–3123, 2018.MathSciNetCrossRef
20.
go back to reference Dario Cordero-Erausquin. Sur le transport de mesures périodiques. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics, 329(3):199–202, 1999.MathSciNet Dario Cordero-Erausquin. Sur le transport de mesures périodiques. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics, 329(3):199–202, 1999.MathSciNet
21.
go back to reference Michael G. Crandall, Lawrence C Evans, and Pierre-Louis Lions. 1984 Some Properties of Viscosity Solutions of Hamilton-Jacobi Equations. Transactions of the American Mathematical Society, 282(2):487–502,MathSciNetCrossRef Michael G. Crandall, Lawrence C Evans, and Pierre-Louis Lions. 1984 Some Properties of Viscosity Solutions of Hamilton-Jacobi Equations. Transactions of the American Mathematical Society, 282(2):487–502,MathSciNetCrossRef
22.
go back to reference Michael G. Crandall and Pierre-Louis Lions. Viscosity Solutions of Hamilton-Jacobi Equations. Transactions of the American Mathematical Society, 277(1):1–42, 1983.MathSciNetCrossRef Michael G. Crandall and Pierre-Louis Lions. Viscosity Solutions of Hamilton-Jacobi Equations. Transactions of the American Mathematical Society, 277(1):1–42, 1983.MathSciNetCrossRef
23.
go back to reference Michael G. Crandall and Pierre-Louis Lions. Two Approximations of Solutions of Hamilton-Jacobi Equations. Mathematics of Computation, 43(167):1–19, 1984.MathSciNetCrossRef Michael G. Crandall and Pierre-Louis Lions. Two Approximations of Solutions of Hamilton-Jacobi Equations. Mathematics of Computation, 43(167):1–19, 1984.MathSciNetCrossRef
24.
go back to reference Matthias Erbar, Martin Rumpf, Bernhard Schmitzer, and Stefan Simon. Computation of optimal transport on discrete metric measure spaces. Numerische Mathematik, 144(1):157–200, 2020.MathSciNetCrossRef Matthias Erbar, Martin Rumpf, Bernhard Schmitzer, and Stefan Simon. Computation of optimal transport on discrete metric measure spaces. Numerische Mathematik, 144(1):157–200, 2020.MathSciNetCrossRef
25.
go back to reference Lawrence C. Evans. On solving certain nonlinear partial differential equations by accretive operator methods. Israel Journal of Mathematics, 36:225–247, 1980.MathSciNetCrossRef Lawrence C. Evans. On solving certain nonlinear partial differential equations by accretive operator methods. Israel Journal of Mathematics, 36:225–247, 1980.MathSciNetCrossRef
26.
go back to reference Lawrence C. Evans. Partial differential equations. American Mathematical Society, Providence, R.I., 2010. Lawrence C. Evans. Partial differential equations. American Mathematical Society, Providence, R.I., 2010.
27.
go back to reference Ivan Gentil, Christian Léonard, and Luigi Ripani. About the analogy between optimal transport and minimal entropy. Annales de la Faculté des Sciences de Toulouse. Mathématiques. Série 6, 3:569–600, 2017. Ivan Gentil, Christian Léonard, and Luigi Ripani. About the analogy between optimal transport and minimal entropy. Annales de la Faculté des Sciences de Toulouse. Mathématiques. Série 6, 3:569–600, 2017.
28.
go back to reference Peter Gladbach, Eva Kopfer, and Jan Maas. Scaling limits of discrete optimal transport. SIAM Journal on Mathematical Analysis, 52(3):2759–2802, 2020.MathSciNetCrossRef Peter Gladbach, Eva Kopfer, and Jan Maas. Scaling limits of discrete optimal transport. SIAM Journal on Mathematical Analysis, 52(3):2759–2802, 2020.MathSciNetCrossRef
29.
go back to reference Romain Hug, Emmanuel Maitre, and Nicolas Papadakis. On the convergence of augmented Lagrangian method for optimal transport between nonnegative densities. Journal of Mathematical Analysis and Applications, 485(2):123811, 2020.MathSciNetCrossRef Romain Hug, Emmanuel Maitre, and Nicolas Papadakis. On the convergence of augmented Lagrangian method for optimal transport between nonnegative densities. Journal of Mathematical Analysis and Applications, 485(2):123811, 2020.MathSciNetCrossRef
30.
go back to reference Stanislav Kondratyev, Léonard Monsaingeon, and Dmitry Vorotnikov. A new optimal transport distance on the space of finite Radon measures. Adv. Differential Equations, 21(11/12):1117–1164, 2016.MathSciNetCrossRef Stanislav Kondratyev, Léonard Monsaingeon, and Dmitry Vorotnikov. A new optimal transport distance on the space of finite Radon measures. Adv. Differential Equations, 21(11/12):1117–1164, 2016.MathSciNetCrossRef
31.
go back to reference Hugo Lavenant. Unconditional convergence for discretizations of dynamical optimal transport. Mathematics of Computation, 90(328):739–786, 2021.MathSciNetCrossRef Hugo Lavenant. Unconditional convergence for discretizations of dynamical optimal transport. Mathematics of Computation, 90(328):739–786, 2021.MathSciNetCrossRef
32.
go back to reference Hugo Lavenant, Sebastian Claici, Edward Chien, and Justin Solomon. Dynamical Optimal Transport on Discrete Surfaces. ACM Trans. Graph., 37(6), dec 2018. Hugo Lavenant, Sebastian Claici, Edward Chien, and Justin Solomon. Dynamical Optimal Transport on Discrete Surfaces. ACM Trans. Graph., 37(6), dec 2018.
33.
go back to reference Bowen Li and Jun Zou. On a general matrix valued unbalanced optimal transport and its fully discretization: dynamic formulation and convergence framework. arXiv preprint arXiv:2011.05845, 2020. Bowen Li and Jun Zou. On a general matrix valued unbalanced optimal transport and its fully discretization: dynamic formulation and convergence framework. arXiv preprint arXiv:​2011.​05845, 2020.
34.
35.
go back to reference Matthias Liero, Alexander Mielke, and Giuseppe Savaré. Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. Inventiones mathematicae, 211(3):969–1117, 2018.MathSciNetCrossRef Matthias Liero, Alexander Mielke, and Giuseppe Savaré. Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. Inventiones mathematicae, 211(3):969–1117, 2018.MathSciNetCrossRef
36.
go back to reference Jan Maas. Gradient flows of the entropy for finite Markov chains. Journal of Functional Analysis, 261(8):2250–2292, 2011.MathSciNetCrossRef Jan Maas. Gradient flows of the entropy for finite Markov chains. Journal of Functional Analysis, 261(8):2250–2292, 2011.MathSciNetCrossRef
37.
go back to reference Gonzalo Mena and Jonathan Niles-Weed. Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem. Advances in neural information processing systems, 32, 2019. Gonzalo Mena and Jonathan Niles-Weed. Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem. Advances in neural information processing systems, 32, 2019.
38.
go back to reference Andrea Natale and Gabriele Todeschi. Computation of optimal transport with finite volumes. ESAIM: Mathematical Modelling and Numerical Analysis, 55(5):1847–1871, 2021. Andrea Natale and Gabriele Todeschi. Computation of optimal transport with finite volumes. ESAIM: Mathematical Modelling and Numerical Analysis, 55(5):1847–1871, 2021.
39.
go back to reference Andrea Natale and Gabriele Todeschi. A mixed finite element discretization of dynamical optimal transport. Journal of Scientific Computing, 91(2):38, 2022.MathSciNetCrossRef Andrea Natale and Gabriele Todeschi. A mixed finite element discretization of dynamical optimal transport. Journal of Scientific Computing, 91(2):38, 2022.MathSciNetCrossRef
40.
go back to reference Jonathan Niles-Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25(4A):2620–2648, 2019.MathSciNet Jonathan Niles-Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25(4A):2620–2648, 2019.MathSciNet
41.
go back to reference Stanley Osher and Chi-Wang Shu. High-order essentially nonoscillatory schemes for Hamilton–Jacobi equations. SIAM Journal on numerical analysis, 28(4):907–922, 1991.MathSciNetCrossRef Stanley Osher and Chi-Wang Shu. High-order essentially nonoscillatory schemes for Hamilton–Jacobi equations. SIAM Journal on numerical analysis, 28(4):907–922, 1991.MathSciNetCrossRef
42.
go back to reference Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet. Optimal transport with proximal splitting. SIAM Journal on Imaging Sciences, 7(1):212–238, 2014.MathSciNetCrossRef Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet. Optimal transport with proximal splitting. SIAM Journal on Imaging Sciences, 7(1):212–238, 2014.MathSciNetCrossRef
43.
go back to reference Gabriel Peyré and Marco Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019. Gabriel Peyré and Marco Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019.
44.
go back to reference Elisabeth Rouy. Numerical approximation of viscosity solutions of first-order Hamilton-Jacobi equations with Neumann type boundary conditions. Mathematical Models and Methods in Applied Sciences, 02:357–374, 1992.MathSciNetCrossRef Elisabeth Rouy. Numerical approximation of viscosity solutions of first-order Hamilton-Jacobi equations with Neumann type boundary conditions. Mathematical Models and Methods in Applied Sciences, 02:357–374, 1992.MathSciNetCrossRef
45.
go back to reference Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Springer International Publishing, 2015. Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Springer International Publishing, 2015.
46.
go back to reference Filippo Santambrogio. Regularity via duality in calculus of variations and degenerate elliptic PDEs. Journal of Mathematical Analysis and Applications, 457(2):1649–1674, 2018. Special Issue on Convex Analysis and Optimization: New Trends in Theory and Applications. Filippo Santambrogio. Regularity via duality in calculus of variations and degenerate elliptic PDEs. Journal of Mathematical Analysis and Applications, 457(2):1649–1674, 2018. Special Issue on Convex Analysis and Optimization: New Trends in Theory and Applications.
47.
go back to reference Filippo Santambrogio and Xu-Jia Wang. Convexity of the support of the displacement interpolation: Counterexamples. Applied Mathematics Letters, 58:152–158, 2016.MathSciNetCrossRef Filippo Santambrogio and Xu-Jia Wang. Convexity of the support of the displacement interpolation: Counterexamples. Applied Mathematics Letters, 58:152–158, 2016.MathSciNetCrossRef
48.
go back to reference Bernhard Schmitzer, Klaus P. Schäfers, and Benedikt Wirth. Dynamic cell imaging in PET with optimal transport regularization. IEEE Transactions on Medical Imaging, 39(5):1626–1635, 2019.CrossRef Bernhard Schmitzer, Klaus P. Schäfers, and Benedikt Wirth. Dynamic cell imaging in PET with optimal transport regularization. IEEE Transactions on Medical Imaging, 39(5):1626–1635, 2019.CrossRef
49.
go back to reference Cedric Villani. Topics in Optimal Transportation. Graduate studies in mathematics. American Mathematical Society, 2003. Cedric Villani. Topics in Optimal Transportation. Graduate studies in mathematics. American Mathematical Society, 2003.
50.
go back to reference Tao Xiong, Chi-Wang Shu, and Mengping Zhang. A priori error estimates for semi-discrete discontinuous Galerkin methods solving nonlinear Hamilton-Jacobi equations with smooth solutions. International Journal of Numerical Analysis & Modeling, 10(1), 2013. Tao Xiong, Chi-Wang Shu, and Mengping Zhang. A priori error estimates for semi-discrete discontinuous Galerkin methods solving nonlinear Hamilton-Jacobi equations with smooth solutions. International Journal of Numerical Analysis & Modeling, 10(1), 2013.
Metadata
Title
Quantitative Convergence of a Discretization of Dynamic Optimal Transport Using the Dual Formulation
Authors
Sadashige Ishida
Hugo Lavenant
Publication date
11-11-2024
Publisher
Springer US
Published in
Foundations of Computational Mathematics
Print ISSN: 1615-3375
Electronic ISSN: 1615-3383
DOI
https://doi.org/10.1007/s10208-024-09686-3

Premium Partner