1 Introduction
Bisimulation is a concept that captures behavioural equivalence of states in a variety of types of transition systems. It has been widely studied in a discrete-time setting where the notion of a step is fundamental. An important and especially useful further notion is that of bisimulation metric which quantifies “how similar two states are”.
Most of the theoretical work that exists is on discrete time but a growing part of what computer science allows us to do is in real-time: robotics, self-driving cars, online machine-learning etc. A common solution is to discretize time, however it is well-known that this can lead to errors that are hopefully small but that may accumulate over time and lead to vastly different outcomes. For that reason, it is important to have a continuous-time way of quantifying the error made.
Anzeige
Bisimulation [21, 23, 25] is a fundamental concept in the theory of transition systems capturing a strong notion of behavioural equivalence. The extension to probabilistic systems is due to Larsen and Skou [20]; henceforth we will simply say “bisimulation” instead of “probabilistic bisimulation”. Bisimulation has been studied for discrete-time systems where transitions happen as steps, both on discrete [20] and continuous state spaces [3, 12, 13]. In all these types of systems, a crucial ingredient of the definition of bisimulation is the ability to talk about the next step. This notion of bisimulation is characterized by a modal logic [20] even when the state space is continuous [12].
Some work had previously been done in what are called continuous-time systems, see for example [2], but even in so-called continuous-time Markov chains there is a discrete notion of time step; it is only that there is a real-valued duration associated with each state that leads to people calling such systems continuous-time. They are often called “jump processes” in the mathematical literature (see, for example, [24, 28]), a phrase that better captures the true nature of such processes. Metrics and equivalences for such processes were studied by Gupta et al. [17, 18].
The processes we consider have continuous state spaces and are governed by a continuous-time evolution, a paradigmatic example is Brownian motion. When approximating such processes by discrete-time processes, entirely new phenomena and difficulties manifest themselves in this procedure. For example, even the basic properties of trajectories of Brownian motion are vastly more complicated than the counterparts of a random walk. Basic concepts like “the time at which a process exits a given subset of the state space” becomes intricate to define. Notions like “matching transition steps” are no longer applicable as the notion of “step” does not make sense.
In [7‐9], we proposed different notions of behavioural equivalences on continuous-time processes. We showed that there were several possible extensions of the notion of bisimulation to continuous time and that the continuous-time notions needed to involve trajectories in order to be meaningful. There were significant mathematical challenges in even proving that an equivalence relation existed. For example, obstacles occurred in establishing measurability of various functions and sets, due to the inability to countably generate the relevant \(\sigma \)-algebras. Those papers left completely open the question of defining a suitable pseudometric analogue, a concept that would be more useful in practice than an equivalence relation.
Anzeige
Previous work on discrete-time Markov processes by Desharnais et al. [14, 15] extended the modal logic characterizing bisimulation to a real-valued logic that allowed to not only state if two states were “behaviourally equivalent” but, more interestingly, how similarly they behaved. This shifts the notion from a qualitative notion (an equivalence) to a quantitative one (a pseudometric).
Other work also on discrete-time Markov processes by van Breugel et al. [26] introduced a slightly different real-valued logic and compared the corresponding pseudometric to another pseudometric obtained as a terminal coalgebra of a carefully crafted functor. We also mention in this connexion the work by Ferns et al. on Markov Decision Processes and the connexion between bisimulation and optimal value functions [16].
In this work, we are looking to extend the notion of bisimulation metric to a behavioural pseudometric on continuous-time processes. Very broadly speaking, we are following a familiar path from equivalences to logics to metrics. However, it is necessary for us to redevelop the framework and the mathematical techniques from scratch. Indeed, a very important aspect in discrete-time is the fact that the process is a jump process, “hopping” from state to state. This limitation also applies to continuous-time Markov chains. In our case, we want to cover processes that evolve through time. A standard example would be Brownian motion or other diffusion processes (often described by stochastic differential equations). As one will see throughout this work, there are new mathematical challenges that need to be overcome. This means that the similarity between the pre-existing work on discrete-time and our generalization to continuous-time is only at the highest level of abstraction.
Outline of the paper1: The first two sections after the introduction are background. We will start by recalling some mathematical notions in Section 2, introducing the continuous-time processes that we will be studying in Section 3. In Section 4, we will introduce a functional \(\mathcal {F}\) and define a pseudometric \(\overline{\delta }\) using this functional. We will also show that the pseudometric \(\overline{\delta }\) is a fixpoint of \(\mathcal {F}\). In Section 5, we will show that this pseudometric is characterized by a real-valued logic. We will further emphasize the novelty of this work wrt discrete time and summarize the obstacles that we had to overcome in Section 6. We will provide some examples in Section 7. Finally we will discuss the limitations of our approach and how it relates to previous works in Section 8.
2 Mathematical background
We assume the reader to be familiar with basic measure theory and topology. Nevertheless we provide a brief review of the relevant notions and theorems. Let us start with clarifying a few notations on integrals: Given a measurable space X equipped with a measure \(\mu \) and a measurable function \(f: X \rightarrow \mathbb {R}\), we can write either \(\int f~ \textrm{d}\mu \) or \(\int f(x) ~ \mu (\textrm{d}x)\) interchangeably. The second notation will be especially useful when considering a Markov kernel \(P_t\) for some \(t \ge 0\) and \(x \in X\): \(\int f(y) ~P_t(x, \textrm{d}y) = \int f ~ \textrm{d}P_t(x)\).
2.1 Couplings
Definition 1
Let \((X, \varSigma _X, P)\) and \((Y, \varSigma _Y, Q)\) be two probability spaces. Then a coupling \(\gamma \) of P and Q is a probability distribution on \((X \times Y, \varSigma _X \otimes \varSigma _Y)\) such that for every \(B_X \in \varSigma _X\), \(\gamma (B_X \times Y) = P(B_X)\) and for every \(B_Y \in \varSigma _Y\), \(\gamma (X \times B_Y) = Q(B_Y)\) (P, Q are called the marginals of \(\gamma \)). We write \(\varGamma (P, Q)\) for the set of couplings of P and Q.
Lemma 1
Given two probability measures P and Q on Polish spaces X and Y respectively, the set of couplings \(\varGamma (P, Q)\) is compact under the topology of weak convergence.
2.2 Optimal transport theory
A lot of this work is based on optimal transport theory. This whole subsection is based on [27] and will be adapted to our framework.
Consider a Polish space \(\mathcal {X}\) and a lower semi-continuous cost function \(c: \mathcal {X}\times \mathcal {X}\rightarrow [0,1]\), meaning that for any for every \(x_0, y_0 \in \mathcal {X}\), \(\liminf _{(x, y) \rightarrow (x_0, y_0)} c(x, y) \ge c(x_0, y_0)\). Assume that c also satisfies that for every \(x \in \mathcal {X}\), \(c(x,x) = 0\).
For every two probability distributions \(\mu \) and \(\nu \) on \(\mathcal {X}\), we write \(W(c)(\mu , \nu )\) for the optimal transport cost from \(\mu \) to \(\nu \). Adapting Theorem 5.10(iii) of [27] to our framework, we get the following statement for the Kantorovich duality:where \( \mathcal {H}(c) = \{ h: \mathcal {X}\rightarrow [0,1] ~|~ \forall x,y ~~ |h(x) - h(y)| \le c(x,y) \}\).
$$W(c)(\mu , \nu ) = \min _{\gamma \in \varGamma (\mu , \nu )} \int c ~ \textrm{d}\gamma = \max _{h \in \mathcal {H}(c)} \left| \int h ~ \textrm{d}\mu - \int h ~\textrm{d}\nu \right| $$
Lemma 2
If the cost function c is a 1-bounded pseudometric on \(\mathcal {X}\), then W(c) is a 1-bounded pseudometric on the space of probability distributions on \(\mathcal {X}\).
We will later need the following technical lemma. Theorem 5.20 of [27] states that a sequence \(W(c_k)(P_k, Q_k)\) converges to W(c)(P, Q) if \(c_k\) uniformly converges to c and \(P_k\) and \(Q_k\) converge weakly to P and Q respectively. Uniform convergence in the cost function may be too strong a condition for us, but the following lemma is enough for what we need.
Lemma 3
Consider a Polish space \(\mathcal {X}\) and a cost function \(c: \mathcal {X}\times \mathcal {X}\rightarrow [0,1]\) such that there exists an increasing (\(c_{k+1} \ge c_k\) for every k) sequence of continuous cost functions \(c_k: \mathcal {X}\times \mathcal {X}\rightarrow [0,1]\) that converges to c pointwise. Then, given two probability distributions P and Q on \(\mathcal {X}\),
$$\begin{aligned} \lim _{k \rightarrow \infty } W(c_k)(P,Q) = W(c)(P, Q). \end{aligned}$$
3 Background on continuous-time Markov processes
This work focuses on continuous-time processes that are honest (without loss of mass over time) and with additional regularity conditions. In order to define what we mean by continuous-time Markov processes here, we first define Feller-Dynkin processes. Much of this material is adapted from [24] and we use their notations. Another useful source is [4].
Let E be a locally compact, Hausdorff space with a countable base. We also equip the set E with its Borel \(\sigma \)-algebra \(\mathcal {B}(E)\), denoted \(\mathcal {E}\). The previous topological hypotheses also imply that E is \(\sigma \)-compact and Polish (see corollary IX.57 in [5]). We will denote \(\varDelta \) for the 1-bounded metric that generates the topology making E Polish.
Definition 2
A semigroup of operators on any Banach space X is a family of linear continuous (bounded) operators \(\mathcal {P}_t: X \rightarrow X\) indexed by \(t\in \mathbb {R}_{\ge 0}\) such thatand
$$\begin{aligned} \forall s,t \ge 0, \mathcal {P}_s \circ \mathcal {P}_t = \mathcal {P}_{s+t} \qquad \text {(semigroup property)} \end{aligned}$$
$$\begin{aligned} \mathcal {P}_0 = I \qquad \text {(the identity)}. \end{aligned}$$
Definition 3
For X a Banach space, we say that a semigroup \(\mathcal {P}_t:X\rightarrow X\) is strongly continuous if
$$\begin{aligned} \forall x\in X, \lim _{t\downarrow 0}\Vert \mathcal {P}_t x - x \Vert \rightarrow 0. \end{aligned}$$
What the semigroup property expresses is that we do not need to understand the past (what happens before time t) in order to compute the future (what happens after some additional time s, so at time \(t+s\)) as long as we know the present (at time t).
We say that a continuous real-valued function f on E “vanishes at infinity” if for every \(\varepsilon > 0\) there is a compact subset \(K \subseteq E\) such that for every \(x\in E\setminus K\), we have \(|f(x)| \le \varepsilon \). To give an intuition, if E is the real line, this means that \(\lim _{x \rightarrow \pm \infty } f(x) = 0\). The space \(C_0(E)\) of continuous real-valued functions that vanish at infinity is a Banach space with the \(\sup \) norm.
Definition 4
A Feller-Dynkin (FD) semigroup is a strongly continuous semigroup \((\hat{P}_t)_{t \ge 0}\) of linear operators on \(C_0(E)\) satisfying the additional condition:
$$\begin{aligned} \forall t \ge 0 ~~~ \forall f \in C_0(E) \text {, if }~~ 0 \le f \le 1 \text {, then }~~ 0 \le \hat{P}_t f \le 1 \end{aligned}$$
The Riesz representation theorem can be found as Theorem II.80.3 of [24]. From it, we can derive the following important proposition which relates these FD-semigroups with Markov kernels. This allows one to see the connection with familiar probabilistic transition systems.
Proposition 1
Given an FD-semigroup \((\hat{P}_t)_{t \ge 0}\) on \(C_0(E)\), it is possible to define a unique family of sub-Markov kernels \((P_t)_{t \ge 0} : E \times \mathcal {E} \rightarrow [0,1]\) such that for all \(t \ge 0\) and \(f \in C_0(E)\),
$$\begin{aligned} \hat{P}_t f(x) = \int f(y) P_t(x, \textrm{d}y). \end{aligned}$$
Given a time t and a state x, we will often write \(P_t(x)\) for the measure \(P_t(x, \cdot )\) on E. Note that since E is Polish, then \(P_t(x)\) is tight.
Definition 5
A process described by the FD-semigroup \((\hat{P}_t)_{t \ge 0}\) is honest if for every \(x \in E\) and every time \(t \ge 0\), \(P_t(x, E) = 1\).
Worded differently, a process is honest if there is no loss of mass over time. A standard example of an honest process is Brownian motion.
3.1 Observables
In previous sections, we defined Feller-Dynkin processes. In order to bring the processes more in line with the kind of transition systems that have hitherto been studied in the computer science literature, we also equip the state space E with an additional continuous function \(obs: E \rightarrow [0,1]\). One should think of it as the interface between the process and the user (or an external observer): external observers won’t see the exact state in which the process is at a given time, but they will see the associated observables. What could be a real-life example is the depth at which a diver goes: while the diver does not know precisely his location underwater, at least his watch is giving him the depth at which he is.
Note that the condition on the observable is a major difference from our previous work [7, 8] since we used a countable set of atomic propositions AP and obs was a discrete function \(E \rightarrow 2^{AP}\).
Definition 6
In this study, a Continuous-time Markov process (abbreviated CTMP) is an honest FD-semigroup on \(C_0(E)\) equipped with a continuous function \(obs: E \rightarrow [0,1]\) and that satisfies the following additional property: if a sequence \((x_n)_{n \in \mathbb {N}}\) converges to x in E, then for every t, the sequence of measures \((P_t(x_n))_{n \in \mathbb {N}}\) weakly converges to the measure \(P_t(x)\).
Remark 1
Some properties could be relaxed. For instance, in some cases, a non honest process could be made into a CTMP by adding a state \(\partial \). Another hypothesis that could be relaxed is the one on obs by imposing some stronger conditions on the FD-process.
4 Generalizing to continuous-time through a functional
We start by defining a behavioural pseudometric on our CTMPs by defining a functional \(\mathcal {F}\) on the lattice of 1-bounded pseudometrics. As we will see, unlike in the discrete-time case, it is not possible to apply the Banach fixpoint theorem and get a fixpoint metric a priori: instead we need to construct a candidate and then show that it is a fixpoint of our functional. More specifically, the idea is to iteratively apply our functional to a metric and then consider the supremum of the sequence of pseudometrics. Doing so requires to first restrict the scope of our functional \(\mathcal {F}\).
4.1 Lattices
At the core of this construction is the definition of a functional on the lattice of 1-bounded pseudometrics.
Let \(\mathcal {M}\) be the lattice of 1-bounded pseudometrics on the state space E equipped with the order \(\le \) defined as: \(m_1 \le m_2\) if and only if for every (x, y), \(m_1(x,y) \le m_2(x,y)\). We can define a sublattice \(\mathcal {P}\) of \(\mathcal {M}\) by restricting to pseudometrics that are lower semi-continuous (wrt the original topology \(\mathcal {O}\) on E generated by the metric \(\varDelta \) making the space E Polish). We will further require to define the sublattice \(\mathcal {C}\) which is the set of pseudometrics \(m \in \mathcal {M}\) on the state space E such that the topology generated by m on E is a subtopology of the original topology \(\mathcal {O}\), i.e. m is a continuous function \(E \times E \rightarrow [0,1]\).
We have the following inclusion: \(\mathcal {C}\subset \mathcal {P}\subset \mathcal {M}\).
Remark 2
One has to be careful here. The topology \(\mathcal {O}\) on E is generated by the 1-bounded metric \(\varDelta \), and hence \(\varDelta \) is in \(\mathcal {C}\). However, we can define many pseudometrics that are not related to \(\mathcal {O}\). As an example, the discrete pseudometric2 on the real line is not related to the usual topology on \(\mathbb {R}\).
4.2 Defining our functional
Throughout the rest of the paper, \((P_t)_{t>=0}\) is the family of Markov kernels associated with a CTMP. Given a discount factor \(0 < c< 1\), we define the functional \(\mathcal {F}_c : \mathcal {P}\rightarrow \mathcal {M}\) as follows: for every pseudometric \(m \in \mathcal {P}\) and every two states x, y,\( \mathcal {F}_c(m)(x,y) \) compares all the distributions \(P_t(x)\) and \(P_t(y)\) through transport theory and takes their supremum.
$$\begin{aligned} \mathcal {F}_c(m)(x,y) = \sup _{t \ge 0} c^t W(m) (P_t(x), P_t(y)). \end{aligned}$$
There are several remarks to make on this definition. First, we can only define \(\mathcal {F}_c (m)\) if m is lower semi-continuous since we are using optimal transport theory which is why the domain of \(\mathcal {F}_c\) is only \(\mathcal {P}\).
Additionally, even if m is lower semi-continuous, \(\mathcal {F}_c(m)\) may not even be measurable which means that the range of \(\mathcal {F}_c\) is not the lattice \(\mathcal {P}\). At least, Lemma 2 ensures that \(\mathcal {F}_c(m)\) is indeed in \(\mathcal {M}\), as a supremum of pseudometrics. This subtlety was not present in the work on continuous-time Markov chains in [17, 18].
Second, we will use the Kantorovich duality throughout this work. It only holds for probability measures, and that is why we restrict this work to honest processes.
As a direct consequence of the definition of \(\mathcal {F}_c\), we have that \(\mathcal {F}_c\) is monotone: if \(m_1 \le m_2\) in \(\mathcal {P}\), then \(\mathcal {F}_c(m_1) \le \mathcal {F}_c(m_2)\). It is also easy to prove the following result.
Lemma 4
For every pseudometric m in \(\mathcal {P}\), discount factor \(0 < c < 1\) and pair of states x, y,
$$\begin{aligned} m(x,y) \le \mathcal {F}_c(m)(x,y). \end{aligned}$$
4.3 When restricted to continuous pseudometrics
We wish to iteratively apply \(\mathcal {F}_c\) in order to construct a fixpoint (in a similar fashion to the proof of the Knaster-Tarski theorem). While \(\mathcal {F}_c(m)\) is a pseudometric (for \(m \in \mathcal {P}\)), there is no reason for it to be in \(\mathcal {P}\). This means that we cannot hastily apply \(\mathcal {F}_c\) iteratively to just any pseudometric in order to obtain a fixpoint.
However, if m is a pseudometric which is continuous wrt the original topology, then so is \(\mathcal {F}_c(m)\).
Lemma 5
Consider a pseudometric \(m \in \mathcal {C}\). Then the topology generated by \(\mathcal {F}_c(m)\) is a subtopology of the original topology \(\mathcal {O}\) for any discount factor \(0< c< 1\).
This is where we need that the discount factor \(c<1\). The condition that \(c< 1\) enables us to maintain continuity by allowing to bound the time interval we consider. Indeed, given \(T> 0\), for any time \(t \ge T\) and any \(x, y \in E\), we know that \(c^t W(m) (P_t(x), P_t(y)) \le ~c^T\).
Proof
Since c and m are fixed throughout the proof, we will omit noting them and for instance write \(\mathcal {F}(x,y)\) instead of \(\mathcal {F}_c(m)(x,y)\) and \(W(P_t(x), P_t(y))\) instead of \(W(m)(P_t(x), P_t(y))\). We will also write \(\varPhi (t,x,y) = c^t W(m)(P_t(x), P_t(y))\), i.e. \(\mathcal {F}(x,y) = \sup _t \varPhi (t,x,y)\).
It is enough to show that for a fixed state x, the map \(y \mapsto \mathcal {F}(x,y)\) is continuous.
Pick \(\epsilon >0\) and a sequence of states \((y_n)_{n \in \mathbb {N}}\) converging to y. We want to show that there exists M such that for all \(n \ge M\),Pick t such that \(\mathcal {F}(x,y) = \sup _s \varPhi (s, x,y) \le \varPhi (t, x, y) + \epsilon / 4\), i.e.Recall that \(P_t(y_n)\) converges weakly to \(P_t(y)\) and hence we can apply Theorem 5.20 of [27] and get:This means that there exists \(N'\) such that for all \(n \ge N'\),This further implies that for all \(n \ge N'\),In order to show (1), it is enough to show that there exists N such that for every \(n \ge N\),Indeed, in that case, \(\forall n \ge \max \{N, N'\}\), using Equations (2) and (3),So let us show (4). Assume it is not the case: for all N, there exists \(n \ge N\) such that \( |\varPhi (t, x, y_n) - \mathcal {F}(x, y_n)| > \epsilon / 2\), i.e.Define the sequence \((N_k)_{k \in \mathbb {N}}\) by: \(N_{-1} = -1\) and if \(N_k\) is defined, define \(N_{k+1}\) to be the smallest \(n \ge N_k + 1\) such that \(\varPhi (t, x, y_n) + \epsilon / 2 < \mathcal {F}(x, y_n)\). In particular for every \(k \in \mathbb {N}\), \(\mathcal {F}(x, y_{N_k}) > \epsilon / 2\). There exists T such that for every \(s \ge T\), \(c^s < \epsilon /2\). We thus have thatTherefore for every \(k \in \mathbb {N}\), there exists \(s_k \in [0, T]\) such thatWe get a sequence \((s_k)_{k \in \mathbb {N}} \subset [0,T]\), and there is thus a subsequence \((t_k)_{k \in \mathbb {N}}\) converging to some \(t' \in [0, T]\). There is a corresponding subsequence \((z_k)_{k \in \mathbb {N}}\) of the original sequence \((y_{N_k})_{k \in \mathbb {N}}\). Since \(\lim _{n \rightarrow \infty } y_n = y\) , \(\lim _{k \rightarrow \infty } z_k = y\).
$$\begin{aligned} |\mathcal {F}(x, y) - \mathcal {F}(x, y_n)| & \le \epsilon . \end{aligned}$$
(1)
$$\begin{aligned} | \varPhi (t, x, y) - \mathcal {F}(x,y)| \le \epsilon / 4. \end{aligned}$$
(2)
$$\begin{aligned} \lim _{n \rightarrow \infty } W(P_t(x), P_t(y_n)) = W(P_t(x), P_t(y)). \end{aligned}$$
$$\begin{aligned} |W(P_t(x), P_t(y_n)) - W(P_t(x), P_t(y))| \le \epsilon /4. \end{aligned}$$
$$\begin{aligned} |\varPhi (t, x, y_n) - \varPhi (t, x, y)| & \le c^t \epsilon /4 \le \epsilon / 4. \end{aligned}$$
(3)
$$\begin{aligned} |\varPhi (t, x, y_n) - \mathcal {F}(x, y_n)| & \le \epsilon / 2. \end{aligned}$$
(4)
$$\begin{aligned} &|\mathcal {F}(x, y) - \mathcal {F}(x, y_n)|\\ & \le |\mathcal {F}(x, y) - \varPhi (t, x, y)| + |\varPhi (t, x, y) - \varPhi (t, x, y_n)| + |\varPhi (t, x, y_n) - \mathcal {F}(x, y_n)| \\ & \le \epsilon / 4 + \epsilon / 4 + \epsilon / 2 = \epsilon . \end{aligned}$$
$$\begin{aligned} \varPhi (t, x, y_n) + \epsilon / 2 < \mathcal {F}(x, y_n). \end{aligned}$$
$$\begin{aligned} \forall k \in \mathbb {N} \quad \mathcal {F}(x, y_{N_k}) = \sup _{ 0 \le s \le T} \varPhi (s, x, y_{N_k}). \end{aligned}$$
$$\begin{aligned} \mathcal {F}(x, y_{N_k}) \le \varPhi (s_k, x, y_{N_k}) + \epsilon / 8. \end{aligned}$$
(5)
We constructed the sequence \((N_k)_{k \in \mathbb {N}}\) such that \(\varPhi (t, x, y_{N_k}) + \epsilon / 2 < \mathcal {F}(x, y_{N_k})\). Hence by Equation (5),which means that by taking the limit \(k \rightarrow \infty \),At the start of this proof, we picked t such that \(\mathcal {F}(x,y) = \sup _s \varPhi (s, x,y) \le \varPhi (t, x, y) + \epsilon / 4\) which means thatEquations (6) and (7) are incompatible which concludes the proof. \(\square \)
$$\begin{aligned} \varPhi (t, x, z_k) + \epsilon / 2 < \mathcal {F}(x, y_{N_k}) \le \varPhi (t_k, x, z_k) + \epsilon / 8, \end{aligned}$$
$$\begin{aligned} \varPhi (t, x, y) + \epsilon / 2 &\le \varPhi (t', x, y) + \epsilon / 8. \end{aligned}$$
(6)
$$\begin{aligned} \varPhi (t', x, y) &\le \varPhi (t, x,y) + \epsilon / 4 . \end{aligned}$$
(7)
4.4 Defining our family of pseudometrics
We are now finally able to iteratively apply our functional \(\mathcal {F}_c\) on continuous pseudometrics and thus construct a sequence of increasing pseudometrics and its limit.
Since obs is a continuous function \(E \rightarrow \mathbb {R}_{\ge 0}\) and by Lemma 5, we can define the sequence of pseudometrics in \(\mathcal {C}\) for each \(0 < c< 1\):By Lemma 4, for every two states x and y, \(\delta ^c_{n+1}(x,y) \ge \delta ^c_n(x,y)\). Define the pseudometric \(\overline{\delta }^c = \sup _n \delta ^c_n\) (which is also a limit since the sequence is non-decreasing).
$$\begin{aligned} \delta ^c_0(x,y) & = |obs(x) - obs(y)|,\\ \delta ^c_{n+1} &= \mathcal {F}_c(\delta _n^c). \end{aligned}$$
Since it is the supremum of continuous functions, the pseudometric \(\overline{\delta }^c\) is lower semi-continuous and is thus in the lattice \(\mathcal {P}\) for any \(0 < c < 1\).
Remark 3
Note that the lattice \(\mathcal {C}\) is not complete which means that, although the metrics \(\delta ^c_n\) all belong to \(\mathcal {C}\), \(\overline{\delta }^c\) does not need to be in \(\mathcal {C}\). For that reason, we cannot directly use the Knaster-Tarski theorem in this work.
4.5 Fixpoint
Even though we are not able to define the metric \(\overline{\delta }^c\) as a fixpoint directly, it is actually a fixpoint.
Theorem 1
The pseudometric \(\overline{\delta }^c\) is a fixpoint for \(\mathcal {F}_c\).
The proof relies on the use of Sion’s minimax theorem on the functionwhere Y is the set of linear combinations of pseudometrics \(\sum _{n \in \mathbb {N}} a_n \delta _n\) such that for every n, \(a_n \ge 0\) (and finitely many are non-zero) and \(\sum _{n \in \mathbb {N}} a_n = 1\).
$$\begin{aligned} \varXi : \varGamma (P_t(x), P_t(y)) \times Y & \rightarrow [0,1]\\ (\gamma , m) & \mapsto \int m ~ \textrm{d}\gamma . \end{aligned}$$
Lemma 6
Consider a discount factor \(0 < c < 1\) and a pseudometric m in \(\mathcal {P}\) such that then \(m \ge \overline{\delta }^c\).
1.
m is a fixpoint for \(\mathcal {F}_c\),
2.
for every two states x and y, \(m(x,y) \ge |obs(x) - obs(y)|\),
The proof is done by showing that \(m \ge \delta _n^c\) by induction on n.
We even have the following characterization of \(\overline{\delta }^c\) using Lemma 6 and Theorem 1.
Theorem 2
The pseudometric \(\overline{\delta }^c\) is the least fixpoint of \(\mathcal {F}_c\) that is greater than the pseudometric \((x, y) \mapsto |obs(x) - obs(y)|\).
5 Corresponding real-valued logic
Similarly to what happens in the discrete-time setting, this behavioural pseudometric \(\overline{\delta }^c\) can be characterized by a real-valued logic. This real-valued logic should be thought of as tests performed on the diffusion process, for instance “what is the expected value of obs after letting the process evolve for time t?” and it generates a pseudometric on the state space by looking at how different the process performs on those tests starting from different positions.
5.1 The logic
Definition of the logic: The logic is defined inductively and is denoted \(\varLambda \):for all \(f_1, f_2, f \in \varLambda \), \(q \in [0,1] \cap \mathbb {Q}\) and \(t \in \mathbb {Q}_{ \ge 0}\).
$$\begin{aligned} f \in \varLambda & := q~|~ obs ~|~ \min \{ f_1, f_2 \} ~|~ 1-f ~|~ f \ominus q ~|~ \langle t \rangle f \end{aligned}$$
This logic closely resembles the ones introduced for discrete-time systems by Desharnais et al. [14, 15] and by van Breugel et al. [26]. The key difference is the term \(\langle t \rangle f\) which deals with continuous-time.
Interpretation of the logics: We fix a discount factor \(0 < c< 1\). The expressions in \(\varLambda \) are interpreted inductively as functions \(E \rightarrow [0,1]\) as follows for a state \(x \in E\):Whenever we want to emphasize the fact that the expressions are interpreted for a discount factor \(0<c<1\), we will write \(\varLambda _c\).
$$\begin{aligned} q(x) & = q, \\ obs(x) & = obs(x), \\ (\min \{ f_1, f_2 \})(x) & = \min \{ f_1(x), f_2(x) \}, \\ (1-f)(x) & = 1 - f(x), \\ (f \ominus q)(x) & = \max \{ 0, f(x) - q \}, \\ (\langle t \rangle f)(x) &= c^t ~ \int f(y) ~ P_t(x, \textrm{d}y) = c^t~ \left( \hat{P}_t f\right) (x). \end{aligned}$$
Remark 4
Let us clarify what the difference is between an expression in \(\varLambda \) and its interpretation. Expressions can be thought of as the notation \(+, ^2, \times \) etc. They don’t carry much meaning by themselves but one can then interpret them for a given set: \(\mathbb {R}, C_0(E)\) (continuous functions \(E \rightarrow \mathbb {R}\) that vanish at infinity) for instance. Combining notations, one can write expressions that can then be interpreted on a given set.
From \(\varLambda \), we can also define the expression \(f \oplus q = 1 - ((1 - f) \ominus q)\) which is interpreted as a function \(E \rightarrow [0,1]\) as \((f \oplus q)(x) = \min \{ 1, f(x) + q \}\).
5.2 Definition of the pseudometric
The pseudometric we derive from the logic \(\varLambda \) corresponds to how different the test results are when the process starts from x compared to the case when it starts from y.
Given a fixed discount factor \(0< c<1\), we can define the pseudometric \(\lambda ^c\):The latter equality holds since for every \(f \in \varLambda _c\), \(\varLambda _c\) also contains \(1-f\).
$$\begin{aligned} \lambda ^c (x,y) = \sup _{f \in \varLambda _c} (f (x) - f(y)) = \sup _{f \in \varLambda _c} |f(x) - f(y)|. \end{aligned}$$
5.3 Comparison to the fixpoint metric
This real-valued logic \(\varLambda _c\) is especially interesting as the corresponding pseudometric \(\lambda ^c\) matches the fixpoint pseudometric \(\overline{\delta }^c\) for the functional \(\mathcal {F}_c\) that we defined in Section 4.4. In order to show that \(\lambda ^c = \overline{\delta }^c\), we establish the inequalities in both directions. One of the direction is proven by induction on the structure of the terms in our logic \(\varLambda _c\):
Lemma 7
For every f in \(\varLambda _c\), there exists n such that for every x, y, \( f(x) - f(y) \le \delta ^c_n(x,y)\).
Since \(\lambda ^c (x,y) = \sup _{f \in \varLambda _c} ( f(x) - f(y) )\), we get the following corollary as an immediate consequence of Lemma 7.
Corollary 1
For every discount factor \(0< c< 1\), \(\overline{\delta }^c \ge \lambda ^c\).
We now aim at proving the reverse inequality. We will use the following result (Lemma A.7.2) from [1] which is also used in [26] for discrete-time processes.
Lemma 8
Let X be a compact Hausdorff space. Let A be a subset of the set of continuous functions \(X \rightarrow \mathbb {R}\) such that if \(f, g \in A\), then \(\max \{ f,g\}\) and \(\min \{f,g\}\) are also in A. Consider a function h that can be approximated at each pair of points by functions in A, meaning thatThen h can be approximated by functions in A, meaning that \(\forall \epsilon > 0 ~ \exists g \in A ~ \forall x\in X~ |h(x) - g(x)| \le \epsilon \).
$$\begin{aligned} \forall x, y \in X~ \forall \epsilon > 0~ \exists g \in A ~ |h(x) - g(x)| \le \epsilon \text { and } |h(y) - g(y)| \le \epsilon \end{aligned}$$
In order to use this lemma, we need the following lemmas:
Lemma 9
For every \(f \in \varLambda _c\), the function \(x \mapsto f(x)\) is continuous.
Proof
This is done by induction on the structure of \(\varLambda _c\). The only case that is not straightforward is when \(f = \langle t \rangle g\). By induction hypothesis, g is continuous. Since the map \(x \mapsto P_t(x)\) is continuous (onto the weak topology), \(f = c^t \hat{P}_t g\) is continuous.
Lemma 10
Consider a continuous function \(h: E \rightarrow [0,1]\) and two states \(z, z'\) such that there exists f in the logic \(\varLambda _c\) such that \(|h(z) - h(z')| \le |f(z) - f(z')|\). Then for every \(\epsilon > 0\), there exists \(g \in \varLambda _c\) such that \(|h(z) - g(z)| \le 2 \epsilon \) and \(|h(z') - g(z')| \le 2 \epsilon \).
Proof
WLOG \(h(z) \ge h(z')\) and \(f(z) \ge f(z')\) (otherwise consider \(1-f\) instead of f). Pick \(p,q,r \in \mathbb {Q} \cap [0,1]\) such thatOne can then show that \(g = ( \min \{ f \ominus p, q \} ) \oplus r\) satisfies the desired property.
$$\begin{aligned} p & \in [f(z') - \epsilon , f(z')],\\ q & \in [h(z) - h(z') - \epsilon , h(z) - h(z')],\\ r & \in [h(z'), h(z') + \epsilon ]. \end{aligned}$$
Corollary 2
Consider a continuous function \(h: E \rightarrow [0,1]\) such that for any two states \(z, z'\) there exists f in the logic \(\varLambda _c\) such that \(|h(z) - h(z')| \le |f(z) - f(z')|\). Then for every compact set K in E, there exists a sequence \((g_n)_{n \in \mathbb {N}}\) in \(\varLambda _c\) that approximates h on K.
Proof
We have proven in Lemma 10 that such a function h can be approximated at pairs of states by functions in \(\varLambda _c\). Now recall that all the functions in \(\varLambda _c\) are continuous (Lemma 9).
We can thus apply Lemma 8 on the compact set K, and we get that the function h can be approximated by functions in \(\varLambda _c\).
Theorem 3
The pseudometric \(\lambda ^c\) is a fixpoint of \(\mathcal {F}_c\).
Proof
We will omit writing c as an index in this proof. We already have that \(\lambda \le \mathcal {F}(\lambda )\) (cf Lemma 4), so we only need to prove the reverse direction.
There are countably many expressions in \(\varLambda \) so we can number them: \(\varLambda = \{ f_0, f_1, ...\}\). Write \(m_k (x,y) = \max _{j \le k} |f_j(x) - f_j(y)|\). Since all the \(f_j\) are continuous (Lemma 9), the map \(m_k\) is also continuous. Furthermore, \(m_k \le m_{k+1}\) and for every two states x and y, \(\lim _{k \rightarrow \infty } m_k(x,y) = \lambda (x,y)\).
Using Lemma 3, we know that for every states x, y and time t,This implies thatIt is therefore enough to show that for every k, every time \(t \in \mathbb {Q}_{\ge 0}\) and every pair of states x, y, \(c^t W(m_k)(P_t(x), P_t(y)) \le \lambda (x,y)\). There exists \(h \in \mathcal {H}(m_k)\) such that \(W(m_k)(P_t(x), P_t(y)) = \left| \int h~ \textrm{d}P_t(x) - \int h~ \textrm{d}P_t(y)\right| .\)
$$\begin{aligned} \sup _k W(m_k)(P_t(x), P_t(y)) = W(\lambda )(P_t(x), P_t(y)). \end{aligned}$$
$$\begin{aligned} \mathcal {F}(\lambda ) (x, y) &= \sup _{t \ge 0} c^t W(\lambda )(P_t(x), P_t(y)) = \sup _{k} \sup _{t \ge 0} c^t W(m_k)(P_t(x), P_t(y)). \end{aligned}$$
Since \(P_t(x)\) and \(P_t(y)\) are tight, for every \(\epsilon > 0\), there exists a compact set \(K \subset E\) such that \(P_t(x, E \setminus K) \le \epsilon / 4\) and \(P_t(y, E \setminus K) \le \epsilon / 4\).
By Corollary 2, there exists \((g_n)_{n \in \mathbb {N}}\) in \(\varLambda \) that pointwise converge to h on K. In particular, for n large enough,and similarly for \(P_t(y)\). We get that for n large enough.and similarly for \(P_t(y)\). We can thus conclude thatSince \(\epsilon \) is arbitrary, \(c^t W(m_k) (P_t(x), P_t(y)) \le \lambda (x, y)\). \(\square \)
$$\begin{aligned} \left| \int _K g_n ~ \textrm{d}P_t(x) - \int _K h ~ \textrm{d}P_t(x) \right| \le \epsilon /4, \end{aligned}$$
$$\begin{aligned} & \left| \int _E g_n ~ \textrm{d}P_t(x) - \int _E h ~ \textrm{d}P_t(x) \right| \\ & \le \left| \int _K g_n ~ \textrm{d}P_t(x) - \int _K h ~ \textrm{d}P_t(x) \right| + \int _{E \setminus K} |g_n- h| ~ \textrm{d}P_t(x) \le \epsilon / 2, \end{aligned}$$
$$\begin{aligned} & c^t W(m_k) (P_t(x), P_t(y)) = c^t \left| \int _E h ~ \textrm{d}P_t(x) - \int _E h ~ \textrm{d}P_t(y) \right| \\ & \le c^t \left| \int _E g_n ~ \textrm{d}P_t(x) - \int _E h ~ \textrm{d}P_t(x) \right| + c^t \left| \int _E g_n ~ \textrm{d}P_t(y) - \int _E h ~ \textrm{d}P_t(y) \right| \\ & \qquad + \left| (\langle t \rangle g_n)(x) - ( \langle t \rangle g_n)(y)\right| \\ & \le \epsilon + \lambda (x, y). \end{aligned}$$
As a consequence of Theorem 3 and using Lemma 6, we get that \(\overline{\delta }^c \le \lambda ^c\). Then with Corollary 1, we finally get:
Theorem 4
The two pseudometrics are equal: \(\overline{\delta }^c = \lambda ^c\).
6 Obstacles in continuous time
As we have pointed out throughout this work, although the overall outline is similar to that employed in the discrete case, we are forced to develop new strategies to overcome technical challenges arising from the continuum setting, where the topological properties of the time/state space become crucial elements in the study.
For example, a key obstacle we face in this work is that the fixpoint pseudometric can’t be derived directly from the Banach fixed point theorem. There is no notion of step and we therefore need to consider all times in \(\mathbb {R}_{\ge 0}\). In discrete-time, the counterpart of our functional \(\mathcal {F}_c\) is c-Lipschitz. However, in our case, this should be thought of as \(c^1\). Since we are forced to consider all times and since \(\sup _{t \ge 0} c^t = 1\), we cannot find a constant \(k < 1\) such that \(\mathcal {F}_c\) is k-Lipschitz. To overcome this issue, we construct a candidate pseudometric with brute force (we first define a sequence of pseudometrics \(\delta _n^c\) and then define our candidate \(\overline{\delta }^c\) as their supremum) and then prove that it is indeed the greatest fixpoint.
In addition, the scope of the functional \(\mathcal {F}_c\) also requires careful treatment: for measurability reasons, we have to restrict to the lattice of pseudometrics which generate a subtopology of the pre-existing one. We cannot apply the Kleene fixed point theorem either since this lattice is not complete. This whole approach toward a fixpoint pseudometric differs substantially from the long-existing methodology.
Furthermore, for some of our key results (e.g., Theorem 3 and Lemma 5), the proofs rely on the compactness argument to establish certain convergence relations in temporal and/or spatial variables and to further achieve the goals. This type of procedure in general is not required in the discrete setting.
Although from time to time, we do restrict the time variable to rational values thanks to the continuity of the FD-semigroups, this is very different from treating discrete-time models, because rational time stamps cannot be “ordered” to represent the notion of “next step” in a continuous-time setting. Therefore, we are still working with a true continuous-time dynamics, and it cannot be reduced to a discrete-time problem.
7 Examples
7.1 A toy example
Let us consider a process defined on \(\{ 0, x, y, z, \partial \}\). Let us first give an intuition for what we are trying to model. In the states x, y, z, the process is trying to learn a value. In the state 0, the correct value has been learnt, but in the state \(\partial \), an incorrect value has been learnt. From the three “ learning” states x, y and z, the process has very different learning strategies:
-
from x, the process exponentially decays to the correct value represented by the state 0,
-
from y, the process is not even attempting to learn the correct value and thus remains in a learning state, and
-
from z, the process slowly learns but it may either learn the correct value (0) or an incorrect one (\(\partial \)).
The word “learning” is here used only to give colour to the example.
Formally, this process is described by the time-indexed kernels:(where \(\lambda \ge 0\)) and by the observable function \(obs(x) = obs(y) = obs(z) = r \in (0,1)\), \(obs(\partial ) = 0\) and \(obs(0) = 1\).
$$\begin{aligned} \begin{array}{llr} P_t (x, \{0\}) = 1 - e^{- \lambda t} & P_t(z, \{ 0\}) = \frac{1}{2}(1 - e^{- \lambda t}) & P_t(0, \{ 0\}) = 1\\ P_t (x, \{x\}) = e^{- \lambda t} & P_t(z, \{ \partial \}) = \frac{1}{2}(1 - e^{- \lambda t}) & P_t(y, \{ y\}) = 1\\ & P_t(z, \{ z\}) =e^{- \lambda t} & P_t(\partial , \{ \partial \}) = 1 \end{array} \end{aligned}$$
We will pick as a discount factor \(c = e^{-\lambda }\) to simplify notations. We can compute the distance \(\overline{\delta }\) (we omit adding c in the notation).
We will only detail the computations for \(\overline{\delta }(0, z)\) in the main section of this paper. First of all, note that \(\delta _0(0, z) = 1 - r\) and thatWe are thus studying the function \(\phi : \theta \mapsto \theta \left[ \frac{1}{2} + \theta \left( \delta _n(0, z) - \frac{1}{2} \right) \right] \). Its derivative \(\phi '\) has value 0 at \(\theta _0 = \frac{1}{2(1 - 2 \delta _n (0, z))}\). There are three distinct cases:
$$\begin{aligned} \delta _{n+1} (0, z) & = \sup _{t \ge 0} e^{- \lambda t} \left( e^{- \lambda t} \delta _n(0, z) + \frac{1}{2} (1 - e^{- \lambda t}) \delta _n(0, \partial ) \right) \\ & = \sup _{0 < \theta \le 1} \theta \left( \theta \delta _n(0, z) + \frac{1}{2} (1 - \theta ) \right) \\ & = \sup _{0 < \theta \le 1} \theta \left[ \frac{1}{2} + \theta \left( \delta _n(0, z) - \frac{1}{2} \right) \right] \end{aligned}$$
1.
If \(0 < \delta _n(0, z) \le \frac{1}{4}\), in that case, \(0 < \theta _0 \le 1\) and \(\sup _{1 < \theta \le 1} \phi (\theta )\) is attained in \(\theta _0\) and in this case, we have that \(\delta _{n+1} (0, z) = \frac{1}{8(1 - 2 \delta _n (0, z))} \le \frac{1}{4}\). This means in particular that if \(1 - r \le \frac{1}{4}\), then \(\overline{\delta }(0, z) = \frac{1}{4}\).
2.
If \(\frac{1}{4} \le \delta _n(0, z) \le \frac{1}{2}\), then \(\theta _0 \ge 1\) and \(\phi \) is increasing on \((- \infty , \theta _0]\). In that case, we therefore have that \(\sup _{1 < \theta \le 1} \phi (\theta )\) is attained in 1 and therefore \(\delta _{n+1}(0, z) = \delta _n(0, z)\).
3.
If \(\frac{1}{2} \le \delta _n(0, z)\), then \(\theta _0 \le 0\) and \(\phi \) is increasing on \([\theta _0, + \infty )\). In that case, we therefore have that \(\sup _{1 < \theta \le 1} \phi (\theta )\) is attained in 1 and therefore \(\delta _{n+1}(0, z) = \delta _n(0, z)\).
We therefore have thatWe summarize the other cases in the following table. Note that the computation of \(\overline{\delta }(x, z)\) is too involved and we therefore only provide an interval.
$$ \overline{\delta }(0, z) = {\left\{ \begin{array}{ll} \frac{1}{4} \qquad \text {if } r \ge \frac{3}{4}\\ 1 - r \qquad \text {otherwise.} \end{array}\right. } $$
x | y | z | \(\partial \) | 0 | |
---|---|---|---|---|---|
x | 0 | \(\frac{1 - r}{2}\) | \(\in \left[ \frac{1}{8}, \frac{1}{4} \right] \) | \({\left\{ \begin{array}{ll} r ~~ \text {if } r \ge \frac{1}{2}\\ \frac{1}{2} ~~ \text {otherwise} \end{array}\right. }\) | \(1 - r\) |
y | \(\frac{1 - r}{2}\) | 0 | \(\frac{1}{4}\) | r | \(1 - r\) |
z | \(\in \left[ \frac{1}{8}, \frac{1}{4} \right] \) | \(\frac{1}{4}\) | 0 | \({\left\{ \begin{array}{ll} r ~~ \text {if } r \ge \frac{1}{4}\\ \frac{1}{4} ~~ \text {otherwise} \end{array}\right. }\) | \({\left\{ \begin{array}{ll} \frac{1}{4} ~~ \text {if } r \ge \frac{3}{4}\\ 1 - r ~~ \text {otherwise} \end{array}\right. }\) |
\(\partial \) | \({\left\{ \begin{array}{ll} r ~~ \text {if } r \ge \frac{1}{2}\\ \frac{1}{2} ~~ \text {otherwise} \end{array}\right. }\) | r | \({\left\{ \begin{array}{ll} r ~~ \text {if } r \ge \frac{1}{4}\\ \frac{1}{4} ~~ \text {otherwise} \end{array}\right. }\) | 0 | 1 |
0 | \(1 - r\) | \(1 -r\) | \({\left\{ \begin{array}{ll} \frac{1}{4} ~~ \text {if } r \ge \frac{3}{4}\\ 1 - r ~~ \text {otherwise} \end{array}\right. }\) | 1 | 0 |
Note that even though the process behaves vastly differently from x than from y, we have that \(\overline{\delta }(x, 0) = \overline{\delta }(y, 0) \), even though for \(t > 0\), we have that \(\hat{P}_t obs(x) > \hat{P}_t obs(y)\). However note that x and y have different distances to other states.
This also happens in the discrete-time setting and for continuous-time Markov chains. A continuous-time Markov chain is a continuous-time type of processes but where the evolution still occurs as steps. They can be described as jump processes over continuous time. They have been studied in [18] by considering the whole trace starting from a single state.
It is possible to adapt our work to traces (called trajectories in [24]) by adding some additional regularity conditions on the processes; this can be found in [10]. However one should consider the added complexity. For instance, for Brownian motion the kernels \(P_t\) are well-known and easy to describe with a density function but that is not the case for the probability measures on trajectories.
7.2 Two examples based on Brownian motion
Previous example is a very simple example which emphasizes some of the difficulties of computing our metric. It may then be tempting to think that our approach cannot yield any result when applied to the real world. The next examples show that we can still provide some meaningful results when looking at real-life processes such as Brownian motion.
Brownian motion is a stochastic process describing the irregular motion of a particle being buffeted by invisible molecules. We refer the reader to [19] for a detailed introduction. One might recall that standard Brownian motion is a stochastic process \((B_t)_{t \ge 0}\) is defined on a probability space \(\varOmega \) with the properties that \(B_0 = 0\) almost surely and for \(0\le s < t\), \(B_t - B_s\) is independent of \(\mathcal {F}_s\) and is normally distributed with mean 0 and variance \(t-s\).
In this very special process, one can start at any place, there is an overall translation symmetry which makes calculations more tractable. We denote \((B_t^x)_{t \ge 0}\) the standard Brownian motion on the real line starting from x.
First example: We can then define the hitting time of 0 or 1 of our standard Brownian motion \((B_t^x)_{t \ge 0}\): \(\tau :=\inf \{t\ge 0:B_{t}^{x}=0\text { or }B_{t}^{x}=1\}\). The intuition is that this is the first time that the process hits either 0 or 1. It is a random time \(\varOmega \rightarrow \mathbb {R}_{\ge 0}\).
Now consider the stochastic process \((B_{t\wedge \tau }^{x})_{t \ge 0}\): the intuition is that this process behaves like Brownian motion until it reaches one of the “boundaries” (either 0 or 1) where it becomes “stuck”.
We now restrict the state space to the interval [0, 1]. For every \(x\in [0,1]\) and every \(t\ge 0\), let \(P_{t}\left( x,\cdot \right) \) be distribution of \(B_{t\wedge \tau }^{x}\). In other words, \(P_{t}\left( x,\cdot \right) \) is the distribution of Brownian motion starting from x running until hitting either 0 or 1 and getting trapped upon hitting a boundary. We equip the state space [0, 1] with obs defined as \(obs(x) = x\). Then, \(obs\left( B_{t\wedge \tau }^{x}\right) =B_{t\wedge \tau }^{x}\) and hence for every \(x\in \left( 0,1\right) \)where the first equality follows from the fact that \(P_{t}\left( 0\right) \) is the dirac distribution \(\mathfrak {d}_{0}\) at 0 and the second equality comes from the fact that any coupling \(\gamma \in \varGamma \left( \mathfrak {d}_{0},P_{t}\left( x\right) \right) \) is reduced to the marginal \(P_{t}\left( x\right) \). Since \(\delta _{0}\left( 0,x\right) = \delta _{1}\left( 0,x\right) \), we then have \(\bar{\delta }\left( 0,x\right) =x\).
$$\begin{aligned} \delta _{1}\left( 0,x\right) & =\sup _{t\ge 0}c^{t}W\left( \delta _{0}\right) \left( P_{t}\left( 0\right) ,P_{t}\left( x\right) \right) =\sup _{t\ge 0}c^{t}\mathbb {E}[B_{t\wedge \tau }^{x}]\\ & =\sup _{t\ge 0}c^{t}\cdot x\text { (because }B_{t\wedge \tau }^{x}\text { is a martingale)}\\ & =x. \end{aligned}$$
Similarly, \(\bar{\delta }\left( 1,y\right) =1-y\) for every \(y\in \left[ 0,1\right] \). It is difficult to compute \(\bar{\delta }\left( x,y\right) \) for general \(x,y\in \left[ 0,1\right] \) though.
Second example: This example relies on stochastic differential equations (SDE). We refer the reader to [22] for a comprehensive introduction.
Let the state space and obs be the same as above. Let \(Q_{t}\left( x,\cdot \right) \) be the distribution of the solution to the SDEIt can be verified that the solution to this equation is the process \(X_{t}=xe^{B^0_{t}}\). In this case, we also have \(Q_{t}\left( 0\right) =\mathfrak {d}_{0}\). Again, for every \(x\in \left[ 0,1\right] \),where \(\tau ':=\inf \left\{ s\ge 0 ~|~ B^0_{s}\ge -\ln x\right\} \) and \(\mathbb {E}\) and \(\mathbb {P}\) denote expected values and probabilities for the standard Brownian motion starting in 0. The distribution of \(\tau '\), as well as the joint distribution of \(\left( B^0_{t},\tau '\right) \), has been determined explicitly (see Chapter 2, Section 8 in [19] for instance). Even if we only consider the second term in the expression above, we haveIt is possible for the right hand side above to be greater than x. For example, if \(x=\frac{1}{e}\), then by choosing \(t=4\), we have that the right hand side above is no smaller than \(c\frac{2}{\sqrt{2\pi }}\int _{\frac{1}{2}}^{\infty }e^{-\frac{y^{2}}{2}}dy\), which will be (strictly) greater than \(\frac{1}{e}\) (i.e., x) provided that c is sufficiently close to 1. As a consequence, for this example, we have \(\bar{\delta }\left( 0,x\right) >x\).
$$ dX_{t}=X_{t}dB^0_{t}+\frac{1}{2}X_{t}dt\text { with }X_{0}=x. $$
$$\begin{aligned} \delta _{1}\left( 0,x\right) & =\sup _{t\ge 0}c^{t}W\left( \delta _{0}\right) \left( Q_{t}\left( 0\right) ,Q_{t}\left( x\right) \right) =\sup _{t\ge 0}c^{t}\mathbb {E}[obs(xe^{B^0_{t}})]\\ & =\sup _{t\ge 0}c^{t}\left( x\mathbb {E}[e^{B^0_{t}} ~|~ t\le \tau ']+\mathbb {P}\left( t>\tau '\right) \right) , \end{aligned}$$
$$ \delta _{1}\left( 0,x\right) \ge \sup _{t\ge 0}c^{t}\mathbb {P}\left( t>\tau '\right) =\sup _{t\ge 0}c^{t}\frac{2}{\sqrt{2\pi }}\int _{\frac{-\ln x}{\sqrt{t}}}^{\infty }e^{-\frac{y^{2}}{2}}dy. $$
These two examples demonstrate different behaviors of the two systems, while the first system “maintains” the mass at the starting point (expectation is constant x), the second system dissipates the mass to the right (which is the direction of larger values of obs). Therefore, processes starting from the same point x possess different distances to the static path between these two systems.
A very important observation from these examples based on Brownian motion is that even though we cannot explicitly compute values, we are still able to compare state behaviours.
8 Conclusion
In our previous work [7, 8], we showed that we needed to use trajectories in order to define a meaningful notion of behavioural equivalence. However working with trajectories is extremely complex as notions do not translate easily from states to trajectories; for instance, we said that a measurable set B of trajectories is time-obs-closed if for every two trajectories \(\omega , \omega '\) such that for every time t, \(obs(\omega (t)) = obs(\omega '(t))\), then \(\omega \in B\) if, and only if \(\omega ' \in B\). The \(\sigma \)-algebra of all time-obs-closed sets cannot be simply described.
To explain why this present work does not deal with trajectories, we need to first discuss the example that lead us to trajectories in [7]: consider Brownian motion on the real line equipped with the function \(obs = \mathbbm {1}_{\{ 0\}}\). There are four obs-closed sets: \(\emptyset , \{ 0\}, \mathbb {R} \setminus \{ 0\}, \mathbb {R}\) and for any \(x \ne 0\) and any time t, \(P_t(x, \{ 0\}) = 0\). This meant that one could not distinguish between the states 1 and 1000 in this specific case. Using trajectories enabled us to consider intervals of time. In this current work, we have decided to instead “smooth” the function obs so as to prevent singling points out without needing to deal with trajectories.
Let us go back to our examples. As shown in those examples, computing \(\overline{\delta }\) is quite an involved process. It would have been interesting to adapt our example in Section 7.1 to the real-line with \(obs(x) = e^{- x^2}\) and consider other processes such as Brownian motion which stops once it hits 0 or the Ornstein-Uhlenbeck process (a variation of Brownian motion which is “attracted” to 0). Having to deal with transport theory, a supremum (over time) and the inductive definition of \(\overline{\delta }\) makes it virtually impossible to compute any harder example. As we have seen in the second example (Section 7.2), it may still be possible to compare the behaviours of two states by stating for instance that “the behaviour of the process starting from x is closer to that starting from y than from z”. The problem of transport theory and the Kantorovich metric also exists in discrete time and there are interesting ways to deal with it, for instance through the MICo distance [6].
One of the advantages of optimal transport is that the Kantorovich duality gives us a way of computing bounds. As one notes however, there is again some difficulty with having a supremum over time in the definition of our functional \(\mathcal {F}_c\). In particular, we can only provide lower bounds for \(\overline{\delta }\).
One avenue of work on this could be to study replacements for this supremum, such as integrals over time for instance. Note that the real-valued logic would also need to be adapted even though it seems to generalize discrete-time really well.
This work is, as far as we know, the first behavioural metric adapted to the continuous-time case. Clearly much remains to be done, particularly the exploration of examples and connexions to broader classes of processes, such as for example those defined by stochastic differential equations.
Acknowledgments
Linan Chen and Prakash Panangaden were supported by NSERC discovery grants
Florence Clerc was supported by the EPSRC-UKRI grant EP/Y000455/1, by a CREATE grant for the project INTER-MATH-AI, by NSERC and by IVADO through the DEEL Project CRDPJ 537462-18.
Disclosure of Interests
The authors have no competing interests to declare.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.