To solve the problem, one must first choose the type of strategies available to the players. As it is standard in the theory of stochastic (in this case piecewise deterministic) differential games, two classes of strategies are available. Open-loop strategies correspond to an entire path chosen for the control variable, while with closed-loop strategies (also known as feedback or Markov strategies) the control variable are a function of the observed level of the state space. The latter class of strategies is more appealing as it appears to be more realistic than open-loop strategies. On the other hand, solving for feedback strategies is, in general, much more difficult. In the case of the problem at hand, we do believe that it can be solved in closed-loop strategies, with the help of the paradigm of backward induction and numerical techniques.
Let us begin with feedback strategies, also known as Markovian strategies. In this setup, the optimal controls are expressed in terms of the state variables, that is
\(p_i(t) = \psi _i\left( x_I(t),x_G(t)\right) \). This means that at each time players perform an action based on the observed state of the system. In this case, the solution of the problem can be expressed in terms of a system of Hamilton-Jacobi-Bellman equations. Let
\(V_i(x,m)\) denote the value function of player
\(i \in \{I,G\}\) in regime
\(m \in \{1,2,3\}\), with
\(x = (x_I,x_G)\). Then, for
\(V_i(x,m)\) to be the solution of the piecewise-deterministic differential game, they must solve the following system:
$$\begin{aligned} \begin{aligned} r V_i(x,m) = \max _{p_i} \left\{ \pi _i(p_i,m) +\frac{d}{dx_I} V_I(x,m) f_I(x, p_I,p_G,h) +\right. \\ \frac{d}{dx_G} V_G(x,m) f_G(x, p_I,p_G,h) + \left. \sum _{n\ne m} \lambda _{n,m}\left( V_i(x,n)-V_i(x,m)\right) \right\} \end{aligned} \end{aligned}$$
(14)
Let us now focus on open-loop strategies. In this case, the actions of the players are expressed in terms of time, the regime of the system and the state of the system at the last observed switch, that is
\(p_i(t) = \phi _i\left( m\left( s(t)\right) , x\left( s(t)\right) ,t-s(t)\right) \), where
s(
t) denotes the time in which the last switching of regime occurred before time
t. In such case, the value functions depend explicitly on time. Thus, for
\(V_i(x,m,t)\) to be the solution to the problem under piecewise open-loop strategies, they must solve the following system of Hamilton-Jacobi-Bellman equation:
$$\begin{aligned} \begin{aligned} r V_i(x,m,t ) - \frac{d}{d_t}V_i(x,m,t ) = \max _{p_i} \left\{ \pi _i(p_i,m) +\frac{d}{dx_I} V_I(x,m,t) f_I(x, p_I,p_G,h) +\right. \\ \frac{d}{dx_G} V_G(x,m,t) f_G(x, p_I,p_G,h) + \left. \sum _{n\ne m} \lambda _{n,m}\left( V_i(x,n,0)-V_i(x,m,t)\right) \right\} \end{aligned} \end{aligned}$$
(15)
The systems of equation in (
14) and (
15) describe sufficient conditions for the solution of the problem sketched in Sect.
2. However, both conditions do not admit closed-form solution, so that we must use a mix of analytical and numerical techniques to derive the optimal policies followed by the players. One approach is to discretize the system of Hamilton-Jacobi-Bellman equation by means of a semi-lagrangian approach (Falcone and Ferretti
2013).
1 This entails splitting the time horizon into a sequence of equidistant steps. We then approximate the variable
x(
t) by means of the sequence
\(x^h_n\). We will make use of the conditional probability that the process will jump from mode
i to mode
j in an analogous time step, which we approximate as
$$\begin{aligned} P^h_{x,i,j}(q) = 1-e^{-h \lambda _{i,j}(x,q)}. \end{aligned}$$
(16)
The continuous-time optimal control problem is thus replaced by the following first-order discrete-time approximation
$$\begin{aligned} V^h(x,i) = \max _{q_1,q_2,\ldots } E_{x,i}\left\{ \sum _{l=0}^{\infty }\sum _{n=N_l}^{N_{l+1}-1}h \beta ^n\pi ^{\xi _l}(x^h_n,q_{n-N_l})\right\} , \quad i\in I\text {,} \end{aligned}$$
(17)
where we set the discount factor
\(\beta = e^{-\omega h}\). Camilli (
1997) shows that
\(V^h(x,i)\) satisfies the following dynamic programming equation
$$\begin{aligned} V^h(x,i) = \max _{q} E_{x,i}\left\{ h \pi ^i(x,q) + \beta V^h(x^h_1,i)\right\} . \end{aligned}$$
(18)
Finally, (
18) gives the discrete-time infinite dimensional system of equations satisfied by the value functions
\(V^h(x)\overset{def}{=}\ \left\{ V^h(x,i):\; i \in I \right\} \)$$\begin{aligned} V^h(x,i) = \mathscr {N}_i\bigl ( V^h(x)\bigr ) \quad i \in I, \end{aligned}$$
(19)
where the dynamic programming operators
\(\mathscr {N}_i(\cdot )\) are defined by
$$\begin{aligned} \begin{aligned} \mathscr {N}_i\bigl (V^h(x)\bigr ) \overset{def}{=}\ \max _{q} \left\{ h \pi ^i(x,q) + \right. \beta P^h_{x,i}(q)V^h(x+h G(x,q,i),i)+ \\ \sum _{j \ne i } P^h_{x,i,j}(q)V^h(x,j).\} \end{aligned} \end{aligned}$$
(20)
Problem (
20) is still infinite dimensional in the state variable. However, we can convert it into a set of finite-dimensional equations by partitioning the state space into a grid
\(\Gamma = \{x_k : k=1,\ldots , K\}\) and solve (
20) only for
\(x \in \Gamma \). To make the scheme operative, we need to reconstruct the values
\(V^h(x_k+h f(x_k,\alpha ,i),i)\) since in general the points
\(x_k+h f(x_k,\alpha ,i)\) do not coincide with any point of
\(\Gamma \).