main-content

## Swipe to navigate through the articles of this issue

Published in:

Open Access 09-01-2021

# Dynamic Coordination Games with Activation Costs

Authors: Stefanny Ramirez, Dario Bauso

Published in: Dynamic Games and Applications | Issue 3/2021

print
PRINT
insite
SEARCH

## Abstract

Motivated by inventory control problems with set-up costs, we consider a coordination game where each player’s dynamics is an inventory model characterized by a controlled input and an uncontrolled output. An activation cost is shared among active players, namely players who control their dynamics at a given time. At each time, each player decides to be active or not depending on its inventory level. The main contribution of this paper is to show that strategies at a Nash equilibrium have a threshold structure on the number of active players. Furthermore, we provide an explicit expression for the lower and upper threshold is given both in the deterministic case, namely when the exogenous signal is known, and in the single-stage game. The relevance of the above results is discussed in the context of inventory control where Nash equilibrium reordering strategies imply that a single retailer reorders only if jointly with a number of other retailers and will reorder to restore a pre-assigned inventory level.
Disclaimer

## Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## 1 Introduction

This paper studies a discrete-state discrete-time dynamic game where players have to coordinate actions within a finite horizon window [2, 3]. Each player’s dynamics is an inventory model characterized by a controlled input and an uncontrolled output. The output flow is an uncontrolled exogenous signal. The input flow is controlled by the player and is subject to an activation cost. The state of the player is the accumulated discrepancy between input flow and output flow. The activation cost is shared among active players, namely those players who control their dynamics at a given time. The possibility of sharing the activation cost determines the need for coordination of control strategies on the part of the players. We study the cases under deterministic and stochastic disturbances. All results can be extended to the vector case by using the robust decomposition approach in [4, Section 3]. Applications arise in coordinated replenishment [8], and opportunistic maintenance [7].

### 1.1 Contribution

This study contributes in different ways to advance the theory on dynamic coordination games with activation costs for the control. An example of two-threshold strategy is the (sS) strategy used in inventory control, see [6] and [5, Chapter 4]. We recall that (sS) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S. In particular, we highlight the following results.
• Strategies at a Nash equilibrium have a threshold structure. We obtain this result in two steps. First, we prove that Nash equilibria are associated with (sS) strategies via $${\mathscr {K}}$$-convex analysis. Second, we view the (sS) strategies as threshold strategies on the number of active retailers.
• Lower and upper thresholds have an explicit expression in the deterministic case, namely when the exogenous signal is known, or in single-stage games.
• We corroborate our results with a numerical analysis of a stylized inventory model.
This paper is organized as follows. In Sect. 2, we introduce the dynamic inventory game. In Sect. 4, we first show that all Nash equilibrium strategies have a two-threshold structure with a reorder level and an order-up-to level. We then provide a dual interpretation of such strategies as threshold strategies on the number of active players. In Sect. 5, we specialize our results to the case of single stage coordination game. In Sect. 6, we provide numerical analysis. Finally, in Sect. 7, we draw conclusions and discuss future works.

## 2 Dynamic Inventory Coordination Game

Consider a set of n retailers $$\varGamma =\{1,\ldots ,n\}$$. At stage $$t=0,\ldots ,N-1$$, the ith retailer holds inventory $$x_i^t\in {\mathbb {Z}}$$, faces a stochastic demand $$\omega _i^t \in \mathbb Z_+$$ and orders a quantity $$u_i^t \in U_i^t \subseteq {\mathbb {Z}}_+$$, where $$U_i^t$$ denotes the set of admissible decisions, $${\mathbb {Z}}$$ the set of integers, and $${\mathbb {Z}}_+$$ the set of nonnegative integers. Thus, for all retailers $$i \in \varGamma$$, inventory $$x_i^t$$, which we refer to as the state of retailer i, evolves according to a linear finite-state, discrete-time model of the form:
\begin{aligned} x_i^{t+1}=x_i^t+u_i^t-\omega _i^t. \end{aligned}
(1)
Here, we assume that there are no delays between orders and deliveries. For all retailers, we also suppose that the inventory at hand plus inventory ordered may not exceed the storage capacity denoted by $$C_{\text {{store}}}$$. Hence, we have $$x_i^t+u_i^t\le C_{{\text {store}}}.$$ We also assume that $$C_{\text {{store}}} \ge x^i_0$$ so to exclude an empty set of feasible orders.
Now, for each time t, let us introduce the vector of the retailers’ decisions $$u^t = [u_i^t]_{i \in \varGamma }$$ and the vector of decisions of all retailers other than i $$u_{-i}^t=[u_j^t]_{j\in \varGamma ,j\ne i} \in U_{-i}^t$$, where $$U_{-i}^t$$ denotes the Cartesian product of all sets $$U_j^t$$ $$j \ne i$$. At each stage, the ith retailer has a cost
\begin{aligned} \begin{array}{lll} g_{i}(x_i^t,u_i^t,u_{-i}^t) = \frac{K}{1+\sum \nolimits _{j=1,j\ne i}^n \delta (u_j^t)}\delta (u_i^t)+cu_i^t \\ \quad + p {\mathbb {E}} \{ \max (0,-x_i^{t+1}) \}+h {\mathbb {E}} \{\max (0,x_i^{t+1}) \},\end{array} \end{aligned}
(2)
where $${\mathbb {E}} \{.\}$$ indicates expectation, $$K \ge 0$$ represents the transportation cost, $$c \ge 0$$ is the purchase cost per stock unit, $$h\ge 0$$ is the penalty on holding, $$p \ge 0$$ the penalty on shortage. The term $$\delta (u_i^t)$$ is one if the ith retailer replenishes, i.e., is active, and zero otherwise.
We henceforth denote by $$a^t$$ the number of active retailers at stage t, i.e.:
\begin{aligned} a^t:=\sum \nolimits _{j=1}^n \delta (u_j^t). \end{aligned}
Note that the term $$\frac{K}{1+\sum \nolimits _{j=1,j\ne i}^n \delta (u_j^t)}\delta (u_i^t)$$, which describes the fixed cost paid by retailer i in (2), is equal to $$\frac{K}{a^t}$$ if retailer i is active and equal to zero otherwise.
After introducing the N stage decision vectors $${{\mathbf {u}}_{\mathbf {i}}}^{0\sim N-1}=[u_i^0,\ldots ,u_i^{N-1}]$$ and $${{\mathbf {u}}_{\mathbf {-i}}}^{0\sim N-1}=[u_{-i}^0,\ldots ,u_{-i}^{N-1}]$$, and denoting by $$\varPhi _i(x_i^{N})$$ a penalty term on final state, the cost over the horizon from 0 to N is of the form
\begin{aligned} \begin{array}{ll} J_i\left( x_i^0,{{\mathbf {u}}_{\mathbf {i}}}^{0\sim N-1},{{\mathbf {u}}_{\mathbf {-i}}}^{0\sim N-1}\right) = \varPhi _i\left( x_i^{N}\right) + \sum \nolimits _{t=0}^{N-1}g_i\left( x_i^{t},u_i^{t},u_{-i}^{t}\right) . \end{array} \end{aligned}
(3)
A challenging issue in the definition of the stage cost (2) is its dependence on the number of active retailers through the term $$\frac{K}{1+\sum \nolimits _{j=1,j\ne i}^n \delta (u_j^t)}\delta (u_i^t)$$. This term establishes that the transportation cost K is equally divided by all active retailers. This in turn implies that the cost of one retailer also depends on the decisions of all other retailers. Conditions (1)–(3) describe the dynamics and the costs of our game.
Other concepts we will make use of in the rest of the paper are Nash equilibrium strategies and $${\mathscr {K}}$$-convexity which we briefly recall next.
Definition 1
Decisions $${{\mathbf {u}}_{\mathbf {i}}}^{\star }$$ are at a Nash equilibrium, if it holds for all $$i \in \varGamma$$
\begin{aligned} J_i\left( x_i^0,{{\mathbf {u}}_{\mathbf {i}}}^{\star },{{\mathbf {u}}_{\mathbf {-i}}}^{\star }\right) \le J_i\left( x_i^0,{{\mathbf {u}}_{\mathbf {i}}},{{\mathbf {u}}_{\mathbf {-i}}}^{\star }\right) \; \forall \; {{\mathbf {u}}_{\mathbf {i}}} \in U_i^0\times \cdots \times U_i^{N-1}. \end{aligned}
For the inventory problem, once at a Nash equilibrium, no retailer benefits from changing its replenishment decisions. The following definition of $${\mathscr {K}}$$-convexity is borrowed from [6].
Definition 2
A function $$f: {\mathbb {R}} \rightarrow {\mathbb {R}}$$ is $${\mathscr {K}}$$-convex, where $${\mathscr {K}}\ge 0$$, if
\begin{aligned} {\mathscr {K}} + f(z+y) \ge f(y) + z \left[ \frac{f(y)-f(y-b)}{b} \right] \, \forall z\ge 0, \, b>0,\, y. \end{aligned}
$${\mathscr {K}}$$-convexity is used in [6] and reiterated in [5] to prove optimality of (sS) strategies. We will make use of $${\mathscr {K}}$$-convexity to prove the main result of this work.
In the following section, we consider threshold strategies according to which, given an inventory level $$x_i^t$$, there exists a threshold $$l_i^t\in \{1,\,2,\,\ldots ,\,n\}$$, such that retailer i reorders only if the number of active players $$a^t$$ is greater than or equal to such a threshold. Such strategies are given by
\begin{aligned} \mu _i(x_i^t,a^t)=\left\{ \begin{array}{ll} \hbox {reorder}&{} \qquad \hbox {if} \quad a^t \ge l_i^t,\\ \hbox {do not reorder} &{} \qquad \hbox {if} \quad a^t < l_i^t. \end{array}\right. \end{aligned}
(4)
As main result, we will show that all Nash equilibrium strategies have the threshold structure (4). To emphasize that $$l_i^t$$ depends on $$x_i^t$$, we sometimes write $$l_i^t(x_i^t)$$.
Note that orders depend on the history of the game as they are function of the state variable $$x_i^t$$ which in turn depends on the past orders of the retailer as in (1). Orders of a single retailer also depend on her competitors’ orders through variable $$a^t$$.
An additional concept that is important to explain is the one of subgame perfect equilibrium. We have borrowed and adapted from [9] the following definition:
Definition 3
A subgame perfect equilibrium is a n-tuple of strategies $${\mathbf {u}}^{\star }=[{{\mathbf {u}}_{\mathbf {i}}}^{\star }]_{i \in \varGamma }$$, such that for every $$i \in \varGamma$$ and every history $$x_i^t$$, we have that:
\begin{aligned} J_i\left( x_i^t,{{\mathbf {u}}_{\mathbf {i}}}^{\star },{{\mathbf {u}}_{\mathbf {-i}}}^{\star }\right) \le J_i\left( x_i^t,{{\mathbf {u}}_{\mathbf {i}}},{{\mathbf {u}}_{\mathbf {-i}}}^{\star }\right) \; \forall \; {{\mathbf {u}}_{\mathbf {i}}} \in U_i^0\times \cdots \times U_i^{N-1}.\end{aligned}
Therefore, note that, if strategy (4) returns a Nash equilibrium, then such equilibrium is also subgame perfect in the sense that the strategy returns the optimal order depending on the current state $$x_i^t$$ and irrespective of the fact that past orders might not be optimal.
To simplify the proofs and the graphs plotted in the following figures, in the rest of the paper, we assume that the penalty term on the final state $$\varPhi (x_i^n)$$ is null. However, the results that we prove still hold if $$\varPhi (x_i^n)$$ is a generic convex function with a minimum in $$x_i^n = 0$$.

## 3 On the Generality of the Model

Consider an n-dimensional inventory model characterized by discrete states $$x^t \in {\mathbb {Z}}^n$$, integer controls $$u^t \in {\mathbb {Z}}_+^n$$, and binary controls $$y^t \in \{0,1\}^n$$, and discrete stochastic disturbances $$w^t\in {\mathbb {Z}}_+^n$$, where $$t=0,1,\ldots$$ is the time index. The evolution of the state is described by a linear discrete-time (difference) equation in the general form (5) below, where A and E are matrices of compatible dimensions and $$x(0)=\xi _0 \ge 0$$ is a given initial state. Integer and binary controls are linked through the general capacity constraints (6), where the (scalar) parameter c is an upper bound on control, with the inequalities in (5) and (6) to be interpreted component-wise.
\begin{aligned} x^{t+1}&=A x^t + E w^t + u^t, \end{aligned}
(5)
\begin{aligned} \quad 0&\le u^t \le cy^t, \quad y^t\in \{0,1\}^n. \end{aligned}
(6)
The above dynamics are characterized by two discrete valued control variables per each state. Starting from nonnegative initial states, we wish to control the state to remain confined to the positive orthant, which may describe a safety region in engineering applications or reflect the desire to prevent shortfalls in inventory applications.
A common situation is where the disturbance seeks to push the state out of the desired region. Its value is given at the beginning and fixed that way. Each column of matrix E establishes how each disturbance component influences the evolution of the state vector. Then, it is reasonable to assume $$Ew(k) < 0,$$ where the inequality is to be interpreted component-wise.
With regard to (5), we can isolate the dependence of one component state on the other ones and rewrite (5) in a way that establishes similarity with standard lot sizing models [10]:
\begin{aligned} x^{t+1}=x^t + B x^t + E w^t + u^t. \end{aligned}
(7)
Equation (7) is a straightforward representation of (5) where
\begin{aligned}&B:= A-I =: \{b_{ij}\},\, b_{ij} = a_{ij} - \delta _{ij},\\&\quad \delta _{ij}:=\left\{ \begin{array}{ll}1, &{} \text{ if } i=j\text{, }\\ 0, &{} \text{ otherwise. } \end{array}\right. \end{aligned}
To preserve the nature of the problem, which has stabilizing control actions playing against unstabilizing disturbances, we assume that the influence of other states on state i is relatively “weak.” In other words, we assume that the influence of $$B x^t$$ is small if compared with the unstabilizing effects of disturbances captured by the term $$E w^t$$. This is captured by assuming that the sum $$\varDelta x^t+Ew^t$$ has same (negative) sign of $$Ew^t$$, namely
\begin{aligned} B x^t + E w^t < 0, \end{aligned}
where inequality is again component-wise and it holds almost everywhere. Essentially, the states’ mutual dependence expressed by $$B x^t$$ only emphasizes or reduces “weakly” the destabilizing effects of the disturbances. In the following, we present a robust decomposition approach that translates dynamics (7) into n scalar dynamics in “lot sizing” form [10].
With the term “robust decomposition” we mean a transformation through which dynamics (7) are replaced by n independent uncertain lot sizing models of the form (8) where $$x_i^t$$ is the inventory, $$d_i^t$$ the demand, $$u_i^t$$ the reordered quantity and $${\mathscr {D}}_i^t \subset {\mathbb {R}}$$ denotes the uncertainty set:
\begin{aligned} x_i^{t+1}= x_i^t - d_i^t + u_i^t, \quad d_i^t \in {\mathscr {D}}_i^t. \end{aligned}
(8)
Recall that in (7) the disturbance is given at the beginning and fixed that way. We use those values of the disturbance to determine set $${\mathscr {D}}_i^t$$ in (8), as explained in the following. Replacing (7) with (8) is possible once we relate the demand $$d_i^t$$ to the current values of all other state components and disturbances as expressed below:
\begin{aligned} \begin{array}{lll} d_i^t &{} = &{} -\left[ \sum \nolimits _{j=1}^n b_{ij} x_j^t + \sum \nolimits _{j=1}^n E_{ij} w_j^t \right] \\ &{} = &{} - \left[ \langle B_{i \bullet } x^t \rangle + \langle E_{i \bullet } w^t \rangle \right] , \end{array} \end{aligned}
(9)
where we denote by $$B_{i \bullet }$$ the ith row of the matrix B, with the same convention applying to $$E_{i \bullet }$$.
In other words, we assume that the influence that all other states have on state i enters into Eq. (8) through demand $$d_i^t$$ defined in (9).
Following the decomposition, each lot sizing model is controlled by an agent i (whose state is $$x_i$$) who plays against a virtual opponent which selects a worst-case demand, which can be viewed as a two-player game.
Our next step is to make the n dynamics in the form (8) mutually independent. Toward that end, we introduce $$X^t$$ as the set of $$x^t$$ and observe that this set is bounded for bounded $$d_i^t$$. The set $$X^t$$ can be defined in two steps. First, we assume that the states never leave a given region, then we compute the worst-case vector $$x^t$$ in the region, namely the vector $$x^t$$ that, once substituted in (9), has the effect of pushing the ith state out of the safe region. Then, we check whether the trajectory still lies within the region.
Boundedness of $$X^t$$ means that there exists a scalar $$\phi >0$$ such that $$\Vert x\Vert _{\infty }\le \phi$$ for all $$x\in X^t$$. In view of this, it is possible to decompose the system by replacing the current demand $$d_i^t$$ by the maximal or minimal demand as computed below:
\begin{aligned} {\overline{d}}_i^t= & {} \max _{\xi \in X^t} \left\{ - \langle B_{i \bullet } \xi \rangle - \langle E_{i \bullet } w^t \rangle \right\} = \sum \nolimits _{j}[B_{ij}]_- \phi - \langle E_{i \bullet } w^t \rangle \\ {\underline{d}}_i^t= & {} \min _{\xi \in X^t} \left\{ - \langle B_{i \bullet } \xi \rangle - \langle E_{i \bullet } w^t \rangle \right\} = \sum \nolimits _{j}[B_{ij}]_+ \phi - \langle E_{i \bullet } w^t \rangle , \end{aligned}
where $$[B_{ij}]_+$$ denotes the positive part of $$B_{ij}$$, i.e., $$\max \{B_{ij},0\}$$ and $$[B_{ij}]_-$$ the negative part.
From the above preamble, we derive the uncertainty set as
\begin{aligned} {\mathscr {D}}_i^t = \{\eta \in {\mathbb {R}}: \, {\underline{d}}_i^t\le \eta \le {\overline{d}}_i^t\}. \end{aligned}
Likewise, (10) describes the demand that would push the state out of the positive orthant in the longest time.

## 4 Nash Equilibrium Strategies

In this section, we show that all Nash equilibrium strategies are threshold strategies of type (4): retailer i reorders only if the number of active retailers is greater than or equal to a given threshold. For the general model explained in Sect. 3, proving that strategies at a Nash equilibrium have a threshold structure is not straight forward, for that reason in this section the results are given for a single retailer i. To show this, in the next subsection we prove the optimality of the (sS)-like strategies via $${\mathscr {K}}$$-convex analysis (see the definition in [5], chapter 4). We recall from [5] that (sS) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S [6]. This is formally stated below where $$\mu (.)$$ is the strategy, x the inventory, and s and S lower and upper thresholds, respectively:
\begin{aligned} \mu (x)=\left\{ \begin{array}{ll} S-x &{} \qquad \hbox {if} \quad x < s,\\ 0 &{} \qquad \hbox {if} \quad x \ge s. \end{array}\right. \end{aligned}
(10)
We refer to (sS)-like strategies as (sS) strategies whose thresholds depend on the players and on time, i.e., we will have $$s:=s_i^t$$ and $$S:=S_i^t$$ for fixed i and t.
In Theorem 1, we prove the optimality of (sS)-like strategies. Before doing this, we need some preliminary analysis which is inspired by [5, Chapter 4].
Let $$K^t(u_{-i}^t)=\frac{K}{1 + \sum \nolimits _{i\in \varGamma ,j\ne i} \delta (u_j^t)}$$ be the transportation cost charged to each retailer i that replenishes at stage t. Fix decisions $$u_{-i}^0,\ldots ,u_{-i}^{N-1}$$ of all retailers other than i over the horizon and denote such decisions $${\bar{u}}_{-i}^0,\ldots ,{\bar{u}}_{-i}^{N-1}$$. Similarly, denote the resulting transportation costs by $$K^0,\ldots ,K^{N-1}$$. Note that $$K^t$$ is a function of $$u_{-i}^t$$ but for ease of notation sometimes we omit the dependence. Then, let us rewrite the stage cost (2) for retailer i as
\begin{aligned} g_{i}(x_i^t,u_i^t,{\bar{u}}_{-i}^t)&= K^t \delta (u_i^t)+cu_i^t\\&\quad + p {\mathbb {E}} \{ \max (0,-x_i^{t+1}) \} +h {\mathbb {E}} \{ \max (0,x_i^{t+1}) \}. \end{aligned}
Now, we can write the cost-to-go from stage t to the final stage recursively using dynamic programming and the Bellman equation. Let us use the superscript t to indicate the iteration. Then, we have
\begin{aligned} v^{t}_{i}(x_i^t,{\bar{u}}_{-i}^{t\sim N-1})&=\min _{u_{i}^t\in U} [g_{i}(x_i^t,u_i^t,{\bar{u}}_{-i}^t) \nonumber \\&\quad \, + {\mathbb {E}} \{ v^{t+1}_{i}(x_i^{t+1},{\bar{u}}_{-i}^{t+1\sim N-1}) \} ], \;\; t= 0,\ldots N-1, \end{aligned}
(11)
\begin{aligned} {J_i^N(x_i^N)=0}, \end{aligned}
(12)
where $$J^{0}_{i}(x_i^0,{\bar{u}}_{-i}^0)$$ is equal to the cost $$J_{i}(x_i^0,{u}_{-i}^0)$$ introduced in (3). Being $$y_i^t=x_i^t+u_i^t$$, the instantaneous inventory position, i.e., the inventory level just after the order has been issued, let us define the new function
\begin{aligned} G^{t}_{i}(y_i^t,\,{\bar{u}}_{-i}^{t+1\sim N-1})&= cy_i^t+p {\mathbb {E}} \{ \max (0,-(y_i^t-\omega _{i}^t)) \} \\&\quad +\,h {\mathbb {E}} \{ \max (0,y_i^t-\omega _i^t) \} + {\mathbb {E}} \{ v_{i}^{t+1}(x_i^{t+1},{\bar{u}}_{-i}^{t+1\sim N-1}) \}, \end{aligned}
and rewrite the Bellman Eq. (11) as follows
\begin{aligned} v_{i}^{t}(x_{i}^t,{\bar{u}}_{-i}^{t\sim N-1})&=-c_{i}x_{i}^t+\min _{y_{i}^t\ge x_i^t}[K^t(u_{-i}^t)\nonumber \\&\quad +\,G_{i}^t(y_i^t,{\bar{u}}_{-i}^{t+1\sim N-1}),\,G_i^t(x_i^t,{\bar{u}}_{-i}^{t+1\sim N-1})]. \end{aligned}
(13)
Note that if we can show that $$v_i^{t+1}$$ is $${\mathscr {K}}$$-convex with $${\mathscr {K}} = K^t$$ then $$G_i^t$$ is also $${\mathscr {K}}$$-convex for $${\mathscr {K}} = K^{t}$$ and the Bellman Eq. (13) has a unique minimizer. Indeed, it has been proved in [5], chapter 4.2, that $${\mathscr {K}}$$-convexity of $$G_i^t(y_i^t,{\bar{u}}_{-i}^{t+1})$$ implies $${\mathscr {K}}$$-convexity of $$v_i^t(x_i^t,{\bar{u}}_{-i}^t)$$. This represents a sufficient optimality condition for the (sS)-like strategies with thresholds depending on time t, that is, $$s:=s_i^t$$ and $$S:=S_i^t$$, where $$s_i^t$$ and $$S_i^t$$ satisfy:
\begin{aligned} S_i^t= & {} \arg \min _\gamma G_i^t(\gamma ,{\bar{u}}_{-i}^{t+1\sim N-1}),\\ G_i^t(s_i^t,{\bar{u}}_{-i}^{t+1\sim N-1})= & {} G_i^t(S_i^t,{\bar{u}}_{-i}^{t+1\sim N-1})+K^t(u_{-i}^t). \end{aligned}
The meaning of $$s_i^t$$ and $$S_i^t$$ is exactly the same as in the (sS) strategies (cfg. [5]), that is, $$s_i^t$$ represents the minimum threshold on inventory level below which retailers replenish to restore the inventory up to level $$S_i^t$$. Now, let us call $${\underline{s}}_i^t$$, the threshold which corresponds to the assumption that the ith retailer is charged the whole transportation cost, i.e.,
\begin{aligned} G_i^t\left( {\underline{s}}_i^t,{\bar{u}}_{-i}^{t+1 \sim N-1}\right) =G_i^t\left( S_i^t,{\bar{u}}_{-i}^{t+1 \sim N-1}\right) +K. \end{aligned}
In the above condition, we have set $$K^t=K$$.
Analogously, let us denote by $${\overline{s}}_i^t$$ the threshold computed as if all retailers would share equally the transportation cost, i.e.,
\begin{aligned} G_i^t({\overline{s}}_i^t,{\bar{u}}_{-i}^{t+1 \sim N-1})=G_i^t(S_i^t,{\bar{u}}_{-i}^{t+1 \sim N-1})+\frac{K}{n}. \end{aligned}
In essence, in the condition above, each retailer is charged a transportation cost $$K^t=\frac{K}{n}$$, namely one nth of the full cost K. Hence, we have $${\underline{s}}_i \le s_i^t\le {\overline{s}}_i$$.
The following theorem establishes the optimality of (sS)-like strategies, where each pair of thresholds is valid on different intervals of inventory levels.
Theorem 1
Let $$K^t$$ be nondecreasing. Solutions of the Bellman Eq. (11) are at most N different (sS)-like strategies $$(s_i^t,\,{S_i^t})$$, $$t=0,\ldots ,N-1$$, where $$S_i^t \in \{\sum \nolimits _{{\hat{t}}=t}^{t+j}\omega _i({\hat{t}})$$, $$j=t,\ldots ,N-1-t$$} and threshold $$s_i^t$$ verifies $$G_i^t(s_i^t,{\bar{u}}_{-i}^{t+1})=G_i^t(S_i^t,{\bar{u}}_{-i}^{t+1})+K^t$$.
Proof
The proof is by induction. Assume $$J_i^N(x_i^N)=0$$, and consider the convex function
\begin{aligned} G^{N-1}_{i}\left( y_{i}^{N-1},{\bar{u}}_{-i}^N\right)&= cy_{i}^{N-1}+p {\mathbb {E}} \{ \max \left( 0,-\left( y_{i}^{N-1}-\omega _{i}^{N-1}\right) \right) \} \nonumber \\&\quad +\, h {\mathbb {E}} \{ \max \left( 0,y_{i}^{N-1}-\omega _{i}^{N-1}\right) \} . \end{aligned}
(14)
Then, we say that $$G^{N-1}_i(\cdot )$$ is convex and hence, it is also $${\mathscr {K}}$$-convex where $${\mathscr {K}}=K^{N-1}$$ as shown in Fig. 1. Here, we also use the notation
\begin{aligned} H_i^{N-1}(x_i)&:=p {\mathbb {E}} \{ \max (0,-(x_{i}^{N-1}-\omega _{i}^{N-1}) \} \\&\quad +\,h {\mathbb {E}} \{ \max (0,x_{i}^{N-1}-\omega _{i}^{N-1}) \}. \end{aligned}
The above reasoning on $${\mathscr {K}}$$-convexity implies that the piecewise linear function
\begin{aligned} v_{i}^{N-1}\left( x_{i}^{N-1},{\bar{u}}_{-i}^{N-1}\right)&=-c_{i}x_{i}^{N-1} +\min _{y_{i}^{N-1}\ge x_i^{N-1}}\left[ K^{N-1}\left( u_{-i}^{N-1}\right) \right. \nonumber \\&\quad \left. +\,G_{i}^{N-1}\left( y_i^{N-1},{\bar{u}}_{-i}^N\right) , G_i^{N-1}\left( x_i^{N-1},{\bar{u}}_{-i}^N\right) \right] \end{aligned}
(15)
is $$K^{N-1}$$-convex, with a global minimum at $${S_i^{N-1}}:= \arg \min _{\gamma } G^{N-1}_{i}(\gamma ,{\bar{u}}_{-i}^N)$$ (in the deterministic case if the cost of purchase is relatively small then $${S_i^{N-1}}=\omega _i^{N-1}$$) (see, e.g., Fig. 1).
To obtain $$S_i^{N-1}$$, let a probability distribution function $$\phi ^{N-1}: {\mathbb {Z}}_+ \rightarrow [0,1]$$ be given, namely $$\omega \mapsto \phi ^{N-1}(\omega )$$ where $$\phi ^{N-1}(\omega )$$ is the probability that $$\omega _i^{N-1} = \omega$$ for all $$\omega \in {\mathbb {Z}}_+$$.
Then, the cost of reordering is given by
\begin{aligned}&K^{N-1}(u_{-i}^{N-1}) -c_{i}x_{i}^{N-1} +G_{i}^{N-1}(\gamma ,{\bar{u}}_{-i}^N) \\&\quad = K^{N-1}(u_{-i}^{N-1}) + c_i u_{i}^{N-1} \\&\qquad +p {\mathbb {E}} \{ \max (0,-(\gamma -\omega _{i}^{N-1}))\} + h {\mathbb {E}} \{ \max (0,\gamma -\omega _{i}^{N-1}) \} \\&\quad = K^{N-1}(u_{-i}^{N-1}) + c_i ( \gamma - x_{i}^{N-1} ) + h \mathbf{E}_h^{N-1}(\gamma ) + p {\mathbf {E}}_s^{N-1}(\gamma ), \end{aligned}
where $${\mathbf {E}}_h^t(\gamma )$$ and $${\mathbf {E}}_s^t(\gamma )$$ are the expected holding and shortage, respectively, defined as:
\begin{aligned} {\mathbf {E}}_h^t(\gamma ):={\mathbb {E}} \left\{ \max \left( 0,\gamma -\omega _{i}^t\right) \right\} , \end{aligned}
\begin{aligned} {\mathbf {E}}_s^t\left( \gamma ):={\mathbb {E}}\left\{ \max (0,-\left( \gamma -\omega _{i}^t\right) \right) \right\} . \end{aligned}
Let the discrete difference operator be given, $$\frac{{\text {d}}}{{\text {d}}S}$$ and let us apply such an operator to function $$G^{N-1}_{i}\left( \gamma ,{\bar{u}}_{-i}^N\right) = c_i \left( \gamma - x_{i}^{N-1} \right) + h {\mathbf {E}}_h(\gamma ) + p {\mathbf {E}}_s(\gamma ).$$ Then, we have
\begin{aligned} \frac{\text {{d}}}{\text {{d}}\gamma } G^{N-1}_{i}\left( \gamma ,{\bar{u}}_{-i}^N\right)&:= G^{N-1}_{i}\left( \gamma +1,{\bar{u}}_{-i}^N\right) - G^{N-1}_{i}\left( \gamma ,{\bar{u}}_{-i}^N\right) \\&\quad = c_i + h \varPhi ^{N-1}_\omega [\gamma ] - p\left( 1 - \varPhi ^{N-1}_\omega [\gamma ]\right) . \end{aligned}
where
\begin{aligned} \varPhi ^t_\omega [\gamma ]:=\sum \nolimits _{\omega =0}^\gamma \phi ^t_{\omega }, \quad 1- \varPhi ^t_\omega [\gamma ]:=\sum \nolimits _{\omega =\gamma +1}^\infty \phi ^t_{\omega }. \end{aligned}
In the above equations, we make use of the following conditions
\begin{aligned} \begin{array}{ll} \sum \nolimits _{\omega =0}^{\gamma +1 }(\gamma +1 -\omega ) \phi ^{N-1}_{\omega } = \sum \nolimits _{\omega =0}^{\gamma }(\gamma +1 -\omega ) \phi ^{N-1}_{\omega } \\ \qquad = \sum \nolimits _{\omega =0}^{\gamma }(\gamma -\omega ) \phi ^{N-1}_{\omega } + \sum \nolimits _{\omega =0}^{\gamma } \phi ^{N-1}_{\omega },\\ \sum \nolimits _{\omega =\gamma +2}^\infty (\omega - \gamma -1) \phi ^{N-1}_{\omega }= \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma -1) \phi ^{N-1}_{\omega } \\ \qquad = \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^{N-1}_{\omega } - \sum \nolimits _{\omega =\gamma +1}^\infty \phi ^{N-1}_{\omega }.\end{array} \end{aligned}
(16)
The order-up-to level $$S_i^{N-1}$$ is the optimal $$\gamma$$, which is obtained from solving
\begin{aligned}&\min _\gamma \, \left\{ \gamma | \, \frac{\text {d}}{{\text {d}}\gamma } G^{N-1}_{i}\left( \gamma ,{\bar{u}}_{-i}^N\right) \ge 0\right\} \\&\quad = \min _\gamma \,\left\{ \gamma | \, c_i + h \varPhi ^{N-1}_\omega [\gamma ] - p \left( 1 - \varPhi ^{N-1}_\omega [\gamma ]\right) \ge 0\right\} . \end{aligned}
From the above, we then obtain
\begin{aligned} S_i^{N-1} = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^{N-1}_\omega [\gamma ] \ge \frac{-c_i + p }{h +p} \Big \}. \end{aligned}
To obtain $$s_i^{N-1}$$, let us consider the cost of not reordering, which is given by
\begin{aligned} -c_{i}x_{i}^{N-1} +G_{i}^{N-1}(x_{i}^{N-1},{\bar{u}}_{-i}^N)&= p {\mathbb {E}} \{ \max (0,-(x_{i}^{N-1}-\omega _{i}^{N-1}))\} \\&\quad +\, h {\mathbb {E}} \{ \max (0,x_{i}^{N-1} -\omega _{i}^{N-1}) \} \\&= h {\mathbf {E}}_h(x_{i}^{N-1}) + p {\mathbf {E}}_s(x_{i}^{N-1}) \end{aligned}
Also, we have
\begin{aligned} s_i^{N-1}&:= \arg \min _{x_{i}^{N-1}} \left\{ x_{i}^{N-1} | \, h {\mathbf {E}}_h\left( x_{i}^{N-1}\right) + p {\mathbf {E}}_s\left( x_{i}^{N-1}\right) \right. \\&\le K^{N-1}\left( u_{-i}^{N-1}\right) -c_i \left( S_{i}^{N-1} - x_{i}^{N-1}\right) \\&\left. \quad +\, h {\mathbf {E}}_h\left( S_{i}^{N-1}\right) + p {\mathbf {E}}_s\left( S_{i}^{N-1}\right) \right\} . \end{aligned}
Now, we are going to assume that the statement is true for some $$t=m$$, and we are going to proof that it is also valid for $$t=m-1$$.
Consider now the convex function (see Fig. 2 which illustrate the example of $$t=N-2$$)
\begin{aligned} G_i^{m-1}(y_{i}^{m-1},{\bar{u}}_{-i}^{m})&= c_iy_{i}^{m-1}+p {\mathbb {E}} \left\{ \max \left( 0,-\left( y_{i}^{m-1}-\omega _{i}^{m-1}\right) \right) \right\} \nonumber \\&\quad +\,h {\mathbb {E}} \left\{ \max \left( 0,y_{i}^{m-1}-\omega _{i}^{m-1}\right) \right\} + {\mathbb {E}} \left\{ v_{i}^{m}\left( x_i^{m},{\bar{u}}_{-i}^{m}\right) \right\} \nonumber \\&= c_i y_{i}^{m-1} + h {\mathbf {E}}_h\left( y_{i}^{m-1}\right) + p {\mathbf { E}}_s\left( y_{i}^{m-1}\right) \nonumber \\&\quad + \sum \nolimits _{\omega =0}^{\infty } v_{i}^{m}\left( y_{i}^{m-1}-\omega , {\bar{u}}_{-i}^{m}\right) \phi ^{m-1}_{\omega }. \end{aligned}
(17)
We know that $$G_i^{m-1}$$ is $${\mathscr {K}}$$-convex, with $${\mathscr {K}}=K^{m-1}$$. This property implies that the function
\begin{aligned} v_{i}^{m-1}\left( x_{i}^{m-1},{\bar{u}}_{-i}^{m-1}\right)&= -c_{i}x_{i}^{m-1}+\min _{y_{i}^{m-1}\ge x_i^{m-1}}\left[ K^{m-1}\left( u_{-i}^{m-1}\right) \right. \nonumber \\&\quad \left. +\,G_{i}^{m-1}\left( y_{i}^{m-1},{\bar{u}}_{-i}^{m}\right) , G_{i}^{m-1}\left( x_{i}^{m-1},{\bar{u}}_{-i}^{m}\right) \right] , \end{aligned}
(18)
is $$K^{m-1}$$-convex, with a global minimum at $$S_i^{m-1}:=$$ $$argmin_{\gamma }G_i^{m-1}(\gamma ,{\bar{u}}_{-i}^m)$$. It is important to notice that we can ensure the existence of a unique minimum value in (18) thanks to the nondecreasing property of $$K^{m-1}$$.
The cost of reordering for $$t=m-1$$ is given by
\begin{aligned}&K^{m-1}\left( u_{-i}^{m-1}\right) -c_{i}x_{i}^{m-1} +G_{i}^{m-1}\left( \gamma ,{\bar{u}}_{-i}^m\right) \\&\quad = K^{m-1}\left( u_{-i}^{m-1}\right) + c_i u_{i}^{m-1} \\&\qquad +p {\mathbb {E}} \left\{ \max \left( 0,-\left( \gamma -\omega _{i}^{m-1}\right) \right) \right\} + h {\mathbb {E}} \left\{ \max \left( 0,\gamma -\omega _{i}^{m-1}\right) \right\} \\&\quad = K^{m-1}(u_{-i}^{m-1}) + c_i \left( \gamma - x_{i}^{m-1} \right) + h \mathbf{E}_h^{m-1}(\gamma ) + p {\mathbf {E}}_s^{m-1}(\gamma ). \end{aligned}
Applying operator $$\frac{{\text {d}}}{{\text {d}}\gamma }$$ to function $$G_i^{m-1}(\gamma ,{\bar{u}}_{-i}^m)$$, we have
\begin{aligned} \frac{\text {d}}{\text {d}\gamma } G^{m-1}_{i}(\gamma ,{\bar{u}}_{-i}^m)&:= G^{m-1}_{i}\left( \gamma +1,{\bar{u}}_{-i}^m\right) - G^{m-1}_{i}\left( \gamma ,{\bar{u}}_{-i}^m\right) \\&= c_i + h \varPhi ^{m-1}_\omega [\gamma ] - p \left( 1 - \varPhi ^{m-1}_\omega [\gamma ]\right) + \\&\quad \sum \nolimits _{\omega =0}^{\infty } \left[ v_{i}^{m}(\gamma +1-\omega , \cdot ) - v_{i}^{m}\left( \gamma -\omega , \cdot \right) \right] \phi ^{m-1}_{\omega }. \end{aligned}
Hence, the order-up-to level $$S_i^{m-1}$$ is the optimal $$\gamma$$, which is obtained from solving
\begin{aligned} S_i^{m-1}&= \arg \min _\gamma \,\left\{ \gamma | \, c_i + h \varPhi ^{m-1}_\omega [\gamma ] - p \left( 1 - \varPhi ^{m-1}_\omega [\gamma ]\right) \right. \\&\left. \quad + \sum \nolimits _{\omega =0}^{\infty } \left[ v_{i}^{m}\left( \gamma +1-\omega , \cdot \right) - v_{i}^{m}\left( \gamma -\omega , \cdot \right) \right] \phi ^{m-1}_{\omega } \ge 0\right\} . \end{aligned}
To obtain $$s_i^{m-1}$$, let us consider the cost of not reordering, which is given by
\begin{aligned}&-c_{i}x_{i}^{m-1} +G_{i}^{m-1}\left( x_{i}^{m-1},{\bar{u}}_{-i}^{m}\right) \\&\quad = h {\mathbf {E}}_h\left( x_{i}^{m-1}\right) + p {\mathbf {E}}_s(x_{i}^{m-1}) + \sum \nolimits _{\omega =0}^{\infty } v_{i}^{m}\left( x_{i}^{m-1}-\omega , {\bar{u}}_{-i}^{m}\right) \phi ^{m-1}_{\omega }. \end{aligned}
Then, we have
\begin{aligned} s_i^{m-1}&:= \arg \min _{x_{i}^{m-1}} \left\{ x_{i}^{m-1} | \, -c_{i}x_{i}^{m-1} +G_{i}^{m-1}(x_{i}^{m-1} ,{\bar{u}}_{-i}^{m}) \right. \\&\left. \quad + \sum \nolimits _{\omega =0}^{\infty } v_{i}^{m}(x_{i}^{m-1}-\omega , {\bar{u}}_{-i}^{m}) \phi ^{m-1}_{\omega }\right. \\&\left. \le K^{m-1}(u_{-i}^{m-1})-c_{i}S_{i}^{m-1} +G_{i}^{m-1}(S_{i}^{m-1},{\bar{u}}_{-i}^{m})\right. \\&\left. \quad + \sum \nolimits _{\omega =0}^{\infty } v_{i}^{m}(S_{i}^{m-1}-\omega , {\bar{u}}_{-i}^{m}) \phi ^{m-1}_{\omega }\right\} . \end{aligned}
The above can be rewritten as
\begin{aligned} s_i^{m-1}&:= \arg \min _{x_{i}^{m-1}} \Big \{x_{i}^{m-1} | \, h {\mathbf {E}}_h(x_{i}^{m-1}) + p {\mathbf {E}}_s(x_{i}^{m-1}) \\&\quad + \sum \nolimits _{\omega =0}^{\infty } v_{i}^{m}(x_{i}^{m-1}-\omega , {\bar{u}}_{-i}^{m}) \phi ^{m-1}_{\omega } \\&\le K^{m-1}(u_{-i}^{m-1})-c_i ( S_{i}^{m-1} - x_{i}^{m-1} ) \\&\quad + h {\mathbf {E}}_h(S_{i}^{m-1}) + p {\mathbf { E}}_s(S_{i}^{m-1})\\&\quad + \,\sum \nolimits _{\omega =0}^{\infty } v_{i}^{m}(S_{i}^{m-1}-\omega , {\bar{u}}_{-i}^{m}) \phi ^{m-1}_{\omega } \Big \}. \end{aligned}
Thus by induction backwards in time, we have proved Theorem 1. $$\square$$
We can reinterpret the (sS)-like strategies as threshold strategies on the number of active retailers. The result is that all Nash equilibrium strategies have the threshold structure (4).
In the following result on a single-stage inventory game (where we have dropped index t), we reinterpret a threshold on the inventory level as a threshold on the number of “active retailers”.
Theorem 2
For each inventory level $$x_i$$, there exists a threshold $$l_i\in \{1,\,2,\,\ldots ,\,n\}$$, such that the replenishment strategy
\begin{aligned} \mu _i(x_i,a)=\left\{ \begin{array}{cc} S_i-x_i, &{} \qquad \hbox {if} \quad a \ge l_i,\\ 0, &{} \qquad \hbox {if} \quad a < l_i, \end{array}\right. \end{aligned}
(19)
is a Nash equilibrium for the single-stage formulation of the inventory game. For the sake of simplicity, we have dropped dependence on time.
Proof
From Theorem 1, if $$N=1$$, we have a unique multi-period strategy $$(s_i,\,S_i)$$. This means that the retailers make decisions according to
\begin{aligned} u_i=\mu _i(x_i)=\left\{ \begin{array}{cc} S_i-x_i, &{} \qquad \hbox {if} \quad x_i < s_i,\\ 0, &{} \qquad \hbox {if} \quad x_i \ge s_i. \end{array}\right. \end{aligned}
(20)
Note that from $$G_i(s_i)= G_i(S_i)+(\frac{K}{a})$$ we have that $$s_i$$ depends on the number of active players a. Now, for given $$x_i$$, the idea is to find $$l_i$$ as the minimum number of active players such that the cost of replenishing does not exceed the cost of not replenishing. This can be expressed by the minimization below (in a single-stage optimization, we can drop the second argument $${\bar{u}}_{-i}^{t+1}$$ from $$G_i(.,.)$$)
\begin{aligned} \begin{array}{ll} l_i=\min _{a=1,\ldots ,n} \Big \{ \, a| \, G_i(x_i)\ge G_i(s_i),\, G_i(s_i)=G_i(S_i)+(K/a)\Big \}. \end{array} \end{aligned}
(21)
Strategy (20) implies (19) once we compute $$l_i$$ from (21) for fixed $$x_i$$.
In solving (21), we distinguish three cases.
• The inventory level is “low,” namely, $$x_i < {\underline{s}}_i$$. Then, the optimal decision is “replenish” independently of a. Actually, the minimization (21) returns $$l_i=1$$ and as it always holds $$a \ge l_i$$ we have $$\mu _i(x_i,a)=S_i-x_i$$.
• The inventory level is “high,” namely $$x_i \ge {\overline{s}}_i$$. Then, the optimal decision is “do not replenish.” Indeed, the minimization (21) is infeasible. With a little abuse of notation, we can take $$l_i=n+1$$ so that it always holds $$a<l_i$$ and therefore also $$\mu _i(x_i,a)=0$$.
• The inventory level verifies $${\underline{s}}_i \le x_i \le {\overline{s}}_i$$. To see this, note that the computation of $$l_i$$ as in (21) leads to $$1 \le l_i \le n$$. Then, if $$a\ge l_i$$ from (21), we have $$x_i<s_i$$ which substituted in (20) returns $$\mu _i(x_i,a)=S_i-x_i$$. Differently if $$a< l_i$$ from (21), we have $$x_i \ge s_i$$ which, again, substituted in (20) returns $$\mu _i(x_i,a)=0$$. The obtained strategy for $$\mu _i(x_i,a)$$ is in accordance with (19), and this concludes the proof. $$\square$$

## 5 Single-Stage Coordination

In this section, we specialize our results to the case of single-stage game. In particular, we provide explicit expressions for the two thresholds, as a function of the probability distribution function which determines the stochastic demand.
Let us start by noting that in the single-stage game function $$G^{t}_{i}(y_{i}^{t},{\bar{u}}_{-i}^{t+1 \sim N-1})$$ does not depend on $${\bar{u}}_{-i}^N$$ and therefore, we simply write $$G^{t}_{i}(y_{i}^{t})$$:
\begin{aligned} G^{t}_{i}\left( y_{i}^{t}\right) = cy_{i}^t+p {\mathbb {E}} \left\{ \max \left( 0,-\left( y_{i}^t-\omega _{i}^t\right) \right) \right\} + h {\mathbb {E}} \left\{ \max \left( 0,y_{i}^t-\omega _{i}^t\right) \right\} . \end{aligned}
(22)
Then, we have for the value function
\begin{aligned} v_{i}^t\left( x_{i}^t,{\bar{u}}_{-i}^t\right) =-c_{i}x_{i}^t+\min _{y_{i}^t\ge x_i^t}\left[ K^t\left( u_{-i}^t\right) +G_{i}^t\left( y_i^t\right) , G_i^t\left( x_i^t\right) \right] . \end{aligned}
(23)
To obtain $$S_i^t$$, consider the cost of reordering, which is given by
\begin{aligned}&K^t\left( u_{-i}^t\right) -c_{i}x_{i}^t +G_{i}^t(\gamma )\\&\quad =K^t\left( u_{-i}^t\right) + c_i u_{i}^t + p {\mathbb {E}} \{ \max \left( 0,-\left( \gamma -\omega _{i}^t\right) \right) \} \\&\qquad + h {\mathbb {E}} \{ \max \left( 0,\gamma -\omega _{i}^t\right) \} \\&\quad =K^t\left( u_{-i}^t\right) + c_i \left( \gamma - x_{i}^t \right) \\&\qquad +p {\mathbb {E}} \{ \max (0,-(\gamma -\omega _{i}^t))\} + h {\mathbb {E}} \{ \max \left( 0,\gamma -\omega _{i}^t\right) \} \\&\quad = K^t(u_{-i}^t) + c_i \left( \gamma - x_{i}^t \right) + h {\mathbf {E}}_h(\gamma ) + p {\mathbf {E}}_s(\gamma ). \end{aligned}
Let the discrete difference operator be given, $$\frac{{\text {d}}}{{\text {d}}S}$$ and let us apply such an operator to function
\begin{aligned} G^t_{i}(\gamma ) = c_i ( \gamma - x_{i}^t ) + h \underbrace{\sum \nolimits _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega }}_{{\mathbf {E}}_h(\gamma )} + p \underbrace{\sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }}_{{\mathbf {E}}_s(\gamma )}. \end{aligned}
By applying the difference operator to function $$G^t_{i}(\gamma )$$, we then have
\begin{aligned} \frac{\text {d} }{\text {d}\gamma } G^t_{i}(\gamma )&:= G^t_{i}(\gamma +1) - G^t_{i}(\gamma ) \\&= c_i \left( \gamma +1 - x_{i}^t\right) + h \sum \nolimits _{\omega =0}^{\gamma +1}(\gamma +1 -\omega ) \phi ^t_{\omega } \\&\quad + p \sum \nolimits _{\omega =\gamma +2}^\infty (\omega - \gamma -1) \phi ^t_{\omega } \\&\quad -\, c_i ( \gamma - x_{i}^t) - h \sum \nolimits _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega } - p \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }. \end{aligned}
Further derivations yield
\begin{aligned} \frac{\text {d}}{\text {d}\gamma } G^t_{i}(\gamma )&= c_i \left( \gamma +1 - x_{i}^t\right) + h \left[ \sum \nolimits _{\omega =0}^{\gamma }(\gamma -\omega ) \phi ^t_{\omega } + \sum \nolimits _{\omega =0}^{\gamma }\phi ^t_{\omega }\right] \\&\quad + p \left[ \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega } - \sum \nolimits _{\omega =\gamma +1}^\infty \phi ^t_{\omega }\right] - c_i \left( \gamma - x_{i}^t\right) \\&\quad - h \sum \nolimits _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega } - p \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega } \\&= c_i + h \sum \nolimits _{\omega =0}^\gamma \phi ^t_\omega - p \sum \nolimits _{\omega =\gamma +1}^\infty \phi ^t_{\omega }\\&= c_i + h \varPhi ^t_\omega [\gamma ] - p \left( 1 - \varPhi ^t_\omega [\gamma ]\right) . \end{aligned}
In the above, we have used the following equalities
\begin{aligned}&\sum \nolimits _{\omega =0}^{\gamma +1 }(\gamma +1 -\omega ) \phi ^t_{\omega } = \sum \nolimits _{\omega =0}^{\gamma }(\gamma +1 -\omega ) \phi ^t_{\omega } \nonumber \\&\quad = \sum \nolimits _{\omega =0}^{\gamma }(\gamma -\omega ) \phi ^t_{\omega } + \sum \nolimits _{\omega =0}^{\gamma } \phi ^t_{\omega }, \nonumber \\&\qquad \sum \nolimits _{\omega =\gamma +2}^\infty (\omega - \gamma -1) \phi ^t_{\omega }= \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma -1) \phi ^t_{\omega } \nonumber \\&\quad = \sum \nolimits _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega } - \sum \nolimits _{\omega =\gamma +1}^\infty \phi ^t_{\omega }. \end{aligned}
(24)
The order-up-to level $$S_i^t$$ is the optimal $$\gamma$$, which is obtained from solving
\begin{aligned}&\min _\gamma \, \{\gamma | \, \frac{\text {d}}{\text {d}\gamma } G^t_{i}(\gamma ) \ge 0\} \\&\quad = \min _\gamma \,\{ \gamma | \, c_i + h \varPhi ^t_\omega [\gamma ] - p (1 - \varPhi ^t_\omega [\gamma ]) \ge 0\}. \end{aligned}
From the above, we then obtain
\begin{aligned} S_i^t = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-c_i + p }{h +p} \Big \}. \end{aligned}
(25)
To obtain $$s_i^t$$, let us consider the cost of not reordering, which is given by
\begin{aligned} -c_{i}x_{i}^t +G_{i}^t\left( x_{i}^t\right)&= p {\mathbb {E}} \{ \max \left( 0,-\left( x_{i}^t-\omega _{i}^t\right) \right) \}\nonumber \\&\quad +\, h {\mathbb {E}} \{ \max \left( 0,x_{i}^t -\omega _{i}^t\right) \} \nonumber \\&= h {\mathbf {E}}_h\left( x_{i}^t\right) + p {\mathbf {E}}_s\left( x_{i}^t\right) . \end{aligned}
(26)
From the above, we then obtain
\begin{aligned} \begin{array}{ll} s_i^t:= \arg \min _{x_{i}^t} \left\{ x_{i}^t | \, -c_{i}x_{i}^t +G_{i}^t\left( x_{i}^t\right) \le K^t\left( u_{-i}^t\right) -c_{i}S_{i}^t +G_{i}^t\left( S_{i}^t\right) \right\} . \end{array} \end{aligned}
In particular, we have
\begin{aligned} s_i^t&:= \arg \min _{x_{i}^t} \left\{ x_{i}^t | \, h {\mathbf {E}}_h\left( x_{i}^t\right) + p {\mathbf {E}}_s(x_{i}^t)\right. \nonumber \\&\left. \le K^t\left( u_{-i}^t\right) +c_i \left( S_{i}^t - x_{i}^t\right) + h \mathbf{E}_h\left( S_{i}^t\right) + p {\mathbf {E}}_s\left( S_{i}^t\right) \right\} . \end{aligned}
(27)
Equations (25) and (27) represent explicit expressions for the two thresholds and fully characterize then the reordering strategy once the probability distribution of the stochastic demand is given.
Once thresholds are obtained, we implement the control $$u_i^t$$ which is given by
\begin{aligned} u_i^t=\mu (x^t)=\left\{ \begin{array}{ll} S_i^t-x^t, &{} \qquad \hbox {if} \quad x^t < s_i^t,\\ 0, &{} \qquad \hbox {if} \quad x^t \ge s_i^t. \end{array}\right. \end{aligned}
(28)
The resulting dynamics is then
\begin{aligned} x_i^{t+1}=\left\{ \begin{array}{lll} S_i^t-\omega _i^t, &{} \qquad \hbox {if} \quad x_i^t < s_i^t,\\ x_i^t-\omega ^t, &{} \qquad \hbox {if} \quad x_i^t \ge s_i^t. \end{array} \right. \end{aligned}
(29)

## 6 Numerical Analysis

We consider an example where the demand $$\omega ^t \in \varOmega := \{0,1,2\}$$ and is uniformly distributed, namely after introducing the notation $$\phi _\omega$$ to indicate the probability that $$\omega ^t = \omega$$, we have $$\phi _\omega =\frac{1}{3}$$ for $$\omega = 0,1,2$$.
Assume that the proportional purchase cost is $$c=1$$, the shortage cost is $$p=10$$, and the holding cost is $$h=2$$. In the case of single stage optimization, we have that the order up to level is given by
\begin{aligned} S = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-c + p }{h +p} \Big \}. \end{aligned}
From the above, we obtain $$S=2$$. Indeed for $$\gamma =2$$, we have
\begin{aligned} \varPhi ^t_\omega [2] =1 \ge \frac{-c + p }{h +p} = \frac{3}{4}.\end{aligned}
Differently, for $$\gamma =1$$ it holds
\begin{aligned}\varPhi ^t_\omega [1] = \frac{2}{3} \not \ge \frac{-c + p }{h +p} = \frac{3}{4},\end{aligned}
and therefore
\begin{aligned}S=\arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-c + p }{h +p} \Big \} =2.\end{aligned}
As for the reorder level s, we have
\begin{aligned} s&:= \arg \min _{x} \Big \{x | \, h {\mathbf {E}}_h(x) + p {\mathbf { E}}_s(x) \\&\le K^t +c ( S - x ) + h {\mathbf {E}}_h(S) + p {\mathbf {E}}_s(S) \Big \}. \end{aligned}
We show next that we have $$s=1$$. Actually, for $$x=1$$ we obtain
\begin{aligned}&h {\mathbf {E}}_h(1) + p {\mathbf {E}}_s(1) = h \frac{1}{3} + p \frac{1}{3} = 4 \\&\quad \le K^t +c + h {\mathbf {E}}_h(2) + p {\mathbf {E}}_s(2)\\&\quad = K^t + c + h {\mathbf {E}}_h(2) =K^t + 3, \end{aligned}
which is satisfied by any $$K^t \ge 1$$.
For $$x=0$$, we have
\begin{aligned}&h {\mathbf {E}}_h(0) + p {\mathbf {E}}_s(0) = p {\mathbf {E}}_s(0) = 10 \\&\quad \nleq K^t + 2 c + h {\mathbf {E}}_h(2) + p {\mathbf {E}}_s(2)\\&\quad = K^t + 2c + h {\mathbf {E}}_h(2) =K^t + 4, \end{aligned}
which is satisfied by any $$K^t < 6$$. For any $$K^t < 6$$, we then have
\begin{aligned} \begin{array}{ll}s:= \arg \min _{x} \Big \{x | \, h {\mathbf {E}}_h(x) + p {\mathbf {E}}_s(x)\\ \qquad \le K^t +c ( S - x ) + h {\mathbf {E}}_h(S) + p {\mathbf {E}}_s(S) \Big \}=1. \end{array} \end{aligned}
We can conclude then that for any $$K^t$$ such that $$1 \le K^t < 6$$ we have the reorder level $$s=1$$ and the order-up-to level $$S=2$$.
Then, from (29) the microscopic dynamics is defined in the bounded support $$\{-1,0,1,2\}$$, namely $$x^t \in \{-1,0,1,2\}$$ for all $$t\ge 0$$ and is given by
\begin{aligned} x^{t+1}=\left\{ \begin{array}{ll} 2-\omega ^t, &{} \qquad \hbox {if} \quad x^t =-1, 0,\\ x^t-\omega ^t, &{} \qquad \hbox {if} \quad x^t = 1,2. \end{array}\right. \end{aligned}
(30)
Figure 3 displays the time plot of the microscopic dynamics for a single player. In other words, the plot shows the inventory level (the state) of a player. The player’s inventory is for most of the time in state 0 and 1, which is in accordance with the greater values of the distribution in those states.
In the following example, we consider a larger instance involving five agents, where the demand of each agent $$w^t \in \varOmega := \{0,1,\ldots ,20\}$$ and is uniformly distributed.
Assume the same purchase, shortage, and holding costs as in the previous example and consider a transportation cost $$K = 120$$, which will be divided among the active agents at each time $$t \in [0,50]$$.
Figure 4 shows the relation between the inventory levels and the transportation costs that each player is willing to pay in case of reordering as well as the minimum number of active agents in case of replenishment for any inventory level. It is possible to see that the inventory has an inverse relation with the transportation cost and an increase relation with the number of active agents. This means that if the inventory level of agent i is higher, the agent is willing to pay less in case of reordering and hence, it is expected to require a large number of active agents to coordinate with.
The last two figures (Figs. 56) display the inventory level of the five players over time. In Fig. 5, it is possible to see the moment in time when it is most convenient that the players coordinate for replenishment. On the other hand, Fig. 6 exhibits the relation of the inventory level and the number of active agents at each time. It is clear that the agents reorder when its inventory level is lower or equal to the threshold s, which also depends on the number of active agents, and they reorder up to the upper threshold $$S=15$$.

## 7 Conclusions and Future Works

We first developed an abstraction in the form of a dynamic coordination game model where each player’s dynamics is a scalar inventory model characterized by a controlled input and an uncontrolled output. The players have to pay a share of the activation cost to control their dynamics at a given time. First, we showed that if the retailers are rational players, then they benefit from using threshold strategies where the threshold is on the number of active players. We then turned to obtain an explicit expressions for the lower and upper thresholds under specific circumstances. A main key direction for future works is to explore the feasibility of the proposed coordination scheme in multi-vector energy systems (heat, gas, power) with special focus on coalitional bidding in decentralized energy trade. The ultimate goal is to investigate the benefits of aggregating independent wind power producers.

## Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
print
PRINT
Literature
1.
Başar T, Olsder GJ (1995) Dynamic Noncooperative Game Theory, 2nd edn. Academic Press, London MATH
2.
Bauso D, Giarrè L, Pesenti R (2008) Consensus in noncooperative dynamic games: a multi-retailer inventory application. IEEE Trans Autom Control 53(4):998–1003 CrossRef
3.
Bauso D, Giarrè L, Pesenti R (2009) Distributed consensus in noncooperative inventory games. Eur J Oper Res 192(3):866–878
4.
Bauso D, Zhu Q, Başar T (2016) Decomposition and mean-field approach to mixed integer optimal compensation problems. J Optim Theory Appl 169:606–630
5.
Bertsekas DP (1995) Dynamic programming and optimal control, 2nd edn. Athena, Bellmont, MA MATH
6.
Clark A, Scarf S (1960) Optimal policies for a multi-echelon inventory problem. Manag Sci 6(4):475–490 CrossRef
7.
Dekker R, Wildeman RE, van der Duyn Schouten FA (1997) A review of multi-component maintenance models with economic dependence. Math Methods Oper Res 45:344–357
8.
Federgruen A, Groenevelt H, Tijms HC (1984) Coordinated replenishments in a multi-item inventory system with compound poisson demands. Manag Sci 30(3):344–357 CrossRef
9.
Osborne Martin J, Rubinstein Ariel (1994) A course in game theory. MIT Press, Cambridge, MA MATH
10.
Pochet Y, Wolsey LA (1993) Lot sizing with constant batches: formulations and valid inequalities. Math Oper Res 18(4):767–785
Title
Dynamic Coordination Games with Activation Costs
Authors
Stefanny Ramirez
Dario Bauso
Publication date
09-01-2021
Publisher
Springer US
Published in
Dynamic Games and Applications / Issue 3/2021
Print ISSN: 2153-0785
Electronic ISSN: 2153-0793
DOI
https://doi.org/10.1007/s13235-020-00375-8

Go to the issue