Open Access 02032021
Subgame Maxmin Strategies in ZeroSum Stochastic Games with Tolerance Levels
Published in: Dynamic Games and Applications  Issue 4/2021
Abstract
We study subgame \(\phi \)maxmin strategies in twoplayer zerosum stochastic games with a countable state space, finite action spaces, and a bounded and universally measurable payoff function. Here, \(\phi \) denotes the tolerance function that assigns a nonnegative tolerated error level to every subgame. Subgame \(\phi \)maxmin strategies are strategies of the maximizing player that guarantee the lower value in every subgame within the subgamedependent tolerance level as given by \(\phi \). First, we provide necessary and sufficient conditions for a strategy to be a subgame \(\phi \)maxmin strategy. As a special case, we obtain a characterization for subgame maxmin strategies, i.e., strategies that exactly guarantee the lower value at every subgame. Secondly, we present sufficient conditions for the existence of a subgame \(\phi \)maxmin strategy. Finally, we show the possibly surprising result that each game admits a strictly positive tolerance function \(\phi ^*\) with the following property: if a player has a subgame \(\phi ^*\)maxmin strategy, then he has a subgame maxmin strategy too. As a consequence, the existence of a subgame \(\phi \)maxmin strategy for every positive tolerance function \(\phi \) is equivalent to the existence of a subgame maxmin strategy.
1 Introduction
We consider twoplayer zerosum stochastic games with a countable state space, finite action spaces, and a bounded and universally measurable payoff function. Such a game models a repeated interaction between two players with opposite objectives. The environment in which the interaction takes place is fully characterized by the state variable. The transition from one state variable to the next one is influenced by both players as well as an element of chance. Throughout the paper, we take the perspective of the maximizing player. We are interested in strategies of the maximizing player that guarantee the lower value at every subgame and call such strategies subgame maxmin strategies. Under the assumptions as made in the paper, the value may not exist, which explains why we consider the lower value instead.
As the name subgame maxmin strategy suggests, this concept is closely related to the concept of a subgame perfect equilibrium as defined in Selten ([19]). In twoplayer zerosum games where the value exists, for conditions see Maitra and Sudderth ([13]) and Martin ([14]), the notions of a subgame maxmin strategy and a subgame minmax strategy coincide with the notion of a subgame optimal strategy. Moreover, in such games a strategy profile is a subgame perfect equilibrium if and only if it consists of a pair of subgame optimal strategies.
Advertisement
As illustrated by the Big Match, a game introduced in Gillette ([9]) and analyzed in Blackwell and Ferguson ([2]), even if the value exists, it is not guaranteed that optimal strategies exist, so a fortiori, subgame optimal strategies and subgame perfect equilibria may not exist. A large part of the literature therefore focuses on socalled subgame perfect \(\epsilon \)equilibria as defined in Radner ([17]). This equilibrium concept is more permissive than a subgame perfect equilibrium and consists of a strategy pair such that every player obtains the value at each history up to a fixed error term of \(\epsilon /2\).
Instead of having a fixed error term at each subgame, we allow the error term to vary across different subgames. This error term is expressed as a function \(\phi \) of the histories and is called the tolerance function. The central topic of this paper is the concept of a subgame \(\phi \)maxmin strategy. This is a strategy of the maximizing player that guarantees the lower value at every subgame within the allowed tolerance level. Intuitively, a subgame \(\phi \)maxmin strategy performs sufficiently well across all subgames. This type of strategy is related to the concept of \(\phi \)tolerance equilibrium as defined in Flesch and Predtetchinski ([6]). Indeed, if the value exists, then a strategy profile in which both players use a subgame \((\phi /2)\)optimal strategy is a \(\phi \)tolerance equilibrium.
One motivation for letting the tolerance level vary across subgames is given by Mailath, Postlewaite, and Samuelson ([11]) when introducing the concept of a contemporaneous perfect \(\epsilon \)equilibrium. The authors focus on games in which the payoff function of the players is given by the discounted sum of periodic rewards. Due to this discounting, there exists a period after which the maximal total discounted reward a player can receive is smaller than \(\epsilon \). If the allowed tolerance level \(\epsilon \) is fixed across all subgames, any strategy will be an \(\epsilon \)maxmin strategy of a subgame in such a period. Therefore, the concept of subgame \(\epsilon \)maxmin strategy does not impose any restrictions on the actions chosen at a very distant future. The issue here is that it would be more intuitive to discount not only the reward but also the allowed tolerance level.
Additional motivation for letting the tolerance level vary across subgames stems from the fact that the notion of what is considered sufficiently good might be relative. For instance, Tversky and Kahneman [21] observe that people evaluate decisions with respect to a reference level. They find that significantly more people were willing to exert extra effort to save $5 on a $15 purchase than to save $5 on a $125 purchase. To apply this to the context of zerosum games, consider the following game to which we will refer as the high stakeslow stakes game. In the first stage of the game, a chance move determines whether the player will engage in the high stakes or the low stakes variant of this game. The high stakes and the low stakes games are identical in terms of possible strategies. The only difference is that the payoffs in the high stakes game are a thousand fold the payoffs in the low stakes game. Furthermore, assume that in the high stakes subgame, the payoff of player 1 ranges between 0 and 2000 and the value is 1000, while in the low stakes subgame the payoff of player 1 ranges between 0 and 2 and the value is 1. Players that evaluate decisions with respect to a reference level may label a strategy which guarantees a payoff of 999 in the high stakes game as sufficiently good. However, in the low stakes game, 0 corresponds to the minimum payoff. Allowing the tolerance level to vary across subgames can therefore be used to accommodate such behavioral effects into the model of zerosum stochastic games.
Advertisement
Another case where historydependent tolerance levels are natural is the following. In situations that commonly occur, a player may use a familiar strategy irrespective of the scale of the payoffs. To understand this better, imagine a player who is highly experienced in playing a certain zerosum game. He has a trusted strategy which guarantees him the value of this game within some error. Now, consider the high stakes–low stakes game again. The player might well use the trusted strategy in both the low stakes and the high stakes subgame. Therefore, the error related to this strategy will be relative with respect to the value of the respective subgame.
Finally, in the class of stochastic games as identified in Flesch, Thuijsman, and Vrieze ([8]), the only way to obtain \(\epsilon \)optimality is to use strategies that are called improving. Improving strategies are closely related to subgame \(\phi \)maxmin strategies such that the tolerance level in some subgames is smaller than the tolerance level at the root.
With respect to the concept of subgame \(\phi \)maxmin strategies, this paper attempts to provide answers to the following two fundamental questions: To answer these questions we will first derive necessary and sufficient conditions for a strategy to be a subgame \(\phi \)maxmin strategy. This is done in Sect. 4. As a special case of these necessary and sufficient conditions, we obtain a characterization of subgame maxmin strategies. For the special class of positive and negative stochastic games, a related characterization of subgame maxmin strategies was obtained by Flesch, Predtetchinski, and Sudderth ([7]).
1.
For positive tolerance functions \( \phi ,\) when does a subgame \(\phi \)maxmin strategy exist?
2.
How is the existence of a subgame \(\phi \)maxmin strategy related to the existence of a subgame maxmin strategy?
The necessary and sufficient conditions for strategies to be subgame \(\phi \)maxmin can be split into a local condition and an equalizing condition. Informally, the local condition states that the lower value one expects to get in the next subgame should always be at least the lower value of the current subgame. The equalizing condition requires that for every strategy of the other player, a subgame \( \phi \)maxmin strategy almost surely results in a play with an eventually good enough payoff, where eventually good enough means being at least the lower value in very deep subgames up to the allowed tolerance level.
In Sect. 5, we consider the question of existence of subgame \(\phi \)maxmin strategies for positive tolerance functions \(\phi \). We prove that for a positive tolerance function \(\phi \), a subgame \(\phi \)maxmin strategy exists if every play is either a point of upper semicontinuity of the payoff function or if the sequence of tolerance levels which occur along the play has a positive lower bound. The novel aspect of the theorem is the way in which it encompasses a number of important special cases, some of which have been studied extensively in the literature. One special case of interest is when the payoff function is upper semicontinuous everywhere. In this case, the existence of subgame maxmin strategies follows by a result of Laraki, Maitra, and Sudderth ([10]). Another is the case when along each play tolerance levels remain bounded away from zero. This latter case subsumes the familiar class of all positive constant tolerance functions. When the tolerance level is equal to a constant \(\epsilon \) at each history, subgame \(\phi \)maxmin strategies are exactly the socalled subgame \(\epsilon \)optimal strategies. Existence of subgame \(\epsilon \)optimal strategies for each \(\epsilon > 0\) is established in Mashiah–Yaakovi ([15]).
Our technique relies on switching between strategies: one starts by following a \(\tfrac{\phi (x_{0})}{2}\)optimal strategy until a history (say h) is reached where it ceases to be \(\phi (h)\)optimal. At that moment, one discards the original strategy and switches instead to a \(\tfrac{\phi (h)}{2}\)optimal strategy, and so on. This technique is wellattested in the literature, and in particular is used in Mashiah–Yaakovi ([15]). We refine the method to accommodate both types of plays: those where the tolerance levels remain bounded away from zero, and those where they converge to zero (the latter are, by assumption, the plays where the payoff function is upper semicontinuous).
In Sect. 6, we study the relation between the existence of subgame \(\phi \)maxmin strategies and subgame maxmin strategies. Our main result states that the existence of a particular subgame \(\phi ^*\)maxmin strategy, with \(\phi ^*>0\), is equivalent to the existence of a subgame maxmin strategy. Consequently, the existence of subgame \(\phi \)maxmin strategies for all positive tolerance functions \(\phi \) is equivalent to the existence of a subgame maxmin strategy. In the special case of an upper semicontinuous payoff function, the results of Sects. 5 and 6 imply the existence of a subgame maxmin strategy. As noted above, this result is not new, and is a consequence of Laraki, Maitra, and Sudderth ([10]).
The connection between existence of subgame \(\phi \)maxmin strategies for every positive tolerance function \(\phi \) and the existence of subgame maxmin strategies is not only useful to further understand the results obtained by Laraki, Maitra, and Sudderth ([10]) but also highlights an important and surprising difference between subgame \(\phi \)maxmin strategies and the closely related concept of subgame \(\epsilon \)maxmin strategies. Indeed, the existence of a subgame \(\epsilon \)maxmin strategy for every \(\epsilon >0\) does not imply the existence of a subgame maxmin strategy.
The rest of the paper is structured as follows. In Sect. 2, we formulate the model setup, and in Sect. 3, we formally define the main concepts. Then in Sect. 4, we discuss the necessary and sufficient conditions for a strategy to be a subgame \(\phi \)maxmin strategy and give a characterization for subgame maxmin strategies. We continue in Sect. 5 by providing sufficient conditions to guarantee the existence of a subgame \(\phi \)maxmin strategy. Then in Sect. 6, we explain why the existence of subgame \(\phi \)maxmin strategies for every positive tolerance function \(\phi \) is equivalent to existence of a subgame maxmin strategy. Finally, in Sect. 7, we discuss the importance of the assumptions we made and whether they might be relaxed.
2 TwoPlayer ZeroSum Stochastic Games
We consider a twoplayer zerosum stochastic game with finitely many actions and countably many states. The payoff function is required to be bounded and universally measurable. The model encompasses all twoplayer zerosum games with perfect information and stochastic moves.
Actions, states, and histories The action sets of players 1 and 2 are given by the finite sets \({\mathcal {A}} \) and \( {\mathcal {B}}, \) respectively. The state space is given by the countable set \( {\mathcal {X}}. \) Let \(x_{0}\) denote the initial state and define the set \({\mathcal {Z}} = {\mathcal {A}}\times {\mathcal {B}}\times {\mathcal {X}}\). The game consists of an infinite sequence of oneshot games. At the initial state \(x_{0},\) the oneshot game \(G(x_{0})\) is played as follows: Players 1 and 2 simultaneously select an action from their respective action sets, denoted by \( a_{1} \) and \( b_{1}, \) respectively. Then, the next state \(x_1\) is selected according to the transition probability \(q(\cdot \vert x_{0},a_1,b_1)\). At the new state \(x_1,\) this process repeats itself and both players play the oneshot game \(G(x_1)\) by selecting actions \(a_2\) and \(b_2\) from their respective action sets. The next state \(x_2\) is selected according to the transition probability \(q(\cdot \vert x_{0},a_1,b_1,x_{1},a_{2},b_{2})\). This goes on indefinitely and creates a play \(p=(x_{0},a_1,b_1,x_1,a_2,b_2,\ldots )\). Note that the transition probability can depend on the entire history.
For every \(t\in {\mathbb {N}}=\lbrace 0,1,2,\ldots \rbrace \), let \({\mathcal {H}}^t=\lbrace x_{0}\rbrace \times {\mathcal {Z}}^{t}\) denote the set of all histories that are generated after t oneshot games. The set \({\mathcal {H}}^0\) consists of the single element \(x_{0}\). For \(t \ge 1,\) elements of \({\mathcal {H}}^t\) are of the form \((x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\). Let \({\mathcal {H}} = \cup _{t\in {\mathbb {N}}}{\mathcal {H}}^t\) denote the set of all histories. For all \(h \in {\mathcal {H}},\) let \(\Vert h\Vert = \Vert (x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\Vert = t\) denote the number of oneshot games that occurred during the history h. We refer to \(\Vert h\Vert \) as the length of the history h. For all \(t\le \Vert h\Vert , \) the restriction of the history h to the first t oneshot games is denoted by \(h_{\vert t}=(x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\). We write \(h' \preceq h\) if there exists \(t\le \Vert h\Vert \) such that \(h_{\vert t}=h'\), so if the history h extends the history \(h'\).
The space of plays Define \({\mathcal {P}} = \lbrace x_{0}\rbrace \times {\mathcal {Z}}^{{\mathbb {N}}}\) to be the set of plays. Elements of \({\mathcal {P}}\) are of the form \(p = (x_{0},a_1,b_1,x_1,a_2,b_2,\ldots )\). For \(t \in {\mathbb {N}},\) let \(p_{\vert t}\) denote the prefix of p of length t: that is \(p_{\vert 0} = x_{0}\) and \(p_{\vert t} = (x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\) for \(t \ge 1\). We write \(h \prec p\) if a history h is a prefix of p. For every \(h \in {\mathcal {H}},\) let \({\mathcal {P}}(h) = \lbrace p \in {\mathcal {P}}\vert h\prec p\rbrace \) denote the cylinder set consisting of all plays which extend history h.
We endow \({\mathcal {Z}}\) with the discrete topology and \({\mathcal {P}}\) with the product topology. The collection of all cylinder sets is a basis for the product topology on \({\mathcal {P}}\).
For \(t \in {\mathbb {N}}\), let \({\mathcal {F}}^t\) be the sigmaalgebra generated by the collection of cylinder sets \(\{{\mathcal {P}}(h) \mid h \in {\mathcal {H}}^{t}\}\). Each set in \({\mathcal {F}}^t\) can be written as the union of sets in \(\{{\mathcal {P}}(h) \mid h \in {\mathcal {H}}^{t}\}\). Let \({\mathcal {F}}^\infty \) be the sigmaalgebra generated by \(\displaystyle \cup _{t\in {\mathbb {N}}} {\mathcal {F}}^t\). This is exactly the Borel sigmaalgebra generated by the product topology on \({\mathcal {P}}\). The sigmaalgebra of universally measurable subsets of \({\mathcal {P}}\) is denoted by \({\mathcal {F}}\). Elements of \({\mathcal {F}}\) are sets that belong to the completion of each Borel probability measure on \({\mathcal {P}}\). For details of the definition of the sigmaalgebra \({\mathcal {F}}, \) the reader is referred to Appendix A. It holds that \({\mathcal {F}}^{t} \subseteq {\mathcal {F}}^{t+1} \subseteq \cdots \subseteq {\mathcal {F}}^\infty \subseteq {\mathcal {F}}\). The set of plays \({\mathcal {P}}\) together with the universally measurable sigmaalgebra \({\mathcal {F}}\) determines a measurable space \(({\mathcal {P}},{\mathcal {F}})\). A stochastic variable is a universally measurable function from \({\mathcal {P}}\) to \({\mathbb {R}}\).
Strategies Let \(\Delta ({\mathcal {A}})\) denote the set of probability measures over the action set of player 1 and \(\Delta ({\mathcal {B}})\) the set of probability measures over the action set of player 2. A behavioral strategy for player 1 is a function \(\sigma : {\mathcal {H}} \rightarrow \Delta ({\mathcal {A}})\). Hence, at each history player 1 chooses a mixed action. A pure strategy for player 1 is a function that assigns to every history an action, with a minor abuse of notation, \(\sigma :{\mathcal {H}}\rightarrow {\mathcal {A}}\). Similarly, one can define a behavioral and a pure strategy \(\tau \) for player 2. Let \({\mathcal {S}}_1\) and \({\mathcal {S}}_2\) denote the sets of behavioral strategies of players 1 and 2, respectively.
It follows from the Ionescu Tulcea extension theorem that every history \(h\in {\mathcal {H}}\) and strategy profile \((\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times {{{\mathcal {S}}}}_{2}\) determine a probability measure \({\mathbb {P}}_{h,\sigma ,\tau }\) on the measurable space \(({\mathcal {P}}(h),{\mathcal {F}}^\infty _{{\mathcal {P}}(h)})\), where \({\mathcal {F}}^\infty _{{\mathcal {P}}(h)}\) denotes the Borel sigmaalgebra over the set of plays extending the history h. The measure \({\mathbb {P}}_{h,\sigma ,\tau }\) can be extended to the measurable space \(({\mathcal {P}},{\mathcal {F}}^\infty )\) in the obvious way. Taking the completion of the probability space \(({\mathcal {P}},{\mathcal {F}}^\infty ,{\mathbb {P}}_{h,\sigma ,\tau })\) yields the probability space \(({\mathcal {P}}, {\mathcal {F}}, {\mathbb {P}}_{h,\sigma ,\tau }^{\mathrm{c}})\). With a minor abuse of notation, we will omit the superscript \(\mathrm{c}\) and write \({\mathbb {P}}_{h,\sigma ,\tau }\) in the remainder of this paper.
Payoff function We assume that the payoff function \(u:{\mathcal {P}}\rightarrow {\mathbb {R}}\) of player 1 is bounded, where we define \( M = \sup _{p\in {\mathcal {P}}} \vert u(p)\vert , \) and universally measurable. We also assume the game to be zerosum. The payoff function of player 2 is therefore given by \(u\). We denote the game as described above by \(\Gamma _{x_{0}}(u)\). Throughout the paper, we take the point of view of player 1. This is without loss of generality, since the situation of player 2 in the game \(\Gamma _{x_{0}}(u)\) is identical to that of player 1 in the game \(\Gamma _{x_{0}}(u)\).
The expected payoff of player 1 corresponding to strategy profile \((\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times {{{\mathcal {S}}}}_{2} \) at history \( h \in {{{\mathcal {H}}}} \) is given by \({\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] , \) where the expectation is taken with respect to the probability measure \({\mathbb {P}}_{h,\sigma ,\tau }\). The expected payoff of player 1 at the history \(x_0\) is denoted by \({\mathbb {E}}_{\sigma ,\tau }\left[ u\right] \).
We imposed the condition above that the payoff function u is universally measurable. As Borel measurable functions are also universally measurable, this condition is weaker than requiring that u be Borel measurable. The main difference between these two conditions is the following. When the payoff function u is Borel measurable, the game is known to admit a value (cf. Sect. 3 for the formal definition). However, it is not provable from the standard ZFC axioms of set theory^{1} that our games with a universally measurable payoff function also admit a value. Therefore, instead of the value, in the remaining part of the paper we will use the lower value (cf. Sect. 3 for the formal definition). We would like to emphasize that except for the difference between the concepts of the value and the lower value, assuming Borel measurable payoffs would not lead to any simplification in the proofs. So the reader unfamiliar with universally measurable functions can read the entire paper imagining Borel measurable payoffs.
3 Subgame \(\phi \)Maxmin Strategies
We start this section by defining subgame \(\phi \)maxmin strategies. Then, we discuss an illustrative example. Finally, we define and examine guarantee levels of strategies.
3.1 Definition of Subgame \(\phi \)Maxmin Strategies
For every game \(\Gamma _{x_{0}}(u), \) for every history \(h\in {\mathcal {H}}, \) we define the lower value \({\underline{v}}(h)\) and the upper value \({\overline{v}}(h)\) byBecause the payoff function u is assumed to be bounded, we have that \({\underline{v}}(h), {\overline{v}}(h)\in {\mathbb {R}}\). Therefore, the lower and upper value exist in every subgame of \(\Gamma _{x_{0}}(u). \) Furthermore, we have that \({\underline{v}}(h)\le {\overline{v}}(h)\). Whenever \({\underline{v}}(h)= {\overline{v}}(h)\) we say that the subgame at history h has a value and we denote it by v(h). The lower value, the upper value, and the value at the initial state \(x_{0}\) are denoted by \({\underline{v}}\), \({\overline{v}},\) and v, respectively. If u is Borel measurable, then the value exists by Maitra and Sudderth ([13]) and Martin ([14]). Since we do not assume u to be Borel measurable, we present our results in terms of the lower value.
$$\begin{aligned} {\underline{v}}(h)&=\sup \limits _{\sigma \in {\mathcal {S}}_1}\inf \limits _{\tau \in {\mathcal {S}}_2} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] , \end{aligned}$$
(3.1)
$$\begin{aligned} {\overline{v}}(h)&=\inf \limits _{\tau \in {\mathcal {S}}_2}\sup \limits _{\sigma \in {\mathcal {S}}_1} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] . \end{aligned}$$
(3.2)
Even if the value exists, player 1 may not have a strategy that guarantees the value in each subgame and the literature has therefore studied subgame \(\epsilon \)optimal strategies. These are strategies which guarantee the value in each subgame up to an allowed error term of \(\epsilon \). If the payoff function is bounded and Borel measurable, it has been shown by Mashiah–Yaakovi ([15]) that for each \(\epsilon >0\) player 1 has a subgame \(\epsilon \)optimal strategy. The concept of a subgame \(\epsilon \)optimal strategy has a constant error term \(\epsilon \) across all subgames. However, as argued in the introduction, there are instances in which it is more natural to consider a variable error term. Therefore, instead of considering a constant error term, we follow Flesch and Predtetchinski ([6]), who allow the error term to vary across histories in their investigation of the \(\phi \)tolerance equilibrium. This leads us to the study of subgame \(\phi \)maxmin strategies, where \(\phi :{\mathcal {H}}\rightarrow [0,\infty )\) is a tolerance function assigning an allowed tolerance level to each history.
Definition 3.1
Let \(\phi :{\mathcal {H}}\rightarrow {[}0,\infty )\) be a tolerance function. A strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi \)maxmin strategy in the game \( \Gamma _{x_{0}}(u) \) if for every history \(h\in {\mathcal {H}}\) it holds that
$$\begin{aligned} \forall \tau \in {\mathcal {S}}_2,~{\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] \ge {\underline{v}}(h)\phi (h). \end{aligned}$$
(3.3)
In case \( \phi \) is identically equal to zero, we omit it from the notation, and simply refer to a subgame maxmin strategy.
A subgame \(\phi \)maxmin strategy guarantees at each history h of the game the lower value up to the tolerance level \(\phi (h)\). If the tolerance function is such that for some \( \epsilon \ge 0, \) for every \(h\in {\mathcal {H}}\), \(\phi (h)=\epsilon \), then we refer to the strategy as a subgame \( \epsilon \)maxmin strategy. If the value exists, then the notion of subgame \(\epsilon \)maxmin strategy coincides with the notion of subgame \(\epsilon \)optimal strategy.
×
The following example illustrates that even for a strictly positive tolerance function \(\phi \) player 1 may not have subgame \(\phi \)maxmin strategies. Interestingly, however, player 1 has a subgame \(\epsilon \)maxmin strategy for every positive \(\epsilon >0, \) which in fact is pure. We also give an example of a tolerance function \( \phi \) for which player 1 has a subgame \(\phi \)maxmin strategy, but not a pure such strategy.
Example 3.2
The decision problem depicted in Fig. 1 corresponds to a twoplayer zerosum stochastic game in which the state space is trivial and the second player is a dummy player. Whenever the state space or action sets are degenerate, the corresponding states and actions are omitted from the notation in examples. The set of actions of player 1 is \( {{{\mathcal {A}}}} = \{c,q\}\), where c stands for continue and q for quit. The game stops as soon as player 1 chooses to quit. If player 1 decides to quit at period t, then his payoff is \( t/(t+1)\). If player 1 never quits, his payoff is 0.
In this game, player 1 has a subgame \(\epsilon \)maxmin strategy for every positive \(\epsilon >0.\) For example, the strategy which quits whenever quitting leads to a payoff of at least \(1\epsilon \).
We now turn to the existence of a subgame \(\phi \)maxmin strategy. Clearly, there exist no subgame maxmin strategy. As we will see later, Theorem 6.1 then implies that there is some strictly positive tolerance function \(\phi \) for which there does not exist a subgame \(\phi \)maxmin strategy. Intuitively, such a tolerance function forces player 1 to continue with such a large probability that the total probability of never quitting becomes nearly one.
Interestingly, there is a tolerance function \(\phi \) for which player 1 has a subgame \(\phi \)maxmin strategy, but for which he has no pure strategy that is subgame \(\phi \)maxmin. To see this, define the tolerance function \(\phi \) by \(\phi (c^t)=1\frac{1}{2}\cdot \frac{t}{t+1}\frac{1}{2}\cdot \frac{t+1}{t+2}\) for each \(t\in {\mathbb {N}}\). Now consider the strategy \(\sigma \) which always chooses action c with probability 1/2 and action q with probability 1/2. In the subgame at any history \(c^t\), it quits immediately with probability 1/2 and quits at a later period with probability 1/2. In the former case, it gives payoff \(\frac{t}{t+1}\), whereas in the latter case, it gives an expected payoff of at least \(\frac{t+1}{t+2}\). Hence, the strategy \(\sigma \) is a subgame \(\phi \)maxmin strategy. Yet, there is no pure strategy that is subgame \(\phi \)maxmin. Indeed, the tolerance function does not allow a pure strategy to quit at any period, whereas a pure strategy that always continues is not subgame \( \phi \)maxmin either. As we show in the working paper version of our paper, Flesch, Herings, Maes, and Predtetchinski ([5]), a pure subgame \(\phi \)maxmin strategy exists if and only if the following two conditions hold: (1) for every \(t\in {\mathbb {N}}, \) \( \phi (c^t)>0, \) and (2) for infinitely many \( t \in {\mathbb {N}}, \) \( \phi ^{\prime }(c^{t}) {=} \displaystyle \min _{n \le t}\phi (c^n)\ge {\frac{1}{t+1}}\). \(\square \)
3.2 Guarantee Levels
To identify subgame \(\phi \)maxmin strategies, it is useful to define the function \({\underline{u}}: {\mathcal {S}}_1\times {\mathcal {H}}\rightarrow {\mathbb {R}}\) byThe payoff \({\underline{u}}(\sigma ,h)\) corresponds to the guarantee level that player 1 is expected to receive at history h when playing the strategy \(\sigma \). A strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is called a \(\phi (h)\)maxmin strategy for the subgame at history h if \({\underline{u}}(\sigma ,h) \ge {\underline{v}}(h)  \phi (h)\).
$$\begin{aligned} {\underline{u}}(\sigma ,h)=\inf _{\tau \in {\mathcal {S}}_2} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] . \end{aligned}$$
(3.4)
For every strategy profile \((\sigma ,\tau ) \in {\mathcal {S}}_{1} \times {\mathcal {S}}_{2}\), for every \(t \in {\mathbb {N}}\), define the stochastic variables \(U_{\sigma ,\tau }^{t}\), \({\underline{U}}_{\sigma }^{t}\), and \({\underline{V}}^{t}\) by letting \(U_{\sigma ,\tau }^{t}(p) = {\mathbb {E}}_{p_{\vert t},\sigma ,\tau }[u]\), \({\underline{U}}_{\sigma }^{t}(p) = {\underline{u}}(\sigma ,p_{\vert t})\), and \({\underline{V}}^{t}(p) = {\underline{v}}(p_{\vert t})\), respectively, for each play \(p \in {\mathcal {P}}\). All three stochastic variables are \({\mathcal {F}}^{t}\)measurable. We have \({\underline{U}}_{\sigma }^{t} \le U_{\sigma ,\tau }^{t}\) and \({\underline{U}}_{\sigma }^{t} \le {\underline{V}}^{t}\) everywhere on \({\mathcal {P}}\).
The next lemma states the submartingale property of guarantee levels. It says that the guarantee level that player 1 can expect to receive increases over time.
Lemma 3.3
(Submartingale property of guarantee levels) Let a strategy profile \((\sigma , \tau ) \in {{{\mathcal {S}}}}_{1} \times \mathcal{S}_{2}, \) \(t \in {\mathbb {N}}, \) and a history \(h \in {\mathcal {H}}^{t}\) of length t be given. The process \(({\underline{U}}_{\sigma }^{t + n})_{n \in {\mathbb {N}}}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)submartingale. In particular, it holds that \({\underline{u}}(\sigma ,h) \le {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+1}]\).
Proof
We first prove the second claim. Take \(\delta > 0. \) Let \(\tau ' \in {\mathcal {S}}_2\) be such that \( \tau ^{\prime }(h) = \tau (h) \) and for each \((a,b,x) \in {\mathcal {Z}} \) it holds thatWe have thatIt follows that \( {\underline{u}}(\sigma ,h) \le {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+1}] \) since \( \delta > 0 \) is arbitrary.
$$\begin{aligned}{\mathbb {E}}_{(h,a,b,x),\sigma ,\tau '}[u] \le {\underline{u}}(\sigma ,(h,a,b,x)) + \delta .\end{aligned}$$
$$\begin{aligned} {\underline{u}}(\sigma ,h)&\le {\mathbb {E}}_{h,\sigma ,\tau '}[u]\\&= \sum _{(a,b,x) \in {\mathcal {Z}}}\sigma (h)(a) \cdot \tau (h)(b) \cdot q(xh,a,b) \cdot {\mathbb {E}}_{(h,a,b,x),\sigma ,\tau '}[u]\\&\le \sum _{(a,b,x) \in {\mathcal {Z}}} \sigma (h)(a) \cdot \tau (h)(b) \cdot q(xh,a,b) \cdot ({\underline{u}}(\sigma ,(h,a,b,x)) + \delta )\\&= {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+1}] + \delta . \end{aligned}$$
To prove that the process \(({\underline{U}}_{\sigma }^{t + n})_{n \in {\mathbb {N}}}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)submartingale, note that the conditional expectation \({\mathbb {E}}_{h,\sigma ,\tau }({\underline{U}}_{\sigma }^{t+n+1} \mid {\mathcal {F}}^{t+n})\), when evaluated at a play \(p \in {\mathcal {P}}(h)\), equals \({\mathbb {E}}_{p\vert _{t+n},\sigma ,\tau }({\underline{U}}_{\sigma }^{t+n+1})\), which, by the claim already proven, is at least \({\underline{u}}(\sigma ,p\vert _{t+n}) = {\underline{U}}_{t+n}(\sigma )(p)\). \(\square \)
4 Conditions for Strategies to be Subgame \(\phi \)Maxmin
4.1 nDay Maxmin Strategies and Equalizing Strategies
The goal of this section is to provide necessary and sufficient conditions for a strategy to be subgame \(\phi \)maxmin and to provide a characterization of subgame maxmin strategies.
Definition 4.1
A strategy \(\sigma \in {\mathcal {S}}_{1}\) is an nday \(\phi \)maxmin strategy in the game \(\Gamma _{x_{0}}(u)\) if for every \(t \in {\mathbb {N}}\), for every history \(h \in {\mathcal {H}}^{t}\) of length t, and for every strategy \(\tau \in {\mathcal {S}}_2\),
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{V}}^{t+n}]\ge {\underline{v}}(h)\phi (h). \end{aligned}$$
(4.1)
Definition 4.2
A strategy \(\sigma \in {\mathcal {S}}_{1}\) is \(\phi \)equalizing in the game \( \Gamma _{x_{0}}(u) \) if for every \(t \in {\mathbb {N}}\), for every history \(h \in {\mathcal {H}}^{t}\) of length t, and for every strategy \(\tau \in {\mathcal {S}}_2, \)
$$\begin{aligned} u\ge \limsup _{t \rightarrow \infty } {\underline{V}}^{t}\phi (h), \quad {\mathbb {P}}_{h,\sigma ,\tau } \text{almost } \text{ surely. } \end{aligned}$$
(4.2)
When \(\phi = 0\), we use the terms nday maxmin and equalizing to mean nday 0maxmin and 0equalizing, respectively.
The first definition is very intuitive. It says that player 1 should play in such a way that, on average, the lower value increases over time. The notion of 1day maxmin strategies is particularly wellknown in dynamic programming and stochastic games, see Puterman ([16]). A simple characterization of 1day maxmin strategies is provided in following theorem.
Theorem 4.3
Consider a strategy \(\sigma \in {\mathcal {S}}_{1}\) in the game \(\Gamma _{x_{0}}(u)\). The following three conditions are equivalent:
1.
For each \(n \in {\mathbb {N}}\), \(\sigma \) is an nday maxmin strategy.
2.
\(\sigma \) is a 1day maxmin strategy.
3.
For each history \(h \in {\mathcal {H}}^{t}\) of length t and each strategy \(\tau \in {\mathcal {S}}_{2},\) the process \(({\underline{V}}^{t + n})_{n \in {\mathbb {N}}}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)submartingale.
Proof
That condition 1 implies condition 2 is obvious. That condition 2 implies condition 3 follows by the law of iterated expectations. Finally, that condition 3 implies condition 1 follows from the properties of a submartingale. \(\square \)
Note that the equivalence in Theorem 4.3 is no longer true for nonzero tolerance functions. For example, if \(\phi >0\), then a 1day \(\phi \)maxmin strategy is generally not 2day \(\phi \)maxmin.
A strategy is \(\phi \)equalizing if, roughly speaking, it almost surely results in a play with an eventually good enough payoff, where eventually good enough means being about as large as the lower value in very deep subgames.
×
The following example illustrates both the notion of a 1day maxmin strategy and an equalizing strategy.
Example 4.4
Consider the centipede game shown in Fig. 2. At every history, the active player can choose to continue (c) or to quit (q). As soon as a player decides to quit the game ends and in that case the payoff is as given in Fig. 2. If the game continues indefinitely then the payoff is 0. It is easily verified that for every \( t \in {\mathbb {N}}, \) \( {\underline{v}}(c^{2t}) = {\underline{v}}(c^{2t+1}) = (t+1)/(t+2). \) A pure strategy \(\sigma \in {\mathcal {S}}_1\) is then easily seen to be a 1day maxmin strategy if and only if \(\sigma (c^{2t})=c\) for every \(t\in {\mathbb {N}}\).
On the other hand, a pure strategy \(\sigma \in {\mathcal {S}}_{1}\) is equalizing if and only if for infinitely many \( t \in {\mathbb {N}} \) it holds that \(\sigma (c^{2t})=q.\) Indeed, for a strategy \( \sigma \) such that q is chosen for infinitely many \( t \in {\mathbb {N}} \) the generated play ends with q after each history \( h \in \mathcal{H} \) irrespective of \( \tau \in {{{\mathcal {S}}}}_{2}, \) which shows that \( \sigma \) is equalizing. On the other hand, if \( \sigma \) is such that q is chosen for finitely many \( t \in {\mathbb {N}}, \) then there is a history \( h \in {{{\mathcal {H}}}} \) such that player 1 always plays continue after this history. If \( \tau \in {{{\mathcal {S}}}}_{2} \) is such that player 2 always chooses continue, then the play \( c^{\infty } \) with \( u(c^{\infty }) = 0 \) results, which is strictly less than \( \limsup _{t\rightarrow \infty } {\underline{v}}(c^t)=1, \) so such a \( \sigma \) is not equalizing.
It follows that player 1 does not have a pure strategy which is both 1day maxmin and equalizing. \(\square \)
The following theorem states sufficient conditions under which a strategy \( \sigma \) of player 1 is a subgame \(\phi \)maxmin strategy.
Theorem 4.5
(Sufficient condition) Let \(\phi :{\mathcal {H}}\rightarrow [0,\infty )\) be a tolerance function. The strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi \)maxmin strategy in the game \(\Gamma _{x_{0}}(u)\) if there exist tolerance functions \(\phi _1:{\mathcal {H}}\rightarrow [0,\infty )\) and \(\phi _2:{\mathcal {H}}\rightarrow [0,\infty )\) such that \( \phi _1+\phi _2 \le \phi \) and
1.
for every \(n\in {\mathbb {N}}\), \(\sigma \) is nday \(\phi _1\)maxmin,
2.
\(\sigma \) is \(\phi _2\)equalizing.
Proof
Let \(\phi _1, \) \(\phi _2, \) and \( \sigma \) be such that the conditions in the theorem are satisfied. We show that \(\sigma \) is a subgame \(\phi \)maxmin strategy.
Fix a history \(h\in {\mathcal {H}}^{t}\) and a strategy \(\tau \in {\mathcal {S}}_2\). Then, we have thatwhere the second inequality holds since \(\sigma \) is an nday \(\phi _1\)maxmin strategy, the third inequality is by Fatou lemma, and the last inequality holds since \(\sigma \) is \(\phi _2\)equalizing. \(\square \)
$$\begin{aligned} {\underline{v}}(h)\phi (h)&\le {\underline{v}}(h)\phi _1(h)\phi _2(h)\\&\le \limsup _{n\rightarrow \infty }{\mathbb {E}}_{h,\sigma ,\tau }[{\underline{V}}^{t+n}] \phi _2(h)\\&\le {\mathbb {E}}_{h,\sigma ,\tau }[\limsup _{n\rightarrow \infty } {\underline{V}}^{t+n}] \phi _2(h)\\&\le {\mathbb {E}}_{h,\sigma ,\tau }[u], \end{aligned}$$
According to Theorem 4.5, to conclude that \( \sigma \) is a subgame \( \phi \)maxmin strategy, it is enough to find tolerance functions \( \phi _1 \) and \( \phi _2 \) such that at each history their sum does not exceed \(\phi \) and the strategy \( \sigma \) is both nday \(\phi _1\)maxmin and \(\phi _2\)equalizing.
Particularly natural are situations where the tolerance level does not increase as time progresses. More formally, the tolerance function \(\phi \) is said to be nonincreasing if \(\phi (h) \ge \phi (h')\) whenever \(h \prec h'\). The following result states necessary conditions for a strategy to be subgame \( \phi \)maxmin.
Theorem 4.6
(Necessary condition) Let \(\sigma \in {{{\mathcal {S}}}}_{1} \) be a subgame \(\phi \)maxmin strategy in the game \(\Gamma _{x_{0}}(u).\) Then, it holds that:
1.
For every \(n\in {\mathbb {N}}\), \(\sigma \) is an nday \(\phi \)maxmin strategy.
2.
If the tolerance function \(\phi \) is nonincreasing, then \(\sigma \) is \(\phi \)equalizing.
Proof
Let \(\sigma \in {{{\mathcal {S}}}}_{1} \) be a subgame \(\phi \)maxmin strategy in the game \(\Gamma _{x_{0}}(u). \) Take a history \(h \in {\mathcal {H}}^{t}\) of length t and a strategy \( \tau \in \mathcal{S}_{2}. \)
We prove condition 1. We havewhere the first inequality holds since for each play \(p \in {\mathcal {P}}(h)\) we have \({\underline{V}}^{t+n}(p) = {\underline{v}}(p_{\vert t + n}) \ge {\underline{u}}(\sigma ,p_{\vert t + n}) = {\underline{U}}_{\sigma }^{t+n}(p)\). The second inequality holds by Lemma 3.3. The next equation holds since t is the length of history h, and the final inequality holds since \(\sigma \) is a subgame \(\phi \)maxmin strategy.
$$\begin{aligned}{\mathbb {E}}_{h,\sigma ,\tau }[{\underline{V}}^{t+n}] \ge {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+n}] \ge {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t}] = {\underline{u}}(\sigma ,h) \ge {\underline{v}}(h)\phi (h),\end{aligned}$$
We prove condition 2. We have, \({\mathbb {P}}_{h,\sigma ,\tau }\)almost surely,where the equality is by Levy’s zeroone law (Lemma A.1 in Appendix A), the first inequality follows since \( \sigma \) is a subgame \( \phi \)maxmin strategy, and the second inequality holds since \(\phi \) is nonincreasing. \(\square \)
$$\begin{aligned}u(p) = \lim _{t \rightarrow \infty }{\mathbb {E}}_{p_{\vert t},\sigma ,\tau }[u] \ge \limsup _{t \rightarrow \infty }({\underline{v}}(p_{\vert t})\phi (p_{\vert t})) \ge \limsup _{t \rightarrow \infty }{\underline{V}}^{t}(p)\phi (h),\end{aligned}$$
Notice that in the case of a nonzero tolerance function, the necessary and sufficient conditions do not coincide, and we do not obtain a characterization of subgame \( \phi \)maxmin strategies. We now turn to the case where the tolerance function \(\phi \) is identically equal to 0.
Corollary 4.7
A strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame maxmin strategy in the game \(\Gamma _{x_{0}}(u)\) if and only if it is 1day maxmin and equalizing.
When we compare the sufficient conditions of Theorem 4.5 to the sufficient conditions of Corollary 4.7, we notice that in the case of a nonzero tolerance function we require the strategy to be nday \(\phi \)maxmin. In the case of a zero tolerance function, the corresponding sufficient conditions only require the strategy to be 1day maxmin. The reason for this difference is that we should avoid that strategies accumulate the allowed tolerance levels, causing them to become too permissive over time. The following example illustrates this issue.
×
Example 4.8
Consider the decision problem depicted in Fig. 3. Each period the decision maker can choose to continue (c) or to quit (q). Notice that \( v(c^t)= 1/(t+1) \) and hence \(\displaystyle \lim _{t\rightarrow \infty } v(c^t)=0\). Any strategy of player 1 is therefore equalizing. Furthermore, it is clear that in this decision problem the subgame maxmin strategy is unique and requires the decision maker to quit at each time. Now suppose the decision maker has the following tolerance function:Consider the strategy \(\sigma \) where the decision maker always chooses to continue. From the definition of the tolerance function, it follows that the strategy \(\sigma \) is a 1day \(\phi \)maxmin strategy. Indeed, it holds that \(v(c^{t+1})\ge v(c^t)\phi (c^t)\). It is also easily seen that the strategy \( \sigma \) is equalizing. Nevertheless, it is clear that \( \sigma \) is not a subgame \(\phi \)maxmin strategy.
$$\begin{aligned} \phi (c^t)=v(c^t)v(c^{t+1})={\frac{1}{t+1}}{\frac{1}{t+2}}, \quad t \in {\mathbb {N}}. \end{aligned}$$
The underlying problem with the strategy \( \sigma \) is that every time the decision maker chooses to continue this causes an acceptable loss in the value, but over time these losses add up to an unacceptable loss. The requirement that for every \(n\in {\mathbb {N}}\) a strategy is nday \(\phi \)maxmin guarantees that the accumulated losses over any finite period of time remain acceptable.
If we require a strategy to be subgame maxmin, then we do not tolerate any losses. Therefore, the accumulation problem mentioned above will never occur, and it will be sufficient to only require that the strategy is 1day maxmin.\(\Diamond \)
Example 4.9 is such that player 1 has a maxmin strategy in every subgame, but has no subgame maxmin strategy. From Corollary 4.7, it follows that any subgame maxmin strategy needs to be both 1day maxmin and equalizing. The example presents a game where all the strategies are 1day maxmin but none of them is equalizing and therefore subgame maxmin strategies do not exist.
Example 4.9
Consider the following perfect information game. Both players have two actions, left (L) and right (R), so \({\mathcal {A}}={\mathcal {B}}=\left\{ L, R\right\} \). The players take turns playing an action, which generates a play \(p\in \left\{ L, R\right\} ^{\mathbb {N}}\). Let \(r_1(p)\) and \(r_2(p)\) denote the number of times that player 1 and player 2, respectively, play action R during the play p and define \(r(p)=\min \lbrace r_1(p),r_2(p)\rbrace . \) Player 1 obtains a payoff of 0 if both players choose R infinitely often. When at least one of them chooses R only a finite number of times, then player 1 receives a payoff of \( r(p)/(r(p)+1). \) The payoff function u is therefore obtained by defining, for \( p \in \left\{ L, R\right\} ^{\mathbb {N}}, \)At each history \(h\in {\mathcal {H}}\), the value of the game exists and is given by \(v(h)= r_2(h)/(r_2(h)+1)\), where \(r_2(h)\) denotes the number of times player 2 has chosen R in the history h. Indeed, player 1 can guarantee this payoff by choosing the action R \(\max \{r_2(h)r_1(h),0\} \) times after history h. Player 2 can guarantee to lose at most this amount by playing only left after history h. Hence at every history \(h\in {\mathcal {H}}\), player 1 has a maxmin strategy.
$$\begin{aligned} u(p)= \left\{ \begin{array}{ll} \frac{r(p)}{r(p)+1}, &{} \text{ if } r_1(p) \ne \infty \text{ or } r_2(p) \ne \infty ,\\ 0, &{} \text{ if } r_1(p)=r_2(p)=\infty . \end{array} \right. \end{aligned}$$
(4.3)
For every history \(h\in {\mathcal {H}}\) where player 1 takes an action and for every action \(a\in {\mathcal {A}} \), we have that \(r_2(ha)=r_2(h)\) and \(v(ha)=v(h)\). Therefore, all strategies of player 1 are 1day maxmin.
On the other hand, no equalizing strategy exists for player 1. To see this, take any strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) for player 1 and consider the strategy \(\tau \in {{{\mathcal {S}}}}_{2} \) in which player 2 always chooses R. Let p be the play generated by the strategy profile \((\sigma ,\tau )\). Then, we have that \(u(p)<1\) and \(\displaystyle \lim _{t\rightarrow \infty }v(p_{\vert t})=1\). It follows that \( \sigma \) is not equalizing. Using Corollary 4.7, we conclude that player 1 does not have a subgame maxmin strategy. \(\square \)
4.2 The case of an upper semicontinuous payoff function
The remainder of this section is devoted to the case where the payoff function is upper semicontinuous. We argue that in this case, any strategy of player 1 is equalizing. Because all upper semicontinuous functions are Borel measurable and because we assumed finite action sets and a countable state space, the value exists, see Maitra and Sudderth ([13]) and Martin ([14]), and the lower value equals the value.
The function u is upper semicontinuous at a play \(p \in {\mathcal {P}}\) if for every sequence \(\{p_{t}\}_{t \in {\mathbb {N}}}\) of plays converging to p it holds that
$$\begin{aligned}\limsup _{t \rightarrow \infty }u(p_{t}) \le u(p).\end{aligned}$$
Lemma 4.10
Let the payoff function u be upper semicontinuous at the play p. Then, we have that
$$\begin{aligned} u(p)\ge \limsup _{t\rightarrow \infty } {\underline{v}}(p_{\vert t}). \end{aligned}$$
(4.4)
Proof
Fix \(\epsilon >0\). For every \(t \in {\mathbb {N}}, \) define \( h_{t} = p_{\vert t} \) and let \( p_{t} \in {\mathcal {P}}(h_{t})\) be such that \(u(p_t) \ge {\underline{v}}(h_{t})\epsilon \). Such a play \( p_{t} \) exists as player 1 can guarantee a payoff of at least \({\underline{v}}(h_{t})\epsilon \) at history \(h_{t}\). Since the sequence \( \{p_{t}\}_{t \in {\mathbb {N}}}\) converges to p, we haveSince this holds for every \(\epsilon >0,\) the lemma follows. \(\square \)
$$\begin{aligned} u(p)\ge \limsup _{t \rightarrow \infty } u(p_t)\ge \limsup _{t \rightarrow \infty } {\underline{v}}(h_{t})\epsilon . \end{aligned}$$
Corollary 4.11
Let \( \Gamma _{x_{0}}(u) \) be such that u is upper semicontinuous. Then, each strategy of player 1 is equalizing. The strategy \(\sigma \in {\mathcal {S}}_1\) is a subgame \(\phi \)maxmin strategy if and only if for every \( n \in {\mathbb {N}} \) it is an nday \(\phi \)maxmin strategy. The strategy \(\sigma \in {{{\mathcal {S}}}}_{1}\) is a subgame maxmin strategy if and only if it is a 1day maxmin strategy.
Example 4.12
(Staying in the set game) Let some subset \({\mathcal {X}}^{*}\) of \({\mathcal {X}}\) be given. For a play \(p = (x_{0},a_1,b_1,x_1,a_2,b_2\dots )\), we define u(p) to be 1 if \(x_t \in {\mathcal {X}}^{*}\) for every \(t \in {\mathbb {N}}\) and to be 0 otherwise. Maitra and Sudderth ([12]) refer to such a payoff function as “staying in a set” and in the computer science literature it goes under the name of “safety objective,” see Bruyaère ([4]). Since u is upper semicontinuous, any strategy \( \sigma \in {{{\mathcal {S}}}}_{1} \) is equalizing. \(\square \)
Example 4.13
Consider again the centipede game shown in Figure 2, but with one slight modification. If the game continues indefinitely, then player 1 receives a payoff of 2 instead of 0. The payoff function is now upper semicontinuous. As argued in Example 4.4, the strategy \( \sigma \) in which player 1 continues at each history is 1day maxmin. Because of Corollary 4.11, we can conclude that the strategy \(\sigma \) is a subgame maxmin strategy. \(\square \)
Note that the limit average payoff is generally not upper semicontinuous. Hence, in games where the payoff function is given by the limit average payoff, there may exist strategies of the maximizing player which are not equalizing.
5 Existence of subgame \(\phi \)maxmin strategies
The goal of this section is to give sufficient conditions for the existence of a subgame \(\phi \)maxmin strategy if the tolerance function \( \phi \) is positive. We will prove the following theorem.
Theorem
Let a game \( \Gamma _{x_{0}}(u) \) and a tolerance function \(\phi > 0\) be given such that for every \(p\in {\mathcal {P}}\) at least one of the following two conditions holds: Then, there exists a subgame \(\phi \)maxmin strategy in the game \(\Gamma _{x_{0}}(u).\)
1.
(point of upper semicontinuity) The function u is upper semicontinuous at p.
2.
(positive limit inferior) \(\liminf _{t\rightarrow \infty } \phi (p_{\vert t}) > 0\).
Two special cases of the above theorem are worth noting. One is when the payoff function is everywhere upper semicontinuous. In this special case, the existence of a \(\phi \)maxmin strategy (in fact, of a maxmin strategy) follows by a result of Laraki, Maitra, and Sudderth ([10]). We note that the latter result is not entirely subsumed by the theorem above, since the authors allow for a general Borel state space. Another special case of interest is when along each play, the tolerance remains bounded away from zero (i.e., if each play satisfies condition 2 of the theorem). This special case, to the best of our knowledge, has not been explicitly considered in the literature. However, it does encompass the familiar situation of a constant tolerance function: \(\phi (h) \equiv \epsilon > 0\). For such tolerance functions, \(\phi \)maxmin strategies are exactly subgame \(\epsilon \)optimal strategies, and existence follows by Proposition 11 in Mashiah–Yaakovi ([15]). Note that the case when all plays satisfy condition 2 of the theorem is more general than the case of a constant tolerance function, since no uniform lower bound on the tolerance is imposed. A novel feature of the above theorem is the way it combines these various special cases. We note that the theorem remains original even if we restrict our attention to Borel measurable payoff functions.
The main technique used in the proof of this theorem are switching strategies. The idea of switching between progressively “more optimal” strategies is certainly not new. A similar construction is used for example in Rosenberg, Solan, and Vieille ([18]), Solan and Vieille ([20]), and Mashiah–Yaakovi ([15]). However, the method needs to be finetuned, in order for it to accommodate both the plays where the tolerance remains bounded away from zero (condition 2), as well as those where the payoff function is upper semicontinuous (condition 1). It is not straightforward to use this proof technique when we have a general Borel measurable state space as the selection of switching strategies would need to satisfy measurability conditions. It is therefore an interesting open question whether the result can be extended to that case.
The switching strategy \( \sigma ^{\phi } \) is formally defined in Sect. 5.1. To analyze its properties, we define in Sect. 5.2, for every \( k \in {\mathbb {N}}, \) a strategy \( \sigma ^{k} \) that coincides with \( \sigma ^{\phi } \) as long as at most k switches have been made and it stops switching thereafter. We also present three technical lemmas for finite switching strategies in this subsection, the most important one showing that the guaranteed expected payoff of a strategy \( \sigma ^{k} \) is increasing in k. Sect. 5.3 uses the lemmas of Sect. 5.2 to show the desired existence result in Theorem 5.7.
5.1 Definition of the Switching Strategy \(\sigma ^\phi \)
Fix a tolerance function \(\phi >0\). For every history \(h\in {\mathcal {H}}, \) player 1 has a \( (\phi (h)/2)\)maxmin strategy for the subgame at history h, denoted by \(\sigma ^{h}\). The function \( \psi : {{{\mathcal {H}}}} \rightarrow {{{\mathcal {H}}}} \) maps histories into histories and is such that player 1 is going to use strategy \( \sigma ^{\psi (h)} \) at subgame \( h \in {{{\mathcal {H}}}}. \) The function \( \psi \) is used to describe when player 1 switches strategies and is recursively defined by setting \( \psi (x_{0})=x_{0}\) and, for every \(h\in {\mathcal {H}},\) for every \(z\in {\mathcal {Z}},\)The condition \({\underline{u}}(\sigma ^{\psi (h)},hz) \ge {\underline{v}}(hz)\phi (hz)\) verifies whether the strategy to which player 1 switched last, \(\sigma ^{\psi (h)}\), is a \(\phi (hz)\)maxmin strategy for the subgame at history hz. If this is the case, then there is no need to switch and \( \psi (hz) = \psi (h). \) Otherwise, player 1 switches to \( \sigma ^{hz}, \) which is achieved by setting \( \psi (hz) = hz. \) Formally, we define the switching strategy \(\sigma ^\phi :{\mathcal {H}}\rightarrow \Delta ({\mathcal {A}})\) byThe following example illustrates the switching strategy \( \sigma ^{\phi } \) and shows that it may not be subgame \(\phi \)maxmin.
$$\begin{aligned} \psi (hz)= \left\{ \begin{array}{ll} \psi (h), &{} \text{ if } {\underline{u}}(\sigma ^{\psi (h)},hz) \ge {\underline{v}}(hz)\phi (hz),\\ hz, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$
$$\begin{aligned} \sigma ^\phi (h)=\sigma ^{\psi (h)}(h), \quad h \in {{{\mathcal {H}}}}. \end{aligned}$$
(5.1)
Example 5.1
Consider again the centipede game depicted in Fig. 2. We recall that, for every \( t \in {\mathbb {N}}, \) \( {\underline{v}}(c^{2t}) = {\underline{v}}(c^{2t+1}) = (t+1)/(t+2). \) Take a tolerance function \( \phi \) with the property that, for every \( t \in {\mathbb {N}}, \) \(\phi (c^{2t})< 1/((t+1)(t+2))\).
Let \( h \in {{{\mathcal {H}}}} \) be the active history and let \( k \in {\mathbb {N}} \) be such that \( h = c^{2k} \) or \( h = c^{2k+1}. \) The strategy \( \sigma ^{h} \) in which player 1 chooses continue at periods \(0,2,\ldots ,2k\) and chooses quit at every later period, i.e.,is a maxmin strategy at subgame h.
$$\begin{aligned} \sigma ^{h}(c^{2t})= \left\{ \begin{array}{ll} c, &{} \text{ if } 2t\le 2k, \\ q, &{} \text{ otherwise }, \end{array} \right. \end{aligned}$$
We now consider the switching strategy \( \sigma ^{\phi }. \) We show by induction that for every \( h \in H, \) for every \( z \in Z, \) it holds that \( \psi (hz) = hz \) if hz is an active history of player 1 and \( \psi (hz) = h \) if hz is an active history of player 2. The statement trivially holds for the initial history. Let h be an active history of player 2 and let \( t \in {\mathbb {N}} \) be such that \( h = c^{2t+1}. \) Since \( \sigma ^{h} = \sigma ^{c^{2t}} \) is a maxmin strategy at subgame h, it holds that \( \psi (c^{2t+1}) = c^{2t}. \) Let h be an active history of player 1 and let \( t \in {\mathbb {N}} \setminus \{0\} \) be such that \( h = c^{2t}. \) We have thatso \( \psi (c^{2t}) = c^{2t}. \) Since the tolerance function \(\phi \) is so stringent, it forces player 1 to switch at each of his active histories. For every \( t \in {\mathbb {N}}, \) it holds that \(\sigma ^{\phi }(c^{2t})=\sigma ^{c^{2t}}(c^{2t})=c, \) so under \( \sigma ^{\phi } \) player 1 chooses c at each of his active histories. The strategy \( \sigma ^{\phi } \) is not a subgame \(\phi \)maxmin strategy as it fails to be \(\phi \)equalizing, see Example 4.4. \(\square \)
$$\begin{aligned} {\underline{u}}(\sigma ^{\psi (c^{2t1})},c^{2t}) = {\underline{u}}(\sigma ^{c^{2t2}},c^{2t}) = \tfrac{t}{t+1} = {\underline{v}}(c^{2t})  \tfrac{1}{(t+1)(t+2)} < {\underline{v}}(c^{2t})  \phi (c^{2t}), \end{aligned}$$
5.2 Definition and Analysis of Finite Switching Strategies
Given the switching strategy \( \sigma ^{\phi }, \) for every \(k\in {\mathbb {N}} \), we define the strategy \(\sigma ^k:{\mathcal {H}}\rightarrow \Delta ({\mathcal {A}})\) such that it coincides with \(\sigma ^{\phi }\) as long as at most k switches have been made and it stops switching thereafter. Formally, we recursively define the function \(\kappa : {{{\mathcal {H}}}} \rightarrow {\mathbb {N}} \) which counts the number of switches along a history h by setting \(\kappa (x_{0}) = 0\) and, for all histories \(h,hz\in {\mathcal {H}}\),For every \(k\in {\mathbb {N}}, \) we define the stopping time \(T_k:{\mathcal {P}}\rightarrow {\mathbb {N}}\cup \lbrace \infty \rbrace \) byThe stopping time \( T_{k} \) indicates the time at which switch k occurred. Since, for \(t<\infty ,\) the expression \(T_k(p)\le t\) only depends on the history up to period t, it holds that the set \(\lbrace p\in {\mathcal {P}}\vert T_k(p)\le t \rbrace \) is \({\mathcal {F}}^{t}\)measurable and \(T_k\) is a stopping time indeed.
$$\begin{aligned} \kappa (hz)= \left\{ \begin{array}{ll} \kappa (h), &{} \text{ if } {\underline{u}}(\sigma ^{\psi (h)},hz) \ge {\underline{v}}(hz)\phi (hz),\\ \kappa (h)+1, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$
$$\begin{aligned} T_k(p) = \inf \lbrace t \in {\mathbb {N}}\vert \kappa (p_{\vert t})=k \rbrace , \quad p \in {{{\mathcal {P}}}}. \end{aligned}$$
(5.2)
We now formally define \(\sigma ^k:{\mathcal {H}}\rightarrow \Delta ({\mathcal {A}}). \) Take any \(p\in {\mathcal {P}}(h)\) and letIf \( \kappa (h) > k, \) then the time at which switch k has occurred is the same for every \(p\in {\mathcal {P}}(h), \) so it holds that \( \sigma ^{k} \) is welldefined.
$$\begin{aligned} \sigma ^k(h)=\left\{ \begin{array}{ll} \sigma ^{\phi }(h), &{} \text{ if } \kappa (h)\le k, \\ \sigma ^{h_{\vert T_k(p)}}(h), &{}\text{ otherwise. } \end{array} \right. \end{aligned}$$
(5.3)
For every \(k\in {\mathbb {N}}, \) let \({\mathcal {R}}_k\subseteq {\mathcal {P}}\) be the set of plays along which at least k switches occur,so \({\mathcal {R}}_1\supseteq {\mathcal {R}}_2 \supseteq {\mathcal {R}}_3 \supseteq \cdots \). Furthermore, let \({\mathcal {R}}_{\infty } \subseteq {\mathcal {P}} \) denote the set of plays along which infinitely many switches occur,The remainder of the section is devoted to the proof of three technical lemmas concerning the finite switching strategies. These lemmas are needed in the analysis of \( \sigma ^{\phi }. \)
$$\begin{aligned} {\mathcal {R}}_k=\lbrace p\in {\mathcal {P}}\vert T_k(p)<\infty \rbrace , \end{aligned}$$
(5.4)
$$\begin{aligned} {\mathcal {R}}_\infty = \displaystyle \cap _{k=1}^{\infty } {\mathcal {R}}_k. \end{aligned}$$
(5.5)
Consider the strategies \(\sigma ^{k}, \sigma ^{k+1},\dots \). All these strategies agree with \(\sigma ^{\phi }\) for as long as \(\sigma ^{\phi }\) does not require more than k switches. Consequently, the measures that these strategies induce on \({\mathcal {P}}\) assign the same probability to any event that is “determined” before switch \( k+1 \) occurs, i.e., to any event in the sigmaalgebra \({\mathcal {F}}^{T_{k+1}}\).
Lemma 5.2
Let a strategy \(\tau \in {{{\mathcal {S}}}}_{2}, \) a history \(h \in {\mathcal {H}}, \) and some \(k \in {\mathbb {N}}\) be given. For \(\sigma = \sigma ^{k}, \sigma ^{k+1},\dots ,\sigma ^{\phi }, \) the probability measures \({\mathbb {P}}_{h,\sigma ,\tau }\) all coincide on the sigmaalgebra \({\mathcal {F}}^{T_{k+1}}\). Furthermore, these probability measures all agree on each universally measurable subset of \({\mathcal {P}} \setminus {\mathcal {R}}_{k + 1}\).
Proof
A set A of the universally measurable sigmaalgebra \( {\mathcal {F}}\) is called agreeable if for \(\sigma = \sigma ^{k}, \sigma ^{k+1},\dots ,\sigma ^{\phi }\) the measures \({\mathbb {P}}_{h,\sigma ,\tau }\) all assign the same probability to A.
We argue first that each cylinder set in \({\mathcal {F}}^{T_{k+1}}\) is agreeable. A cylinder set \({\mathcal {P}}(h)\) is a member of \({\mathcal {F}}^{T_{k+1}}\) if and only if \(\kappa (h) \le k+1\). Let a cylinder set \({\mathcal {P}}(h')\) in \({\mathcal {F}}^{T_{k+1}}\) be given. Since \(\kappa (h') \le k + 1\), we know that \(\kappa (h'') \le k\) for each history \(h''\) preceding \(h'\). It follows that \(\sigma ^{k}(h'') = \sigma ^{k + 1}(h'') = \cdots = \sigma ^{\phi }(h'')\). Since this applies to each history \(h''\) that precedes \(h^{\prime }\), the set \({\mathcal {P}}(h')\) is agreeable.
Now take any \(E \in {\mathcal {F}}^{T_{k+1}}\). For \(t \in {\mathbb {N}},\) let \(E_{t} = E \cap \{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = t\}\) and let \( E_{\infty } = E \cap \{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = \infty \}\). To show that E is agreeable, it suffices to show that the sets \(E_{t}\) and \(E_{\infty }\) are.
Let some \( t \in {\mathbb {N}} \) be given. We know that \(E_{t}\) is a member of \({\mathcal {F}}^{t}\). Consequently, \(E_{t}\) can be written as a disjoint union of cylinder sets in \({\mathcal {F}}^{t}, \) say \(E_{t} = \cup \{C_{n} \mid n \in {\mathbb {N}}\}\), with each \(C_{n}\) a member of \({\mathcal {F}}^{t}\). Since each \(C_{n}\) is a subset of the set \(\{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p)=t\}\), it is a member of \({\mathcal {F}}^{T_{k+1}}, \) so is agreeable by the result of the second paragraph in the proof. It now follows that \(E_{t}\) is agreeable.
To show that \(E_{\infty }\) is agreeable, we make use of the fact that \( E_{\infty } \) is a Borel set and of the regularity of \( \sigma \) on the Borel sigmaalgebra. Let O be any open subset of \({\mathcal {P}}\) containing \(E_{\infty }\). The set O can be written as a disjoint union of cylinder sets, say \(O = \cup \{{\mathcal {P}}(h_{n}) \mid n \in {\mathbb {N}}\}\). Without loss of generality it holds that for every \(n \in {\mathbb {N}}\) the set \({\mathcal {P}}(h_{n})\) has a point in common with \(E_{\infty }\). Thus in particular, there is \(p \in {\mathcal {P}}(h_{n})\) with \(T_{k+1}(p) = \infty \). This implies that \(\kappa (h_{n}) \le k\) and hence that \({\mathcal {P}}(h_{n})\) is a member of \({\mathcal {F}}^{T_{k+1}}\). We conclude that each \({\mathcal {P}}(h_{n})\) is agreeable by the result of the second paragraph in the proof. It follows that O is an agreeable set.
To prove the second claim, we notice that all Borel subsets of \({\mathcal {P}} \setminus {\mathcal {R}}_{k + 1} = \{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = \infty \}\) are members of \({\mathcal {F}}^{T_{k+1}}\), so are agreeable. The result for universally measurable subsets of \(\{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = \infty \}\) follows since each such set differs from a Borel set by a negligible set. \(\square \)
Let T be a stopping time. We define the stochastic variable \({\underline{U}}_{\sigma }^{T}\) by letting it agree with \({\underline{U}}_{\sigma }^{t}\) on the set \(\{p \in {\mathcal {P}}:T(p) = t\}\) for each \(t \in {\mathbb {N}}\) and by letting it agree with \({\underline{U}}_{\sigma }^{\infty }\) on the set \(\{p \in {\mathcal {P}}:T(p) = \infty \}, \) where \( {\underline{U}}_{\sigma }^{\infty } \) is some suitably chosen \({\mathcal {F}}^{\infty }\)measurable stochastic variable. The next lemma relates the guaranteed expected payoffs of strategies \( \sigma ^{k} \) and \( \sigma ^{k+1} \) at the moment of switch \(k+1\). We write \(\Phi ^t\) to denote the \({\mathcal {F}}^{t}\)measurable stochastic variable given by \(\Phi ^t(p)=\phi (p_{\vert t})\).
Lemma 5.3
Let \(k \in {\mathbb {N}}\) and \({\mathcal {F}}^{\infty }\)measurable stochastic variables \({\underline{U}}_{\sigma ^{k}}^{\infty }, {\underline{U}}_{\sigma ^{k+1}}^{\infty }\) such that \({\underline{U}}_{\sigma ^{k}}^{\infty } = {\underline{U}}_{\sigma ^{k+1}}^{\infty }\) be given. Then, it holds that
$$\begin{aligned} {\underline{U}}^{T_{k+1}}_{\sigma ^{k}} \le {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}}  \tfrac{1}{2} \Phi ^{T_{k+1}} \cdot I(T_{k+1} < \infty ). \end{aligned}$$
(5.6)
Proof
Let some \(p\in {\mathcal {P}}\) be given. We distinguish the following two cases.
Case 1: \(T_{k+1}(p)<\infty \). In this case at least \(k+1\) switches occur along the play p. For \(h = p_{\vert T_{k+1}(p)}\), we have the following inequalitieswhere the first inequality holds since \(\sigma ^k\) is not a \(\phi (h)\)maxmin strategy for the subgame at history h and the second inequality holds because the strategy \(\sigma ^{k+1}\) is a \( (\phi (h)/2)\)maxmin strategy for the subgame at history h. Since \(I(T_{k+1}(p) < \infty ) = 1\), (5.6) holds.
$$\begin{aligned}{\underline{u}}(\sigma ^k, h) < {\underline{v}}(h)  \phi (h) = {\underline{v}}(h)  \tfrac{1}{2}\phi (h)  \tfrac{1}{2}\phi (h) \le {\underline{u}}(\sigma ^{k+1}, h)  \tfrac{1}{2}\phi (h),\end{aligned}$$
Case 2: \(T_{k+1}(p)=\infty \). In this case, we haveThus, (5.6) holds as an equality. \(\square \)
$$\begin{aligned}{\underline{U}}^{T_{k+1}}_{\sigma ^{k}}(p) = {\underline{U}}_{\sigma ^{k}}^{\infty }(p) = {\underline{U}}_{\sigma ^{k+1}}^{\infty }(p) = {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}}(p).\end{aligned}$$
The following lemma states the intuitive property that for histories with less than \( k+1 \) switches or histories at which switch \( k+1 \) occurs, strategy \( \sigma ^{k+1} \) guarantees at least the same payoff to player 1 than strategy \( \sigma ^{k}. \)
Lemma 5.4
Let \( t \in {\mathbb {N}} \) and a history \( h \in {{{\mathcal {H}}}}^{t} \) of length t be given. Let \( k \in {\mathbb {N}} \) be such that \( T_{k+1}(p) \ge t \) for every \( p \in {{{\mathcal {P}}}}(h). \) Then, it holds that \({\underline{u}}(\sigma ^{k},h) \le {\underline{u}}(\sigma ^{k + 1},h)\).
Proof
Fix strategy \( \tau \in {\mathcal {S}}_2 \) of player 2.
We first define \({\underline{U}}_{\sigma ^{k}}^{\infty }\). Consider the probability measure \({\mathbb {Q}}\) on the measurable space \(({\mathcal {P}},{\mathcal {F}})\) given by \({\mathbb {Q}}(A) = \sum _{k \in {\mathbb {N}}} 2^{k1} {\mathbb {P}}_{h,\sigma ^{k},\tau }(A)\) for each \(A \in {\mathcal {F}}\). Let \({\bar{u}}\) be an \({\mathcal {F}}^{\infty }\)measurable stochastic variable with the property that \({\bar{u}} = u\), \({\mathbb {Q}}\)almost surely. Since \({\mathbb {P}}_{h,\sigma ^{k},\tau }\) is absolutely continuous with respect to \({\mathbb {Q}}\), it holds that \( {\bar{u}} = u\), \({\mathbb {P}}_{h,\sigma ^{k},\tau }\)almost surely, for every \(k \in {\mathbb {N}}\). We define \( {\underline{U}}_{\sigma ^{k}}^{\infty }\) to be equal to \({\bar{u}}\), for every \(k \in {\mathbb {N}}\).
We now obtain the following inequalities. First, by Lemma 3.3 and the optional sampling theorem with unbounded stopping times as presented in Theorem 8.16 of Yeh ([22], p. 139), we have^{2}Secondly, from the fact that \({\underline{U}}^{T_{k+1}}_{\sigma ^k}\) is an \({\mathcal {F}}^{T_{k+1}}\)measurable stochastic variable, we obtain by Lemma 5.2 thatThirdly, we know thatwhere the first of these inequalities follows from Lemma 5.3 and the second one follows by Lemma 3.3 and the optional sampling theorem with unbounded stopping times as presented in Theorem 8.16 of Yeh ([22], p. 139). Taking the expectation of the last array of inequalities with respect to \({\mathbb {P}}_{h,\sigma ^{k+1},\tau }\) and making use of the law of iterated expectation yieldsCombining these facts yields \({\underline{u}}(\sigma ^{k},h) \le {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[u]\). Taking the infimum over all strategies \(\tau \) of player 2 completes the proof. \(\square \)
$$\begin{aligned}{\underline{u}}(\sigma ^{k},h) \le {\mathbb {E}}_{h,\sigma ^k,\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}].\end{aligned}$$
$$\begin{aligned}{\mathbb {E}}_{h,\sigma ^k,\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}] = {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}].\end{aligned}$$
$$\begin{aligned} {\underline{U}}^{T_{k+1}}_{\sigma ^k} \le {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}} \le {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[u \vert {\mathcal {F}}^{T_{k+1}}], \quad {\mathbb {P}}_{h,\sigma ^{k+1},\tau } \text{almost } \text{ surely }, \end{aligned}$$
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}] \le {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[u]. \end{aligned}$$
5.3 Properties of the Switching Strategy \(\sigma ^\phi \)
We show first that the switching strategy \( \sigma ^{\phi } \) is always nday \(\phi \)maxmin for every \(n \in {\mathbb {N}}\).
Lemma 5.5
Let a game \( \Gamma _{x_{0}}(u) \) and a tolerance function \( \phi > 0 \) be given. For every \( n \in {\mathbb {N}}, \) the switching strategy \(\sigma ^\phi \) is nday \(\phi \)maxmin.
Proof
Let a history \( h \in {{{\mathcal {H}}}} \) with length t be given. Let \( k = \kappa (h)\) denote the number of switches that have occurred along the history h. For every \( \tau \in {{{\mathcal {S}}}}_{2}, \) we obtain the chain of inequalitieswhere the first inequality holds since \(\sigma ^{k}\) is a \(\phi (h)\)maxmin strategy for the subgame at history h, the second inequality holds by Lemma 5.4 since \( k \le t \le t+n\), the third inequality holds by Lemma 3.3, and the fourth one follows since \({\underline{U}}_{\sigma ^{t+n}}^{t+n} \le {\underline{V}}^{t+n}. \) The final equality follows from the fact that the stochastic variable \({\underline{V}}^{t+n}\) is \({\mathcal {F}}^{t+n}\)measurable and the fact that the strategy \(\sigma ^{t+n}\) coincides with \(\sigma ^{\phi }\) at least until time \(t+n\). \(\square \)
$$\begin{aligned} {\underline{v}}(h)  \phi (h) \le {\underline{u}}(\sigma ^{k},h) \le {\underline{u}}(\sigma ^{t+n},h) \le {\mathbb {E}}_{h,\sigma ^{t+n},\tau }[{\underline{U}}_{\sigma ^{t+n}}^{t+n}] \le&{\mathbb {E}}_{h,\sigma ^{t+n},\tau }[{\underline{V}}^{t+n}]\\ =&{\mathbb {E}}_{h,\sigma ^{\phi },\tau }[{\underline{V}}^{t+n}], \end{aligned}$$
The next lemma says that along almost any play \(p \in {\mathcal {P}}\) only finitely many switches occur or the tolerance level goes to zero. This lemma is crucial to show the equalizing property of switching strategies.
Lemma 5.6
For every history \(h\in {\mathcal {H}},\) for every strategy \( \tau \in {{{\mathcal {S}}}}_{2} \) of player 2, it holds that
$$\begin{aligned} \lim _{k \rightarrow \infty } \Phi ^{T_k} \cdot I(T_k<\infty ) = 0, \quad {\mathbb {P}}_{h,\sigma ^{\phi },\tau }\text {almost surely.} \end{aligned}$$
Proof
Although it is possible that player 1 switches infinitely many times along a play p, and therefore incurs infinitely many increases in his guarantee level along this play, we show first that the total overall increase in his guarantee level is bounded, since the payoff function itself is a bounded function. More formally, let \( t \in {\mathbb {N}}, \) a history \( h \in {\mathcal {H}}^{t} \) of length t, and a strategy \( \tau \in {\mathcal {S}}_2 \) of player 2 be given. Let \( \kappa (h)\) denote the number of switches that have occurred along the history h. We show thatwhere we recall that \( M = \sup _{p\in {\mathcal {P}}}\vert u(p)\vert . \)
$$\begin{aligned} \sum _{k=\kappa (h)+1}^{\infty } {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ \tfrac{1}{2}\Phi ^{T_k} \cdot I\left( T_k<\infty \right) \right] \le 2M, \end{aligned}$$
(5.7)
As in the proof of Lemma 5.4, let \({\bar{u}}\) be any \({\mathcal {F}}^{\infty }\)measurable stochastic variable with the property that, for every \( k \in {\mathbb {N}}, \) \({\bar{u}} = u,\) \( {\mathbb {P}}_{h,\sigma ^{k},\tau }\)almost surely. For every \(k \in {\mathbb {N}}\), we define \({\underline{U}}_{\sigma ^{k}}^{\infty } = {\bar{u}}\).
Let some \(k > \kappa (h) \) be given. For every \( p \in {{{\mathcal {P}}}}(h), \) it holds that \( t \le T_{k}(p) \le T_{k+1}(p), \) so Lemma 3.3 and the optional sampling theorem with unbounded stopping times as presented in Theorem 8.16 of Yeh ([22], p. 139) impliesWe now havewhere the two equalities follow from Lemma 5.2 and the fact that both \({\underline{U}}^{T_k}_{\sigma ^k}\) and \({\underline{U}}^{T_{k+1}}_{\sigma ^k}\) are \({\mathcal {F}}^{T_{k+1}}\)measurable stochastic variables, and the inequality follows by taking the expectation on both sides of inequality (5.8) and the law of iterated expectation.
$$\begin{aligned} {\underline{U}}^{T_k}_{\sigma ^k} \le {\mathbb {E}}_{h,\sigma ^k,\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k} \vert {\mathcal {F}}^{T_k}], \quad {\mathbb {P}}_{h,\sigma ^k,\tau }\text {almost surely}. \end{aligned}$$
(5.8)
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{\phi },\tau }[{\underline{U}}^{T_k}_{\sigma ^k}] = {\mathbb {E}}_{h,\sigma ^{k},\tau }[{\underline{U}}^{T_k}_{\sigma ^k}] \le {\mathbb {E}}_{h,\sigma ^{k},\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}] = {\mathbb {E}}_{h,\sigma ^{\phi },\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}], \end{aligned}$$
Using Lemma 5.3, we can conclude thatSumming the preceding inequality over \( k = \kappa (h)+1, \dots , K, \) we obtainThe result follows by taking the limit as \(K \rightarrow \infty \).
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ \tfrac{1}{2}\Phi ^{T_{k+1}} \cdot I(T_{k+1}<\infty ) \right]&\le {\mathbb {E}}_{h,\sigma ^\phi ,\tau } \left[ {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}} \right]  {\mathbb {E}}_{h,\sigma ^\phi ,\tau } \left[ {\underline{U}}^{T_{k+1}}_{\sigma ^{k}} \right] \\&\le {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}}\right]  {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_k}_{\sigma ^k}\right] . \end{aligned}$$
$$\begin{aligned} \sum _{k=\kappa (h)+1}^{K} {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ \tfrac{1}{2}\Phi ^{T_k} \cdot I(T_{k}<\infty ) \right] \le {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_{K}}_{\sigma ^{K}} \right] {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_{\kappa (h)+1}}_{\sigma ^{\kappa (h)+1}} \right] \le 2M. \end{aligned}$$
Let us write \(X_{k} = \Phi ^{T_k} \cdot I(T_k<\infty )\) and \( k^{\prime } = \kappa (h) + 1\). Since \(X_{k} \ge 0\), the monotone convergence theorem implies that \({\mathbb {E}}_{h,\sigma ^{\phi },\tau } [\sum _{k = k^{\prime }}^{\infty }X_{k}] = \sum _{k = k^{\prime }}^{\infty }{\mathbb {E}}_{h,\sigma ^{\phi },\tau } [X_{k}]\). The latter expression is finite by (5.7). Thus \(\sum _{k = k^{\prime }}^{\infty }X_{k}\) has a finite mean with respect to the probability measure \({\mathbb {P}}_{h,\sigma ^{\phi },\tau }\). Hence \(\sum _{k = k^{\prime }}^{\infty }X_{k} < \infty \) holds \({\mathbb {P}}_{h,\sigma ^{\phi },\tau }\)almost surely. This implies that \(X_{k} \rightarrow 0\) holds \({\mathbb {P}}_{h,\sigma ^{\phi },\tau }\)almost surely. \(\square \)
We are now in a position to state the main result of this section, which gives sufficient conditions for the existence of subgame \( \phi \)maxmin strategies.
Theorem 5.7
Let a game \( \Gamma _{x_{0}}(u) \) and a tolerance function \(\phi > 0\) be given such that for every \(p\in {\mathcal {P}}\) at least one of the following two conditions holds: Then, there exists a subgame \(\phi \)maxmin strategy in the game \(\Gamma _{x_{0}}(u).\)
1.
(point of upper semicontinuity) The function u is upper semicontinuous at p.
2.
(positive limit inferior) \(\liminf _{t\rightarrow \infty } \phi (p_{\vert t}) > 0\).
Proof
We define the tolerance function \(\phi ^{\prime }\) byThus, \(\phi ^{\prime }\) is a nonincreasing tolerance function with \(\phi ^{\prime } \le \tfrac{1}{2} \phi \).
$$\begin{aligned} \phi ^{\prime }(h) = \tfrac{1}{2}\min \lbrace \phi (h')\vert h'\preceq h\rbrace , \quad h \in {\mathcal {H}}. \end{aligned}$$
We show that \(\sigma ^{\phi ^{\prime }} \) is \(\phi ^{\prime }\)equalizing. Since \(2\phi ^{\prime } \le \phi \) and \(\sigma ^{\phi ^{\prime }}\) is an nday \(\phi ^{\prime }\)maxmin strategy for every \(n \in {\mathbb {N}}\) by Lemma 5.5, it then follows from Theorem 4.5 that \(\sigma ^{\phi ^{\prime }}\) is a subgame \(\phi \)maxmin strategy.
Let \({\mathcal {U}}\) denote the set of plays \(p \in {\mathcal {P}}\) at which u is upper semicontinuous and let \({\mathcal {I}}\) denote the set of plays \(p \in {\mathcal {P}}\) such that \(\liminf _{t\rightarrow \infty } \phi (p_{\vert t}) > 0\). By the assumption of the theorem it holds that \({\mathcal {P}} = {\mathcal {U}} \cup {\mathcal {I}}\). By the definition of \(\phi ^{\prime }\), we have \(\liminf _{t\rightarrow \infty } \phi ^{\prime }(p_{\vert t}) > 0\) for each \(p \in {\mathcal {I}}\).
Let some \(t \in {\mathbb {N}}\), a history \(h \in {\mathcal {H}}^{t}\), and a strategy \(\tau \in {\mathcal {S}}_{2}\) be given.
Step 1 For every \( p \in {{{\mathcal {U}}}}, \) \( u(p) \ge \limsup _{n\rightarrow \infty } {\underline{V}}^n(p)  \phi ^{\prime }. \)
This follows directly from Lemma 4.10.
For every \( k \in {\mathbb {N}}, \) we define \( {\mathcal {J}}_{k} = \{p \in {\mathcal {I}} \cap ({\mathcal {R}}_{k} \setminus {\mathcal {R}}_{k + 1}) \mid \lim _{n \rightarrow \infty } U_{\sigma ^{k},\tau }^{n}(p) = u(p)\} \) and \( {\mathcal {J}} = \bigcup _{k \in {\mathbb {N}}}{\mathcal {J}}_{k}, \) where we remind the reader that \( {{{\mathcal {R}}}}_{k} \) contains the plays along which at least k switches occur.
Step 2 For every \( p \in {{{\mathcal {J}}}}, \) \( u(p) \ge \limsup _{n\rightarrow \infty } {\underline{V}}^n(p)  \phi ^{\prime }. \)
Let \(k \in {\mathbb {N}}\) and \(p \in {\mathcal {J}}_{k}\) be given. Exactly k switches occur along the play p and the last switch occurs at time \(T_{k}(p)\). By our construction of \(\sigma ^{\phi ^{\prime }}\), this means that for each time \(n > T_{k}(p)\), the strategy \(\sigma ^k\) is a \(\phi ^{\prime }(p_{n})\)maxmin strategy for the subgame at history \(p_{n}\), so \({\underline{u}}(\sigma ^{k},p_{n}) \ge {\underline{v}}(p_{n})\phi ^{\prime }(p_{n})\). Since \(U_{\sigma ^{k},\tau }^{n}(p) \ge {\underline{u}}(\sigma ^{k},p_{n})\) and since for \(n \ge t\) we have \(\phi ^{\prime }(p_{n}) \le \phi ^{\prime }(h)\), we conclude that for \(n \ge t\)Taking the limit as n goes to infinity, and making use of the fact that \(p \in {\mathcal {J}}_{k}\), we obtainStep 3: \({\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}} \cap {\mathcal {R}}_\infty ) = 0\).
$$\begin{aligned} U_{\sigma ^{k},\tau }^{n}(p)\ge {\underline{V}}^{n}(p) \phi ^{\prime }(h). \end{aligned}$$
$$\begin{aligned} u(p) = \lim _{n\rightarrow \infty } U_{\sigma ^{k},\tau }^n(p) \ge \limsup _{n\rightarrow \infty } {\underline{V}}^{n}(p)  \phi ^{\prime }(h). \end{aligned}$$
Recall that by Lemma 5.6, \(\Phi ^{T_k} \cdot I(T_k<\infty )\) converges to 0 as k goes to infinity, \({\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }\)almost surely. Also recall that \({\mathcal {R}}_{\infty }\) is the set of plays where infinitely many switches occur. Thus \(I(T_k < \infty )\) is identically equal to 1 on \({\mathcal {R}}_{\infty }\). Furthermore, \(\liminf _{k \rightarrow \infty }\Phi ^{T_{k}} > 0\) everywhere on \({\mathcal {I}}\). We conclude that \(\Phi ^{T_{k}} \cdot I(T_k < \infty )\) does not converge to zero on \({\mathcal {I}} \cap {\mathcal {R}}_\infty \) and the result follows.
Step 4: \({\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \cup {\mathcal {J}}) = 1\).
For every \( k \in {\mathbb {N}}, \) it holds by Levy’s zeroone law (Lemma A.1 in Appendix A) thatUsing Lemma 5.2 twice yieldsWe now havewhere the last equality follows from Step 3. Finally, we obtainwhere the last equality follows from the fact that the sets \({\mathcal {U}}\) and \({\mathcal {I}}\) cover \({\mathcal {P}}(h)\). \(\square \)
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {J}}_{k}) = {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {I}} \cap ({{{\mathcal {R}}}}_{k} \setminus {\mathcal {R}}_{k + 1})). \end{aligned}$$
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {J}}_{k}) = {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {J}}_{k}) = {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {I}} \cap ({{{\mathcal {R}}}}_{k} \setminus {\mathcal {R}}_{k + 1})) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}} \cap (\mathcal{R}_{k} \setminus {\mathcal {R}}_{k + 1})). \end{aligned}$$
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {J}}) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}} \setminus {\mathcal {R}}_{\infty }) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}}), \end{aligned}$$
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \cup {\mathcal {J}}) \ge {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \setminus {\mathcal {I}}) + {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {J}}) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \setminus {\mathcal {I}}) + {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}}) = 1, \end{aligned}$$
Theorem 5.7 generalizes Proposition 11 in Mashiah–Yaakovi ([15]), where the existence of a subgame \(\epsilon \)optimal strategy in a twoplayer zerosum stochastic game with Borel measurable payoff functions is shown. The tolerance function \( \phi \) there is given by \(\phi (h)=\epsilon \) for every \(h\in {\mathcal {H}}, \) so for every play the tolerance function has a positive limit inferior. Theorem 5.7 yields the existence of a subgame \(\epsilon \)maxmin strategy. Because the Borel measurability of the payoff function guarantees the existence of the value (Maitra and Sudderth, [13], and Martin, [14]), this is equivalent to proving the existence of a subgame \(\epsilon \)optimal strategy.
6 Subgame maxmin strategies
The goal of this section is to explore the relationship between the concept of a subgame maxmin strategy and that of a subgame \(\phi \)maxmin strategy for \( \phi > 0. \) The main result of this section is the following theorem.
Theorem 6.1
For every game \( \Gamma _{x_{0}}(u) \), there exists a tolerance function \(\phi ^* > 0 \) such that the following statements are equivalent:
1.
The game \( \Gamma _{x_{0}}(u) \) has a subgame maxmin strategy.
2.
The game \( \Gamma _{x_{0}}(u) \) has a subgame \(\phi ^*\)maxmin strategy.
Because every subgame maxmin strategy is a subgame \(\phi \)maxmin strategy, statement 1 clearly implies statement 2. To prove the converse, we define a tolerance function \(\phi ^*>0\); the precise definition is given in Subsection 6.3. Intuitively, we fix a positive sequence \((\delta _t)_{t\in {\mathbb {N}}}\) with the property that \(\sum _{t=0}^\infty \delta _t<\infty \). Then, we choose \(\phi ^*\) such that (1) \(\phi ^*(p_{\vert t})<\delta _t\) for very play \(p\in {\mathcal {P}}\) and every \(t\in {\mathbb {N}}\) and (2) every oneday \(\phi ^*\)maxmin action is \(\delta _t\)close to the region of optimal mixed actions in the oneday game. The first condition implies that tolerance function \(\phi ^*\) decreases rapidly to zero, i.e., \(\sum _{t=0}^\infty \phi ^*(p_{\vert t})<\infty \). By assumption, player 1 has a subgame \(\phi ^*\)maxmin strategy. We obtain a subgame maxmin strategy as follows: at every history \(h_t\) we pick a mixed action which guarantees the lower value \({\underline{v}}(h_t)\) in the oneday game and is \(\delta _t\)close to the mixed action chosen by the subgame \(\phi ^{*}\)maxmin strategy. This method creates a strategy which is clearly oneday maxmin. The condition that the tolerance function \(\phi ^{*}\) fastly decreases to 0 is then used to show that expected payoffs from a history \(h_t\) with length t obtained by the subgame \(\phi ^{*}\)maxmin strategy and the constructed strategy are within a distance of \(M\cdot \sum _{n=t}^\infty \delta _n\), where \(M=\sup _{p\in {{{\mathcal {P}}}}} u(p)\). From this, the equalizing property of the constructed strategy follows. Then, Corollary 4.7 from Sect. 4 lets us conclude that the constructed strategy is indeed a subgame maxmin strategy.
Theorem 6.1 has the following corollary.
Corollary 6.2
The following statements are equivalent:
1.
The game \( \Gamma _{x_{0}}(u) \) has a subgame maxmin strategy.
2.
For every \( \phi > 0, \) the game \( \Gamma _{x_{0}}(u) \) has a subgame \(\phi \)maxmin strategy.
We would like to make two remarks. First, as the payoff function u need not be continuous, one cannot simply use a continuity argument to prove that statement 2 of Corollary 6.2 implies statement 1. Second, note that the existence of a subgame \(\epsilon \)maxmin strategy for every \(\epsilon >0\) is a weaker requirement than statements 1 and 2 of Corollary 6.2. Indeed, as Example 3.2 already illustrated, there exist games which admit a subgame \(\epsilon \)maxmin strategy for every \(\epsilon >0\) but do not admit a subgame maxmin strategy.
The special case where the payofffunction is upper semicontinuous deserves some additional attention. From Maitra and Sudderth ([13]) and Martin ([14]), it follows that every twoplayer zerosum stochastic game with a countable state space, finite action sets, and a Borel measurable payoff function admits a value. Because every upper semicontinuous payoff function is Borel measurable, the existence of the value is guaranteed. From Theorem 5.7, we obtain the existence of a subgame \(\phi \)optimal strategy for every \(\phi >0\). Combining this with Corollary 6.2 shows that player 1 has a subgame optimal strategy. Hereby, we obtain a special case of the result by Laraki, Maitra, and Sudderth ([10]), where the authors allow the state space to be a Borel subset of a Polish space and transition probabilities to be determined by a Borel transition function.
When we are interested in pure strategies that guarantee the maxmin levels, we can strengthen the result of Theorem 6.1. In the context of simultaneous move games, pure strategies are of course rather restrictive. Still, there are important classes of games, such as perfect information games, in which they are natural and play a prominent role.
Theorem 6.3
For every game \(\Gamma _{x_{0}}(u)\), there exists a tolerance function \( \phi ^* > 0 \) such that the following statements are equivalent:
1.
The pure strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame maxmin strategy.
2.
The pure strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi ^*\)maxmin strategy.
The tolerance function \( \phi ^*\) in Theorem 6.3 will be defined based on the data of the game and the lower values in the subgames.
This section is structured as follows. We start by proving Theorem 6.3, which is easier than Theorem 6.1 and helps us explain the main ideas. Then, we turn to the proof of Theorem 6.1.
6.1 The proof of Theorem 6.3
In this subsection, we prove Theorem 6.3. Since the statement of this theorem is about pure strategies, the proof is less technical and the intuition is more transparent.
We only need to prove that there is \( \phi ^* > 0 \) such that statement 2 of Theorem 6.3 implies statement 1 of Theorem 6.3. From now on, for any \(t\in {\mathbb {N}}\), history \(h\in {\mathcal {H}}^t\) with some final state x, and mixed actions \(m_1\in \Delta ({\mathcal {A}})\) and \(m_2\in \Delta ({\mathcal {B}})\), we denote the expectation of the lower value at the next stage byThe proof consists of two steps.
$$\begin{aligned} {\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] \;=\;\sum _{a\in {\mathcal {A}}}\sum _{b\in {\mathcal {B}}}\sum _{x'\in {\mathcal {X}}}q(x'h,a,b)\cdot {\underline{v}}(h,a,b,x'). \end{aligned}$$
(6.1)
Step 1. Definition of \(\phi ^* > 0. \)
For every \(t\in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t\), there exists a number \(d(h) >0\) such that for every action \(a\in {\mathcal {A}}\) of player 1 either one of the following holds:This statement is true because the action set \({\mathcal {A}}\) of player 1 is finite. We define \( \phi ^* > 0 \) as follows. For every \(t\in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t\), let \(\phi ^*(h)=\min \{d(h),2^{t}\}\). The term \(2^{t}\) is included so that the tolerance levels tend to 0 along each play. For every \(p\in {\mathcal {P}}\), it holds that \(\displaystyle \lim _{t\rightarrow \infty }\phi ^*(p_{\vert t})=0\).

Action a guarantees that the lower value does not drop in expectation: for every \(m_2\in \Delta ({\mathcal {B}})\), \({\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h)\).

There exists a mixed action \(m_2\in \Delta ({\mathcal {B}})\) for player 2 such that, if player 1 uses action a, the lower value drops in expectation by more than d(h): \({\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] < {\underline{v}}(h)d(h)\).
Step 2. If the pure strategy \( \sigma \in \mathcal{S}_{1} \) is a subgame \( \phi ^{*}\)maxmin strategy, then \( \sigma \) is a subgame maxmin strategy.
Let the pure strategy \( \sigma \in {{{\mathcal {S}}}}_{1} \) be subgame \(\phi ^*\)maxmin. We verify that \(\sigma \) satisfies the conditions of Corollary 4.7.
First, we show that \(\sigma \) is 1day maxmin. Fix \( t \in {\mathbb {N}} \) and \(h\in {\mathcal {H}}^t. \) Let \(a=\sigma (h)\) denote the action that \(\sigma \) recommends at h. Then, for every mixed action \(m_2\in \Delta ({\mathcal {B}})\) and strategy \(\tau \in {\mathcal {S}}_2\) with \(m_2=\tau (h)\), we havewhere the first inequality follows from the fact that \(\sigma \) is subgame \(\phi ^*\)maxmin and by condition 1 of Theorem 4.6, and the second inequality follows from the definition of \(\phi ^*(h). \) Therefore, by the choice of d(h) in Step 1, for every \(m_2\in \Delta ({\mathcal {B}})\), it holds thatHence, for every strategy \(\tau \in {\mathcal {S}}_2\), it holds thatThus, \(\sigma \) is 1day maxmin.
$$\begin{aligned} {\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h)\phi ^*(h)\ge {\underline{v}}(h)d(h), \end{aligned}$$
$$\begin{aligned} {\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h). \end{aligned}$$
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }\left[ {\underline{V}}^{t+1}\right] ={\mathbb {E}}_{h,a,\tau (h)}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h). \end{aligned}$$
We show that \(\sigma \) is equalizing. Take some \( \tau \in \mathcal{S}_{2}. \) Because \(\sigma \) is subgame \(\phi ^*\)maxmin, for every \(p\in {\mathcal {P}}(h), \) for every \(n\ge t, \) we havewhere \(\Phi ^{*n}(p)=\phi ^*(p_{\vert n})\). We conclude thatwhere the last equality from the fact that for all \(p\in {\mathcal {P}}\) we have \(\displaystyle \lim _{n\rightarrow \infty }\phi ^*(p_{\vert n})=0, \) sowhere the equality follows from Lemma A.1 in Appendix A. We have shown that \(\sigma \) is equalizing.
$$\begin{aligned} U_{\sigma ,\tau }^n(p)={\mathbb {E}}_{p_{\vert n},\sigma ,\tau }\left[ u\right] \ge {\underline{v}}(p_{\vert n})\phi ^*(p_{\vert n})={\underline{V}}^n(p)\Phi ^{*n}(p), \end{aligned}$$
$$\begin{aligned} \lim _{n\rightarrow \infty }U_{\sigma ,\tau }^n \ge \limsup _{n\rightarrow \infty }\left( {\underline{V}}^n\Phi ^{*n} \right) =\limsup _{n\rightarrow \infty } {\underline{V}}^n\lim _{n\rightarrow \infty }\Phi ^{*n} = \limsup _{n\rightarrow \infty } {\underline{V}}^n, \end{aligned}$$
$$\begin{aligned} u = \lim _{n\rightarrow \infty }U_{\sigma ,\tau }^n \ge \limsup _{n\rightarrow \infty } {\underline{V}}^n,\quad {\mathbb {P}}_{h,\sigma ,\tau }\text {almost surely,} \end{aligned}$$
6.2 The OneShot Game \(\Upsilon _h\)
To prove Theorem 6.1, we first analyze a oneshot zerosum game in this subsection. For each history, the oneshot game is such that the payoff is given by the lower value at the next stage. In the next subsection, we use these oneshot games to construct the tolerance function \(\phi ^* > 0\).
For some \( t \in {\mathbb {N}}, \) let \( h \in {\mathcal {H}}^t \) be a history in the game \(\Gamma _{x_{0}}(u). \) The oneshot zerosum game \(\Upsilon _h\) is played as follows. Player 1 chooses an action \(a\in {\mathcal {A}}\) and player 2 simultaneously chooses an action \(b\in {\mathcal {B}}\). Then, player 1 receives from player 2 the amount \({\mathbb {E}}_{h,a,b}\left[ {\underline{V}}^{t+1}\right] \). As the action sets \({\mathcal {A}}\) and \({\mathcal {B}}\) are finite, the game \(\Upsilon _h\) has a value, which we denote by w(h). Furthermore, both players have optimal mixed actions in the game \(\Upsilon _h\).
The following lemma states that the value w(h) of the oneshot game \(\Upsilon _h\) equals the lower value \({\underline{v}}(h)\) at the history h in the original game \(\Gamma _{x_{0}}(u)\). In case the value exists, this is a classic result of dynamic programming.
Lemma 6.4
For every history \(h\in {\mathcal {H}}, \) we have \(w(h)={\underline{v}}(h)\).
Proof
The proof is by contradiction. Fix \( t \in {\mathbb {N}} \) and a history \(h\in {\mathcal {H}}^t. \) Now suppose that \(w(h)\ne {\underline{v}}(h)\). Then, we have either \(w(h)>{\underline{v}}(h)\) or \(w(h)<{\underline{v}}(h)\).
Case 1: \(w(h)>{\underline{v}}(h)\).
Let \(\delta =w(h){\underline{v}}(h)\). We derive a contradiction by showing that, in the subgame of \(\Gamma _{x_{0}}(u)\) at history h, player 1 can guarantee an expected payoff of at least \({\underline{v}}(h)+\delta /2\).
Let \(m_1\in \Delta ({\mathcal {A}})\) be an optimal mixed action for player 1 in the oneshot game \(\Upsilon _h\). Let \( \sigma \in {\mathcal {S}}_1 \) be such that \(\sigma (h)=m_1\) and such that it induces a \((\delta /2)\)maxmin strategy for the subgame at each history in period \(t+1\), i.e., for every \(h'\in {\mathcal {H}}^{t+1},\) for every \(\tau \in {\mathcal {S}}_2,\)Then, for every \(\tau \in {\mathcal {S}}_2,\) it holds thatwhere the second equality follows from the fact that \((U^n_{\sigma ,\tau })_{n\ge t}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)martingale, the first inequality follows from (6.2), and the second inequality follows from the choice of \(m_1\).
$$\begin{aligned} {\mathbb {E}}_{h',\sigma ,\tau }\left[ u\right] \ge {\underline{v}}(h')\tfrac{\delta }{2}. \end{aligned}$$
(6.2)
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right]= & {} {\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t+1}_{\sigma ,\tau }\right] \\\ge & {} {\mathbb {E}}_{h,\sigma ,\tau }\left[ {\underline{V}}^{t+1}\right] \tfrac{\delta }{2} = {\mathbb {E}}_{h,m_1,\tau (h)}\left[ {\underline{V}}^{t+1}\right] \tfrac{\delta }{2} \ge w(h) \tfrac{\delta }{2} ={\underline{v}}(h)+\tfrac{\delta }{2}, \end{aligned}$$
Let \(\delta ={\underline{v}}(h)w(h)\). We derive a contradiction by showing that, for every strategy of player 1, there is a strategy for player 2 such that the expected payoff is at most \({\underline{v}}(h)\delta /2\) in the subgame of \(\Gamma _{x_{0}}(u)\) at history h.
Fix \(\sigma \in {\mathcal {S}}_1\) and let \(m_1=\sigma (h)\). Let \(m_2\in \Delta ({\mathcal {B}})\) be an optimal mixed action for player 2 in the oneshot game \(\Upsilon _h\). Let \(\tau \in {\mathcal {S}}_2\) be such that \(\tau (h)=m_2\) and the expected payoff under \((\sigma ,\tau )\) in the subgame at each history \(h'\) at period \(t+1\) is at most the lower value \({\underline{v}}(h') + \delta /2\), i.e., for every \(h'\in {\mathcal {H}}^{t+1},\)We have thatwhere the first inequality follows from the choice of \(m_2\), the second inequality follows from (6.3), and the penultimate equality follows from the fact that \((U^n_{\sigma ,\tau })_{n\ge t}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)martingale. Because \( \sigma \) is chosen arbitrarily, we have derived a contradiction with the definition of the lower value \({\underline{v}}(h)\). \(\square \)
$$\begin{aligned} {\mathbb {E}}_{h',\sigma ,\tau }\left[ u\right] \le {\underline{v}}(h')+\tfrac{\delta }{2}. \end{aligned}$$
(6.3)
$$\begin{aligned} {\underline{v}}(h)\tfrac{\delta }{2}= & {} w(h)+\tfrac{\delta }{2} \ge {\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] +\tfrac{\delta }{2} \\\ge & {} {\mathbb {E}}_{h,m_1,m_2}\left[ U^{t+1}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t+1}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] , \end{aligned}$$
The total variation distance between two mixed actions \(m_1,n_1\in \Delta ({\mathcal {A}})\) of player 1 is defined asThe total variation distance between two probability measures \( {\mathbb {P}} \) and \( {\mathbb {P}}^{\prime } \) on \( ({{{\mathcal {P}}}}, {{{\mathcal {F}}}}) \) is defined asLet \( t \in {\mathbb {N}} \) and a history \(h\in {\mathcal {H}}^t\) be given. Let \(O_h \subseteq \Delta ({{{\mathcal {A}}}}) \) denote the set of optimal mixed actions of player 1 in the oneshot game \(\Upsilon _h\). By Lemma 6.4 it holds thatNote that \(O_h\) is a compact subset of \(\Delta ({\mathcal {A}})\). For every \(m_1\in \Delta ({\mathcal {A}}), \) the distance of \(m_1\) to \(O_h\) is defined byDue to the compactness of \(O_h, \) the minimum is attained. For \(\delta >0\), let \(D_h^\delta \) be the set of mixed actions of player 1 which have a distance of at least \(\delta \) to the set \(O_h, \) soThe mixed actions in \(D_h^\delta \) are not optimal in the oneshot game \(\Upsilon _h\). The following lemma says that the loss in utility caused by these mixed actions has a positive lower bound.
$$\begin{aligned} \Vert m_1n_1\Vert _{\mathrm{TV}}=\sum _{a\in {\mathcal {A}}}\vert m_1(a)n_1(a)\vert . \end{aligned}$$
$$\begin{aligned} \Vert {\mathbb {P}}  {\mathbb {P}}^{\prime }\Vert _{{\mathrm{TV}}} = \sup \{\sum _{i=1}^{n} {\mathbb {P}}(F_{i})  {\mathbb {P}}^{\prime }(F_{i}) : F_{1}, \ldots , F_{n} \in {{{\mathcal {F}}}} \text{ and } \{F_{1}, \ldots , F_{n}\} \text{ is } \text{ a } \text{ partition } \text{ of } {{{\mathcal {P}}}}\}. \end{aligned}$$
$$\begin{aligned} O_h = \left\{ m_1\in \Delta ({\mathcal {A}}) \mid \text{ for } \text{ every } m_2\in \Delta ({\mathcal {B}}),~{\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h) \right\} . \end{aligned}$$
$$\begin{aligned} \Vert m_1O_h\Vert _{\mathrm{TV}}=\displaystyle \min _{n_1\in O_h}\Vert m_1n_1\Vert _{\mathrm{TV}}. \end{aligned}$$
$$\begin{aligned} D_h^\delta \;=\;\left\{ m_1\in \Delta ({\mathcal {A}})\vert \Vert m_1O_h\Vert _{\mathrm{TV}}\ge \delta \right\} . \end{aligned}$$
(6.4)
Lemma 6.5
Let \(h\in {\mathcal {H}}\) and \(\delta >0\) be given. If \( D_h^\delta \) is nonempty, then there is \( \epsilon >0\) such that for every \(m_1\in D_h^\delta \) there exists \(b\in {\mathcal {B}}\) such that
$$\begin{aligned}{\mathbb {E}}_{h,m_1,b}\left[ {\underline{V}}^{t+1}\right] \le {\underline{v}}(h)\epsilon .\end{aligned}$$
Proof
Assume \( D^{\delta }_{h} \) is nonempty. Consider the function \( e_h^\delta : D_h^\delta \rightarrow {\mathbb {R}}\) defined bySince \({\mathcal {B}}\) is finite, the minimum exists. For every \(m_1\in D_h^\delta \), we have \(m_1\notin O_h\), and therefore there exists \(m_2\in \Delta ({\mathcal {B}})\) such that \({\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] <{\underline{v}}(h)\). The function \( e^{\delta }_{h} \) is therefore a positive and continuous function on a compact set, so has a positive minimum. \(\square \)
$$\begin{aligned} e_h^\delta (m_1)\;=\;{\underline{v}}(h)\min _{b\in {\mathcal {B}}}{\mathbb {E}}_{h,m_1,b}\left[ {\underline{V}}^{t+1}\right] , \quad m_1\in D_h^\delta . \end{aligned}$$
(6.5)
6.3 Definition of the Tolerance Function \(\phi ^*\)
In this subsection, we define a positive tolerance function \(\phi ^*\). Fix a positive and decreasing sequence \((\delta _t)_{t\in {\mathbb {N}}}\) such that \(\sum _{t=0}^{\infty } \delta _t<\infty \). For example, \(\delta _t=2^{t}\) for every \(t\in {\mathbb {N}}\). Note that \(\lim _{t\rightarrow \infty }\delta _t=0\).
For every \( t \in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t, \) we define the constant c(h) as follows. If the set \(D_h^{\delta _t}\) is nonempty, then c(h) is equal to the positive number \( \epsilon \) of Lemma 6.5 and \( c(h) = \delta _t \) otherwise. We defineNotice that \(0<\phi ^*(h)<\delta _t\).
$$\begin{aligned} \phi ^*(h)=\tfrac{\min \left\{ c(h),\delta _{t}\right\} }{2}. \end{aligned}$$
(6.6)
We summarize the properties of the tolerance function \(\phi ^*\): The importance of \(\sum _{t=0}^{\infty } \delta _t<\infty \) is underlined by the following lemma. Recall that \( M=\sup _{p\in {\mathcal {P}}} \vert u(p)\vert . \)
1.
For every history \(h\in {\mathcal {H}}\), we have \(\phi ^*(h)>0\).
2.
For every play \(p\in {\mathcal {P}}\), we have \(\sum _{t=0}^{\infty } \phi ^*(p_{\vert t})\le \sum _{t=0}^{\infty } \delta _t<\infty \).
3.
For every play \(p\in {\mathcal {P}}\), \(\displaystyle \lim _{t\rightarrow \infty }\phi ^*(p_{\vert t})=0\).
4.
If the set \(D_h^{\delta _t}\) is nonempty, then by the choice of c(h), for every \(m_1\in D_h^{\delta _t}\) there exists \(b\in {\mathcal {B}}\) such that
$$\begin{aligned} {\mathbb {E}}_{h,m_1,b}\left[ {\underline{V}}^{t+1}\right] \;\le \;{\underline{v}}(h)c(h) \;<\;{\underline{v}}(h)\phi ^*(h). \end{aligned}$$
Lemma 6.6
Let the strategies \( \sigma , \sigma ' \in {{{\mathcal {S}}}}_{1} \) be such that, for every \(t\in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t,\) \( \Vert \sigma (h)\sigma '(h)\Vert _{\mathrm{TV}}\le \delta _{t}. \) Then, for every strategy \(\tau \in \mathcal{S}_{2}, \) for every \(t\in {\mathbb {N}}, \) and for every history \(h\in {\mathcal {H}}^t,\)
$$\begin{aligned}\left {\mathbb {E}}_{h,\sigma ,\tau }[u]{\mathbb {E}}_{h,\sigma ',\tau }[u]\right \le M\cdot \sum _{n=t}^{\infty }\delta _n.\end{aligned}$$
Proof
Let \(\tau \in {{{\mathcal {S}}}}_{2} \) be given. It follows from a more general result in Abate, Redig, and Tkachev ([1], theorem 1) that, for every \(t\in {\mathbb {N}},\) for every history \(h\in {\mathcal {H}}^t,\)For completeness, we provide a direct proof of this inequality in Lemma B.1 in Appendix B. The claim of Lemma 6.6 follows directly. \(\square \)
$$\begin{aligned} \Vert {\mathbb {P}}_{h,\sigma ,\tau }{\mathbb {P}}_{h,\sigma ',\tau }\Vert _{\mathrm{TV}}\le \sum _{n=t}^{\infty }\delta _n. \end{aligned}$$
(6.7)
Lemma 6.6 says the following. Consider two arbitrary strategies \(\sigma , \sigma ' \in {{{\mathcal {S}}}}_{1} \) such that the total variation distance between the mixed actions at every history in period t is at most \(\delta _t\). Now consider \( t \in {\mathbb {N}}, \) a history \( h \in {\mathcal {H}}^t, \) and a strategy \( \tau \in {{{\mathcal {S}}}}_{2} \) of player 2. Then, the expected payoffs under \((\sigma ,\tau )\) and \((\sigma ',\tau )\) in the subgame at h differ at most the constant M times \(\sum _{n=t}^{\infty }\delta _n\). Note that this bound does not depend on the strategy \(\tau \) and it only depends on the history h through its period t. Moreover, these bounds tend to 0 as t goes to infinity.
The importance of property 4 of the tolerance function is shown by the following lemma. It says that if \( \sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi ^*\)maxmin strategy, then for every \( h \in H \) the mixed action \( \sigma (h) \) is close to the set of optimal mixed actions \( O_{h} \) in the oneshot game \(\Upsilon _h\).
Lemma 6.7
Let \(\sigma \in {{{\mathcal {S}}}}_{1} \) be a subgame \(\phi ^*\)maxmin strategy. Then, for every \( t \in {\mathbb {N}}, \) for every history \( h \in {\mathcal {H}}^t, \) we have \(\sigma (h)\notin D_h^{\delta _t}, \) so \( \Vert \sigma (h)O_h\Vert _{\mathrm{TV}} < \delta _t. \)
Proof
Because \(\sigma \) is a subgame \(\phi ^*\)maxmin strategy, it follows from condition 1 of Theorem 4.6 that for every mixed action \(m_2\in \Delta ({\mathcal {B}})\) of player 2If \(D_h^{\delta _t}\) is empty, then there is nothing to prove. If \(D_h^{\delta _t}\) is nonempty, then property 4 of \(\phi ^*\) shows that \(\sigma (h)\notin D_h^{\delta _t}\). \(\square \)
$$\begin{aligned}{\mathbb {E}}_{h,\sigma (h),m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h)\phi ^*(h).\end{aligned}$$
6.4 The proof of Theorem 6.1
Proof
Let \( \Gamma _{x_{0}}(u) \) be a game and take the tolerance function \(\phi ^*\) as defined in Subsection 6.3. We only need to show that statement 2 implies statement 1. Let \( \sigma \in \mathcal{S}_{1} \) be a subgame \(\phi ^*\)maxmin strategy of \( \Gamma _{x_{0}}(u). \)
With the help of \( \sigma , \) we define a strategy \( \sigma ^{*} \in {{{\mathcal {S}}}}_{1} \) in Step 1 of the proof. Then, it is shown that \( \sigma ^{*} \) is a subgame maxmin strategy of \( \Gamma _{x_{0}}(u) \) in Steps 2 and 3 of the proof by verifying that \( \sigma ^{*} \) satisfies the conditions of Corollary 4.7.
Step 1: Definition of \( \sigma ^{*} \in {{{\mathcal {S}}}}_{1}. \)
Take \( t \in {\mathbb {N}} \) and a history \(h\in {\mathcal {H}}^t. \) In view of Lemma 6.7, it holds that \(\Vert \sigma (h)O_h\Vert _{\mathrm{TV}}<\delta _{t}\). Therefore, there exists \( m^{*}(h) \in O_h \) such thatNow define \(\sigma ^{*}(h)=m^{*}(h)\).
$$\begin{aligned} \Vert \sigma (h)m^{*}(h)\Vert _{\mathrm{TV}} <\delta _t. \end{aligned}$$
(6.8)
Step 2: \(\sigma ^{*}\) is 1day maxmin.
Consider some \( \tau \in {{{\mathcal {S}}}}_{2}. \) For every \( t \in {\mathbb {N}}, \) for every \( h\in {\mathcal {H}}^t, \) since \(\sigma ^*(h)\in O_h\) we have thatStep 3: \(\sigma ^{*}\) is equalizing.
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{*},\tau }\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h). \end{aligned}$$
Consider some \( \tau \in {{{\mathcal {S}}}}_{2}. \) In view of (6.8), we can apply Lemma 6.6 to conclude that, for every \( t \in {\mathbb {N}}, \) for every \( h \in {\mathcal {H}}^t, \)Hence, for every \( t \in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t\), and for every \(n\ge t,\)Because \(\sigma \) is subgame \(\phi ^*\)maxmin, for every history \(h\in {\mathcal {H}}, \) we haveThus, for every history \(h\in {\mathcal {H}}\) it holds thatwhere the first equality is due to Lemma A.1 in Appendix A, the second equality follows from (6.9), the inequality is by (6.10), and the last equality is a consequence of property 3 of \(\phi ^*\) from Subsection 6.3. \(\square \)
$$\begin{aligned} \left {\mathbb {E}}_{h,\sigma ,\tau }[u]{\mathbb {E}}_{h,\sigma ^*,\tau }[u]\right \le M \cdot \sum _{n=t}^{\infty } \delta _n. \end{aligned}$$
$$\begin{aligned} \left U^n_{\sigma ,\tau }[u]U^n_{\sigma ^*,\tau }[u]\right \le M\cdot \sum _{i=n}^{\infty }\delta _i, \quad {\mathbb {P}}_{h,\sigma ^{*},\tau }\text {almost surely.} \end{aligned}$$
(6.9)
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }[u]\ge {\underline{v}}(h)\phi ^*(h). \end{aligned}$$
(6.10)
$$\begin{aligned} u=\lim _{t\rightarrow \infty }U_{\sigma ^{*},\tau }^t=\lim _{t\rightarrow \infty }U_{\sigma ,\tau }^t\ge \limsup _{t\rightarrow \infty }\left( {\underline{V}}^t  \Phi ^{*t}\right) =\limsup _{t\rightarrow \infty } {\underline{V}}^t, \quad {\mathbb {P}}_{h,\sigma ^{*},\tau }\text {almost surely}, \end{aligned}$$
Note that it is not necessary that the initial \(\delta _t\) are small, or that for initial histories the tolerance level \(\phi ^*(h)\) is small. The conditions put on the sequence \((\delta _t)_{t\in {\mathbb {N}}}\) are there to ensure that the equalizing condition for subgame maxmin strategies is satisfied. The equalizing condition essentially only cares about what happens to the payoff at deep subgames. The requirement that the sequence \((\delta _t)_{t\in {\mathbb {N}}}\) decreases to zero very fast guarantees that \(\lim _{t\rightarrow \infty }\Vert {\mathbb {P}}_{p_{\vert t}, \sigma ,\tau }{\mathbb {P}}_{p_{\vert t}, \sigma ,\tau }\Vert _{\mathrm{TV}}=0\). This allows us to use the \(\phi ^*\)equalizing property of the subgame \(\phi ^*\)maxmin strategy to show that the strategy \(\sigma ^*\) satisfies the equalizing condition.
7 Discussion
Countable action sets. We have assumed throughout the paper that the action sets are finite. If one player has countably many actions, several of our arguments and results are no longer valid. For example, consider the following game. In the initial state, player 1’s action set is \({\mathbb {N}}\), player 2 has only one action, and if player 1 chooses action \(n\in {\mathbb {N}}\) then regardless how the play continues, the payoff is \(1\frac{1}{n+1}\). This is essentially a oneshot game. Clearly, for every tolerance function \(\phi >0\) player 1 has a pure subgame \(\phi \)maxmin strategy, i.e. he can play action n with n sufficiently large. Yet, player 1 has no subgame maxmin strategy. So the statements of Theorems 6.1 and 6.3 and Corollary 6.2 are not valid for this game.
Oneplayer games. Suppose that player 2 has a trivial action set, i.e. \({\mathcal {B}}\) is a singleton. Even though this is essentially a game with only one player, as shown in Example 3.2, it may happen that player 1 has a subgame \(\phi \)maxmin strategy for a certain tolerance function \(\phi \), but he has no pure strategy that is subgame \(\phi \)maxmin. Nevertheless, if \({\mathcal {B}}\) is a singleton and the conditions of Theorem 5.7 are satisfied, then player 1 has a pure subgame \(\phi \)maxmin strategy. Indeed, player 1 has a pure \(\delta \)maxmin strategy for each \(\delta >0\). Hence, the switching strategy used in the proof of Theorem 5.7 will also be pure.
Discounted games. Our model encompasses discounted stochastic games with finite action and countable state spaces, provided that the immediate payoffs are bounded. Since these games admit a value and the discounted payoffs are continuous, it follows by Theorem 5.7 that player 1 has a subgame \(\phi \)optimal strategy for every tolerance function \(\phi >0\). Therefore, Corollary 6.2 implies the wellknown fact that player 1 has a subgame 0optimal strategy. Our proof does not directly imply that this strategy can be chosen to be stationary.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.