Skip to main content
Erschienen in: Dynamic Games and Applications 4/2021

Open Access 02.03.2021

Subgame Maxmin Strategies in Zero-Sum Stochastic Games with Tolerance Levels

verfasst von: János Flesch, P. Jean-Jacques Herings, Jasmine Maes, Arkadi Predtetchinski

Erschienen in: Dynamic Games and Applications | Ausgabe 4/2021

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We study subgame \(\phi \)-maxmin strategies in two-player zero-sum stochastic games with a countable state space, finite action spaces, and a bounded and universally measurable payoff function. Here, \(\phi \) denotes the tolerance function that assigns a nonnegative tolerated error level to every subgame. Subgame \(\phi \)-maxmin strategies are strategies of the maximizing player that guarantee the lower value in every subgame within the subgame-dependent tolerance level as given by \(\phi \). First, we provide necessary and sufficient conditions for a strategy to be a subgame \(\phi \)-maxmin strategy. As a special case, we obtain a characterization for subgame maxmin strategies, i.e., strategies that exactly guarantee the lower value at every subgame. Secondly, we present sufficient conditions for the existence of a subgame \(\phi \)-maxmin strategy. Finally, we show the possibly surprising result that each game admits a strictly positive tolerance function \(\phi ^*\) with the following property: if a player has a subgame \(\phi ^*\)-maxmin strategy, then he has a subgame maxmin strategy too. As a consequence, the existence of a subgame \(\phi \)-maxmin strategy for every positive tolerance function \(\phi \) is equivalent to the existence of a subgame maxmin strategy.
Hinweise
We would like to thank the Associate Editor and two anonymous Reviewers for their helpful comments.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

We consider two-player zero-sum stochastic games with a countable state space, finite action spaces, and a bounded and universally measurable payoff function. Such a game models a repeated interaction between two players with opposite objectives. The environment in which the interaction takes place is fully characterized by the state variable. The transition from one state variable to the next one is influenced by both players as well as an element of chance. Throughout the paper, we take the perspective of the maximizing player. We are interested in strategies of the maximizing player that guarantee the lower value at every subgame and call such strategies subgame maxmin strategies. Under the assumptions as made in the paper, the value may not exist, which explains why we consider the lower value instead.
As the name subgame maxmin strategy suggests, this concept is closely related to the concept of a subgame perfect equilibrium as defined in Selten ([19]). In two-player zero-sum games where the value exists, for conditions see Maitra and Sudderth ([13]) and Martin ([14]), the notions of a subgame maxmin strategy and a subgame minmax strategy coincide with the notion of a subgame optimal strategy. Moreover, in such games a strategy profile is a subgame perfect equilibrium if and only if it consists of a pair of subgame optimal strategies.
As illustrated by the Big Match, a game introduced in Gillette ([9]) and analyzed in Blackwell and Ferguson ([2]), even if the value exists, it is not guaranteed that optimal strategies exist, so a fortiori, subgame optimal strategies and subgame perfect equilibria may not exist. A large part of the literature therefore focuses on so-called subgame perfect \(\epsilon \)-equilibria as defined in Radner ([17]). This equilibrium concept is more permissive than a subgame perfect equilibrium and consists of a strategy pair such that every player obtains the value at each history up to a fixed error term of \(\epsilon /2\).
Instead of having a fixed error term at each subgame, we allow the error term to vary across different subgames. This error term is expressed as a function \(\phi \) of the histories and is called the tolerance function. The central topic of this paper is the concept of a subgame \(\phi \)-maxmin strategy. This is a strategy of the maximizing player that guarantees the lower value at every subgame within the allowed tolerance level. Intuitively, a subgame \(\phi \)-maxmin strategy performs sufficiently well across all subgames. This type of strategy is related to the concept of \(\phi \)-tolerance equilibrium as defined in Flesch and Predtetchinski ([6]). Indeed, if the value exists, then a strategy profile in which both players use a subgame \((\phi /2)\)-optimal strategy is a \(\phi \)-tolerance equilibrium.
One motivation for letting the tolerance level vary across subgames is given by Mailath, Postlewaite, and Samuelson ([11]) when introducing the concept of a contemporaneous perfect \(\epsilon \)-equilibrium. The authors focus on games in which the payoff function of the players is given by the discounted sum of periodic rewards. Due to this discounting, there exists a period after which the maximal total discounted reward a player can receive is smaller than \(\epsilon \). If the allowed tolerance level \(\epsilon \) is fixed across all subgames, any strategy will be an \(\epsilon \)-maxmin strategy of a subgame in such a period. Therefore, the concept of subgame \(\epsilon \)-maxmin strategy does not impose any restrictions on the actions chosen at a very distant future. The issue here is that it would be more intuitive to discount not only the reward but also the allowed tolerance level.
Additional motivation for letting the tolerance level vary across subgames stems from the fact that the notion of what is considered sufficiently good might be relative. For instance, Tversky and Kahneman [21] observe that people evaluate decisions with respect to a reference level. They find that significantly more people were willing to exert extra effort to save $5 on a $15 purchase than to save $5 on a $125 purchase. To apply this to the context of zero-sum games, consider the following game to which we will refer as the high stakes-low stakes game. In the first stage of the game, a chance move determines whether the player will engage in the high stakes or the low stakes variant of this game. The high stakes and the low stakes games are identical in terms of possible strategies. The only difference is that the payoffs in the high stakes game are a thousand fold the payoffs in the low stakes game. Furthermore, assume that in the high stakes subgame, the payoff of player 1 ranges between 0 and 2000 and the value is 1000, while in the low stakes subgame the payoff of player 1 ranges between 0 and 2 and the value is 1. Players that evaluate decisions with respect to a reference level may label a strategy which guarantees a payoff of 999 in the high stakes game as sufficiently good. However, in the low stakes game, 0 corresponds to the minimum payoff. Allowing the tolerance level to vary across subgames can therefore be used to accommodate such behavioral effects into the model of zero-sum stochastic games.
Another case where history-dependent tolerance levels are natural is the following. In situations that commonly occur, a player may use a familiar strategy irrespective of the scale of the payoffs. To understand this better, imagine a player who is highly experienced in playing a certain zero-sum game. He has a trusted strategy which guarantees him the value of this game within some error. Now, consider the high stakes–low stakes game again. The player might well use the trusted strategy in both the low stakes and the high stakes subgame. Therefore, the error related to this strategy will be relative with respect to the value of the respective subgame.
Finally, in the class of stochastic games as identified in Flesch, Thuijsman, and Vrieze ([8]), the only way to obtain \(\epsilon \)-optimality is to use strategies that are called improving. Improving strategies are closely related to subgame \(\phi \)-maxmin strategies such that the tolerance level in some subgames is smaller than the tolerance level at the root.
With respect to the concept of subgame \(\phi \)-maxmin strategies, this paper attempts to provide answers to the following two fundamental questions:
1.
For positive tolerance functions \( \phi ,\) when does a subgame \(\phi \)-maxmin strategy exist?
 
2.
How is the existence of a subgame \(\phi \)-maxmin strategy related to the existence of a subgame maxmin strategy?
 
To answer these questions we will first derive necessary and sufficient conditions for a strategy to be a subgame \(\phi \)-maxmin strategy. This is done in Sect. 4. As a special case of these necessary and sufficient conditions, we obtain a characterization of subgame maxmin strategies. For the special class of positive and negative stochastic games, a related characterization of subgame maxmin strategies was obtained by Flesch, Predtetchinski, and Sudderth ([7]).
The necessary and sufficient conditions for strategies to be subgame \(\phi \)-maxmin can be split into a local condition and an equalizing condition. Informally, the local condition states that the lower value one expects to get in the next subgame should always be at least the lower value of the current subgame. The equalizing condition requires that for every strategy of the other player, a subgame \( \phi \)-maxmin strategy almost surely results in a play with an eventually good enough payoff, where eventually good enough means being at least the lower value in very deep subgames up to the allowed tolerance level.
In Sect. 5, we consider the question of existence of subgame \(\phi \)-maxmin strategies for positive tolerance functions \(\phi \). We prove that for a positive tolerance function \(\phi \), a subgame \(\phi \)-maxmin strategy exists if every play is either a point of upper semicontinuity of the payoff function or if the sequence of tolerance levels which occur along the play has a positive lower bound. The novel aspect of the theorem is the way in which it encompasses a number of important special cases, some of which have been studied extensively in the literature. One special case of interest is when the payoff function is upper semicontinuous everywhere. In this case, the existence of subgame maxmin strategies follows by a result of Laraki, Maitra, and Sudderth ([10]). Another is the case when along each play tolerance levels remain bounded away from zero. This latter case subsumes the familiar class of all positive constant tolerance functions. When the tolerance level is equal to a constant \(\epsilon \) at each history, subgame \(\phi \)-maxmin strategies are exactly the so-called subgame \(\epsilon \)-optimal strategies. Existence of subgame \(\epsilon \)-optimal strategies for each \(\epsilon > 0\) is established in Mashiah–Yaakovi ([15]).
Our technique relies on switching between strategies: one starts by following a \(\tfrac{\phi (x_{0})}{2}\)-optimal strategy until a history (say h) is reached where it ceases to be \(\phi (h)\)-optimal. At that moment, one discards the original strategy and switches instead to a \(\tfrac{\phi (h)}{2}\)-optimal strategy, and so on. This technique is well-attested in the literature, and in particular is used in Mashiah–Yaakovi ([15]). We refine the method to accommodate both types of plays: those where the tolerance levels remain bounded away from zero, and those where they converge to zero (the latter are, by assumption, the plays where the payoff function is upper semicontinuous).
In Sect. 6, we study the relation between the existence of subgame \(\phi \)-maxmin strategies and subgame maxmin strategies. Our main result states that the existence of a particular subgame \(\phi ^*\)-maxmin strategy, with \(\phi ^*>0\), is equivalent to the existence of a subgame maxmin strategy. Consequently, the existence of subgame \(\phi \)-maxmin strategies for all positive tolerance functions \(\phi \) is equivalent to the existence of a subgame maxmin strategy. In the special case of an upper semi-continuous payoff function, the results of Sects. 5 and 6 imply the existence of a subgame maxmin strategy. As noted above, this result is not new, and is a consequence of Laraki, Maitra, and Sudderth ([10]).
The connection between existence of subgame \(\phi \)-maxmin strategies for every positive tolerance function \(\phi \) and the existence of subgame maxmin strategies is not only useful to further understand the results obtained by Laraki, Maitra, and Sudderth ([10]) but also highlights an important and surprising difference between subgame \(\phi \)-maxmin strategies and the closely related concept of subgame \(\epsilon \)-maxmin strategies. Indeed, the existence of a subgame \(\epsilon \)-maxmin strategy for every \(\epsilon >0\) does not imply the existence of a subgame maxmin strategy.
The rest of the paper is structured as follows. In Sect.  2, we formulate the model setup, and in Sect.  3, we formally define the main concepts. Then in Sect. 4, we discuss the necessary and sufficient conditions for a strategy to be a subgame \(\phi \)-maxmin strategy and give a characterization for subgame maxmin strategies. We continue in Sect. 5 by providing sufficient conditions to guarantee the existence of a subgame \(\phi \)-maxmin strategy. Then in Sect. 6, we explain why the existence of subgame \(\phi \)-maxmin strategies for every positive tolerance function \(\phi \) is equivalent to existence of a subgame maxmin strategy. Finally, in Sect. 7, we discuss the importance of the assumptions we made and whether they might be relaxed.

2 Two-Player Zero-Sum Stochastic Games

We consider a two-player zero-sum stochastic game with finitely many actions and countably many states. The payoff function is required to be bounded and universally measurable. The model encompasses all two-player zero-sum games with perfect information and stochastic moves.
Actions, states, and histories The action sets of players 1 and 2 are given by the finite sets \({\mathcal {A}} \) and \( {\mathcal {B}}, \) respectively. The state space is given by the countable set \( {\mathcal {X}}. \) Let \(x_{0}\) denote the initial state and define the set \({\mathcal {Z}} = {\mathcal {A}}\times {\mathcal {B}}\times {\mathcal {X}}\). The game consists of an infinite sequence of one-shot games. At the initial state \(x_{0},\) the one-shot game \(G(x_{0})\) is played as follows: Players 1 and 2 simultaneously select an action from their respective action sets, denoted by \( a_{1} \) and \( b_{1}, \) respectively. Then, the next state \(x_1\) is selected according to the transition probability \(q(\cdot \vert x_{0},a_1,b_1)\). At the new state \(x_1,\) this process repeats itself and both players play the one-shot game \(G(x_1)\) by selecting actions \(a_2\) and \(b_2\) from their respective action sets. The next state \(x_2\) is selected according to the transition probability \(q(\cdot \vert x_{0},a_1,b_1,x_{1},a_{2},b_{2})\). This goes on indefinitely and creates a play \(p=(x_{0},a_1,b_1,x_1,a_2,b_2,\ldots )\). Note that the transition probability can depend on the entire history.
For every \(t\in {\mathbb {N}}=\lbrace 0,1,2,\ldots \rbrace \), let \({\mathcal {H}}^t=\lbrace x_{0}\rbrace \times {\mathcal {Z}}^{t}\) denote the set of all histories that are generated after t one-shot games. The set \({\mathcal {H}}^0\) consists of the single element \(x_{0}\). For \(t \ge 1,\) elements of \({\mathcal {H}}^t\) are of the form \((x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\). Let \({\mathcal {H}} = \cup _{t\in {\mathbb {N}}}{\mathcal {H}}^t\) denote the set of all histories. For all \(h \in {\mathcal {H}},\) let \(\Vert h\Vert = \Vert (x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\Vert = t\) denote the number of one-shot games that occurred during the history h. We refer to \(\Vert h\Vert \) as the length of the history h. For all \(t\le \Vert h\Vert , \) the restriction of the history h to the first t one-shot games is denoted by \(h_{\vert t}=(x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\). We write \(h' \preceq h\) if there exists \(t\le \Vert h\Vert \) such that \(h_{\vert t}=h'\), so if the history h extends the history \(h'\).
The space of plays Define \({\mathcal {P}} = \lbrace x_{0}\rbrace \times {\mathcal {Z}}^{{\mathbb {N}}}\) to be the set of plays. Elements of \({\mathcal {P}}\) are of the form \(p = (x_{0},a_1,b_1,x_1,a_2,b_2,\ldots )\). For \(t \in {\mathbb {N}},\) let \(p_{\vert t}\) denote the prefix of p of length t: that is \(p_{\vert 0} = x_{0}\) and \(p_{\vert t} = (x_{0},a_1,b_1,\ldots ,a_t,b_t,x_t)\) for \(t \ge 1\). We write \(h \prec p\) if a history h is a prefix of p. For every \(h \in {\mathcal {H}},\) let \({\mathcal {P}}(h) = \lbrace p \in {\mathcal {P}}\vert h\prec p\rbrace \) denote the cylinder set consisting of all plays which extend history h.
We endow \({\mathcal {Z}}\) with the discrete topology and \({\mathcal {P}}\) with the product topology. The collection of all cylinder sets is a basis for the product topology on \({\mathcal {P}}\).
For \(t \in {\mathbb {N}}\), let \({\mathcal {F}}^t\) be the sigma-algebra generated by the collection of cylinder sets \(\{{\mathcal {P}}(h) \mid h \in {\mathcal {H}}^{t}\}\). Each set in \({\mathcal {F}}^t\) can be written as the union of sets in \(\{{\mathcal {P}}(h) \mid h \in {\mathcal {H}}^{t}\}\). Let \({\mathcal {F}}^\infty \) be the sigma-algebra generated by \(\displaystyle \cup _{t\in {\mathbb {N}}} {\mathcal {F}}^t\). This is exactly the Borel sigma-algebra generated by the product topology on \({\mathcal {P}}\). The sigma-algebra of universally measurable subsets of \({\mathcal {P}}\) is denoted by \({\mathcal {F}}\). Elements of \({\mathcal {F}}\) are sets that belong to the completion of each Borel probability measure on \({\mathcal {P}}\). For details of the definition of the sigma-algebra \({\mathcal {F}}, \) the reader is referred to Appendix A. It holds that \({\mathcal {F}}^{t} \subseteq {\mathcal {F}}^{t+1} \subseteq \cdots \subseteq {\mathcal {F}}^\infty \subseteq {\mathcal {F}}\). The set of plays \({\mathcal {P}}\) together with the universally measurable sigma-algebra \({\mathcal {F}}\) determines a measurable space \(({\mathcal {P}},{\mathcal {F}})\). A stochastic variable is a universally measurable function from \({\mathcal {P}}\) to \({\mathbb {R}}\).
Strategies Let \(\Delta ({\mathcal {A}})\) denote the set of probability measures over the action set of player 1 and \(\Delta ({\mathcal {B}})\) the set of probability measures over the action set of player 2. A behavioral strategy for player 1 is a function \(\sigma : {\mathcal {H}} \rightarrow \Delta ({\mathcal {A}})\). Hence, at each history player 1 chooses a mixed action. A pure strategy for player 1 is a function that assigns to every history an action, with a minor abuse of notation, \(\sigma :{\mathcal {H}}\rightarrow {\mathcal {A}}\). Similarly, one can define a behavioral and a pure strategy \(\tau \) for player 2. Let \({\mathcal {S}}_1\) and \({\mathcal {S}}_2\) denote the sets of behavioral strategies of players 1 and 2, respectively.
It follows from the Ionescu Tulcea extension theorem that every history \(h\in {\mathcal {H}}\) and strategy profile \((\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times {{{\mathcal {S}}}}_{2}\) determine a probability measure \({\mathbb {P}}_{h,\sigma ,\tau }\) on the measurable space \(({\mathcal {P}}(h),{\mathcal {F}}^\infty _{{\mathcal {P}}(h)})\), where \({\mathcal {F}}^\infty _{{\mathcal {P}}(h)}\) denotes the Borel sigma-algebra over the set of plays extending the history h. The measure \({\mathbb {P}}_{h,\sigma ,\tau }\) can be extended to the measurable space \(({\mathcal {P}},{\mathcal {F}}^\infty )\) in the obvious way. Taking the completion of the probability space \(({\mathcal {P}},{\mathcal {F}}^\infty ,{\mathbb {P}}_{h,\sigma ,\tau })\) yields the probability space \(({\mathcal {P}}, {\mathcal {F}}, {\mathbb {P}}_{h,\sigma ,\tau }^{\mathrm{c}})\). With a minor abuse of notation, we will omit the superscript \(\mathrm{c}\) and write \({\mathbb {P}}_{h,\sigma ,\tau }\) in the remainder of this paper.
Payoff function We assume that the payoff function \(u:{\mathcal {P}}\rightarrow {\mathbb {R}}\) of player 1 is bounded, where we define \( M = \sup _{p\in {\mathcal {P}}} \vert u(p)\vert , \) and universally measurable. We also assume the game to be zero-sum. The payoff function of player 2 is therefore given by \(-u\). We denote the game as described above by \(\Gamma _{x_{0}}(u)\). Throughout the paper, we take the point of view of player 1. This is without loss of generality, since the situation of player 2 in the game \(\Gamma _{x_{0}}(u)\) is identical to that of player 1 in the game \(\Gamma _{x_{0}}(-u)\).
The expected payoff of player 1 corresponding to strategy profile \((\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times {{{\mathcal {S}}}}_{2} \) at history \( h \in {{{\mathcal {H}}}} \) is given by \({\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] , \) where the expectation is taken with respect to the probability measure \({\mathbb {P}}_{h,\sigma ,\tau }\). The expected payoff of player 1 at the history \(x_0\) is denoted by \({\mathbb {E}}_{\sigma ,\tau }\left[ u\right] \).
We imposed the condition above that the payoff function u is universally measurable. As Borel measurable functions are also universally measurable, this condition is weaker than requiring that u be Borel measurable. The main difference between these two conditions is the following. When the payoff function u is Borel measurable, the game is known to admit a value (cf. Sect. 3 for the formal definition). However, it is not provable from the standard ZFC axioms of set theory1 that our games with a universally measurable payoff function also admit a value. Therefore, instead of the value, in the remaining part of the paper we will use the lower value (cf. Sect. 3 for the formal definition). We would like to emphasize that except for the difference between the concepts of the value and the lower value, assuming Borel measurable payoffs would not lead to any simplification in the proofs. So the reader unfamiliar with universally measurable functions can read the entire paper imagining Borel measurable payoffs.

3 Subgame \(\phi \)-Maxmin Strategies

We start this section by defining subgame \(\phi \)-maxmin strategies. Then, we discuss an illustrative example. Finally, we define and examine guarantee levels of strategies.

3.1 Definition of Subgame \(\phi \)-Maxmin Strategies

For every game \(\Gamma _{x_{0}}(u), \) for every history \(h\in {\mathcal {H}}, \) we define the lower value \({\underline{v}}(h)\) and the upper value \({\overline{v}}(h)\) by
$$\begin{aligned} {\underline{v}}(h)&=\sup \limits _{\sigma \in {\mathcal {S}}_1}\inf \limits _{\tau \in {\mathcal {S}}_2} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] , \end{aligned}$$
(3.1)
$$\begin{aligned} {\overline{v}}(h)&=\inf \limits _{\tau \in {\mathcal {S}}_2}\sup \limits _{\sigma \in {\mathcal {S}}_1} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] . \end{aligned}$$
(3.2)
Because the payoff function u is assumed to be bounded, we have that \({\underline{v}}(h), {\overline{v}}(h)\in {\mathbb {R}}\). Therefore, the lower and upper value exist in every subgame of \(\Gamma _{x_{0}}(u). \) Furthermore, we have that \({\underline{v}}(h)\le {\overline{v}}(h)\). Whenever \({\underline{v}}(h)= {\overline{v}}(h)\) we say that the subgame at history h has a value and we denote it by v(h). The lower value, the upper value, and the value at the initial state \(x_{0}\) are denoted by \({\underline{v}}\), \({\overline{v}},\) and v,  respectively. If u is Borel measurable, then the value exists by Maitra and Sudderth ([13]) and Martin ([14]). Since we do not assume u to be Borel measurable, we present our results in terms of the lower value.
Even if the value exists, player 1 may not have a strategy that guarantees the value in each subgame and the literature has therefore studied subgame \(\epsilon \)-optimal strategies. These are strategies which guarantee the value in each subgame up to an allowed error term of \(\epsilon \). If the payoff function is bounded and Borel measurable, it has been shown by Mashiah–Yaakovi ([15]) that for each \(\epsilon >0\) player 1 has a subgame \(\epsilon \)-optimal strategy. The concept of a subgame \(\epsilon \)-optimal strategy has a constant error term \(\epsilon \) across all subgames. However, as argued in the introduction, there are instances in which it is more natural to consider a variable error term. Therefore, instead of considering a constant error term, we follow Flesch and Predtetchinski ([6]), who allow the error term to vary across histories in their investigation of the \(\phi \)-tolerance equilibrium. This leads us to the study of subgame \(\phi \)-maxmin strategies, where \(\phi :{\mathcal {H}}\rightarrow [0,\infty )\) is a tolerance function assigning an allowed tolerance level to each history.
Definition 3.1
Let \(\phi :{\mathcal {H}}\rightarrow {[}0,\infty )\) be a tolerance function. A strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi \)-maxmin strategy in the game \( \Gamma _{x_{0}}(u) \) if for every history \(h\in {\mathcal {H}}\) it holds that
$$\begin{aligned} \forall \tau \in {\mathcal {S}}_2,~{\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] \ge {\underline{v}}(h)-\phi (h). \end{aligned}$$
(3.3)
In case \( \phi \) is identically equal to zero, we omit it from the notation, and simply refer to a subgame maxmin strategy.
A subgame \(\phi \)-maxmin strategy guarantees at each history h of the game the lower value up to the tolerance level \(\phi (h)\). If the tolerance function is such that for some \( \epsilon \ge 0, \) for every \(h\in {\mathcal {H}}\), \(\phi (h)=\epsilon \), then we refer to the strategy as a subgame \( \epsilon \)-maxmin strategy. If the value exists, then the notion of subgame \(\epsilon \)-maxmin strategy coincides with the notion of subgame \(\epsilon \)-optimal strategy.
The following example illustrates that even for a strictly positive tolerance function \(\phi \) player 1 may not have subgame \(\phi \)-maxmin strategies. Interestingly, however, player 1 has a subgame \(\epsilon \)-maxmin strategy for every positive \(\epsilon >0, \) which in fact is pure. We also give an example of a tolerance function \( \phi \) for which player 1 has a subgame \(\phi \)-maxmin strategy, but not a pure such strategy.
Example 3.2
The decision problem depicted in Fig. 1 corresponds to a two-player zero-sum stochastic game in which the state space is trivial and the second player is a dummy player. Whenever the state space or action sets are degenerate, the corresponding states and actions are omitted from the notation in examples. The set of actions of player 1 is \( {{{\mathcal {A}}}} = \{c,q\}\), where c stands for continue and q for quit. The game stops as soon as player 1 chooses to quit. If player 1 decides to quit at period t, then his payoff is \( t/(t+1)\). If player 1 never quits, his payoff is 0.
In this game, player 1 has a subgame \(\epsilon \)-maxmin strategy for every positive \(\epsilon >0.\) For example, the strategy which quits whenever quitting leads to a payoff of at least \(1-\epsilon \).
We now turn to the existence of a subgame \(\phi \)-maxmin strategy. Clearly, there exist no subgame maxmin strategy. As we will see later, Theorem 6.1 then implies that there is some strictly positive tolerance function \(\phi \) for which there does not exist a subgame \(\phi \)-maxmin strategy. Intuitively, such a tolerance function forces player 1 to continue with such a large probability that the total probability of never quitting becomes nearly one.
Interestingly, there is a tolerance function \(\phi \) for which player 1 has a subgame \(\phi \)-maxmin strategy, but for which he has no pure strategy that is subgame \(\phi \)-maxmin. To see this, define the tolerance function \(\phi \) by \(\phi (c^t)=1-\frac{1}{2}\cdot \frac{t}{t+1}-\frac{1}{2}\cdot \frac{t+1}{t+2}\) for each \(t\in {\mathbb {N}}\). Now consider the strategy \(\sigma \) which always chooses action c with probability 1/2 and action q with probability 1/2. In the subgame at any history \(c^t\), it quits immediately with probability 1/2 and quits at a later period with probability 1/2. In the former case, it gives payoff \(\frac{t}{t+1}\), whereas in the latter case, it gives an expected payoff of at least \(\frac{t+1}{t+2}\). Hence, the strategy \(\sigma \) is a subgame \(\phi \)-maxmin strategy. Yet, there is no pure strategy that is subgame \(\phi \)-maxmin. Indeed, the tolerance function does not allow a pure strategy to quit at any period, whereas a pure strategy that always continues is not subgame \( \phi \)-maxmin either. As we show in the working paper version of our paper, Flesch, Herings, Maes, and Predtetchinski ([5]), a pure subgame \(\phi \)-maxmin strategy exists if and only if the following two conditions hold: (1) for every \(t\in {\mathbb {N}}, \) \( \phi (c^t)>0, \) and (2) for infinitely many \( t \in {\mathbb {N}}, \) \( \phi ^{\prime }(c^{t}) {=} \displaystyle \min _{n \le t}\phi (c^n)\ge {\frac{1}{t+1}}\). \(\square \)

3.2 Guarantee Levels

To identify subgame \(\phi \)-maxmin strategies, it is useful to define the function \({\underline{u}}: {\mathcal {S}}_1\times {\mathcal {H}}\rightarrow {\mathbb {R}}\) by
$$\begin{aligned} {\underline{u}}(\sigma ,h)=\inf _{\tau \in {\mathcal {S}}_2} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] . \end{aligned}$$
(3.4)
The payoff \({\underline{u}}(\sigma ,h)\) corresponds to the guarantee level that player 1 is expected to receive at history h when playing the strategy \(\sigma \). A strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is called a \(\phi (h)\)-maxmin strategy for the subgame at history h if \({\underline{u}}(\sigma ,h) \ge {\underline{v}}(h) - \phi (h)\).
For every strategy profile \((\sigma ,\tau ) \in {\mathcal {S}}_{1} \times {\mathcal {S}}_{2}\), for every \(t \in {\mathbb {N}}\), define the stochastic variables \(U_{\sigma ,\tau }^{t}\), \({\underline{U}}_{\sigma }^{t}\), and \({\underline{V}}^{t}\) by letting \(U_{\sigma ,\tau }^{t}(p) = {\mathbb {E}}_{p_{\vert t},\sigma ,\tau }[u]\), \({\underline{U}}_{\sigma }^{t}(p) = {\underline{u}}(\sigma ,p_{\vert t})\), and \({\underline{V}}^{t}(p) = {\underline{v}}(p_{\vert t})\), respectively, for each play \(p \in {\mathcal {P}}\). All three stochastic variables are \({\mathcal {F}}^{t}\)-measurable. We have \({\underline{U}}_{\sigma }^{t} \le U_{\sigma ,\tau }^{t}\) and \({\underline{U}}_{\sigma }^{t} \le {\underline{V}}^{t}\) everywhere on \({\mathcal {P}}\).
The next lemma states the submartingale property of guarantee levels. It says that the guarantee level that player 1 can expect to receive increases over time.
Lemma 3.3
(Submartingale property of guarantee levels) Let a strategy profile \((\sigma , \tau ) \in {{{\mathcal {S}}}}_{1} \times \mathcal{S}_{2}, \) \(t \in {\mathbb {N}}, \) and a history \(h \in {\mathcal {H}}^{t}\) of length t be given. The process \(({\underline{U}}_{\sigma }^{t + n})_{n \in {\mathbb {N}}}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)-submartingale. In particular, it holds that \({\underline{u}}(\sigma ,h) \le {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+1}]\).
Proof
We first prove the second claim. Take \(\delta > 0. \) Let \(\tau ' \in {\mathcal {S}}_2\) be such that \( \tau ^{\prime }(h) = \tau (h) \) and for each \((a,b,x) \in {\mathcal {Z}} \) it holds that
$$\begin{aligned}{\mathbb {E}}_{(h,a,b,x),\sigma ,\tau '}[u] \le {\underline{u}}(\sigma ,(h,a,b,x)) + \delta .\end{aligned}$$
We have that
$$\begin{aligned} {\underline{u}}(\sigma ,h)&\le {\mathbb {E}}_{h,\sigma ,\tau '}[u]\\&= \sum _{(a,b,x) \in {\mathcal {Z}}}\sigma (h)(a) \cdot \tau (h)(b) \cdot q(x|h,a,b) \cdot {\mathbb {E}}_{(h,a,b,x),\sigma ,\tau '}[u]\\&\le \sum _{(a,b,x) \in {\mathcal {Z}}} \sigma (h)(a) \cdot \tau (h)(b) \cdot q(x|h,a,b) \cdot ({\underline{u}}(\sigma ,(h,a,b,x)) + \delta )\\&= {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+1}] + \delta . \end{aligned}$$
It follows that \( {\underline{u}}(\sigma ,h) \le {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+1}] \) since \( \delta > 0 \) is arbitrary.
To prove that the process \(({\underline{U}}_{\sigma }^{t + n})_{n \in {\mathbb {N}}}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)-submartingale, note that the conditional expectation \({\mathbb {E}}_{h,\sigma ,\tau }({\underline{U}}_{\sigma }^{t+n+1} \mid {\mathcal {F}}^{t+n})\), when evaluated at a play \(p \in {\mathcal {P}}(h)\), equals \({\mathbb {E}}_{p\vert _{t+n},\sigma ,\tau }({\underline{U}}_{\sigma }^{t+n+1})\), which, by the claim already proven, is at least \({\underline{u}}(\sigma ,p\vert _{t+n}) = {\underline{U}}_{t+n}(\sigma )(p)\). \(\square \)

4 Conditions for Strategies to be Subgame \(\phi \)-Maxmin

4.1 n-Day Maxmin Strategies and Equalizing Strategies

The goal of this section is to provide necessary and sufficient conditions for a strategy to be subgame \(\phi \)-maxmin and to provide a characterization of subgame maxmin strategies.
Definition 4.1
A strategy \(\sigma \in {\mathcal {S}}_{1}\) is an n-day \(\phi \)-maxmin strategy in the game \(\Gamma _{x_{0}}(u)\) if for every \(t \in {\mathbb {N}}\), for every history \(h \in {\mathcal {H}}^{t}\) of length t, and for every strategy \(\tau \in {\mathcal {S}}_2\),
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{V}}^{t+n}]\ge {\underline{v}}(h)-\phi (h). \end{aligned}$$
(4.1)
Definition 4.2
A strategy \(\sigma \in {\mathcal {S}}_{1}\) is \(\phi \)-equalizing in the game \( \Gamma _{x_{0}}(u) \) if for every \(t \in {\mathbb {N}}\), for every history \(h \in {\mathcal {H}}^{t}\) of length t, and for every strategy \(\tau \in {\mathcal {S}}_2, \)
$$\begin{aligned} u\ge \limsup _{t \rightarrow \infty } {\underline{V}}^{t}-\phi (h), \quad {\mathbb {P}}_{h,\sigma ,\tau } \text{-almost } \text{ surely. } \end{aligned}$$
(4.2)
When \(\phi = 0\), we use the terms n-day maxmin and equalizing to mean n-day 0-maxmin and 0-equalizing, respectively.
The first definition is very intuitive. It says that player 1 should play in such a way that, on average, the lower value increases over time. The notion of 1-day maxmin strategies is particularly well-known in dynamic programming and stochastic games, see Puterman ([16]). A simple characterization of 1-day maxmin strategies is provided in following theorem.
Theorem 4.3
Consider a strategy \(\sigma \in {\mathcal {S}}_{1}\) in the game \(\Gamma _{x_{0}}(u)\). The following three conditions are equivalent:
1.
For each \(n \in {\mathbb {N}}\), \(\sigma \) is an n-day maxmin strategy.
 
2.
\(\sigma \) is a 1-day maxmin strategy.
 
3.
For each history \(h \in {\mathcal {H}}^{t}\) of length t and each strategy \(\tau \in {\mathcal {S}}_{2},\) the process \(({\underline{V}}^{t + n})_{n \in {\mathbb {N}}}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)-submartingale.
 
Proof
That condition 1 implies condition 2 is obvious. That condition 2 implies condition 3 follows by the law of iterated expectations. Finally, that condition 3 implies condition 1 follows from the properties of a submartingale. \(\square \)
Note that the equivalence in Theorem 4.3 is no longer true for nonzero tolerance functions. For example, if \(\phi >0\), then a 1-day \(\phi \)-maxmin strategy is generally not 2-day \(\phi \)-maxmin.
A strategy is \(\phi \)-equalizing if, roughly speaking, it almost surely results in a play with an eventually good enough payoff, where eventually good enough means being about as large as the lower value in very deep subgames.
The following example illustrates both the notion of a 1-day maxmin strategy and an equalizing strategy.
Example 4.4
Consider the centipede game shown in Fig.  2. At every history, the active player can choose to continue (c) or to quit (q). As soon as a player decides to quit the game ends and in that case the payoff is as given in Fig. 2. If the game continues indefinitely then the payoff is 0. It is easily verified that for every \( t \in {\mathbb {N}}, \) \( {\underline{v}}(c^{2t}) = {\underline{v}}(c^{2t+1}) = (t+1)/(t+2). \) A pure strategy \(\sigma \in {\mathcal {S}}_1\) is then easily seen to be a 1-day maxmin strategy if and only if \(\sigma (c^{2t})=c\) for every \(t\in {\mathbb {N}}\).
On the other hand, a pure strategy \(\sigma \in {\mathcal {S}}_{1}\) is equalizing if and only if for infinitely many \( t \in {\mathbb {N}} \) it holds that \(\sigma (c^{2t})=q.\) Indeed, for a strategy \( \sigma \) such that q is chosen for infinitely many \( t \in {\mathbb {N}} \) the generated play ends with q after each history \( h \in \mathcal{H} \) irrespective of \( \tau \in {{{\mathcal {S}}}}_{2}, \) which shows that \( \sigma \) is equalizing. On the other hand, if \( \sigma \) is such that q is chosen for finitely many \( t \in {\mathbb {N}}, \) then there is a history \( h \in {{{\mathcal {H}}}} \) such that player 1 always plays continue after this history. If \( \tau \in {{{\mathcal {S}}}}_{2} \) is such that player 2 always chooses continue, then the play \( c^{\infty } \) with \( u(c^{\infty }) = 0 \) results, which is strictly less than \( \limsup _{t\rightarrow \infty } {\underline{v}}(c^t)=1, \) so such a \( \sigma \) is not equalizing.
It follows that player 1 does not have a pure strategy which is both 1-day maxmin and equalizing. \(\square \)
The following theorem states sufficient conditions under which a strategy \( \sigma \) of player 1 is a subgame \(\phi \)-maxmin strategy.
Theorem 4.5
(Sufficient condition) Let \(\phi :{\mathcal {H}}\rightarrow [0,\infty )\) be a tolerance function. The strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi \)-maxmin strategy in the game \(\Gamma _{x_{0}}(u)\) if there exist tolerance functions \(\phi _1:{\mathcal {H}}\rightarrow [0,\infty )\) and \(\phi _2:{\mathcal {H}}\rightarrow [0,\infty )\) such that \( \phi _1+\phi _2 \le \phi \) and
1.
for every \(n\in {\mathbb {N}}\), \(\sigma \) is n-day \(\phi _1\)-maxmin,
 
2.
\(\sigma \) is \(\phi _2\)-equalizing.
 
Proof
Let \(\phi _1, \) \(\phi _2, \) and \( \sigma \) be such that the conditions in the theorem are satisfied. We show that \(\sigma \) is a subgame \(\phi \)-maxmin strategy.
Fix a history \(h\in {\mathcal {H}}^{t}\) and a strategy \(\tau \in {\mathcal {S}}_2\). Then, we have that
$$\begin{aligned} {\underline{v}}(h)-\phi (h)&\le {\underline{v}}(h)-\phi _1(h)-\phi _2(h)\\&\le \limsup _{n\rightarrow \infty }{\mathbb {E}}_{h,\sigma ,\tau }[{\underline{V}}^{t+n}] -\phi _2(h)\\&\le {\mathbb {E}}_{h,\sigma ,\tau }[\limsup _{n\rightarrow \infty } {\underline{V}}^{t+n}] -\phi _2(h)\\&\le {\mathbb {E}}_{h,\sigma ,\tau }[u], \end{aligned}$$
where the second inequality holds since \(\sigma \) is an n-day \(\phi _1\)-maxmin strategy, the third inequality is by Fatou lemma, and the last inequality holds since \(\sigma \) is \(\phi _2\)-equalizing. \(\square \)
According to Theorem 4.5, to conclude that \( \sigma \) is a subgame \( \phi \)-maxmin strategy, it is enough to find tolerance functions \( \phi _1 \) and \( \phi _2 \) such that at each history their sum does not exceed \(\phi \) and the strategy \( \sigma \) is both n-day \(\phi _1\)-maxmin and \(\phi _2\)-equalizing.
Particularly natural are situations where the tolerance level does not increase as time progresses. More formally, the tolerance function \(\phi \) is said to be non-increasing if \(\phi (h) \ge \phi (h')\) whenever \(h \prec h'\). The following result states necessary conditions for a strategy to be subgame \( \phi \)-maxmin.
Theorem 4.6
(Necessary condition) Let \(\sigma \in {{{\mathcal {S}}}}_{1} \) be a subgame \(\phi \)-maxmin strategy in the game \(\Gamma _{x_{0}}(u).\) Then, it holds that:
1.
For every \(n\in {\mathbb {N}}\), \(\sigma \) is an n-day \(\phi \)-maxmin strategy.
 
2.
If the tolerance function \(\phi \) is non-increasing, then \(\sigma \) is \(\phi \)-equalizing.
 
Proof
Let \(\sigma \in {{{\mathcal {S}}}}_{1} \) be a subgame \(\phi \)-maxmin strategy in the game \(\Gamma _{x_{0}}(u). \) Take a history \(h \in {\mathcal {H}}^{t}\) of length t and a strategy \( \tau \in \mathcal{S}_{2}. \)
We prove condition 1. We have
$$\begin{aligned}{\mathbb {E}}_{h,\sigma ,\tau }[{\underline{V}}^{t+n}] \ge {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t+n}] \ge {\mathbb {E}}_{h,\sigma ,\tau }[{\underline{U}}_{\sigma }^{t}] = {\underline{u}}(\sigma ,h) \ge {\underline{v}}(h)-\phi (h),\end{aligned}$$
where the first inequality holds since for each play \(p \in {\mathcal {P}}(h)\) we have \({\underline{V}}^{t+n}(p) = {\underline{v}}(p_{\vert t + n}) \ge {\underline{u}}(\sigma ,p_{\vert t + n}) = {\underline{U}}_{\sigma }^{t+n}(p)\). The second inequality holds by Lemma 3.3. The next equation holds since t is the length of history h, and the final inequality holds since \(\sigma \) is a subgame \(\phi \)-maxmin strategy.
We prove condition 2. We have, \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely,
$$\begin{aligned}u(p) = \lim _{t \rightarrow \infty }{\mathbb {E}}_{p_{\vert t},\sigma ,\tau }[u] \ge \limsup _{t \rightarrow \infty }({\underline{v}}(p_{\vert t})-\phi (p_{\vert t})) \ge \limsup _{t \rightarrow \infty }{\underline{V}}^{t}(p)-\phi (h),\end{aligned}$$
where the equality is by Levy’s zero-one law (Lemma A.1 in Appendix A), the first inequality follows since \( \sigma \) is a subgame \( \phi \)-maxmin strategy, and the second inequality holds since \(\phi \) is non-increasing. \(\square \)
Notice that in the case of a nonzero tolerance function, the necessary and sufficient conditions do not coincide, and we do not obtain a characterization of subgame \( \phi \)-maxmin strategies. We now turn to the case where the tolerance function \(\phi \) is identically equal to 0.
Corollary 4.7
A strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame maxmin strategy in the game \(\Gamma _{x_{0}}(u)\) if and only if it is 1-day maxmin and equalizing.
When we compare the sufficient conditions of Theorem 4.5 to the sufficient conditions of Corollary 4.7, we notice that in the case of a non-zero tolerance function we require the strategy to be n-day \(\phi \)-maxmin. In the case of a zero tolerance function, the corresponding sufficient conditions only require the strategy to be 1-day maxmin. The reason for this difference is that we should avoid that strategies accumulate the allowed tolerance levels, causing them to become too permissive over time. The following example illustrates this issue.
Example 4.8
Consider the decision problem depicted in Fig. 3. Each period the decision maker can choose to continue (c) or to quit (q). Notice that \( v(c^t)= 1/(t+1) \) and hence \(\displaystyle \lim _{t\rightarrow \infty } v(c^t)=0\). Any strategy of player 1 is therefore equalizing. Furthermore, it is clear that in this decision problem the subgame maxmin strategy is unique and requires the decision maker to quit at each time. Now suppose the decision maker has the following tolerance function:
$$\begin{aligned} \phi (c^t)=v(c^t)-v(c^{t+1})={\frac{1}{t+1}}-{\frac{1}{t+2}}, \quad t \in {\mathbb {N}}. \end{aligned}$$
Consider the strategy \(\sigma \) where the decision maker always chooses to continue. From the definition of the tolerance function, it follows that the strategy \(\sigma \) is a 1-day \(\phi \)-maxmin strategy. Indeed, it holds that \(v(c^{t+1})\ge v(c^t)-\phi (c^t)\). It is also easily seen that the strategy \( \sigma \) is equalizing. Nevertheless, it is clear that \( \sigma \) is not a subgame \(\phi \)-maxmin strategy.
The underlying problem with the strategy \( \sigma \) is that every time the decision maker chooses to continue this causes an acceptable loss in the value, but over time these losses add up to an unacceptable loss. The requirement that for every \(n\in {\mathbb {N}}\) a strategy is n-day \(\phi \)-maxmin guarantees that the accumulated losses over any finite period of time remain acceptable.
If we require a strategy to be subgame maxmin, then we do not tolerate any losses. Therefore, the accumulation problem mentioned above will never occur, and it will be sufficient to only require that the strategy is 1-day maxmin.\(\Diamond \)
Example 4.9 is such that player 1 has a maxmin strategy in every subgame, but has no subgame maxmin strategy. From Corollary 4.7, it follows that any subgame maxmin strategy needs to be both 1-day maxmin and equalizing. The example presents a game where all the strategies are 1-day maxmin but none of them is equalizing and therefore subgame maxmin strategies do not exist.
Example 4.9
Consider the following perfect information game. Both players have two actions, left (L) and right (R),  so \({\mathcal {A}}={\mathcal {B}}=\left\{ L, R\right\} \). The players take turns playing an action, which generates a play \(p\in \left\{ L, R\right\} ^{\mathbb {N}}\). Let \(r_1(p)\) and \(r_2(p)\) denote the number of times that player 1 and player 2, respectively, play action R during the play p and define \(r(p)=\min \lbrace r_1(p),r_2(p)\rbrace . \) Player 1 obtains a payoff of 0 if both players choose R infinitely often. When at least one of them chooses R only a finite number of times, then player 1 receives a payoff of \( r(p)/(r(p)+1). \) The payoff function u is therefore obtained by defining, for \( p \in \left\{ L, R\right\} ^{\mathbb {N}}, \)
$$\begin{aligned} u(p)= \left\{ \begin{array}{ll} \frac{r(p)}{r(p)+1}, &{} \text{ if } r_1(p) \ne \infty \text{ or } r_2(p) \ne \infty ,\\ 0, &{} \text{ if } r_1(p)=r_2(p)=\infty . \end{array} \right. \end{aligned}$$
(4.3)
At each history \(h\in {\mathcal {H}}\), the value of the game exists and is given by \(v(h)= r_2(h)/(r_2(h)+1)\), where \(r_2(h)\) denotes the number of times player 2 has chosen R in the history h. Indeed, player 1 can guarantee this payoff by choosing the action R \(\max \{r_2(h)-r_1(h),0\} \) times after history h. Player 2 can guarantee to lose at most this amount by playing only left after history h. Hence at every history \(h\in {\mathcal {H}}\), player 1 has a maxmin strategy.
For every history \(h\in {\mathcal {H}}\) where player 1 takes an action and for every action \(a\in {\mathcal {A}} \), we have that \(r_2(ha)=r_2(h)\) and \(v(ha)=v(h)\). Therefore, all strategies of player 1 are 1-day maxmin.
On the other hand, no equalizing strategy exists for player 1. To see this, take any strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) for player 1 and consider the strategy \(\tau \in {{{\mathcal {S}}}}_{2} \) in which player 2 always chooses R. Let p be the play generated by the strategy profile \((\sigma ,\tau )\). Then, we have that \(u(p)<1\) and \(\displaystyle \lim _{t\rightarrow \infty }v(p_{\vert t})=1\). It follows that \( \sigma \) is not equalizing. Using Corollary 4.7, we conclude that player 1 does not have a subgame maxmin strategy. \(\square \)

4.2 The case of an upper semi-continuous payoff function

The remainder of this section is devoted to the case where the payoff function is upper semi-continuous. We argue that in this case, any strategy of player 1 is equalizing. Because all upper semi-continuous functions are Borel measurable and because we assumed finite action sets and a countable state space, the value exists, see Maitra and Sudderth ([13]) and Martin ([14]), and the lower value equals the value.
The function u is upper semi-continuous at a play \(p \in {\mathcal {P}}\) if for every sequence \(\{p_{t}\}_{t \in {\mathbb {N}}}\) of plays converging to p it holds that
$$\begin{aligned}\limsup _{t \rightarrow \infty }u(p_{t}) \le u(p).\end{aligned}$$
Lemma 4.10
Let the payoff function u be upper semi-continuous at the play p. Then, we have that
$$\begin{aligned} u(p)\ge \limsup _{t\rightarrow \infty } {\underline{v}}(p_{\vert t}). \end{aligned}$$
(4.4)
Proof
Fix \(\epsilon >0\). For every \(t \in {\mathbb {N}}, \) define \( h_{t} = p_{\vert t} \) and let \( p_{t} \in {\mathcal {P}}(h_{t})\) be such that \(u(p_t) \ge {\underline{v}}(h_{t})-\epsilon \). Such a play \( p_{t} \) exists as player 1 can guarantee a payoff of at least \({\underline{v}}(h_{t})-\epsilon \) at history \(h_{t}\). Since the sequence \( \{p_{t}\}_{t \in {\mathbb {N}}}\) converges to p,  we have
$$\begin{aligned} u(p)\ge \limsup _{t \rightarrow \infty } u(p_t)\ge \limsup _{t \rightarrow \infty } {\underline{v}}(h_{t})-\epsilon . \end{aligned}$$
Since this holds for every \(\epsilon >0,\) the lemma follows. \(\square \)
In view of Lemma 4.10, we obtain the following corollary to Theorems 4.5, 4.6, and 4.7.
Corollary 4.11
Let \( \Gamma _{x_{0}}(u) \) be such that u is upper semi-continuous. Then, each strategy of player 1 is equalizing. The strategy \(\sigma \in {\mathcal {S}}_1\) is a subgame \(\phi \)-maxmin strategy if and only if for every \( n \in {\mathbb {N}} \) it is an n-day \(\phi \)-maxmin strategy. The strategy \(\sigma \in {{{\mathcal {S}}}}_{1}\) is a subgame maxmin strategy if and only if it is a 1-day maxmin strategy.
Example 4.12
(Staying in the set game) Let some subset \({\mathcal {X}}^{*}\) of \({\mathcal {X}}\) be given. For a play \(p = (x_{0},a_1,b_1,x_1,a_2,b_2\dots )\), we define u(p) to be 1 if \(x_t \in {\mathcal {X}}^{*}\) for every \(t \in {\mathbb {N}}\) and to be 0 otherwise. Maitra and Sudderth ([12]) refer to such a payoff function as “staying in a set” and in the computer science literature it goes under the name of “safety objective,” see Bruyaère ([4]). Since u is upper semi-continuous, any strategy \( \sigma \in {{{\mathcal {S}}}}_{1} \) is equalizing. \(\square \)
Example 4.13
Consider again the centipede game shown in Figure 2, but with one slight modification. If the game continues indefinitely, then player 1 receives a payoff of 2 instead of 0. The payoff function is now upper semi-continuous. As argued in Example 4.4, the strategy \( \sigma \) in which player 1 continues at each history is 1-day maxmin. Because of Corollary 4.11, we can conclude that the strategy \(\sigma \) is a subgame maxmin strategy. \(\square \)
Note that the limit average payoff is generally not upper semi-continuous. Hence, in games where the payoff function is given by the limit average payoff, there may exist strategies of the maximizing player which are not equalizing.

5 Existence of subgame \(\phi \)-maxmin strategies

The goal of this section is to give sufficient conditions for the existence of a subgame \(\phi \)-maxmin strategy if the tolerance function \( \phi \) is positive. We will prove the following theorem.
Theorem
Let a game \( \Gamma _{x_{0}}(u) \) and a tolerance function \(\phi > 0\) be given such that for every \(p\in {\mathcal {P}}\) at least one of the following two conditions holds:
1.
(point of upper semicontinuity) The function u is upper semi-continuous at p.
 
2.
(positive limit inferior) \(\liminf _{t\rightarrow \infty } \phi (p_{\vert t}) > 0\).
 
Then, there exists a subgame \(\phi \)-maxmin strategy in the game \(\Gamma _{x_{0}}(u).\)
Two special cases of the above theorem are worth noting. One is when the payoff function is everywhere upper semicontinuous. In this special case, the existence of a \(\phi \)-maxmin strategy (in fact, of a maxmin strategy) follows by a result of Laraki, Maitra, and Sudderth ([10]). We note that the latter result is not entirely subsumed by the theorem above, since the authors allow for a general Borel state space. Another special case of interest is when along each play, the tolerance remains bounded away from zero (i.e., if each play satisfies condition 2 of the theorem). This special case, to the best of our knowledge, has not been explicitly considered in the literature. However, it does encompass the familiar situation of a constant tolerance function: \(\phi (h) \equiv \epsilon > 0\). For such tolerance functions, \(\phi \)-maxmin strategies are exactly subgame \(\epsilon \)-optimal strategies, and existence follows by Proposition 11 in Mashiah–Yaakovi ([15]). Note that the case when all plays satisfy condition 2 of the theorem is more general than the case of a constant tolerance function, since no uniform lower bound on the tolerance is imposed. A novel feature of the above theorem is the way it combines these various special cases. We note that the theorem remains original even if we restrict our attention to Borel measurable payoff functions.
The main technique used in the proof of this theorem are switching strategies. The idea of switching between progressively “more optimal” strategies is certainly not new. A similar construction is used for example in Rosenberg, Solan, and Vieille ([18]), Solan and Vieille ([20]), and Mashiah–Yaakovi ([15]). However, the method needs to be fine-tuned, in order for it to accommodate both the plays where the tolerance remains bounded away from zero (condition 2), as well as those where the payoff function is upper semicontinuous (condition 1). It is not straightforward to use this proof technique when we have a general Borel measurable state space as the selection of switching strategies would need to satisfy measurability conditions. It is therefore an interesting open question whether the result can be extended to that case.
The switching strategy \( \sigma ^{\phi } \) is formally defined in Sect. 5.1. To analyze its properties, we define in Sect. 5.2, for every \( k \in {\mathbb {N}}, \) a strategy \( \sigma ^{k} \) that coincides with \( \sigma ^{\phi } \) as long as at most k switches have been made and it stops switching thereafter. We also present three technical lemmas for finite switching strategies in this subsection, the most important one showing that the guaranteed expected payoff of a strategy \( \sigma ^{k} \) is increasing in k. Sect. 5.3 uses the lemmas of Sect. 5.2 to show the desired existence result in Theorem 5.7.

5.1 Definition of the Switching Strategy \(\sigma ^\phi \)

Fix a tolerance function \(\phi >0\). For every history \(h\in {\mathcal {H}}, \) player 1 has a \( (\phi (h)/2)\)-maxmin strategy for the subgame at history h,  denoted by \(\sigma ^{h}\). The function \( \psi : {{{\mathcal {H}}}} \rightarrow {{{\mathcal {H}}}} \) maps histories into histories and is such that player 1 is going to use strategy \( \sigma ^{\psi (h)} \) at subgame \( h \in {{{\mathcal {H}}}}. \) The function \( \psi \) is used to describe when player 1 switches strategies and is recursively defined by setting \( \psi (x_{0})=x_{0}\) and, for every \(h\in {\mathcal {H}},\) for every \(z\in {\mathcal {Z}},\)
$$\begin{aligned} \psi (hz)= \left\{ \begin{array}{ll} \psi (h), &{} \text{ if } {\underline{u}}(\sigma ^{\psi (h)},hz) \ge {\underline{v}}(hz)-\phi (hz),\\ hz, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$
The condition \({\underline{u}}(\sigma ^{\psi (h)},hz) \ge {\underline{v}}(hz)-\phi (hz)\) verifies whether the strategy to which player 1 switched last, \(\sigma ^{\psi (h)}\), is a \(\phi (hz)\)-maxmin strategy for the subgame at history hz. If this is the case, then there is no need to switch and \( \psi (hz) = \psi (h). \) Otherwise, player 1 switches to \( \sigma ^{hz}, \) which is achieved by setting \( \psi (hz) = hz. \) Formally, we define the switching strategy \(\sigma ^\phi :{\mathcal {H}}\rightarrow \Delta ({\mathcal {A}})\) by
$$\begin{aligned} \sigma ^\phi (h)=\sigma ^{\psi (h)}(h), \quad h \in {{{\mathcal {H}}}}. \end{aligned}$$
(5.1)
The following example illustrates the switching strategy \( \sigma ^{\phi } \) and shows that it may not be subgame \(\phi \)-maxmin.
Example 5.1
Consider again the centipede game depicted in Fig. 2. We recall that, for every \( t \in {\mathbb {N}}, \) \( {\underline{v}}(c^{2t}) = {\underline{v}}(c^{2t+1}) = (t+1)/(t+2). \) Take a tolerance function \( \phi \) with the property that, for every \( t \in {\mathbb {N}}, \) \(\phi (c^{2t})< 1/((t+1)(t+2))\).
Let \( h \in {{{\mathcal {H}}}} \) be the active history and let \( k \in {\mathbb {N}} \) be such that \( h = c^{2k} \) or \( h = c^{2k+1}. \) The strategy \( \sigma ^{h} \) in which player 1 chooses continue at periods \(0,2,\ldots ,2k\) and chooses quit at every later period, i.e.,
$$\begin{aligned} \sigma ^{h}(c^{2t})= \left\{ \begin{array}{ll} c, &{} \text{ if } 2t\le 2k, \\ q, &{} \text{ otherwise }, \end{array} \right. \end{aligned}$$
is a maxmin strategy at subgame h.
We now consider the switching strategy \( \sigma ^{\phi }. \) We show by induction that for every \( h \in H, \) for every \( z \in Z, \) it holds that \( \psi (hz) = hz \) if hz is an active history of player 1 and \( \psi (hz) = h \) if hz is an active history of player 2. The statement trivially holds for the initial history. Let h be an active history of player 2 and let \( t \in {\mathbb {N}} \) be such that \( h = c^{2t+1}. \) Since \( \sigma ^{h} = \sigma ^{c^{2t}} \) is a maxmin strategy at subgame h,  it holds that \( \psi (c^{2t+1}) = c^{2t}. \) Let h be an active history of player 1 and let \( t \in {\mathbb {N}} \setminus \{0\} \) be such that \( h = c^{2t}. \) We have that
$$\begin{aligned} {\underline{u}}(\sigma ^{\psi (c^{2t-1})},c^{2t}) = {\underline{u}}(\sigma ^{c^{2t-2}},c^{2t}) = \tfrac{t}{t+1} = {\underline{v}}(c^{2t}) - \tfrac{1}{(t+1)(t+2)} < {\underline{v}}(c^{2t}) - \phi (c^{2t}), \end{aligned}$$
so \( \psi (c^{2t}) = c^{2t}. \) Since the tolerance function \(\phi \) is so stringent, it forces player 1 to switch at each of his active histories. For every \( t \in {\mathbb {N}}, \) it holds that \(\sigma ^{\phi }(c^{2t})=\sigma ^{c^{2t}}(c^{2t})=c, \) so under \( \sigma ^{\phi } \) player 1 chooses c at each of his active histories. The strategy \( \sigma ^{\phi } \) is not a subgame \(\phi \)-maxmin strategy as it fails to be \(\phi \)-equalizing, see Example 4.4. \(\square \)

5.2 Definition and Analysis of Finite Switching Strategies

Given the switching strategy \( \sigma ^{\phi }, \) for every \(k\in {\mathbb {N}} \), we define the strategy \(\sigma ^k:{\mathcal {H}}\rightarrow \Delta ({\mathcal {A}})\) such that it coincides with \(\sigma ^{\phi }\) as long as at most k switches have been made and it stops switching thereafter. Formally, we recursively define the function \(\kappa : {{{\mathcal {H}}}} \rightarrow {\mathbb {N}} \) which counts the number of switches along a history h by setting \(\kappa (x_{0}) = 0\) and, for all histories \(h,hz\in {\mathcal {H}}\),
$$\begin{aligned} \kappa (hz)= \left\{ \begin{array}{ll} \kappa (h), &{} \text{ if } {\underline{u}}(\sigma ^{\psi (h)},hz) \ge {\underline{v}}(hz)-\phi (hz),\\ \kappa (h)+1, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$
For every \(k\in {\mathbb {N}}, \) we define the stopping time \(T_k:{\mathcal {P}}\rightarrow {\mathbb {N}}\cup \lbrace \infty \rbrace \) by
$$\begin{aligned} T_k(p) = \inf \lbrace t \in {\mathbb {N}}\vert \kappa (p_{\vert t})=k \rbrace , \quad p \in {{{\mathcal {P}}}}. \end{aligned}$$
(5.2)
The stopping time \( T_{k} \) indicates the time at which switch k occurred. Since, for \(t<\infty ,\) the expression \(T_k(p)\le t\) only depends on the history up to period t,  it holds that the set \(\lbrace p\in {\mathcal {P}}\vert T_k(p)\le t \rbrace \) is \({\mathcal {F}}^{t}\)-measurable and \(T_k\) is a stopping time indeed.
We now formally define \(\sigma ^k:{\mathcal {H}}\rightarrow \Delta ({\mathcal {A}}). \) Take any \(p\in {\mathcal {P}}(h)\) and let
$$\begin{aligned} \sigma ^k(h)=\left\{ \begin{array}{ll} \sigma ^{\phi }(h), &{} \text{ if } \kappa (h)\le k, \\ \sigma ^{h_{\vert T_k(p)}}(h), &{}\text{ otherwise. } \end{array} \right. \end{aligned}$$
(5.3)
If \( \kappa (h) > k, \) then the time at which switch k has occurred is the same for every \(p\in {\mathcal {P}}(h), \) so it holds that \( \sigma ^{k} \) is well-defined.
For every \(k\in {\mathbb {N}}, \) let \({\mathcal {R}}_k\subseteq {\mathcal {P}}\) be the set of plays along which at least k switches occur,
$$\begin{aligned} {\mathcal {R}}_k=\lbrace p\in {\mathcal {P}}\vert T_k(p)<\infty \rbrace , \end{aligned}$$
(5.4)
so \({\mathcal {R}}_1\supseteq {\mathcal {R}}_2 \supseteq {\mathcal {R}}_3 \supseteq \cdots \). Furthermore, let \({\mathcal {R}}_{\infty } \subseteq {\mathcal {P}} \) denote the set of plays along which infinitely many switches occur,
$$\begin{aligned} {\mathcal {R}}_\infty = \displaystyle \cap _{k=1}^{\infty } {\mathcal {R}}_k. \end{aligned}$$
(5.5)
The remainder of the section is devoted to the proof of three technical lemmas concerning the finite switching strategies. These lemmas are needed in the analysis of \( \sigma ^{\phi }. \)
Consider the strategies \(\sigma ^{k}, \sigma ^{k+1},\dots \). All these strategies agree with \(\sigma ^{\phi }\) for as long as \(\sigma ^{\phi }\) does not require more than k switches. Consequently, the measures that these strategies induce on \({\mathcal {P}}\) assign the same probability to any event that is “determined” before switch \( k+1 \) occurs, i.e., to any event in the sigma-algebra \({\mathcal {F}}^{T_{k+1}}\).
Lemma 5.2
Let a strategy \(\tau \in {{{\mathcal {S}}}}_{2}, \) a history \(h \in {\mathcal {H}}, \) and some \(k \in {\mathbb {N}}\) be given. For \(\sigma = \sigma ^{k}, \sigma ^{k+1},\dots ,\sigma ^{\phi }, \) the probability measures \({\mathbb {P}}_{h,\sigma ,\tau }\) all coincide on the sigma-algebra \({\mathcal {F}}^{T_{k+1}}\). Furthermore, these probability measures all agree on each universally measurable subset of \({\mathcal {P}} \setminus {\mathcal {R}}_{k + 1}\).
Proof
A set A of the universally measurable sigma-algebra \( {\mathcal {F}}\) is called agreeable if for \(\sigma = \sigma ^{k}, \sigma ^{k+1},\dots ,\sigma ^{\phi }\) the measures \({\mathbb {P}}_{h,\sigma ,\tau }\) all assign the same probability to A.
We argue first that each cylinder set in \({\mathcal {F}}^{T_{k+1}}\) is agreeable. A cylinder set \({\mathcal {P}}(h)\) is a member of \({\mathcal {F}}^{T_{k+1}}\) if and only if \(\kappa (h) \le k+1\). Let a cylinder set \({\mathcal {P}}(h')\) in \({\mathcal {F}}^{T_{k+1}}\) be given. Since \(\kappa (h') \le k + 1\), we know that \(\kappa (h'') \le k\) for each history \(h''\) preceding \(h'\). It follows that \(\sigma ^{k}(h'') = \sigma ^{k + 1}(h'') = \cdots = \sigma ^{\phi }(h'')\). Since this applies to each history \(h''\) that precedes \(h^{\prime }\), the set \({\mathcal {P}}(h')\) is agreeable.
Now take any \(E \in {\mathcal {F}}^{T_{k+1}}\). For \(t \in {\mathbb {N}},\) let \(E_{t} = E \cap \{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = t\}\) and let \( E_{\infty } = E \cap \{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = \infty \}\). To show that E is agreeable, it suffices to show that the sets \(E_{t}\) and \(E_{\infty }\) are.
Let some \( t \in {\mathbb {N}} \) be given. We know that \(E_{t}\) is a member of \({\mathcal {F}}^{t}\). Consequently, \(E_{t}\) can be written as a disjoint union of cylinder sets in \({\mathcal {F}}^{t}, \) say \(E_{t} = \cup \{C_{n} \mid n \in {\mathbb {N}}\}\), with each \(C_{n}\) a member of \({\mathcal {F}}^{t}\). Since each \(C_{n}\) is a subset of the set \(\{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p)=t\}\), it is a member of \({\mathcal {F}}^{T_{k+1}}, \) so is agreeable by the result of the second paragraph in the proof. It now follows that \(E_{t}\) is agreeable.
To show that \(E_{\infty }\) is agreeable, we make use of the fact that \( E_{\infty } \) is a Borel set and of the regularity of \( \sigma \) on the Borel sigma-algebra. Let O be any open subset of \({\mathcal {P}}\) containing \(E_{\infty }\). The set O can be written as a disjoint union of cylinder sets, say \(O = \cup \{{\mathcal {P}}(h_{n}) \mid n \in {\mathbb {N}}\}\). Without loss of generality it holds that for every \(n \in {\mathbb {N}}\) the set \({\mathcal {P}}(h_{n})\) has a point in common with \(E_{\infty }\). Thus in particular, there is \(p \in {\mathcal {P}}(h_{n})\) with \(T_{k+1}(p) = \infty \). This implies that \(\kappa (h_{n}) \le k\) and hence that \({\mathcal {P}}(h_{n})\) is a member of \({\mathcal {F}}^{T_{k+1}}\). We conclude that each \({\mathcal {P}}(h_{n})\) is agreeable by the result of the second paragraph in the proof. It follows that O is an agreeable set.
To prove the second claim, we notice that all Borel subsets of \({\mathcal {P}} \setminus {\mathcal {R}}_{k + 1} = \{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = \infty \}\) are members of \({\mathcal {F}}^{T_{k+1}}\), so are agreeable. The result for universally measurable subsets of \(\{p \in {{{\mathcal {P}}}} \mid T_{k+1}(p) = \infty \}\) follows since each such set differs from a Borel set by a negligible set. \(\square \)
Let T be a stopping time. We define the stochastic variable \({\underline{U}}_{\sigma }^{T}\) by letting it agree with \({\underline{U}}_{\sigma }^{t}\) on the set \(\{p \in {\mathcal {P}}:T(p) = t\}\) for each \(t \in {\mathbb {N}}\) and by letting it agree with \({\underline{U}}_{\sigma }^{\infty }\) on the set \(\{p \in {\mathcal {P}}:T(p) = \infty \}, \) where \( {\underline{U}}_{\sigma }^{\infty } \) is some suitably chosen \({\mathcal {F}}^{\infty }\)-measurable stochastic variable. The next lemma relates the guaranteed expected payoffs of strategies \( \sigma ^{k} \) and \( \sigma ^{k+1} \) at the moment of switch \(k+1\). We write \(\Phi ^t\) to denote the \({\mathcal {F}}^{t}\)-measurable stochastic variable given by \(\Phi ^t(p)=\phi (p_{\vert t})\).
Lemma 5.3
Let \(k \in {\mathbb {N}}\) and \({\mathcal {F}}^{\infty }\)-measurable stochastic variables \({\underline{U}}_{\sigma ^{k}}^{\infty }, {\underline{U}}_{\sigma ^{k+1}}^{\infty }\) such that \({\underline{U}}_{\sigma ^{k}}^{\infty } = {\underline{U}}_{\sigma ^{k+1}}^{\infty }\) be given. Then, it holds that
$$\begin{aligned} {\underline{U}}^{T_{k+1}}_{\sigma ^{k}} \le {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}} - \tfrac{1}{2} \Phi ^{T_{k+1}} \cdot I(T_{k+1} < \infty ). \end{aligned}$$
(5.6)
Proof
Let some \(p\in {\mathcal {P}}\) be given. We distinguish the following two cases.
Case 1: \(T_{k+1}(p)<\infty \). In this case at least \(k+1\) switches occur along the play p. For \(h = p_{\vert T_{k+1}(p)}\), we have the following inequalities
$$\begin{aligned}{\underline{u}}(\sigma ^k, h) < {\underline{v}}(h) - \phi (h) = {\underline{v}}(h) - \tfrac{1}{2}\phi (h) - \tfrac{1}{2}\phi (h) \le {\underline{u}}(\sigma ^{k+1}, h) - \tfrac{1}{2}\phi (h),\end{aligned}$$
where the first inequality holds since \(\sigma ^k\) is not a \(\phi (h)\)-maxmin strategy for the subgame at history h and the second inequality holds because the strategy \(\sigma ^{k+1}\) is a \( (\phi (h)/2)\)-maxmin strategy for the subgame at history h. Since \(I(T_{k+1}(p) < \infty ) = 1\), (5.6) holds.
Case 2: \(T_{k+1}(p)=\infty \). In this case, we have
$$\begin{aligned}{\underline{U}}^{T_{k+1}}_{\sigma ^{k}}(p) = {\underline{U}}_{\sigma ^{k}}^{\infty }(p) = {\underline{U}}_{\sigma ^{k+1}}^{\infty }(p) = {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}}(p).\end{aligned}$$
Thus, (5.6) holds as an equality. \(\square \)
The following lemma states the intuitive property that for histories with less than \( k+1 \) switches or histories at which switch \( k+1 \) occurs, strategy \( \sigma ^{k+1} \) guarantees at least the same payoff to player 1 than strategy \( \sigma ^{k}. \)
Lemma 5.4
Let \( t \in {\mathbb {N}} \) and a history \( h \in {{{\mathcal {H}}}}^{t} \) of length t be given. Let \( k \in {\mathbb {N}} \) be such that \( T_{k+1}(p) \ge t \) for every \( p \in {{{\mathcal {P}}}}(h). \) Then, it holds that \({\underline{u}}(\sigma ^{k},h) \le {\underline{u}}(\sigma ^{k + 1},h)\).
Proof
Fix strategy \( \tau \in {\mathcal {S}}_2 \) of player 2.
We first define \({\underline{U}}_{\sigma ^{k}}^{\infty }\). Consider the probability measure \({\mathbb {Q}}\) on the measurable space \(({\mathcal {P}},{\mathcal {F}})\) given by \({\mathbb {Q}}(A) = \sum _{k \in {\mathbb {N}}} 2^{-k-1} {\mathbb {P}}_{h,\sigma ^{k},\tau }(A)\) for each \(A \in {\mathcal {F}}\). Let \({\bar{u}}\) be an \({\mathcal {F}}^{\infty }\)-measurable stochastic variable with the property that \({\bar{u}} = u\), \({\mathbb {Q}}\)-almost surely. Since \({\mathbb {P}}_{h,\sigma ^{k},\tau }\) is absolutely continuous with respect to \({\mathbb {Q}}\), it holds that \( {\bar{u}} = u\), \({\mathbb {P}}_{h,\sigma ^{k},\tau }\)-almost surely, for every \(k \in {\mathbb {N}}\). We define \( {\underline{U}}_{\sigma ^{k}}^{\infty }\) to be equal to \({\bar{u}}\), for every \(k \in {\mathbb {N}}\).
We now obtain the following inequalities. First, by Lemma 3.3 and the optional sampling theorem with unbounded stopping times as presented in Theorem 8.16 of Yeh ([22], p. 139), we have2
$$\begin{aligned}{\underline{u}}(\sigma ^{k},h) \le {\mathbb {E}}_{h,\sigma ^k,\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}].\end{aligned}$$
Secondly, from the fact that \({\underline{U}}^{T_{k+1}}_{\sigma ^k}\) is an \({\mathcal {F}}^{T_{k+1}}\)-measurable stochastic variable, we obtain by Lemma 5.2 that
$$\begin{aligned}{\mathbb {E}}_{h,\sigma ^k,\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}] = {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}].\end{aligned}$$
Thirdly, we know that
$$\begin{aligned} {\underline{U}}^{T_{k+1}}_{\sigma ^k} \le {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}} \le {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[u \vert {\mathcal {F}}^{T_{k+1}}], \quad {\mathbb {P}}_{h,\sigma ^{k+1},\tau } \text{-almost } \text{ surely }, \end{aligned}$$
where the first of these inequalities follows from Lemma 5.3 and the second one follows by Lemma 3.3 and the optional sampling theorem with unbounded stopping times as presented in Theorem 8.16 of Yeh ([22], p. 139). Taking the expectation of the last array of inequalities with respect to \({\mathbb {P}}_{h,\sigma ^{k+1},\tau }\) and making use of the law of iterated expectation yields
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}] \le {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[u]. \end{aligned}$$
Combining these facts yields \({\underline{u}}(\sigma ^{k},h) \le {\mathbb {E}}_{h,\sigma ^{k+1},\tau }[u]\). Taking the infimum over all strategies \(\tau \) of player 2 completes the proof. \(\square \)

5.3 Properties of the Switching Strategy \(\sigma ^\phi \)

We show first that the switching strategy \( \sigma ^{\phi } \) is always n-day \(\phi \)-maxmin for every \(n \in {\mathbb {N}}\).
Lemma 5.5
Let a game \( \Gamma _{x_{0}}(u) \) and a tolerance function \( \phi > 0 \) be given. For every \( n \in {\mathbb {N}}, \) the switching strategy \(\sigma ^\phi \) is n-day \(\phi \)-maxmin.
Proof
Let a history \( h \in {{{\mathcal {H}}}} \) with length t be given. Let \( k = \kappa (h)\) denote the number of switches that have occurred along the history h. For every \( \tau \in {{{\mathcal {S}}}}_{2}, \) we obtain the chain of inequalities
$$\begin{aligned} {\underline{v}}(h) - \phi (h) \le {\underline{u}}(\sigma ^{k},h) \le {\underline{u}}(\sigma ^{t+n},h) \le {\mathbb {E}}_{h,\sigma ^{t+n},\tau }[{\underline{U}}_{\sigma ^{t+n}}^{t+n}] \le&{\mathbb {E}}_{h,\sigma ^{t+n},\tau }[{\underline{V}}^{t+n}]\\ =&{\mathbb {E}}_{h,\sigma ^{\phi },\tau }[{\underline{V}}^{t+n}], \end{aligned}$$
where the first inequality holds since \(\sigma ^{k}\) is a \(\phi (h)\)-maxmin strategy for the subgame at history h, the second inequality holds by Lemma 5.4 since \( k \le t \le t+n\), the third inequality holds by Lemma 3.3, and the fourth one follows since \({\underline{U}}_{\sigma ^{t+n}}^{t+n} \le {\underline{V}}^{t+n}. \) The final equality follows from the fact that the stochastic variable \({\underline{V}}^{t+n}\) is \({\mathcal {F}}^{t+n}\)-measurable and the fact that the strategy \(\sigma ^{t+n}\) coincides with \(\sigma ^{\phi }\) at least until time \(t+n\). \(\square \)
The next lemma says that along almost any play \(p \in {\mathcal {P}}\) only finitely many switches occur or the tolerance level goes to zero. This lemma is crucial to show the equalizing property of switching strategies.
Lemma 5.6
For every history \(h\in {\mathcal {H}},\) for every strategy \( \tau \in {{{\mathcal {S}}}}_{2} \) of player 2, it holds that
$$\begin{aligned} \lim _{k \rightarrow \infty } \Phi ^{T_k} \cdot I(T_k<\infty ) = 0, \quad {\mathbb {P}}_{h,\sigma ^{\phi },\tau }\text {-almost surely.} \end{aligned}$$
Proof
Although it is possible that player 1 switches infinitely many times along a play p, and therefore incurs infinitely many increases in his guarantee level along this play, we show first that the total overall increase in his guarantee level is bounded, since the payoff function itself is a bounded function. More formally, let \( t \in {\mathbb {N}}, \) a history \( h \in {\mathcal {H}}^{t} \) of length t,  and a strategy \( \tau \in {\mathcal {S}}_2 \) of player 2 be given. Let \( \kappa (h)\) denote the number of switches that have occurred along the history h. We show that
$$\begin{aligned} \sum _{k=\kappa (h)+1}^{\infty } {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ \tfrac{1}{2}\Phi ^{T_k} \cdot I\left( T_k<\infty \right) \right] \le 2M, \end{aligned}$$
(5.7)
where we recall that \( M = \sup _{p\in {\mathcal {P}}}\vert u(p)\vert . \)
As in the proof of Lemma 5.4, let \({\bar{u}}\) be any \({\mathcal {F}}^{\infty }\)-measurable stochastic variable with the property that, for every \( k \in {\mathbb {N}}, \) \({\bar{u}} = u,\) \( {\mathbb {P}}_{h,\sigma ^{k},\tau }\)-almost surely. For every \(k \in {\mathbb {N}}\), we define \({\underline{U}}_{\sigma ^{k}}^{\infty } = {\bar{u}}\).
Let some \(k > \kappa (h) \) be given. For every \( p \in {{{\mathcal {P}}}}(h), \) it holds that \( t \le T_{k}(p) \le T_{k+1}(p), \) so Lemma 3.3 and the optional sampling theorem with unbounded stopping times as presented in Theorem 8.16 of Yeh ([22], p. 139) implies
$$\begin{aligned} {\underline{U}}^{T_k}_{\sigma ^k} \le {\mathbb {E}}_{h,\sigma ^k,\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k} \vert {\mathcal {F}}^{T_k}], \quad {\mathbb {P}}_{h,\sigma ^k,\tau }\text {-almost surely}. \end{aligned}$$
(5.8)
We now have
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{\phi },\tau }[{\underline{U}}^{T_k}_{\sigma ^k}] = {\mathbb {E}}_{h,\sigma ^{k},\tau }[{\underline{U}}^{T_k}_{\sigma ^k}] \le {\mathbb {E}}_{h,\sigma ^{k},\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}] = {\mathbb {E}}_{h,\sigma ^{\phi },\tau }[{\underline{U}}^{T_{k+1}}_{\sigma ^k}], \end{aligned}$$
where the two equalities follow from Lemma 5.2 and the fact that both \({\underline{U}}^{T_k}_{\sigma ^k}\) and \({\underline{U}}^{T_{k+1}}_{\sigma ^k}\) are \({\mathcal {F}}^{T_{k+1}}\)-measurable stochastic variables, and the inequality follows by taking the expectation on both sides of inequality (5.8) and the law of iterated expectation.
Using Lemma 5.3, we can conclude that
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ \tfrac{1}{2}\Phi ^{T_{k+1}} \cdot I(T_{k+1}<\infty ) \right]&\le {\mathbb {E}}_{h,\sigma ^\phi ,\tau } \left[ {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}} \right] - {\mathbb {E}}_{h,\sigma ^\phi ,\tau } \left[ {\underline{U}}^{T_{k+1}}_{\sigma ^{k}} \right] \\&\le {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_{k+1}}_{\sigma ^{k+1}}\right] - {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_k}_{\sigma ^k}\right] . \end{aligned}$$
Summing the preceding inequality over \( k = \kappa (h)+1, \dots , K, \) we obtain
$$\begin{aligned} \sum _{k=\kappa (h)+1}^{K} {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ \tfrac{1}{2}\Phi ^{T_k} \cdot I(T_{k}<\infty ) \right] \le {\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_{K}}_{\sigma ^{K}} \right] -{\mathbb {E}}_{h,\sigma ^{\phi },\tau }\left[ {\underline{U}}^{T_{\kappa (h)+1}}_{\sigma ^{\kappa (h)+1}} \right] \le 2M. \end{aligned}$$
The result follows by taking the limit as \(K \rightarrow \infty \).
Let us write \(X_{k} = \Phi ^{T_k} \cdot I(T_k<\infty )\) and \( k^{\prime } = \kappa (h) + 1\). Since \(X_{k} \ge 0\), the monotone convergence theorem implies that \({\mathbb {E}}_{h,\sigma ^{\phi },\tau } [\sum _{k = k^{\prime }}^{\infty }X_{k}] = \sum _{k = k^{\prime }}^{\infty }{\mathbb {E}}_{h,\sigma ^{\phi },\tau } [X_{k}]\). The latter expression is finite by (5.7). Thus \(\sum _{k = k^{\prime }}^{\infty }X_{k}\) has a finite mean with respect to the probability measure \({\mathbb {P}}_{h,\sigma ^{\phi },\tau }\). Hence \(\sum _{k = k^{\prime }}^{\infty }X_{k} < \infty \) holds \({\mathbb {P}}_{h,\sigma ^{\phi },\tau }\)-almost surely. This implies that \(X_{k} \rightarrow 0\) holds \({\mathbb {P}}_{h,\sigma ^{\phi },\tau }\)-almost surely. \(\square \)
We are now in a position to state the main result of this section, which gives sufficient conditions for the existence of subgame \( \phi \)-maxmin strategies.
Theorem 5.7
Let a game \( \Gamma _{x_{0}}(u) \) and a tolerance function \(\phi > 0\) be given such that for every \(p\in {\mathcal {P}}\) at least one of the following two conditions holds:
1.
(point of upper semicontinuity) The function u is upper semi-continuous at p.
 
2.
(positive limit inferior) \(\liminf _{t\rightarrow \infty } \phi (p_{\vert t}) > 0\).
 
Then, there exists a subgame \(\phi \)-maxmin strategy in the game \(\Gamma _{x_{0}}(u).\)
Proof
We define the tolerance function \(\phi ^{\prime }\) by
$$\begin{aligned} \phi ^{\prime }(h) = \tfrac{1}{2}\min \lbrace \phi (h')\vert h'\preceq h\rbrace , \quad h \in {\mathcal {H}}. \end{aligned}$$
Thus, \(\phi ^{\prime }\) is a non-increasing tolerance function with \(\phi ^{\prime } \le \tfrac{1}{2} \phi \).
We show that \(\sigma ^{\phi ^{\prime }} \) is \(\phi ^{\prime }\)-equalizing. Since \(2\phi ^{\prime } \le \phi \) and \(\sigma ^{\phi ^{\prime }}\) is an n-day \(\phi ^{\prime }\)-maxmin strategy for every \(n \in {\mathbb {N}}\) by Lemma 5.5, it then follows from Theorem 4.5 that \(\sigma ^{\phi ^{\prime }}\) is a subgame \(\phi \)-maxmin strategy.
Let \({\mathcal {U}}\) denote the set of plays \(p \in {\mathcal {P}}\) at which u is upper semi-continuous and let \({\mathcal {I}}\) denote the set of plays \(p \in {\mathcal {P}}\) such that \(\liminf _{t\rightarrow \infty } \phi (p_{\vert t}) > 0\). By the assumption of the theorem it holds that \({\mathcal {P}} = {\mathcal {U}} \cup {\mathcal {I}}\). By the definition of \(\phi ^{\prime }\), we have \(\liminf _{t\rightarrow \infty } \phi ^{\prime }(p_{\vert t}) > 0\) for each \(p \in {\mathcal {I}}\).
Let some \(t \in {\mathbb {N}}\), a history \(h \in {\mathcal {H}}^{t}\), and a strategy \(\tau \in {\mathcal {S}}_{2}\) be given.
Step 1 For every \( p \in {{{\mathcal {U}}}}, \) \( u(p) \ge \limsup _{n\rightarrow \infty } {\underline{V}}^n(p) - \phi ^{\prime }. \)
This follows directly from Lemma 4.10.
For every \( k \in {\mathbb {N}}, \) we define \( {\mathcal {J}}_{k} = \{p \in {\mathcal {I}} \cap ({\mathcal {R}}_{k} \setminus {\mathcal {R}}_{k + 1}) \mid \lim _{n \rightarrow \infty } U_{\sigma ^{k},\tau }^{n}(p) = u(p)\} \) and \( {\mathcal {J}} = \bigcup _{k \in {\mathbb {N}}}{\mathcal {J}}_{k}, \) where we remind the reader that \( {{{\mathcal {R}}}}_{k} \) contains the plays along which at least k switches occur.
Step 2 For every \( p \in {{{\mathcal {J}}}}, \) \( u(p) \ge \limsup _{n\rightarrow \infty } {\underline{V}}^n(p) - \phi ^{\prime }. \)
Let \(k \in {\mathbb {N}}\) and \(p \in {\mathcal {J}}_{k}\) be given. Exactly k switches occur along the play p and the last switch occurs at time \(T_{k}(p)\). By our construction of \(\sigma ^{\phi ^{\prime }}\), this means that for each time \(n > T_{k}(p)\), the strategy \(\sigma ^k\) is a \(\phi ^{\prime }(p_{|n})\)-maxmin strategy for the subgame at history \(p_{|n}\), so \({\underline{u}}(\sigma ^{k},p_{|n}) \ge {\underline{v}}(p_{|n})-\phi ^{\prime }(p_{|n})\). Since \(U_{\sigma ^{k},\tau }^{n}(p) \ge {\underline{u}}(\sigma ^{k},p_{|n})\) and since for \(n \ge t\) we have \(\phi ^{\prime }(p_{|n}) \le \phi ^{\prime }(h)\), we conclude that for \(n \ge t\)
$$\begin{aligned} U_{\sigma ^{k},\tau }^{n}(p)\ge {\underline{V}}^{n}(p) -\phi ^{\prime }(h). \end{aligned}$$
Taking the limit as n goes to infinity, and making use of the fact that \(p \in {\mathcal {J}}_{k}\), we obtain
$$\begin{aligned} u(p) = \lim _{n\rightarrow \infty } U_{\sigma ^{k},\tau }^n(p) \ge \limsup _{n\rightarrow \infty } {\underline{V}}^{n}(p) - \phi ^{\prime }(h). \end{aligned}$$
Step 3: \({\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}} \cap {\mathcal {R}}_\infty ) = 0\).
Recall that by Lemma 5.6, \(\Phi ^{T_k} \cdot I(T_k<\infty )\) converges to 0 as k goes to infinity, \({\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }\)-almost surely. Also recall that \({\mathcal {R}}_{\infty }\) is the set of plays where infinitely many switches occur. Thus \(I(T_k < \infty )\) is identically equal to 1 on \({\mathcal {R}}_{\infty }\). Furthermore, \(\liminf _{k \rightarrow \infty }\Phi ^{T_{k}} > 0\) everywhere on \({\mathcal {I}}\). We conclude that \(\Phi ^{T_{k}} \cdot I(T_k < \infty )\) does not converge to zero on \({\mathcal {I}} \cap {\mathcal {R}}_\infty \) and the result follows.
Step 4: \({\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \cup {\mathcal {J}}) = 1\).
For every \( k \in {\mathbb {N}}, \) it holds by Levy’s zero-one law (Lemma A.1 in Appendix A) that
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {J}}_{k}) = {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {I}} \cap ({{{\mathcal {R}}}}_{k} \setminus {\mathcal {R}}_{k + 1})). \end{aligned}$$
Using Lemma 5.2 twice yields
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {J}}_{k}) = {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {J}}_{k}) = {\mathbb {P}}_{h,\sigma ^{k},\tau }({\mathcal {I}} \cap ({{{\mathcal {R}}}}_{k} \setminus {\mathcal {R}}_{k + 1})) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}} \cap (\mathcal{R}_{k} \setminus {\mathcal {R}}_{k + 1})). \end{aligned}$$
We now have
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {J}}) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}} \setminus {\mathcal {R}}_{\infty }) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}}), \end{aligned}$$
where the last equality follows from Step 3. Finally, we obtain
$$\begin{aligned} {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \cup {\mathcal {J}}) \ge {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \setminus {\mathcal {I}}) + {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {J}}) = {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {U}} \setminus {\mathcal {I}}) + {\mathbb {P}}_{h,\sigma ^{\phi ^{\prime }},\tau }({\mathcal {I}}) = 1, \end{aligned}$$
where the last equality follows from the fact that the sets \({\mathcal {U}}\) and \({\mathcal {I}}\) cover \({\mathcal {P}}(h)\). \(\square \)
Theorem 5.7 generalizes Proposition 11 in Mashiah–Yaakovi ([15]), where the existence of a subgame \(\epsilon \)-optimal strategy in a two-player zero-sum stochastic game with Borel measurable payoff functions is shown. The tolerance function \( \phi \) there is given by \(\phi (h)=\epsilon \) for every \(h\in {\mathcal {H}}, \) so for every play the tolerance function has a positive limit inferior. Theorem 5.7 yields the existence of a subgame \(\epsilon \)-maxmin strategy. Because the Borel measurability of the payoff function guarantees the existence of the value (Maitra and Sudderth, [13], and Martin, [14]), this is equivalent to proving the existence of a subgame \(\epsilon \)-optimal strategy.
In Sect. 6, we argue that Theorem 5.7 also provides further insight into the main result of Laraki, Maitra, and Sudderth ([10]). There the authors prove among other things that if the payoff function is bounded and upper semi-continuous, then the first player has a subgame optimal strategy.

6 Subgame maxmin strategies

The goal of this section is to explore the relationship between the concept of a subgame maxmin strategy and that of a subgame \(\phi \)-maxmin strategy for \( \phi > 0. \) The main result of this section is the following theorem.
Theorem 6.1
For every game \( \Gamma _{x_{0}}(u) \), there exists a tolerance function \(\phi ^* > 0 \) such that the following statements are equivalent:
1.
The game \( \Gamma _{x_{0}}(u) \) has a subgame maxmin strategy.
 
2.
The game \( \Gamma _{x_{0}}(u) \) has a subgame \(\phi ^*\)-maxmin strategy.
 
Because every subgame maxmin strategy is a subgame \(\phi \)-maxmin strategy, statement 1 clearly implies statement 2. To prove the converse, we define a tolerance function \(\phi ^*>0\); the precise definition is given in Subsection 6.3. Intuitively, we fix a positive sequence \((\delta _t)_{t\in {\mathbb {N}}}\) with the property that \(\sum _{t=0}^\infty \delta _t<\infty \). Then, we choose \(\phi ^*\) such that (1) \(\phi ^*(p_{\vert t})<\delta _t\) for very play \(p\in {\mathcal {P}}\) and every \(t\in {\mathbb {N}}\) and (2) every one-day \(\phi ^*\)-maxmin action is \(\delta _t\)-close to the region of optimal mixed actions in the one-day game. The first condition implies that tolerance function \(\phi ^*\) decreases rapidly to zero, i.e., \(\sum _{t=0}^\infty \phi ^*(p_{\vert t})<\infty \). By assumption, player 1 has a subgame \(\phi ^*\)-maxmin strategy. We obtain a subgame maxmin strategy as follows: at every history \(h_t\) we pick a mixed action which guarantees the lower value \({\underline{v}}(h_t)\) in the one-day game and is \(\delta _t\)-close to the mixed action chosen by the subgame \(\phi ^{*}\)-maxmin strategy. This method creates a strategy which is clearly one-day maxmin. The condition that the tolerance function \(\phi ^{*}\) fastly decreases to 0 is then used to show that expected payoffs from a history \(h_t\) with length t obtained by the subgame \(\phi ^{*}\)-maxmin strategy and the constructed strategy are within a distance of \(M\cdot \sum _{n=t}^\infty \delta _n\), where \(M=\sup _{p\in {{{\mathcal {P}}}}} |u(p)|\). From this, the equalizing property of the constructed strategy follows. Then, Corollary 4.7 from Sect. 4 lets us conclude that the constructed strategy is indeed a subgame maxmin strategy.
Theorem 6.1 has the following corollary.
Corollary 6.2
The following statements are equivalent:
1.
The game \( \Gamma _{x_{0}}(u) \) has a subgame maxmin strategy.
 
2.
For every \( \phi > 0, \) the game \( \Gamma _{x_{0}}(u) \) has a subgame \(\phi \)-maxmin strategy.
 
We would like to make two remarks. First, as the payoff function u need not be continuous, one cannot simply use a continuity argument to prove that statement 2 of Corollary 6.2 implies statement 1. Second, note that the existence of a subgame \(\epsilon \)-maxmin strategy for every \(\epsilon >0\) is a weaker requirement than statements 1 and 2 of Corollary 6.2. Indeed, as Example 3.2 already illustrated, there exist games which admit a subgame \(\epsilon \)-maxmin strategy for every \(\epsilon >0\) but do not admit a subgame maxmin strategy.
The special case where the payoff-function is upper semi-continuous deserves some additional attention. From Maitra and Sudderth ([13]) and Martin ([14]), it follows that every two-player zero-sum stochastic game with a countable state space, finite action sets, and a Borel measurable payoff function admits a value. Because every upper semi-continuous payoff function is Borel measurable, the existence of the value is guaranteed. From Theorem 5.7, we obtain the existence of a subgame \(\phi \)-optimal strategy for every \(\phi >0\). Combining this with Corollary 6.2 shows that player 1 has a subgame optimal strategy. Hereby, we obtain a special case of the result by Laraki, Maitra, and Sudderth ([10]), where the authors allow the state space to be a Borel subset of a Polish space and transition probabilities to be determined by a Borel transition function.
When we are interested in pure strategies that guarantee the maxmin levels, we can strengthen the result of Theorem 6.1. In the context of simultaneous move games, pure strategies are of course rather restrictive. Still, there are important classes of games, such as perfect information games, in which they are natural and play a prominent role.
Theorem 6.3
For every game \(\Gamma _{x_{0}}(u)\), there exists a tolerance function \( \phi ^* > 0 \) such that the following statements are equivalent:
1.
The pure strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame maxmin strategy.
 
2.
The pure strategy \(\sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi ^*\)-maxmin strategy.
 
The tolerance function \( \phi ^*\) in Theorem 6.3 will be defined based on the data of the game and the lower values in the subgames.
This section is structured as follows. We start by proving Theorem 6.3, which is easier than Theorem 6.1 and helps us explain the main ideas. Then, we turn to the proof of Theorem 6.1.

6.1 The proof of Theorem 6.3

In this subsection, we prove Theorem 6.3. Since the statement of this theorem is about pure strategies, the proof is less technical and the intuition is more transparent.
We only need to prove that there is \( \phi ^* > 0 \) such that statement 2 of Theorem 6.3 implies statement 1 of Theorem 6.3. From now on, for any \(t\in {\mathbb {N}}\), history \(h\in {\mathcal {H}}^t\) with some final state x, and mixed actions \(m_1\in \Delta ({\mathcal {A}})\) and \(m_2\in \Delta ({\mathcal {B}})\), we denote the expectation of the lower value at the next stage by
$$\begin{aligned} {\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] \;=\;\sum _{a\in {\mathcal {A}}}\sum _{b\in {\mathcal {B}}}\sum _{x'\in {\mathcal {X}}}q(x'|h,a,b)\cdot {\underline{v}}(h,a,b,x'). \end{aligned}$$
(6.1)
The proof consists of two steps.
Step 1. Definition of \(\phi ^* > 0. \)
For every \(t\in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t\), there exists a number \(d(h) >0\) such that for every action \(a\in {\mathcal {A}}\) of player 1 either one of the following holds:
  • Action a guarantees that the lower value does not drop in expectation: for every \(m_2\in \Delta ({\mathcal {B}})\), \({\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h)\).
  • There exists a mixed action \(m_2\in \Delta ({\mathcal {B}})\) for player 2 such that, if player 1 uses action a, the lower value drops in expectation by more than d(h): \({\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] < {\underline{v}}(h)-d(h)\).
This statement is true because the action set \({\mathcal {A}}\) of player 1 is finite. We define \( \phi ^* > 0 \) as follows. For every \(t\in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t\), let \(\phi ^*(h)=\min \{d(h),2^{-t}\}\). The term \(2^{-t}\) is included so that the tolerance levels tend to 0 along each play. For every \(p\in {\mathcal {P}}\), it holds that \(\displaystyle \lim _{t\rightarrow \infty }\phi ^*(p_{\vert t})=0\).
Step 2. If the pure strategy \( \sigma \in \mathcal{S}_{1} \) is a subgame \( \phi ^{*}\)-maxmin strategy, then \( \sigma \) is a subgame maxmin strategy.
Let the pure strategy \( \sigma \in {{{\mathcal {S}}}}_{1} \) be subgame \(\phi ^*\)-maxmin. We verify that \(\sigma \) satisfies the conditions of Corollary 4.7.
First, we show that \(\sigma \) is 1-day maxmin. Fix \( t \in {\mathbb {N}} \) and \(h\in {\mathcal {H}}^t. \) Let \(a=\sigma (h)\) denote the action that \(\sigma \) recommends at h. Then, for every mixed action \(m_2\in \Delta ({\mathcal {B}})\) and strategy \(\tau \in {\mathcal {S}}_2\) with \(m_2=\tau (h)\), we have
$$\begin{aligned} {\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h)-\phi ^*(h)\ge {\underline{v}}(h)-d(h), \end{aligned}$$
where the first inequality follows from the fact that \(\sigma \) is subgame \(\phi ^*\)-maxmin and by condition 1 of Theorem 4.6, and the second inequality follows from the definition of \(\phi ^*(h). \) Therefore, by the choice of d(h) in Step 1, for every \(m_2\in \Delta ({\mathcal {B}})\), it holds that
$$\begin{aligned} {\mathbb {E}}_{h,a,m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h). \end{aligned}$$
Hence, for every strategy \(\tau \in {\mathcal {S}}_2\), it holds that
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }\left[ {\underline{V}}^{t+1}\right] ={\mathbb {E}}_{h,a,\tau (h)}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h). \end{aligned}$$
Thus, \(\sigma \) is 1-day maxmin.
We show that \(\sigma \) is equalizing. Take some \( \tau \in \mathcal{S}_{2}. \) Because \(\sigma \) is subgame \(\phi ^*\)-maxmin, for every \(p\in {\mathcal {P}}(h), \) for every \(n\ge t, \) we have
$$\begin{aligned} U_{\sigma ,\tau }^n(p)={\mathbb {E}}_{p_{\vert n},\sigma ,\tau }\left[ u\right] \ge {\underline{v}}(p_{\vert n})-\phi ^*(p_{\vert n})={\underline{V}}^n(p)-\Phi ^{*n}(p), \end{aligned}$$
where \(\Phi ^{*n}(p)=\phi ^*(p_{\vert n})\). We conclude that
$$\begin{aligned} \lim _{n\rightarrow \infty }U_{\sigma ,\tau }^n \ge \limsup _{n\rightarrow \infty }\left( {\underline{V}}^n-\Phi ^{*n} \right) =\limsup _{n\rightarrow \infty } {\underline{V}}^n-\lim _{n\rightarrow \infty }\Phi ^{*n} = \limsup _{n\rightarrow \infty } {\underline{V}}^n, \end{aligned}$$
where the last equality from the fact that for all \(p\in {\mathcal {P}}\) we have \(\displaystyle \lim _{n\rightarrow \infty }\phi ^*(p_{\vert n})=0, \) so
$$\begin{aligned} u = \lim _{n\rightarrow \infty }U_{\sigma ,\tau }^n \ge \limsup _{n\rightarrow \infty } {\underline{V}}^n,\quad {\mathbb {P}}_{h,\sigma ,\tau }\text {-almost surely,} \end{aligned}$$
where the equality follows from Lemma A.1 in Appendix A. We have shown that \(\sigma \) is equalizing.

6.2 The One-Shot Game \(\Upsilon _h\)

To prove Theorem 6.1, we first analyze a one-shot zero-sum game in this subsection. For each history, the one-shot game is such that the payoff is given by the lower value at the next stage. In the next subsection, we use these one-shot games to construct the tolerance function \(\phi ^* > 0\).
For some \( t \in {\mathbb {N}}, \) let \( h \in {\mathcal {H}}^t \) be a history in the game \(\Gamma _{x_{0}}(u). \) The one-shot zero-sum game \(\Upsilon _h\) is played as follows. Player 1 chooses an action \(a\in {\mathcal {A}}\) and player 2 simultaneously chooses an action \(b\in {\mathcal {B}}\). Then, player 1 receives from player 2 the amount \({\mathbb {E}}_{h,a,b}\left[ {\underline{V}}^{t+1}\right] \). As the action sets \({\mathcal {A}}\) and \({\mathcal {B}}\) are finite, the game \(\Upsilon _h\) has a value, which we denote by w(h). Furthermore, both players have optimal mixed actions in the game \(\Upsilon _h\).
The following lemma states that the value w(h) of the one-shot game \(\Upsilon _h\) equals the lower value \({\underline{v}}(h)\) at the history h in the original game \(\Gamma _{x_{0}}(u)\). In case the value exists, this is a classic result of dynamic programming.
Lemma 6.4
For every history \(h\in {\mathcal {H}}, \) we have \(w(h)={\underline{v}}(h)\).
Proof
The proof is by contradiction. Fix \( t \in {\mathbb {N}} \) and a history \(h\in {\mathcal {H}}^t. \) Now suppose that \(w(h)\ne {\underline{v}}(h)\). Then, we have either \(w(h)>{\underline{v}}(h)\) or \(w(h)<{\underline{v}}(h)\).
Case 1: \(w(h)>{\underline{v}}(h)\).
Let \(\delta =w(h)-{\underline{v}}(h)\). We derive a contradiction by showing that, in the subgame of \(\Gamma _{x_{0}}(u)\) at history h, player 1 can guarantee an expected payoff of at least \({\underline{v}}(h)+\delta /2\).
Let \(m_1\in \Delta ({\mathcal {A}})\) be an optimal mixed action for player 1 in the one-shot game \(\Upsilon _h\). Let \( \sigma \in {\mathcal {S}}_1 \) be such that \(\sigma (h)=m_1\) and such that it induces a \((\delta /2)\)-maxmin strategy for the subgame at each history in period \(t+1\), i.e., for every \(h'\in {\mathcal {H}}^{t+1},\) for every \(\tau \in {\mathcal {S}}_2,\)
$$\begin{aligned} {\mathbb {E}}_{h',\sigma ,\tau }\left[ u\right] \ge {\underline{v}}(h')-\tfrac{\delta }{2}. \end{aligned}$$
(6.2)
Then, for every \(\tau \in {\mathcal {S}}_2,\) it holds that
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right]= & {} {\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t+1}_{\sigma ,\tau }\right] \\\ge & {} {\mathbb {E}}_{h,\sigma ,\tau }\left[ {\underline{V}}^{t+1}\right] -\tfrac{\delta }{2} = {\mathbb {E}}_{h,m_1,\tau (h)}\left[ {\underline{V}}^{t+1}\right] -\tfrac{\delta }{2} \ge w(h) -\tfrac{\delta }{2} ={\underline{v}}(h)+\tfrac{\delta }{2}, \end{aligned}$$
where the second equality follows from the fact that \((U^n_{\sigma ,\tau })_{n\ge t}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)-martingale, the first inequality follows from (6.2), and the second inequality follows from the choice of \(m_1\).
Case 2: \(w(h)<{\underline{v}}(h)\).3
Let \(\delta ={\underline{v}}(h)-w(h)\). We derive a contradiction by showing that, for every strategy of player 1, there is a strategy for player 2 such that the expected payoff is at most \({\underline{v}}(h)-\delta /2\) in the subgame of \(\Gamma _{x_{0}}(u)\) at history h.
Fix \(\sigma \in {\mathcal {S}}_1\) and let \(m_1=\sigma (h)\). Let \(m_2\in \Delta ({\mathcal {B}})\) be an optimal mixed action for player 2 in the one-shot game \(\Upsilon _h\). Let \(\tau \in {\mathcal {S}}_2\) be such that \(\tau (h)=m_2\) and the expected payoff under \((\sigma ,\tau )\) in the subgame at each history \(h'\) at period \(t+1\) is at most the lower value \({\underline{v}}(h') + \delta /2\), i.e., for every \(h'\in {\mathcal {H}}^{t+1},\)
$$\begin{aligned} {\mathbb {E}}_{h',\sigma ,\tau }\left[ u\right] \le {\underline{v}}(h')+\tfrac{\delta }{2}. \end{aligned}$$
(6.3)
We have that
$$\begin{aligned} {\underline{v}}(h)-\tfrac{\delta }{2}= & {} w(h)+\tfrac{\delta }{2} \ge {\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] +\tfrac{\delta }{2} \\\ge & {} {\mathbb {E}}_{h,m_1,m_2}\left[ U^{t+1}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t+1}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ U^{t}_{\sigma ,\tau }\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ u\right] , \end{aligned}$$
where the first inequality follows from the choice of \(m_2\), the second inequality follows from (6.3), and the penultimate equality follows from the fact that \((U^n_{\sigma ,\tau })_{n\ge t}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)-martingale. Because \( \sigma \) is chosen arbitrarily, we have derived a contradiction with the definition of the lower value \({\underline{v}}(h)\). \(\square \)
The total variation distance between two mixed actions \(m_1,n_1\in \Delta ({\mathcal {A}})\) of player 1 is defined as
$$\begin{aligned} \Vert m_1-n_1\Vert _{\mathrm{TV}}=\sum _{a\in {\mathcal {A}}}\vert m_1(a)-n_1(a)\vert . \end{aligned}$$
The total variation distance between two probability measures \( {\mathbb {P}} \) and \( {\mathbb {P}}^{\prime } \) on \( ({{{\mathcal {P}}}}, {{{\mathcal {F}}}}) \) is defined as
$$\begin{aligned} \Vert {\mathbb {P}} - {\mathbb {P}}^{\prime }\Vert _{{\mathrm{TV}}} = \sup \{\sum _{i=1}^{n} |{\mathbb {P}}(F_{i}) - {\mathbb {P}}^{\prime }(F_{i})| : F_{1}, \ldots , F_{n} \in {{{\mathcal {F}}}} \text{ and } \{F_{1}, \ldots , F_{n}\} \text{ is } \text{ a } \text{ partition } \text{ of } {{{\mathcal {P}}}}\}. \end{aligned}$$
Let \( t \in {\mathbb {N}} \) and a history \(h\in {\mathcal {H}}^t\) be given. Let \(O_h \subseteq \Delta ({{{\mathcal {A}}}}) \) denote the set of optimal mixed actions of player 1 in the one-shot game \(\Upsilon _h\). By Lemma 6.4 it holds that
$$\begin{aligned} O_h = \left\{ m_1\in \Delta ({\mathcal {A}}) \mid \text{ for } \text{ every } m_2\in \Delta ({\mathcal {B}}),~{\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h) \right\} . \end{aligned}$$
Note that \(O_h\) is a compact subset of \(\Delta ({\mathcal {A}})\). For every \(m_1\in \Delta ({\mathcal {A}}), \) the distance of \(m_1\) to \(O_h\) is defined by
$$\begin{aligned} \Vert m_1-O_h\Vert _{\mathrm{TV}}=\displaystyle \min _{n_1\in O_h}\Vert m_1-n_1\Vert _{\mathrm{TV}}. \end{aligned}$$
Due to the compactness of \(O_h, \) the minimum is attained. For \(\delta >0\), let \(D_h^\delta \) be the set of mixed actions of player 1 which have a distance of at least \(\delta \) to the set \(O_h, \) so
$$\begin{aligned} D_h^\delta \;=\;\left\{ m_1\in \Delta ({\mathcal {A}})\vert \Vert m_1-O_h\Vert _{\mathrm{TV}}\ge \delta \right\} . \end{aligned}$$
(6.4)
The mixed actions in \(D_h^\delta \) are not optimal in the one-shot game \(\Upsilon _h\). The following lemma says that the loss in utility caused by these mixed actions has a positive lower bound.
Lemma 6.5
Let \(h\in {\mathcal {H}}\) and \(\delta >0\) be given. If \( D_h^\delta \) is non-empty, then there is \( \epsilon >0\) such that for every \(m_1\in D_h^\delta \) there exists \(b\in {\mathcal {B}}\) such that
$$\begin{aligned}{\mathbb {E}}_{h,m_1,b}\left[ {\underline{V}}^{t+1}\right] \le {\underline{v}}(h)-\epsilon .\end{aligned}$$
Proof
Assume \( D^{\delta }_{h} \) is non-empty. Consider the function \( e_h^\delta : D_h^\delta \rightarrow {\mathbb {R}}\) defined by
$$\begin{aligned} e_h^\delta (m_1)\;=\;{\underline{v}}(h)-\min _{b\in {\mathcal {B}}}{\mathbb {E}}_{h,m_1,b}\left[ {\underline{V}}^{t+1}\right] , \quad m_1\in D_h^\delta . \end{aligned}$$
(6.5)
Since \({\mathcal {B}}\) is finite, the minimum exists. For every \(m_1\in D_h^\delta \), we have \(m_1\notin O_h\), and therefore there exists \(m_2\in \Delta ({\mathcal {B}})\) such that \({\mathbb {E}}_{h,m_1,m_2}\left[ {\underline{V}}^{t+1}\right] <{\underline{v}}(h)\). The function \( e^{\delta }_{h} \) is therefore a positive and continuous function on a compact set, so has a positive minimum. \(\square \)

6.3 Definition of the Tolerance Function \(\phi ^*\)

In this subsection, we define a positive tolerance function \(\phi ^*\). Fix a positive and decreasing sequence \((\delta _t)_{t\in {\mathbb {N}}}\) such that \(\sum _{t=0}^{\infty } \delta _t<\infty \). For example, \(\delta _t=2^{-t}\) for every \(t\in {\mathbb {N}}\). Note that \(\lim _{t\rightarrow \infty }\delta _t=0\).
For every \( t \in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t, \) we define the constant c(h) as follows. If the set \(D_h^{\delta _t}\) is non-empty, then c(h) is equal to the positive number \( \epsilon \) of Lemma 6.5 and \( c(h) = \delta _t \) otherwise. We define
$$\begin{aligned} \phi ^*(h)=\tfrac{\min \left\{ c(h),\delta _{t}\right\} }{2}. \end{aligned}$$
(6.6)
Notice that \(0<\phi ^*(h)<\delta _t\).
We summarize the properties of the tolerance function \(\phi ^*\):
1.
For every history \(h\in {\mathcal {H}}\), we have \(\phi ^*(h)>0\).
 
2.
For every play \(p\in {\mathcal {P}}\), we have \(\sum _{t=0}^{\infty } \phi ^*(p_{\vert t})\le \sum _{t=0}^{\infty } \delta _t<\infty \).
 
3.
For every play \(p\in {\mathcal {P}}\), \(\displaystyle \lim _{t\rightarrow \infty }\phi ^*(p_{\vert t})=0\).
 
4.
If the set \(D_h^{\delta _t}\) is non-empty, then by the choice of c(h), for every \(m_1\in D_h^{\delta _t}\) there exists \(b\in {\mathcal {B}}\) such that
$$\begin{aligned} {\mathbb {E}}_{h,m_1,b}\left[ {\underline{V}}^{t+1}\right] \;\le \;{\underline{v}}(h)-c(h) \;<\;{\underline{v}}(h)-\phi ^*(h). \end{aligned}$$
 
The importance of \(\sum _{t=0}^{\infty } \delta _t<\infty \) is underlined by the following lemma. Recall that \( M=\sup _{p\in {\mathcal {P}}} \vert u(p)\vert . \)
Lemma 6.6
Let the strategies \( \sigma , \sigma ' \in {{{\mathcal {S}}}}_{1} \) be such that, for every \(t\in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t,\) \( \Vert \sigma (h)-\sigma '(h)\Vert _{\mathrm{TV}}\le \delta _{t}. \) Then, for every strategy \(\tau \in \mathcal{S}_{2}, \) for every \(t\in {\mathbb {N}}, \) and for every history \(h\in {\mathcal {H}}^t,\)
$$\begin{aligned}\left| {\mathbb {E}}_{h,\sigma ,\tau }[u]-{\mathbb {E}}_{h,\sigma ',\tau }[u]\right| \le M\cdot \sum _{n=t}^{\infty }\delta _n.\end{aligned}$$
Proof
Let \(\tau \in {{{\mathcal {S}}}}_{2} \) be given. It follows from a more general result in Abate, Redig, and Tkachev ([1], theorem 1) that, for every \(t\in {\mathbb {N}},\) for every history \(h\in {\mathcal {H}}^t,\)
$$\begin{aligned} \Vert {\mathbb {P}}_{h,\sigma ,\tau }-{\mathbb {P}}_{h,\sigma ',\tau }\Vert _{\mathrm{TV}}\le \sum _{n=t}^{\infty }\delta _n. \end{aligned}$$
(6.7)
For completeness, we provide a direct proof of this inequality in Lemma B.1 in Appendix B. The claim of Lemma 6.6 follows directly. \(\square \)
Lemma 6.6 says the following. Consider two arbitrary strategies \(\sigma , \sigma ' \in {{{\mathcal {S}}}}_{1} \) such that the total variation distance between the mixed actions at every history in period t is at most \(\delta _t\). Now consider \( t \in {\mathbb {N}}, \) a history \( h \in {\mathcal {H}}^t, \) and a strategy \( \tau \in {{{\mathcal {S}}}}_{2} \) of player 2. Then, the expected payoffs under \((\sigma ,\tau )\) and \((\sigma ',\tau )\) in the subgame at h differ at most the constant M times \(\sum _{n=t}^{\infty }\delta _n\). Note that this bound does not depend on the strategy \(\tau \) and it only depends on the history h through its period t. Moreover, these bounds tend to 0 as t goes to infinity.
The importance of property 4 of the tolerance function is shown by the following lemma. It says that if \( \sigma \in {{{\mathcal {S}}}}_{1} \) is a subgame \(\phi ^*\)-maxmin strategy, then for every \( h \in H \) the mixed action \( \sigma (h) \) is close to the set of optimal mixed actions \( O_{h} \) in the one-shot game \(\Upsilon _h\).
Lemma 6.7
Let \(\sigma \in {{{\mathcal {S}}}}_{1} \) be a subgame \(\phi ^*\)-maxmin strategy. Then, for every \( t \in {\mathbb {N}}, \) for every history \( h \in {\mathcal {H}}^t, \) we have \(\sigma (h)\notin D_h^{\delta _t}, \) so \( \Vert \sigma (h)-O_h\Vert _{\mathrm{TV}} < \delta _t. \)
Proof
Because \(\sigma \) is a subgame \(\phi ^*\)-maxmin strategy, it follows from condition 1 of Theorem 4.6 that for every mixed action \(m_2\in \Delta ({\mathcal {B}})\) of player 2
$$\begin{aligned}{\mathbb {E}}_{h,\sigma (h),m_2}\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h)-\phi ^*(h).\end{aligned}$$
If \(D_h^{\delta _t}\) is empty, then there is nothing to prove. If \(D_h^{\delta _t}\) is non-empty, then property 4 of \(\phi ^*\) shows that \(\sigma (h)\notin D_h^{\delta _t}\). \(\square \)

6.4 The proof of Theorem 6.1

Proof
Let \( \Gamma _{x_{0}}(u) \) be a game and take the tolerance function \(\phi ^*\) as defined in Subsection 6.3. We only need to show that statement 2 implies statement 1. Let \( \sigma \in \mathcal{S}_{1} \) be a subgame \(\phi ^*\)-maxmin strategy of \( \Gamma _{x_{0}}(u). \)
With the help of \( \sigma , \) we define a strategy \( \sigma ^{*} \in {{{\mathcal {S}}}}_{1} \) in Step 1 of the proof. Then, it is shown that \( \sigma ^{*} \) is a subgame maxmin strategy of \( \Gamma _{x_{0}}(u) \) in Steps 2 and 3 of the proof by verifying that \( \sigma ^{*} \) satisfies the conditions of Corollary 4.7.
Step 1: Definition of \( \sigma ^{*} \in {{{\mathcal {S}}}}_{1}. \)
Take \( t \in {\mathbb {N}} \) and a history \(h\in {\mathcal {H}}^t. \) In view of Lemma 6.7, it holds that \(\Vert \sigma (h)-O_h\Vert _{\mathrm{TV}}<\delta _{t}\). Therefore, there exists \( m^{*}(h) \in O_h \) such that
$$\begin{aligned} \Vert \sigma (h)-m^{*}(h)\Vert _{\mathrm{TV}} <\delta _t. \end{aligned}$$
(6.8)
Now define \(\sigma ^{*}(h)=m^{*}(h)\).
Step 2: \(\sigma ^{*}\) is 1-day maxmin.
Consider some \( \tau \in {{{\mathcal {S}}}}_{2}. \) For every \( t \in {\mathbb {N}}, \) for every \( h\in {\mathcal {H}}^t, \) since \(\sigma ^*(h)\in O_h\) we have that
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ^{*},\tau }\left[ {\underline{V}}^{t+1}\right] \ge {\underline{v}}(h). \end{aligned}$$
Step 3: \(\sigma ^{*}\) is equalizing.
Consider some \( \tau \in {{{\mathcal {S}}}}_{2}. \) In view of (6.8), we can apply Lemma 6.6 to conclude that, for every \( t \in {\mathbb {N}}, \) for every \( h \in {\mathcal {H}}^t, \)
$$\begin{aligned} \left| {\mathbb {E}}_{h,\sigma ,\tau }[u]-{\mathbb {E}}_{h,\sigma ^*,\tau }[u]\right| \le M \cdot \sum _{n=t}^{\infty } \delta _n. \end{aligned}$$
Hence, for every \( t \in {\mathbb {N}}, \) for every history \(h\in {\mathcal {H}}^t\), and for every \(n\ge t,\)
$$\begin{aligned} \left| U^n_{\sigma ,\tau }[u]-U^n_{\sigma ^*,\tau }[u]\right| \le M\cdot \sum _{i=n}^{\infty }\delta _i, \quad {\mathbb {P}}_{h,\sigma ^{*},\tau }\text {-almost surely.} \end{aligned}$$
(6.9)
Because \(\sigma \) is subgame \(\phi ^*\)-maxmin, for every history \(h\in {\mathcal {H}}, \) we have
$$\begin{aligned} {\mathbb {E}}_{h,\sigma ,\tau }[u]\ge {\underline{v}}(h)-\phi ^*(h). \end{aligned}$$
(6.10)
Thus, for every history \(h\in {\mathcal {H}}\) it holds that
$$\begin{aligned} u=\lim _{t\rightarrow \infty }U_{\sigma ^{*},\tau }^t=\lim _{t\rightarrow \infty }U_{\sigma ,\tau }^t\ge \limsup _{t\rightarrow \infty }\left( {\underline{V}}^t - \Phi ^{*t}\right) =\limsup _{t\rightarrow \infty } {\underline{V}}^t, \quad {\mathbb {P}}_{h,\sigma ^{*},\tau }\text {-almost surely}, \end{aligned}$$
where the first equality is due to Lemma A.1 in Appendix A, the second equality follows from (6.9), the inequality is by (6.10), and the last equality is a consequence of property 3 of \(\phi ^*\) from Subsection 6.3. \(\square \)
Note that it is not necessary that the initial \(\delta _t\) are small, or that for initial histories the tolerance level \(\phi ^*(h)\) is small. The conditions put on the sequence \((\delta _t)_{t\in {\mathbb {N}}}\) are there to ensure that the equalizing condition for subgame maxmin strategies is satisfied. The equalizing condition essentially only cares about what happens to the payoff at deep subgames. The requirement that the sequence \((\delta _t)_{t\in {\mathbb {N}}}\) decreases to zero very fast guarantees that \(\lim _{t\rightarrow \infty }\Vert {\mathbb {P}}_{p_{\vert t}, \sigma ,\tau }-{\mathbb {P}}_{p_{\vert t}, \sigma ,\tau }\Vert _{\mathrm{TV}}=0\). This allows us to use the \(\phi ^*\)-equalizing property of the subgame \(\phi ^*\)-maxmin strategy to show that the strategy \(\sigma ^*\) satisfies the equalizing condition.

7 Discussion

Countable action sets. We have assumed throughout the paper that the action sets are finite. If one player has countably many actions, several of our arguments and results are no longer valid. For example, consider the following game. In the initial state, player 1’s action set is \({\mathbb {N}}\), player 2 has only one action, and if player 1 chooses action \(n\in {\mathbb {N}}\) then regardless how the play continues, the payoff is \(1-\frac{1}{n+1}\). This is essentially a one-shot game. Clearly, for every tolerance function \(\phi >0\) player 1 has a pure subgame \(\phi \)-maxmin strategy, i.e. he can play action n with n sufficiently large. Yet, player 1 has no subgame maxmin strategy. So the statements of Theorems 6.1 and 6.3 and Corollary 6.2 are not valid for this game.
One-player games. Suppose that player 2 has a trivial action set, i.e. \({\mathcal {B}}\) is a singleton. Even though this is essentially a game with only one player, as shown in Example 3.2, it may happen that player 1 has a subgame \(\phi \)-maxmin strategy for a certain tolerance function \(\phi \), but he has no pure strategy that is subgame \(\phi \)-maxmin. Nevertheless, if \({\mathcal {B}}\) is a singleton and the conditions of Theorem 5.7 are satisfied, then player 1 has a pure subgame \(\phi \)-maxmin strategy. Indeed, player 1 has a pure \(\delta \)-maxmin strategy for each \(\delta >0\). Hence, the switching strategy used in the proof of Theorem 5.7 will also be pure.
Discounted games. Our model encompasses discounted stochastic games with finite action and countable state spaces, provided that the immediate payoffs are bounded. Since these games admit a value and the discounted payoffs are continuous, it follows by Theorem 5.7 that player 1 has a subgame \(\phi \)-optimal strategy for every tolerance function \(\phi >0\). Therefore, Corollary 6.2 implies the well-known fact that player 1 has a subgame 0-optimal strategy. Our proof does not directly imply that this strategy can be chosen to be stationary.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Appendix A: Universal Measurability

This appendix contains a review of the definitions of universally measurable sets, integrals of universally measurable functions, stopping times, and conditional expectations, as well as a technical lemma that is used in the paper.
Universally measurable sets Let \(({\mathcal {P}},{\mathcal {F}}^\infty )\) be a measurable space, where \({\mathcal {F}}^\infty \) denotes the Borel sigma-algebra and let \({\mathcal {M}}\) denote the collection of all probability measures over this measurable space. For each probability measure \({\mathbb {P}}\in {\mathcal {M}}, \) we can extend the probability space \(({\mathcal {P}},{\mathcal {F}}^\infty ,{\mathbb {P}})\) to a complete probability space \(({\mathcal {P}},{\mathcal {F}}_{{\mathbb {P}}},{\mathbb {P}}^{\mathrm{c}})\) by including all \({\mathbb {P}}\)-negligible sets. To be more precise, let
$$\begin{aligned} {\mathcal {F}}_{\mathbb {P}}^0 =\lbrace Q\subset {\mathcal {P}} \mid \exists Q'\in {\mathcal {F}}^\infty \text{ such } \text{ that } {\mathbb {P}}(Q')=0 \text { and } Q\subseteq Q'\rbrace \end{aligned}$$
be the set of all subsets of \({\mathbb {P}}\)-negligible sets of \({\mathcal {F}}^\infty \). We define
$$\begin{aligned} {\mathcal {F}}_{\mathbb {P}} = \lbrace Q \cup Q^0 \subseteq {\mathcal {P}} \vert Q\in {\mathcal {F}}^\infty \text { and } Q^0\in {\mathcal {F}}_{\mathbb {P}}^0\rbrace \end{aligned}$$
and we define \({\mathbb {P}}^{\mathrm{c}}:{\mathcal {F}}_{\mathbb {P}}\rightarrow [0,1]\) by
$$\begin{aligned} {\mathbb {P}}^{\mathrm{c}}(Q\cup Q^0 )={\mathbb {P}}(Q), \quad Q \in \mathcal{F}^{\infty },~Q^{0} \in {\mathcal {F}}_{\mathbb {P}}^0. \end{aligned}$$
It can be shown that \(({\mathcal {P}},{\mathcal {F}}_{\mathbb {P}},{\mathbb {P}}^{\mathrm{c}})\) is a probability space. Now define
$$\begin{aligned} {\mathcal {F}}=\bigcap _{{\mathbb {P}}\in {\mathcal {M}}}{\mathcal {F}}_{\mathbb {P}}. \end{aligned}$$
It can be shown that \({\mathcal {F}}\) is a sigma-algebra that contains the Borel sigma-algebra \({\mathcal {F}}^\infty \) as a proper subset. The collection \( {\mathcal {F}} \) is the universally measurable sigma-algebra and the elements of \({\mathcal {F}}\) are called universally measurable sets.
Integrals of universally measurable functions A function \(u:{\mathcal {P}}\rightarrow {\mathbb {R}}\) is called universally measurable if \(u^{-1}[a,b]\in {\mathcal {F}}\) for every \([a,b]\subseteq {\mathbb {R}}\). The class of universally measurable functions contains the class of Borel measurable functions. Furthermore, for every universally measurable function there exists a Borel measurable function that coincides with it almost everywhere.
A function \( g:{\mathcal {P}}\rightarrow {\mathbb {R}}\) is called a simple universally measurable function if it is of the form \(g(p)=\sum _{i=1}^n c_i I(p\in Z_i),\) where \(\lbrace Z_1,\ldots , Z_n \rbrace \) is a partition of \({\mathcal {P}}\) with \(Z_i\in {\mathcal {F}}\) for all \(i=1,\ldots ,n\). The expected value of a simple universally measurable payoff with respect to a probability measure \({\mathbb {P}}\) is defined as
$$\begin{aligned} \int _{p\in {\mathcal {P}}} g(p) {\mathbb {P}}(dp)= \sum _{i=1}^n c_i {\mathbb {P}}^{\mathrm{c}}(Z_i). \end{aligned}$$
(A.1)
Let \({\mathcal {G}}\) denote the set of simple universally measurable functions. The expected value of a bounded universally measurable function \(u:{\mathcal {P}}\rightarrow {\mathbb {R}}\) is then given by
$$\begin{aligned} \int _{p\in {\mathcal {P}}} u(p) {\mathbb {P}}(dp)=\sup \limits _{\begin{array}{c} g\in {\mathcal {G}}\\ g \le u \end{array}} \int _{p\in {\mathcal {P}}} g(p) {\mathbb {P}}(dp)=\inf \limits _{\begin{array}{c} g\in {\mathcal {G}}\\ g \ge u \end{array}} \int _{p\in {\mathcal {P}}} g(p) {\mathbb {P}}(dp). \end{aligned}$$
(A.2)
Stopping times A stopping time is a function \(T : {\mathcal {P}} \rightarrow {\mathbb {N}} \cup \{\infty \}\) such that for each \(t \in {\mathbb {N}}\) the set \(\{p \in {\mathcal {P}} \mid T(p) = t\}\) is an element of \({\mathcal {F}}^{t}\). Given a stopping time T, let \({\mathcal {F}}^{T}\) denote the sigma-algebra of sets \(A \in {\mathcal {F}}^{\infty }\) such that \(A \cap \{p \in {\mathcal {P}} \mid T(p) = t\} \in {\mathcal {F}}^{t}\) for each \(t \in {\mathbb {N}}\).
For every \(t \in {\mathbb {N}},\) let \( X^{t} \) be an \({\mathcal {F}}^{t}\) measurable stochastic variable, and let \( X^{\infty } \) be an \({\mathcal {F}}^{\infty }\) measurable stochastic variable. The stochastic variable \(X^{T}\) is defined by letting it coincide with \(X^{t}\) on \(\{p \in {\mathcal {P}} \mid T(p) = t\}\) for each \(t \in {\mathbb {N}}\) and with \(X^{\infty }\) on \(\{p \in {\mathcal {P}} \mid T(p) = \infty \}\). Following Yeh ([22], Theorem 3.28), it holds that \( X^{T} \) is \( {\mathcal {F}}^{T}\) measurable.
Conditional expectations Consider a bounded stochastic variable \(F : {\mathcal {P}} \rightarrow {\mathbb {R}}, \) a strategy profile \( (\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times \mathcal{S}_{2}, \) and a history \(h \in {\mathcal {H}}^{\ell }\) of length \(\ell \). Let some \(t \ge \ell \) be given. The conditional expectation of F with respect to the sigma-algebra \({\mathcal {F}}^{t}\) and the measurable space \(({\mathcal {P}}, {\mathcal {F}},{\mathbb {P}}_{h,\sigma ,\tau })\) is denoted by \({\mathbb {E}}_{h,\sigma ,\tau }[F|{\mathcal {F}}^{t}]\). The conditional expectation \({\mathbb {E}}_{h,\sigma ,\tau }[F|{\mathcal {F}}^{t}]\) can be identified with the stochastic variable \(p \mapsto {\mathbb {E}}_{p_{\vert t},\sigma ,\tau }[F]\).4
Lemma A.1 states a version of Levy’s zero-one law for universally measurable functions. It relies on the fact that a universally measurable function can be approximated by a Borel measurable function.
Lemma A.1
(Levy’s zero-one law for universally measurable functions) For every strategy profile \((\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times {{{\mathcal {S}}}}_{2}, \) for every history \(h\in {\mathcal {H}}, \) we have
$$\begin{aligned} \lim _{t\rightarrow \infty } U^t_{\sigma ,\tau }= u, \quad {\mathbb {P}}_{h,\sigma ,\tau }\text {-almost surely}. \end{aligned}$$
(A.3)
Proof
Fix a strategy profile \((\sigma ,\tau ) \in {{{\mathcal {S}}}}_{1} \times \mathcal{S}_{2} \) and a history \(h\in {\mathcal {H}}\). Since u is universally measurable, there exists a Borel measurable function \({\bar{u}}\) such that \(u={\bar{u}}\), \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely. Then,
$$\begin{aligned} \lim _{t\rightarrow \infty } U^t_{\sigma ,\tau }=\lim _{t\rightarrow \infty } {\mathbb {E}}_{h,\sigma ,\tau }\left[ u\vert {\mathcal {F}}^t\right] =\lim _{t\rightarrow \infty } {\mathbb {E}}_{h,\sigma ,\tau }\left[ {\bar{u}}\vert {\mathcal {F}}^t\right] ={\mathbb {E}}_{h,\sigma ,\tau }\left[ {\bar{u}}\vert {\mathcal {F}}^\infty \right] ={\bar{u}}, \end{aligned}$$
\({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely. In the first equality, we use the definition of the stochastic variable \(U^t_{\sigma ,\tau }\). The second equality follows from the fact that \(u={\bar{u}}\) \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely. The third equality follows from Levy’s zero-one law (see e.g., Bogachev, [3], Example 10.3.15). The last equality follows from the fact that \({\bar{u}}\) is \({\mathcal {F}}^\infty \) measurable. This is because \({\bar{u}}\) is Borel measurable and \({\mathcal {F}}^\infty \) is the Borel sigma-algebra on \( {{{\mathcal {P}}}}. \) The fact that \(u={\bar{u}}\) \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely concludes the proof. \(\square \)

Appendix B: The Proof of Inequality (6.7)

The following statement follows from the more general result of Theorem 1 in Abate, Redig, and Tkachev ([1]). For completeness, we provide a direct proof in this section.
Lemma B.1
Let \( \sigma , \sigma ' \in {{{\mathcal {S}}}}_{1} \) be such that, for every \(t\in {\mathbb {N}}, \) for every \( h \in {\mathcal {H}}^t, \) \( \Vert \sigma (h)-\sigma '(h)\Vert _{\mathrm{TV}}\le \delta _{t}. \) Then, for every strategy \( \tau \in {{{\mathcal {S}}}}_{2}, \) for every \(t\in {\mathbb {N}}, \) and for every history \(h\in {\mathcal {H}}^t, \)
$$\begin{aligned}\Vert {\mathbb {P}}_{h,\sigma ,\tau }-{\mathbb {P}}_{h,\sigma ',\tau }\Vert _{\mathrm{TV}}\le \sum _{i=t}^{\infty }\delta _i.\end{aligned}$$
The class \({\mathcal {V}}\subseteq {\mathcal {F}}\) is called an inner (outer) approximating class for the class \({\mathcal {W}}\subseteq {\mathcal {F}}\) if \({\mathcal {V}}\) is closed under unions (intersections) and if for every \(\epsilon >0\), and for every probability measure \({\mathbb {P}}\in {\mathcal {M}}\) and every set \(W\in {\mathcal {W}}\) there exists a set \(V\in {\mathcal {V}}\) such that \(V\subseteq W\) \((V\supseteq W)\) and \(\vert {\mathbb {P}}(W)-{\mathbb {P}}(V)\vert \le \epsilon \).
Lemma B.2
If \({\mathcal {V}}\) is an inner (outer) approximating class for the class \({\mathcal {W}}\) and \({\mathcal {V}}\subseteq {\mathcal {W}}, \) then it holds for every two probability measures \({\mathbb {P}}\in {\mathcal {M}}\) and \({\mathbb {P}}^{\prime }\in {\mathcal {M}}\) that
$$\begin{aligned} \sup _{W\in {\mathcal {W}}}\vert {\mathbb {P}}(W)-{\mathbb {P}}^{\prime }(W)\vert = \sup _{V\in {\mathcal {V}}}\vert {\mathbb {P}}(V)-{\mathbb {P}}^{\prime }(V)\vert . \end{aligned}$$
Proof
Fix \(\epsilon >0\). Assume that the class \({\mathcal {V}}\) is an inner approximating class for \({\mathcal {W}}\). The proof for an outer approximating class is similar. Fix two probability measures \({\mathbb {P}}\) and \({\mathbb {P}}^{\prime }\) and a set \(W\in {\mathcal {W}}\). Then there exist sets \(V\in {\mathcal {V}}\) and \(V^{\prime }\in {\mathcal {V}}\) such that \(V\subseteq W\), \(V^{\prime }\subseteq W, \) \(\vert {\mathbb {P}}(W)-{\mathbb {P}}(V)\vert \le \epsilon \), and \(\vert {\mathbb {P}}^{\prime }(W)-{\mathbb {P}}^{\prime }(V^{\prime })\vert \le \epsilon \). Let \({\tilde{V}}=V\cup V^{\prime }\). Because \({\mathcal {V}}\) is closed under unions we have that \({\tilde{V}}\in {\mathcal {V}}\). Furthermore, it follows trivially that \({\tilde{V}}\subseteq W, \) \(\vert {\mathbb {P}}(W)-{\mathbb {P}}({\tilde{V}})\vert \le \epsilon , \) and \(\vert {\mathbb {P}}^{\prime }(W)-{\mathbb {P}}^{\prime }({\tilde{V}})\vert \le \epsilon \). We find that
$$\begin{aligned} \vert {\mathbb {P}}(W)-{\mathbb {P}}^{\prime }(W)\vert&= \vert {\mathbb {P}}(W)-{\mathbb {P}}({\tilde{V}})+ {\mathbb {P}}({\tilde{V}})-{\mathbb {P}}^{\prime }({\tilde{V}}) +{\mathbb {P}}^{\prime }({\tilde{V}})-{\mathbb {P}}^{\prime }(W)\vert \\&\le \vert {\mathbb {P}}(W)-{\mathbb {P}}({\tilde{V}})\vert +\vert {\mathbb {P}}({\tilde{V}})-{\mathbb {P}}^{\prime }({\tilde{V}})\vert +\vert {\mathbb {P}}^{\prime }({\tilde{V}})-{\mathbb {P}}^{\prime }(W)\vert \\&\le \vert {\mathbb {P}}({\tilde{V}})-{\mathbb {P}}^{\prime }({\tilde{V}})\vert +2\epsilon . \end{aligned}$$
Because this holds for any \(\epsilon >0\) and any set \( W \in \mathcal{W}, \) we can conclude that
$$\begin{aligned} \sup _{W\in {\mathcal {W}}}\vert {\mathbb {P}}(W)-{\mathbb {P}}^{\prime }(W)\vert \le \sup _{V\in {\mathcal {V}}}\vert {\mathbb {P}}(V)-{\mathbb {P}}^{\prime }(V)\vert . \end{aligned}$$
Because \({\mathcal {V}}\subseteq {\mathcal {W}}\), it is clear that:
$$\begin{aligned} \sup _{W\in {\mathcal {W}}}\vert {\mathbb {P}}(W)-{\mathbb {P}}^{\prime }(W)\vert \ge \sup _{V\in {\mathcal {V}}}\vert {\mathbb {P}}(V)-{\mathbb {P}}^{\prime }(V)\vert . \end{aligned}$$
\(\square \)
In the following lemma, we use Lemma B.2 to simplify the computation of the total variation norm. Instead of having to compute the supremum over all sets of the universally measurable sigma-algebra, it will be sufficient to compute the supremum over the subclass of open sets \( {{{\mathcal {O}}}}^{t}, \) \( t \in {\mathbb {N}}, \) defined by
$$\begin{aligned} {{{\mathcal {O}}}}^t=\displaystyle \lbrace \cup _{h\in H} {\mathcal {P}}(h)\vert H\subseteq {\mathcal {H}}^t \rbrace \end{aligned}$$
(B.4)
The set \({{{\mathcal {O}}}}^t\) is the class of open sets such that all the plays sharing a common history at time t are such that either all or none of them are contained in a specific open set.
Lemma B.3
For every \(h\in {\mathcal {H}}\), it holds that
$$\begin{aligned} \Vert {\mathbb {P}}_{h,\sigma ,\tau }-{\mathbb {P}}_{h,\sigma ^{\prime },\tau }\Vert _{\mathrm{TV}}=\sup _{t \in {\mathbb {N}}, O\in {{{\mathcal {O}}}}^t} \vert {\mathbb {P}}_{h,\sigma ,\tau }(O)-{\mathbb {P}}_{h,\sigma ^{\prime },\tau }(O)\vert . \end{aligned}$$
Proof
We define \(\displaystyle {{{\mathcal {O}}}}=\cup _{t \in {\mathbb {N}}}{{{\mathcal {O}}}}^t. \) Since any set in \( {{{\mathcal {O}}}} \) is a union of open sets, it holds that \( {{{\mathcal {O}}}} \) is a subset of the class of all open sets, \( {\mathcal {O}}^{*}. \) We now show that \({{{\mathcal {O}}}}\) is an inner approximating class for \({\mathcal {O}}^{*}\). Fix an open set \(O^{*}\in {\mathcal {O}}^{*}. \) For every \(t\in {\mathbb {N}}, \) we define \(O^t=\lbrace p\in {\mathcal {P}}\vert {\mathcal {P}}(p_{\vert t})\subseteq O^{*} \rbrace \). It is clear that for every \(t\in {\mathbb {N}}\), we have \(O^t\in {{{\mathcal {O}}}}\) and \(O^t\subseteq O^{*}\). Furthermore, we have that \(O^1\subseteq O^2\subseteq \ldots \) and \(O^{*}=\cup _{t\in {\mathbb {N}}} O^t\). For every probability measure \({\mathbb {P}}\) it therefore holds that \(\displaystyle \lim _{t \rightarrow \infty }{\mathbb {P}}(O^t)={\mathbb {P}}(O^{*}), \) so for every \(\epsilon >0\) there exists \(t\in {\mathbb {N}}\) and \( O^t \in {{{\mathcal {O}}}}^t \) such that \(\vert {\mathbb {P}}(O^{*})-{\mathbb {P}}(O^t) \vert <\epsilon \). Hence, \({{{\mathcal {O}}}}\) is an inner approximating class of \({\mathcal {O}}^{*}\).
Because of the outer regularity of Borel measures on metric spaces, we have that the class of open sets is an outer approximating class for the class of Borel sets. In addition, we have that the class of Borel sets is an inner approximating class for the class of universally measurable sets. Indeed, for any probability measure and any universally measurable set, there exists a Borel set which is contained in the universally measurable set and has the same probability. Repeated application of Lemma B.2 concludes the proof. \(\square \)
Proof of Lemma B.1
Fix \( t \in {\mathbb {N}} \) and a history \( h_{t} \in {\mathcal {H}}^{t}. \) We prove by induction that for every \( n \ge t, \) for every \( O \in {{{\mathcal {O}}}}^{n+1},\)
$$\begin{aligned} \left| {\mathbb {P}}_{h_t,\sigma ,\tau }\left( O\right) -{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }\left( O\right) \right| \le \sum _{i=t}^n\delta _{i}. \end{aligned}$$
(B.5)
Induction basis \( (n=t). \)
We prove that for every \(O\in {{{\mathcal {O}}}}^{t+1}\), \( \left| {\mathbb {P}}_{h_t,\sigma ,\tau }\left( O\right) -{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }\left( O\right) \right| \le \delta _t. \) Fix a set \(O\in {{{\mathcal {O}}}}^{t+1}\). We define
$$\begin{aligned} {\mathcal {Z}}_{h_t} =\lbrace (a,b,x)\in {\mathcal {A}}\times {\mathcal {B}}\times {\mathcal {X}}\vert {\mathcal {P}}(h_tabx)\subseteq O \rbrace . \end{aligned}$$
Let \({\mathcal {A}}_{h_t}=\lbrace a\in {\mathcal {A}}\vert \exists (b,x) \in {\mathcal {B}} \times {\mathcal {X}}: (a,b,x) \in {\mathcal {Z}}_{h_t}\rbrace \) be the projection of the set \( \mathcal{Z}_{h_t} \) on the set \({\mathcal {A}}\). Let \(x_t\) denote the state at the history \(h_t\). We have that
$$\begin{aligned} {\mathbb {P}}_{h_t,\sigma ,\tau }\left( O\right)&=\sum _{(a,b,x)\in {\mathcal {Z}}_{h_t}} \sigma (h_t)(a)\cdot \tau (h_t)(b)\cdot q(x\vert a,b,x_t),\\ {\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }\left( O\right)&= \sum _{(a,b,x)\in {\mathcal {Z}}_{h_t}} \sigma ^{\prime }(h_t)(a)\cdot \tau (h_t)(b)\cdot q(x\vert a,b,x_t). \end{aligned}$$
We find that
$$\begin{aligned}&\left| {\mathbb {P}}_{h_t,\sigma ,\tau }\left( O\right) -{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }\left( O\right) \right| = \left| \sum _{(a,b,x)\in {\mathcal {Z}}_{h_t}} \left( \sigma (h_t)(a)-\sigma ^{\prime }(h_t)(a)\right) \cdot \tau (h_t)(b) \cdot q(x\vert a,b,x_t)\right| \\&\quad \le \sum _{(a,b,x)\in {\mathcal {Z}}_{h_t}}\left| \sigma (h_t)(a)-\sigma ^{\prime }(h_t)(a)\right| \cdot \tau (h_t)(b) \cdot q(x \vert a,b,x_t)\\&\quad \le \sum _{(a,b,x)\in {\mathcal {A}}_{h_t}\times {\mathcal {B}}\times {\mathcal {X}}}\left| \sigma (h_t)(a)-\sigma ^{\prime }(h_t)(a)\right| \cdot \tau (h_t)(b) \cdot q(x \vert a,b,x_t)\\&\quad = \left[ \sum _{a\in {\mathcal {A}}_{h_t}} \left| \sigma (h_t)(a)-\sigma ^{\prime }(h_t)(a)\right| \right] \cdot \left[ \sum _{(b,x)\in {\mathcal {B}}\times {\mathcal {X}} }\tau (h_t)(b) \cdot q(x \vert a,b,x_t)\right] \\&\quad \le \sum _{a\in {\mathcal {A}}}\left| \sigma (h_t)(a)-\sigma ^{\prime }(h_t)(a)\right| = \Vert \sigma (h_t)-\sigma ^{\prime }(h_t) \Vert _{\mathrm{TV}} \le \delta _t. \end{aligned}$$
Induction step.
From the induction hypotheses, it follows that for every open n-level set \(O^n\in {{{\mathcal {O}}}}^n\) with \( n - 1 \ge t, \)
$$\begin{aligned} \vert {\mathbb {P}}_{h_t, \sigma ,\tau }(O^n)-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O^n)\vert \le \sum _{i=t}^{n-1}\delta _i. \end{aligned}$$
(B.6)
Fix a set \(O\in {{{\mathcal {O}}}}^{n+1}\). We can assume without loss of generality that \( {\mathbb {P}}_{h_t,\sigma ,\tau }(O)-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O)\ge 0\). Define
$$\begin{aligned} {\mathcal {H}}^n_{+}=\lbrace h_n\in {\mathcal {H}}^n \vert {\mathbb {P}}_{h_t,\sigma ,\tau }({\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\ge 0 \rbrace . \end{aligned}$$
(B.7)
We have that
$$\begin{aligned}&{\mathbb {P}}_{h_t,\sigma ,\tau }(O)-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O) \\&\quad = \sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n)){\mathbb {P}}_{h_t,\sigma ,\tau }({\mathcal {P}}(h_n))-\sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O\vert {\mathcal {P}}(h_n)){\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n)) \\&\quad = \sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n)){\mathbb {P}}_{h_t,\sigma ,\tau }({\mathcal {P}}(h_n))-\sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n)){\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\\&\qquad + \sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n)){\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))-\sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O\vert {\mathcal {P}}(h_n)){\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n)) \\&\quad = \sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n))\cdot \left( {\mathbb {P}}_{h_t,\sigma ,\tau }({\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\right) \\&\qquad +\sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\cdot \left( {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O\vert {\mathcal {P}}(h_n)) \right) \\&\quad \le \sum _{h_n\in {\mathcal {H}}^n_{+}} {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n))\cdot \left( {\mathbb {P}}_{h_t,\sigma ,\tau }({\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\right) \\&\qquad +\sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\cdot \left| \left( {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O\vert {\mathcal {P}}(h_n)) \right) \right| \\&\quad \le \sum _{h_n\in {\mathcal {H}}^n_{+}} \left( {\mathbb {P}}_{h_t,\sigma ,\tau }({\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n))\right) +\sum _{h_n\in {\mathcal {H}}^n} {\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }({\mathcal {P}}(h_n)) \cdot \delta _n\\&\quad \le \vert {\mathbb {P}}_{h_t,\sigma ,\tau }(\displaystyle \cup _{h_n\in {\mathcal {H}}^n_{+}}{\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(\displaystyle \cup _{h_n\in {\mathcal {H}}^n_{+}}{\mathcal {P}}(h_n))\vert +\delta _n\\&\quad \le \sum _{i=t}^{n-1}\delta _i+\delta _n = \sum _{i=t}^{n}\delta _i, \end{aligned}$$
where the fact that \(\left| \left( {\mathbb {P}}_{h_t,\sigma ,\tau }(O\vert {\mathcal {P}}(h_n))-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O\vert {\mathcal {P}}(h_n)) \right) \right| \le \delta _n \) follows by assumption.
Using Lemma B.3, we can conclude that
$$\begin{aligned} \Vert {\mathbb {P}}_{h_{t},\sigma ,\tau }-{\mathbb {P}}_{h_{t},\sigma ^{\prime },\tau }\Vert _{\mathrm{TV}}= & {} \sup _{n \in {\mathbb {N}}, O\in {{{\mathcal {O}}}}^{n}} \vert {\mathbb {P}}_{h_t,\sigma ,\tau }(O)-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O)\vert \\= & {} \sup _{n \ge t, O\in {{{\mathcal {O}}}}^{n+1}} \vert {\mathbb {P}}_{h_t,\sigma ,\tau }(O)-{\mathbb {P}}_{h_t,\sigma ^{\prime },\tau }(O)\vert \\\le & {} \sup _{n \ge t} \sum _{i=t}^{n}\delta _i=\sum _{i=t}^{\infty }\delta _i. \end{aligned}$$
\(\square \)
Fußnoten
1
See the discussion on page 1566 in Martin ([14]), for the specific case of perfect information games in which the payoff function is the characteristic function of an analytic set.
 
2
We verify that the conditions of Theorem 8.16 in Yeh ([22]) applied to the process \(({\underline{U}}_{\sigma }^{n})_{n \ge t}\) on the measurable space \(({\mathcal {P}},{\mathcal {F}},{\mathbb {P}}_{h,\sigma ,\tau })\) with filtration \(({\mathcal {F}}^{n})_{n \ge t}\) are satisfied. The process \(({\underline{U}}_{\sigma }^{n})_{n \ge t}\) is a \({\mathbb {P}}_{h,\sigma ,\tau }\)-submartingale with respect to the filtration \(({\mathcal {F}}^{n})_{n \ge t}\) by Lemma 3.3. Take \(\xi \) of Theorem 8.16 in Yeh ([22]) equal to u. Since \({\underline{U}}_{\sigma }^{\infty }\) is an \({\mathcal {F}}^{\infty }\)-measurable stochastic variable that \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely coincides with u, it is a version of \({\mathbb {E}}_{h,\sigma ,\tau }[u|{\mathcal {F}}^{\infty }]\), as is required by the theorem. To verify condition (1) of Theorem 8.16 in Yeh ([22]), notice that, for every \(n \ge t, \) for every play \(p \in {\mathcal {P}}, \) we have \({\underline{U}}_{\sigma }^{n}(p) = {\underline{u}}(\sigma ,p_{\vert n}) \le {\mathbb {E}}_{p_{\vert n},\sigma ,\tau }[u]\). The right-hand side of this inequality is a version of \({\mathbb {E}}_{h,\sigma ,\tau }[u|{\mathcal {F}}^{n}]\). Consequently, \({\underline{U}}_{\sigma }^{n} \le {\mathbb {E}}_{h,\sigma ,\tau }[u|{\mathcal {F}}^{n}]\) holds \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely, as desired.
 
3
The proof of this case in not symmetric to the proof of Case 1, because we consider the lower value. Imitating the proof of Case 1 for player 2 would yield results in terms of the upper value.
 
4
As usual, a conditional expectation is not defined uniquely, since some histories might not be reached with positive probability under the strategy profile \((\sigma ,\tau ).\) Our particular choice is both convenient and inconsequential, since any two conditional probability systems coincide \({\mathbb {P}}_{h,\sigma ,\tau }\)-almost surely. We refer to Bogachev ([3], p. 350) for a careful discussion of conditional expectations.
 
Literatur
1.
Zurück zum Zitat Abate A, Redig F, Tkachev I (2014) On the effect of perturbation of conditional probabilities in total variation. Stat Probab Lett 88:1–8MathSciNetCrossRef Abate A, Redig F, Tkachev I (2014) On the effect of perturbation of conditional probabilities in total variation. Stat Probab Lett 88:1–8MathSciNetCrossRef
4.
Zurück zum Zitat Bruyère V (2017) Computer aided synthesis: a game-theoretic approach. In: Charlier E, Leroy J, Rigo M (eds) Developments in Language Theory, vol 10396. Lecture Notes in Computer Science. Springer, pp 3–35 Bruyère V (2017) Computer aided synthesis: a game-theoretic approach. In: Charlier E, Leroy J, Rigo M (eds) Developments in Language Theory, vol 10396. Lecture Notes in Computer Science. Springer, pp 3–35
5.
Zurück zum Zitat Flesch J, Herings PJJ, Maes J, Predtetchinski A (2018) Subgame maxmin strategies in zero-sum stochastic games with tolerance levels. GSBE Research Memorandum 18/20, Maastricht University, Maastricht, pp. 1-37 Flesch J, Herings PJJ, Maes J, Predtetchinski A (2018) Subgame maxmin strategies in zero-sum stochastic games with tolerance levels. GSBE Research Memorandum 18/20, Maastricht University, Maastricht, pp. 1-37
6.
Zurück zum Zitat Flesch J, Predtetchinski A (2016) On refinements of subgame perfect \(\epsilon \)-equilibrium. Int J Game Theory 45:523–542MathSciNetCrossRef Flesch J, Predtetchinski A (2016) On refinements of subgame perfect \(\epsilon \)-equilibrium. Int J Game Theory 45:523–542MathSciNetCrossRef
7.
Zurück zum Zitat Flesch J, Predtetchinski A, Sudderth W (2018) Characterization and simplification of optimal strategies in positive stochastic games. J Appl Probab 55:728–741MathSciNetCrossRef Flesch J, Predtetchinski A, Sudderth W (2018) Characterization and simplification of optimal strategies in positive stochastic games. J Appl Probab 55:728–741MathSciNetCrossRef
8.
Zurück zum Zitat Flesch J, Thuijsman F, Vrieze OJ (1998) Improving strategies in stochastic games. In: Proceedings of the 37th IEEE Conference on Decision and Control, Vol 3, IEEE, pp. 2674-2679 Flesch J, Thuijsman F, Vrieze OJ (1998) Improving strategies in stochastic games. In: Proceedings of the 37th IEEE Conference on Decision and Control, Vol 3, IEEE, pp. 2674-2679
9.
Zurück zum Zitat Gillette D (1957) Stochastic games with zero stop probabilities. In: Dresher M, Tucker AW, Wolfe P (eds) Contributions to the Theory of Games, vol III. Annals of Mathematics Studies, Volume 39. Princeton University Press, Princeton, New Jersey, pp 179–187 Gillette D (1957) Stochastic games with zero stop probabilities. In: Dresher M, Tucker AW, Wolfe P (eds) Contributions to the Theory of Games, vol III. Annals of Mathematics Studies, Volume 39. Princeton University Press, Princeton, New Jersey, pp 179–187
10.
Zurück zum Zitat Laraki R, Maitra A, Sudderth W (2013) Two-person zero-sum stochastic games with semicontinuous payoff. Dyn Games Appl 3:162–171MathSciNetCrossRef Laraki R, Maitra A, Sudderth W (2013) Two-person zero-sum stochastic games with semicontinuous payoff. Dyn Games Appl 3:162–171MathSciNetCrossRef
11.
Zurück zum Zitat Mailath G, Postlewaite A, Samuelson L (2005) Contemporaneous perfect epsilon-equilibria. Games Econ Behav 53:126–140MathSciNetCrossRef Mailath G, Postlewaite A, Samuelson L (2005) Contemporaneous perfect epsilon-equilibria. Games Econ Behav 53:126–140MathSciNetCrossRef
12.
Zurück zum Zitat Maitra A, Sudderth W (1996) Discrete gambling and stochastic games. Springer, New YorkCrossRef Maitra A, Sudderth W (1996) Discrete gambling and stochastic games. Springer, New YorkCrossRef
13.
Zurück zum Zitat Maitra A, Sudderth W (1998) Finitely additive stochastic games with Borel measurable payoffs. Int J Game Theory 27:257–267MathSciNetCrossRef Maitra A, Sudderth W (1998) Finitely additive stochastic games with Borel measurable payoffs. Int J Game Theory 27:257–267MathSciNetCrossRef
15.
Zurück zum Zitat Mashiah-Yaakovi A (2015) Correlated equilibria in stochastic games with Borel measurable payoffs. Dyn Games Appl 5:120–135MathSciNetCrossRef Mashiah-Yaakovi A (2015) Correlated equilibria in stochastic games with Borel measurable payoffs. Dyn Games Appl 5:120–135MathSciNetCrossRef
16.
Zurück zum Zitat Puterman ML (1994) Markov decision processes, discrete stochastic dynamic programming. Wiley, HobokenCrossRef Puterman ML (1994) Markov decision processes, discrete stochastic dynamic programming. Wiley, HobokenCrossRef
17.
Zurück zum Zitat Radner R (1980) Collusive behavior in noncooperative epsilon-equilibria of oligopolies with long but finite lives. J Econ Theory 22:136–154CrossRef Radner R (1980) Collusive behavior in noncooperative epsilon-equilibria of oligopolies with long but finite lives. J Econ Theory 22:136–154CrossRef
18.
Zurück zum Zitat Rosenberg D, Solan E, Vieille N (2001) Stopping games with randomized strategies. Probab Theory Related Fields 119:433–451MathSciNetCrossRef Rosenberg D, Solan E, Vieille N (2001) Stopping games with randomized strategies. Probab Theory Related Fields 119:433–451MathSciNetCrossRef
19.
Zurück zum Zitat Selten R (1965) Spieltheoretische Behandlung eines Oligopolmodells mit Nachfrageträgheit: Teil I: Bestimmung des dynamischen Preisgleichgewichts. Zeitschrift für die gesamte Staatswissenschaft 121:301–324 Selten R (1965) Spieltheoretische Behandlung eines Oligopolmodells mit Nachfrageträgheit: Teil I: Bestimmung des dynamischen Preisgleichgewichts. Zeitschrift für die gesamte Staatswissenschaft 121:301–324
20.
21.
22.
Zurück zum Zitat Yeh J (1995) Martingales and stochastic analysis, vol 1. World Scientific, SingaporeCrossRef Yeh J (1995) Martingales and stochastic analysis, vol 1. World Scientific, SingaporeCrossRef
Metadaten
Titel
Subgame Maxmin Strategies in Zero-Sum Stochastic Games with Tolerance Levels
verfasst von
János Flesch
P. Jean-Jacques Herings
Jasmine Maes
Arkadi Predtetchinski
Publikationsdatum
02.03.2021
Verlag
Springer US
Erschienen in
Dynamic Games and Applications / Ausgabe 4/2021
Print ISSN: 2153-0785
Elektronische ISSN: 2153-0793
DOI
https://doi.org/10.1007/s13235-021-00378-z

Weitere Artikel der Ausgabe 4/2021

Dynamic Games and Applications 4/2021 Zur Ausgabe