1 Introduction

Perfect information games are routinely used in various areas of economics such as bargaining theory (e.g. Binmore et al. 1992) and coalition formation (e.g. Ray 2007), as well as in logic and computer science (e.g. Apt and Grädel 2011). We examine a rich class of perfect information games with deterministic transitions. Such a game can be given by a finite set of players and a directed tree with a root and no terminal nodes. Each node in the tree is associated with a player, who controls this node. Play of the game starts at the root. At every node that play visits, the player who controls this node has to choose one of the outgoing arcs, which brings play to the next node. This induces an infinite path, called play, in the tree, and depending on this play, each player receives a payoff.

This setup encompasses all games of finite duration (as we can replace every terminal node with one infinite sequence of arcs). Another important special case is the situation when the players receive instantaneous payoffs at every period of the game and then aggregate them into one payoff, for example by taking the discounted sum.

In these games, subgame perfect equilibrium (cf. Selten 1965), SPE for short, is the most common refinement of Nash equilibrium. A strategy profile is called an SPE if it induces a Nash equilibrium in every subgame. More precisely, for every node in the tree, given play reaches this node from the root, the continuation strategies form a Nash equilibrium.

Various conditions have been found which guarantee the existence of an SPE (cf. for example Fudenberg and Levine 1983; Harris 1985; Maitra and Sudderth 2007). Yet, without strong conditions, an SPE can easily fail to exist, even in one-player games. Suppose for example that a player has to choose a natural number, and his payoff is \(1-\tfrac{1}{n}\) if he chooses \(n\in \mathbb {N}\). In this game, the player has obviously no optimal strategy, and therefore the game admits no Nash equilibrium and no SPE either. Or as a similar example, suppose that a player can choose to stop or to continue at time periods \(t\in \mathbb {N}\), and his payoff is \(1-\tfrac{1}{t}\) if he decides to stop the game at period \(t\) and his payoff is 0 if he never stops. A common feature of these two games is that, although the player has no optimal strategy, he does have approximately optimal strategies at his disposal. Indeed, for every error-term \(\epsilon >0\), if the player chooses an \(n\ge \tfrac{1}{\epsilon }\) in the first game or stops at a period \(t\ge \tfrac{1}{\epsilon }\) in the second game, then his payoff is at least \(1-\epsilon \). For a small error-term \(\epsilon \), these approximately optimal strategies are fairly satisfactory, and offer a remedy to the non-existence of optimal solutions.Footnote 1

It has been discovered in recent years that there are various classes of perfect information games in which, despite the non-existence of SPE, one can construct approximate SPEs with arbitrarily small errors. The main definition of approximate SPE is called subgame perfect \(\epsilon \)-equilibrium, \(\epsilon \)-SPE for short, where \(\epsilon >0\), which is a strategy profile that induces an \(\epsilon \)-equilibrium in every subgame.Footnote 2 Here, an \(\epsilon \)-equilibrium is a strategy profile such that no player can gain more than \(\epsilon \) by a unilateral deviation. The concept of \(\epsilon \)-SPE has appealing properties. The definition is natural, and as mentioned above, it often provides a way out when an exact SPE fails to exist.

Existence of \(\epsilon \)-SPE has been shown in various classes of games with perfect information: e.g. in Carmona (2005) for games with bounded continuous at infinity payoffs, in Flesch et al. (2010) for games with bounded lower semicontinuous payoffs, in Purves and Sudderth (2011) for games with bounded upper semicontinuous payoffs, in Flesch et al. (2013) for free transition games, in Solan and Vieille (2001, 2003), Solan (2005) and Mashiah-Yaakovi (2009) for quitting games. Laraki et al. (2013) consider zero-sum games with semicontinuous payoffs, and prove that both players have \(\epsilon \)-optimal strategies and one of the players even has a subgame perfect \(\epsilon \)-optimal strategy.

Yet, there are reasons to impose additional requirements and look for refinements. Flesch et al. (2014) pointed out the drawback of \(\epsilon \)-SPE that it does not rule out that a player chooses, with small probability, an action that leads to a bad payoff. This causes a problem if one interprets a probability distribution, which a strategy prescribes on the actions, as a lottery that the player uses when choosing an action. Indeed, in the unlikely but possible event that the lottery picks an action with low payoff, the player may become reluctant to execute this action. Flesch et al. (2014) proposed and examined a refinement, called strong \(\epsilon \)-SPE, which avoids this shortcoming. As another refinement, Brihaye et al. (2013) investigated so-called secure SPE, which can be applied for assume-guarantee synthesis and model checking. Secure refers to a property that the players, in some sense, do not have to fear deviations when an opponent changes his strategy to another one which gives this opponent the same payoff. An analog of this concept would be a secure \(\epsilon \)-SPE .

1.1 Our contribution

In this paper, we identify further properties that one may wish to require from a strategy profile in addition to being an \(\epsilon \)-SPE, and propose two refinements of the concept \(\epsilon \)-SPE: continuity \(\epsilon \)-SPE and \(\phi \)-tolerance equilibrium. We also examine their properties and their existence, mainly when the payoff functions are bounded and semicontinuous.

1.2 Continuity \(\epsilon \)-SPE

When the payoff functions are not fully continuous, it can happen that an \(\epsilon \)-SPE induces an infinite play which is not a continuity point of the payoff functions. This has the consequence that arbitrarily small perturbations in the strategies, even at arbitrarily far time periods, can lead to drastic changes in payoffs. This is particularly problematic if the players are not absolutely certain about their opponents strategies, or when strategies cannot be executed with exact precision.

Therefore, we examine a refinement of \(\epsilon \)-SPE, called continuity \(\epsilon \)-SPE, which requires from the \(\epsilon \)-SPE that, in every subgame, the induced play is a continuity point of the payoff functions. This concept has an appealing finitistic nature, due to the additional continuity property: in any subgame, if such a profile is followed then eventually a time period is reached after which the payoffs are essentially fixed, i.e. any other continuation profile would give almost the same payoff. Roughly speaking, the game is strategically over in finite time.

Of course, not all games admit a continuity \(\epsilon \)-SPE for every \(\epsilon >0\). But we prove this to be the case when the payoff functions are bounded and lower semicontinuous. Bounded and upper semicontinuous payoffs are not sufficient, as a counter-example demonstrates.

1.3 \(\phi \)-tolerance equilibrium

The mistake of a strategy profile in a certain subgame is defined as the maximal improvement in this subgame that a player can gain by a unilateral deviation from his strategy. With this terminology, an \(\epsilon \)-SPE is a strategy profile for which the mistake is at most \(\epsilon \) in every subgame.

It may however be reasonable to also require that the mistakes in the ensuing subgames converge to 0 as play progresses. The need for this type of refinement was already recognized by Mailath et al. (2005), who introduced the concept of contemporaneous perfect \(\epsilon \)-equilibrium, in the more restricted context of games in which each player receives a payoff at every period of the game and aims at maximizing his total discounted payoff. In a subgame of such a discounted game, with a first node at a far period, the payoffs are discounted heavily, and therefore every strategy profile of the original game automatically induces an \(\epsilon \)-equilibrium in this subgame. Consequently, \(\epsilon \)-SPE does not guarantee acceptable outcomes and realistic behavior in far subgames in such discounted games. The main point is that not only the payoffs, but the maximally allowed mistake \(\epsilon \) should also be discounted.

There is in fact another motivation to assume that the mistakes tend to zero as time progresses. Since a subgame is smaller than the original game, one could argue that in certain situations it should be easier for the players to overview the strategic possibilities in a subgame and to reason about them than in the original game. Hence, it may be reasonable to assume that the players only tolerate smaller mistakes once play reaches later subgames.

With these motivations in mind, let \(\phi \) be a function that assigns a positive number \(\phi (h)\) to every history \(h\). Any such function is called a loss tolerance function, and the value \(\phi (h)\) is called a tolerance level at \(h\). A strategy profile is called a \(\phi \)-tolerance equilibrium if for every history \(h\) it induces a \(\phi (h)\)-equilibrium in the subgame that begins at \(h\). We obtain \(\epsilon \)-SPE as a special case by taking \(\phi \) to be a constant function assigning the tolerance level of \(\epsilon \) to each history. We are most interested in cases where tolerance levels converge to zero as the length of the history increases. We prove that, if the payoff functions are bounded and continuous, then a \(\phi \)-tolerance equilibrium exists for every loss tolerance function. Counter-examples demonstrate that continuity cannot be replaced with lower or upper semicontinuity.

1.4 Vanishing loss

At this point we have introduced two refinements of \(\epsilon \)-SPE that at first glance might appear to be unrelated. Yet, common to the two concepts is an intuitive property that we call vanishing loss.

A strategy profile is said to exhibit vanishing loss if the maximal improvement any player can gain by unilaterally deviating from the given strategy approaches zero as the play proceeds. We do not consider the property of vanishing loss as an independent refinement. However, we do think that this is a noteworthy property. In fact, much of the motivation we have given for our refinements stems from vanishing loss.

1.5 Structure of the paper

In Sect. 2, we define the model precisely. Then, we discuss continuity \(\epsilon \)-SPE in Sect. 3. We turn to loss tolerance equilibrium in Sect. 4, and prove the existence result in Sect. 5. Section 6 concludes with the discussion of vanishing loss.

2 The model

2.1 The game

Let \(N = \{1, \dots , n\}\) denote the set of players and let \(A\) be an arbitrary non-empty set. Let \(\mathbb {N} = \{0,1,2,\ldots \}\). We denote by \(H\) the set of all finite sequences of elements of \(A\), including the empty sequence . We write \(|h|\) to denote the length of \(h \in H\). We denote by \(A^\mathbb {N}\) the set of all infinite sequences of elements of \(A\). The elements of \(A\) are called actions, the elements of \(H\) are called histories, and the elements of \(A^\mathbb {N}\) are called plays. There is a function \(\iota : H \rightarrow N\) which assigns an active player to each history. Further, each player \(i\in N\) is given a payoff function \(u_{i} : A^{\mathbb {N}} \rightarrow \mathbb {R}\).

The game is played as follows: At period \(0\), player chooses an action \(a_{0}\). Suppose that up to period \(t \in \mathbb {N}\) of the game the sequence \(h = (a_{0}, \dots , a_{t})\) of actions has been chosen. Then player \(\iota (h)\) chooses an action \(a_{t + 1}\). The chosen action is observed by all players. Continuing this way the players generate a play \(p = (a_{0}, a_{1}, \dots )\), and finally each player \(i\in N\) receives payoff \(u_{i}(p)\).

2.2 Strategies

A pure strategy for player \(i\) is a function \(\sigma _{i} : \iota ^{-1}(i) \rightarrow A\), where \(\iota ^{-1}(i)\) is the set of histories where player \(i\) moves. As we only consider pure strategies, we omit the qualification “pure” in the sequel. A strategy profile is a tuple \((\sigma _{1}, \dots , \sigma _{n})\) where each \(\sigma _{i}\) is a strategy for player \(i\). Given a strategy profile \(\sigma = (\sigma _{1}, \dots , \sigma _{n})\) and a strategy \(\eta _{i}\) for player \(i\) we write \(\sigma /\eta _{i}\) to denote the strategy profile obtained from \(\sigma \) by replacing \(\sigma _{i}\) with \(\eta _{i}\).

For the concatenation of histories and actions we use the following notations. For a history \(h=(a_0,\ldots ,a_t) \in H\) and an action \(b\in A\), we denote the sequence \((a_0,\ldots ,a_t,b)\) by \((h,b)\) or simply \(hb\), and for a sequence of actions \((b_0,b_1,\ldots )\) in \(A\) let \((h,b_0,b_1,\ldots )=(a_0,\ldots ,a_t,b_0,b_1,\ldots )\).

We can identify a strategy profile \(\sigma \) with a function \(\sigma : H \rightarrow A\). Define the play induced by \(\sigma \) starting from the history \(h\), denoted by \(\pi (\sigma ;h)\), inductively as follows: Let \(h_{0} = h\). If \(h_{t}\) has been defined for some \(t \ge 0\), then let \(a_{t} = \sigma (h_{t})\) and set \(h_{t + 1} = (h_{t},a_{t})\). Then \(\pi (\sigma ;h) = (h,a_{0},a_{1},\dots )\). For the special case , we simply write .

2.3 Subgame-perfect \(\epsilon \)-equilibrium

Let \(\epsilon \ge 0\) be an error-term. A strategy profile \(\sigma \) is called an \(\epsilon \)-equilibrium if no player can gain more than \(\epsilon \) by a unilateral deviation, i.e. if for each player \(i \in N\) and for each strategy \(\sigma _{i}^{\prime }\) of player \(i\) it holds that

$$\begin{aligned} u_{i}(\pi (\sigma )) \ge u_{i}(\pi (\sigma /\sigma _{i}^{\prime })) - \epsilon . \end{aligned}$$

A stronger concept arises if we require that the strategy profile induces an \(\epsilon \)-equilibrium in every subgame, i.e. conditional on history \(h\in H\) being reached. A strategy profile \(\sigma \) is called a subgame perfect \(\epsilon \)-equilibrium, \(\epsilon \)-SPE for short, if for each history \(h \in H\), each player \(i \in N\), and each strategy \(\sigma _{i}^{\prime }\) of player \(i\) it holds that

$$\begin{aligned} u_{i}(\pi (\sigma ; h)) \ge u_{i}(\pi (\sigma /\sigma _{i}^{\prime }; h)) - \epsilon . \end{aligned}$$

A 0-equilibrium is simply called an equilibrium, and a 0-SPE is simply called an SPE.

2.4 The topological structure

Following, among others, Martin (1975), Fudenberg and Levine (1983), Flesch et al. (2010), Purves and Sudderth (2011), we endow the set \(A\) with the discrete topology and \(A^{\mathbb {N}}\) with the product topology. The topology on \(A^{\mathbb {N}}\) is completely metrizable,Footnote 3 and a basis of this topology is formed by the cylinder sets \(O(h)= \{p \in A^{\mathbb {N}}: h < p\}\) for \(h \in H\), where for a history \(h \in H\) and a play \(p \in A^{\mathbb {N}}\) we write \(h < p\) if \(h\) is the initial segment of \(p\). Thus a sequence of plays \((p_n)_{n\in \mathbb {N}}\) converges to a play \(p\) precisely when for every \(k\in \mathbb {N}\) there exists an \(N_k\in \mathbb {N}\) such that \(p_n\) coincides with \(p\) on the first \(k\) coordinates for every \(n\ge N_k\).

A function \(f : A^{\mathbb {N}} \rightarrow \mathbb {R}\) is said to be continuous at a play \(p\in A^\mathbb {N}\) if, for every sequence of plays \((p_n)_{n\in \mathbb {N}}\) converging to \(p\), we have \(\lim _{n\rightarrow \infty } f(p_n) = f(p)\). Thus, \(f\) is continuous at \(p\) precisely when for every \(\delta >0\) there is an \(N_\delta \in \mathbb {N}\) such that if a play \(q\) coincides with \(p\) on the first \(N_\delta \) coordinates then \(|f(p)-f(q)|\le \delta \). Further, \(f\) is said to be continuous if it is continuous at each play in \(A^{\mathbb {N}}\).

Continuity of a payoff function with respect to the chosen topology on \(A^{\mathbb {N}}\) is implied by a condition known as continuity at infinity, see e.g. Fudenberg and Levine (1983). The condition only says that the actions taken at distant stages of the game have little effect on the payoff. For example, a discounted sum of bounded instantaneous payoffs automatically satisfies continuity at infinity.

A function \(f : A^{\mathbb {N}} \rightarrow \mathbb {R}\) is said to be lower semicontinuous, lsc for short, at a play \(p\in A^\mathbb {N}\) if, for every sequence of plays \((p_n)_{n\in \mathbb {N}}\) converging to \(p\), we have \(\liminf _{n\rightarrow \infty } f(p_n)\ge f(p)\). We say that \(f\) is lsc if it is lsc at each play in \(A^{\mathbb {N}}\). Equivalently, \(f\) is lsc if the set \(\{q \in A^{\mathbb {N}}: f(q) > \alpha \}\) is open in \(A^{\mathbb {N}}\) for every \(\alpha \in \mathbb {R}\).

A function \(f : A^{\mathbb {N}} \rightarrow \mathbb {R}\) is said to be upper semicontinuous, usc for short, at a play \(p\in A^\mathbb {N}\) if, for every sequence of plays \((p_n)_{n\in \mathbb {N}}\) converging to \(p\), we have \(\limsup _{n\rightarrow \infty } f(p_n)\le f(p)\). We say that \(f\) is usc if it is usc at each play in \(A^{\mathbb {N}}\). Equivalently, \(f\) is usc if the set \(\{q \in A^{\mathbb {N}}: f(q) < \alpha \}\) is open in \(A^{\mathbb {N}}\) for every \(\alpha \in \mathbb {R}\).

2.5 Existence results

Mertens and Neyman (cf. Mertens 1987) showed, by using the result of Martin (1975), that if each player’s payoff function is bounded and Borel measurable then an \(\epsilon \)-equilibrium exists for every \(\epsilon >0\). An \(\epsilon \)-SPE does not necessarily exist under these conditions, which was demonstrated by an example in Flesch et al. (2014). Nevertheless, semicontinuity proved to be a useful condition. Flesch et al. (2010) proved that an \(\epsilon \)-SPE exists for every \(\epsilon >0\) when the payoffs are bounded and lsc, whereas Purves and Sudderth (2011) proved the same when the payoffs are bounded and usc. Very general topological conditions for existence of SPE are given in Alós-Ferrer and Ritzberger (2013).

3 Continuity \(\epsilon \)-SPE

In this section, we examine a refinement of \(\epsilon \)-SPE, which requires that, in every subgame, the payoff functions are continuous at the induced play. We start with the following illustrative example.

Example 3.1

Consider a 1-player game where \(A = \{0, 1\}\) and the payoff \(u_{1}(p)\) is \(1\) if \(p = 1^{\infty } = (1, 1, 1, \dots )\) and is \(0\) otherwise. Any SPE (and in fact any equilibrium) induces the play \(1^{\infty }\) from the root. Such an SPE however does not guarantee within finite time that the payoff will be 1, or at least close to 1, and the player has to play action 1 consistently at every period. To be more precise, for any error-term \(\delta \in [0,1)\), the player is not guaranteed a payoff within a distance of \(\delta \) from 1 by following the strategy for a finite number of periods. Consequently, any mistake, even at far periods, will be costly. The exact cause of this problem lies in the fact that the payoff function is discontinuous at the play \(p\) (even though the payoff function is usc).

Now consider the same game, but now with a different payoff function. At every period of the game, the player receives a payoff of 1 if he chooses action 1 and receives a payoff of 0 if he chooses action 0. The payoff for a play \(q=(a_0,a_1,\ldots )\) is then the total discounted payoff with discount factor \(\beta \in (0,1)\), i.e.

$$\begin{aligned} u_1(q)=(1-\beta ) \sum _{t=0}^\infty \beta ^{t}a_t. \end{aligned}$$
(3.1)

Clearly, the unique SPE is to choose action 1 at every period, which again induces the play \(p=1^{\infty }\) from the root with payoff \(u_1(p)=1\). But now, in contrast with the previous payoff function, \(u_1\) is continuous at \(p\) (and in fact continuous everywhere). And indeed, for any error-term \(\delta >0\), the player can guarantee a payoff of at least \(1-\delta \) in finite time, by simply playing action 1 at the first \(N_\delta \) periods, for a large enough \(N_\delta \). A similar property holds in every subgame.

We call a play \(p\) of the game a continuity play if the payoff function of every player is continuous at \(p\). Otherwise, the play is called a discontinuity play.

Definition 3.2

Let \(\epsilon \ge 0\). A strategy profile \(\sigma \) is called a continuity strategy profile if \(\sigma \) induces a continuity play in every subgame, i.e. \(\pi (\sigma ;h)\) is a continuity play for every history \(h \in H\). A continuity strategy profile \(\sigma \) that is an \(\epsilon \)-SPE is called a continuity \(\epsilon \)-SPE.

Suppose that \(\sigma \) is a continuity \(\epsilon \)-SPE in a game \(G\). Then \(\sigma \) enjoys the following property: for any \(\delta >0\) and any history \(h \in H\) there is a period \(N_{\delta ,h}\) such that if, starting at \(h\), the players follow \(\sigma \) until period \(N_{\delta ,h}\), then regardless how they play after period \(N_{\delta ,h}\), their payoffs will be within a distance of \(\delta \) from \(u(\pi (\sigma ;h))\). This means that, in every subgame, the payoffs are essentially fixed after a finite number of periods (up to payoff \(2\delta \)), and hence the game effectively ends in finite time. In contrast, when the induced play of the game fails to be a continuity play of the payoff functions, the game never settles, and the players may have to play the entire infinite sequence of actions with full precision to be sure about the payoffs that they receive.

Now we will examine the existence of continuity \(\epsilon \)-SPE when the payoff functions are semicontinuous. As Example 3.1 demonstrated, bounded and usc payoffs are not sufficient to guarantee the existence. It turns out however that bounded and lsc payoffs are sufficient, and we will prove it below.

Example 3.3

We construct a game with bounded and lsc payoffs having an SPE such that all payoff functions are discontinuous at the equilibrium play. There are two players, and the action set is \(A = \{L, R\}\). The players move alternatingly, starting from player 1. Player 1’s payoff function only depends on actions of player 2. Player 1’s payoff is \(0\) if player 2 plays \(L\) all the time, and it is \(1\) otherwise. Symmetrically, player 2’s payoff only depends on player 1’s actions:

$$\begin{aligned} u_{1}(a_{0}, a_{1}, a_{2}, \dots )= & {} {\left\{ \begin{array}{ll}0 &{}\quad \forall t \in \mathbb {N},\quad a_{2t + 1} = L\\ 1 &{}\quad \text {otherwise.}\end{array}\right. }\\ u_{2}(a_{0}, a_{1}, a_{2}, \dots )= & {} {\left\{ \begin{array}{ll}0 &{}\quad \forall t \in \mathbb {N},\quad a_{2t} = L\\ 1 &{}\quad \text {otherwise.}\end{array}\right. } \end{aligned}$$

Both payoff functions are lsc. Consider the strategy profile \(\sigma \) such that \(\sigma (h) = L\) for each \(h \in H\). It is an SPE, and it induces the play \((L, L, L, \dots )\), which is a discontinuity point of both payoff functions. Note that there is another SPE in which both players play \(R\) at every history. This is clearly a continuity SPE.

The result below strengthens the main theorem in Flesch et al. (2010). A set \(D \subset A^{\mathbb {N}}\) is dense if it intersects every non-empty open subset of \(A^{\mathbb {N}}\), or equivalently, if it intersects every cylinder set \(O(h)\).

Theorem 3.4

Suppose that all payoff functions are bounded and lsc. Let \(D\) be a subset of \(A^{\mathbb {N}}\) and let \(\epsilon > 0\). The game admits an \(\epsilon \)-SPE \(\sigma \) for which \(\pi (\sigma ;h) \in D\) for each \(h \in H\) if and only if \(D\) is dense in \(A^{\mathbb {N}}\).

Proof

\(\Longrightarrow \): Suppose that \(\sigma \) is an \(\epsilon \)-SPE (or in fact any strategy profile) such that \(\pi (\sigma ;h) \in D\) for each \(h \in H\). Take the cylinder set \(O(h)=\{p\in A^{\mathbb {N}}: h<p\}\) corresponding to an arbitrary history \(h\in H\). Since \(\pi (\sigma ;h) \in D\cap O(h)\), we have \(D\cap O(h)\ne \emptyset \). Because the cylinder sets form a basis of the topology of \(A^{\mathbb {N}}\), the set \(D\) is dense in \(A^{\mathbb {N}}\) indeed.

\(\Longleftarrow \): Suppose that \(D\) is dense in \(A^{\mathbb {N}}\). The proof is nearly identical to the proof of Theorem 2.3 in Flesch et al. (2010). The only alteration we introduce is in the definition of \(P_{0}(h)\). We replace formula (2) in Flesch et al. (2010) by the following:

$$\begin{aligned} P_{0}(h) = \{p \in D: h < p\}. \end{aligned}$$

The rest of the proof remains the same, word for word. \(\square \)

If player \(i\)’s payoff function \(u_i\) is bounded and lsc, then the set of continuity points \(D_i\) of \(u_i\) is a dense subset of \(A^{\mathbb {N}}\) (see for example Fort 1951). By taking \(D=\cap _{i\in N}D_i\), Theorem 3.4 yields the following stronger result.

Corollary 3.5

Suppose that all payoff functions are bounded and lsc. Then, for every \(\epsilon >0\), the game admits a continuity \(\epsilon \)-SPE.

4 Loss tolerance equilibrium

An \(\epsilon \)-SPE allows a player to play a suboptimal strategy in any subgame, provided that the payoff loss the player incurs does not exceed a fixed margin of \(\epsilon \). We could say that a payoff loss of \(\epsilon \) is tolerated in every subgame.

In games with discounted payoffs, \(\epsilon \)-SPE places no restrictions on behavior at remote decision nodes. As an illustration, consider the following example.

Example 4.1

Consider a 1-player game where \(A = [0, 1)\) and the payoff function is given for a play \(q=(a_0,a_1,\ldots )\) by the total discounted payoff with discount factor \(\beta \in (0,1)\), i.e.

$$\begin{aligned} u_1(q)=(1-\beta ) \sum _{t=0}^\infty \beta ^{t}a_t. \end{aligned}$$
(4.1)

The game admits no SPE  but does admit an \(\epsilon \)-SPE for each \(0 < \epsilon < 1\), for example playing \(1 - \epsilon \) in each period. However, there is also an arguably unnatural \(\epsilon \)-SPE where the player plays \(1 - \tfrac{1}{2}\epsilon \) in the first \(t\) periods and plays 0 thereafter, with \(t\) being large enough to satisfy \(\beta ^{t} < \tfrac{\epsilon }{2}\). To see that such a strategy is indeed an \(\epsilon \)-SPE simply notice that after period \(t\) the distance between any two feasible payoffs is at most \(\epsilon \), and so any course of action is compatible with the concept of \(\epsilon \)-SPE.

Same observations apply to all games with continuous payoffs. As the play of the game proceeds, the set of feasible payoffs shrinks, and so a period is eventually reached after which all feasible payoffs differ by no more than \(\epsilon \). In any such subgame, \(\epsilon \)-SPE places no restrictions.

These considerations motivate us to abandon constant loss tolerance and introduce a loss tolerance function assigning a tolerance level to each decision node. Clearly we are most interested in the case where tolerance level declines as the game proceeds.

Definition 4.2

A tolerance function is any function \(\phi : H \rightarrow (0, +\infty )\). The value \(\phi (h)\) of the tolerance function is called the tolerance level at \(h\). Given a loss tolerance function \(\phi \), the strategy profile \(\sigma \) is said to be a \(\phi \)-tolerance equilibrium if for each player \(i \in N\), each strategy \(\eta _{i}\) of player \(i\) and each history \(h \in H\) it holds that

$$\begin{aligned} u_{i}(\pi (\sigma /\eta _{i}; h)) - u_{i}(\pi (\sigma ; h)) \le \phi (h). \end{aligned}$$
(4.2)

Equivalently, \(\sigma \) is a \(\phi \)-tolerance equilibrium if for each \(h \in H\) it induces a \(\phi (h)\)-equilibrium in the subgame that starts at \(h\). The most interesting case is the one where \(\phi \) is decreasing, in the sense of assigning smaller tolerance levels to longer histories. If \(\phi (h) \le \epsilon \) for each \(h\) then clearly each \(\phi \)-tolerance equilibrium is an \(\epsilon \)-SPE. With \(\phi (h) = \beta ^{|h|}\), where \(\beta \) is a discount factor, our \(\phi \)-tolerance equilibrium is closely related to the concept of contemporaneous equilibrium of Mailath et al. (2005).

The differences between three concepts: SPE, \(\epsilon \)-SPE, and loss tolerance equilibrium are illustrated by the following example.

Example 4.3

This is a 3-period game. Formally, the game is infinite, but only the actions \(a_{0}\), \(a_{1}\), and \(a_{2}\) matter for the payoffs. In period 0 player 1 can choose to play \(in\) or \(out\). If player 1 chooses \(out\) the game ends and both players receive payoff zero. If player 1 chooses \(in\), the game continues and in period 1 player 1 chooses a positive integer \(a_{1}\). In period 2 player 2 can play \(yes\) or \(no\). If player 2 plays \(yes\) the payoffs are \((1, 0)\), if \(no\) the payoffs are \((-\frac{1}{a_{1}}, \frac{1}{a_{1}})\). Thus formally , and \(\iota (in, a_{1}) = 2\), and the payoffs are given by

$$\begin{aligned} (u_{1}(a_{0}, a_{1}, a_{2}),u_{2}(a_{0}, a_{1}, a_{2})) = {\left\{ \begin{array}{ll} (0, 0) &{}\quad a_{0} = out\\ (1, 0) &{}\quad a_{0} = in,\,\quad a_{2} = yes\\ \left( -\frac{1}{a_{1}}, \tfrac{1}{a_{1}}\right) &{}\quad a_{0} = in,\, \quad a_{2} = no,\end{array}\right. } \end{aligned}$$

where \(a_{1} \in \{1, 2, \dots \}\).

Intuitively we should expect that in this game player 2 always plays \(no\) and player 1 chooses \(out\). Turning to the formal analysis, notice that the game has no SPE. Indeed, if player 2 plays \(no\) at history \((in, a_{1})\) for each \(a_{1}\), then player 1 at history \((in)\) has no best response.

Fix an \(n \ge 1\). The game has a \(\tfrac{1}{n}\)–SPE. For example the strategy profile

is a \(\tfrac{1}{n}\)–SPE. However, in a \(\tfrac{1}{n}\)–SPE it is not necessarily the case that player 2 always chooses \(no\). For example, the strategy profile

is also a \(\tfrac{1}{n}\)–SPE. The reason is that player 2, even though he has an exact best response at each of his decision nodes, is allowed to play suboptimally provided that he incurs a payoff loss of not more than \(\tfrac{1}{n}\).

Now consider the loss tolerance function \(\phi \) given by and \(\phi (in, a_{1}) = \tfrac{1}{2a_{1}}\). It is clear that \(\sigma \) is a \(\phi \)-tolerance equilibrium while \(\sigma ^{\prime }\) is not. In fact, in each \(\phi \)-tolerance equilibrium with \(\phi \) specified above, player 2 always plays \(no\) and player 1 chooses \(out\).

The main result of this section is the following theorem.

Theorem 4.4

Suppose that all payoff functions are bounded and continuous. Then, for each loss tolerance function \(\phi \), the game admits a \(\phi \)-tolerance equilibrium.

The novelty of Theorem 4.4 is that it covers games where the action set \(A\) is infinite. For the special case when \(A\) is finite, the conclusion of the theorem follows from the fact that for such games an SPE exist, by the results of Fudenberg and Levine (1983) and Harris (1985). From the technical point of view, the main challenge is that the space of plays \(A^\mathbb {N}\) is not compact unless \(A\) is finite. It is for this reason that the truncation technique of Fudenberg and Levine (1983) does not suffice to prove Theorem 4.4. Instead, we carry out an iterative procedure similar to that in Flesch et al. (2010). It differs from the one in Harris (1985) in that the latter terminates in countably many steps, whereas ours is transfinite. For more discussion on our proof see Sect. 5.

Thus Theorem 4.4 is particularly useful in games where the payoffs are continuous at infinity but where the action set is not compact, as is the case in Example 4.3. Another important class of games that is covered by the theorem are discounted games where the instantaneous reward function is discontinuous in actions.

To conclude this section we argue, by means of two examples, that loss tolerance equilibrium may fail to exist if the payoff functions are only semicontinuous.

Example 4.5

Consider the following 1-player game with lsc payoffs. The action set is \(A = \{s, c\}\) where \(s\) stands for stop and \(c\) for continue. If player \(1\) plays \(s\) in period \(t\) the game stopsFootnote 4 with payoff \(\tfrac{t}{t + 1}\). If the player never plays action \(s\) his payoff is \(0\).

figure a

Define the loss tolerance function \(\phi \) by letting \(\phi (t) = \tfrac{1}{t+2}\) where we write \(t\) to denote the history reached if action \(c\) is played \(t\) times. We show that this game admits no \(\phi \)-tolerance equilibrium. Suppose on the contrary that \(\sigma \) is a \(\phi \)-tolerance equilibrium. First assume that \(\sigma (t) = s\) for some time \(t\). Then the payoff for the strategy \(\sigma \) at \(t\) is \(\tfrac{t}{t+1}\). However, because \(1-\tfrac{t}{t+1} > \phi (t)\), the player could improve this payoff at \(t\) by more than \(\phi (t)\) by playing \(c\) sufficiently many times and then playing \(s\). Hence we conclude that \(\phi (t) = c\) for each \(t\). But then the payoff for \(\sigma \) is \(0\) and it could be improved by more than \(\phi (0)\) at the beginning of the game by playing \(c\) twice and then \(s\).

Example 4.6

The following is a game with usc payoffs having no \(\phi \)-tolerance equilibrium:

figure b

In this game, there are two players, who move in turn. The action set is \(A = \{s, c\}\) where \(s\) stands for stop and \(c\) stands for continue. There are three possible scenarios. If player 1 stops the game at period \(t \in \{0, 2, 4, \ldots \}\), then the payoffs are \((\tfrac{1}{t+1},\tfrac{1}{t+1})\). If player 2 stops the game at period \(t \in \{1, 3, 5, \ldots \}\), then the payoffs are \((\tfrac{1}{t+1},\tfrac{1}{t+3})\). Finally, if neither player stops, then the payoffs are \((2,0)\). Note that the payoff function of player 1 is usc, while the payoff function of player 2 is even continuous.

Define the tolerance level by

$$\begin{aligned} \phi (t)=\tfrac{1}{2}\min \left\{ \tfrac{1}{t+1}-\tfrac{1}{t+2}, \tfrac{1}{t+2}-\tfrac{1}{t+3}\right\} \end{aligned}$$

where we write \(t\) to denote the history reached once \(c\) has been played \(t\) times. We argue that there is no \(\phi \)-tolerance equilibrium. Suppose by way of contradiction that \(\sigma \) is a \(\phi \)-tolerance equilibrium. We distinguish two cases.

Suppose first that there is some period \(t_{0}\) after which \(\sigma _2\) never quits. Since the tolerance level is always smaller than \(1\), \(\sigma _1\) prescribes player 1 to always continue after \(t_{0}\), and this yields payoff \((2,0)\). Then, however, at any odd period \(t > t_{0}\), player 2 could improve his payoff with \(\tfrac{1}{t+3}\) by quitting, and this gain is greater than \(\phi (t)\).

Now suppose that \(\sigma _2\) quits at infinitely many odd periods, say at \(t_1<t_2<t_3<\cdots \) etc. Then our choice of the loss tolerance function forces player 1 to quit at \(t_{2} - 1\). But then it forces player 2 to continue at \(t_{2} - 2\). At position \(t_{2} - 3\) player 1 has to stop, and at position \(t_{2} - 4\) player 2 must continue. Working backwards we find that player 1 quits at each position \(t < t_{2}\) and that player 2 continues at each position \(t < t_{2}\). This contradicts our supposition that player 2 quits at \(t_{1}\). This completes the argument.

5 The proof of Theorem 4.4

The proof of the theorem relies on Lemma 5.1 which warrants some attention on its own right. It is in essence an \(\phi \)-version of a well-known one-shot deviation principle. We say that the strategy \(\sigma \) is \(\phi \)-robust to one-shot deviations if by deviating from \(\sigma \) at any single history \(h\) the player \(\iota (h)\) cannot improve his payoff by more than \(\phi (h)\). More precisely, \(\sigma \) is \(\phi \)-robust to one-shot deviations if for every \(h \in H\)

$$\begin{aligned} u_{\iota (h)}(\pi (\sigma ; h)) \ge u_{\iota (h)}(\pi (\sigma ; ha)) - \phi (h)\quad \text { for each }a \in A. \end{aligned}$$
(5.1)

Clearly each \(\phi \)-tolerance equilibrium is \(\phi \)-robust to one-shot deviations. The converse is not true, but we have the following result.

Lemma 5.1

Suppose the functions \(u_{i}\) are bounded and lsc. Then the following conditions are equivalent:

  1. [1]

    For each loss tolerance function \(\phi \) there exists a \(\phi \)-tolerance equilibrium.

  2. [2]

    For each loss tolerance function \(\phi \) there exists a strategy profile that is \(\phi \)-robust to one-shot deviations.

Proof

That [1] implies [2] is immediate since each \(\phi \)-tolerance equilibrium is \(\phi \)-robust to one-shot deviations. We show the converse implication. Let the loss tolerance function \(\phi \) be given. Without loss of generality we can assume that it is a decreasing function in the sense that \(\phi (h) > \phi (h^{\prime })\) whenever \(h < h^{\prime }\). Define the loss tolerance function \(\phi _{0}\) by \(\phi _{0}(h) = 2^{-|h|-1}\phi (h)\). Let the strategy profile \(\sigma \) be \(\phi _{0}\)-robust to one-shot deviations. We show that \(\sigma \) is a \(\phi \)-tolerance equilibrium.

Let \(h\) be any history and \(\eta _{i}\) a pure strategy for player \(i\). We want to prove inequality (4.2). Write \(\eta = \sigma /\eta _{i}\). Let \(h_{0}, h_{1}, h_{2}, \dots \) be the successive enumeration of the initial segments of the play \(\pi (\eta ;h)\) starting from \(h\), that is \(h_{0} = h\) and \(h_{k + 1} = h_{k}\eta (h_{k})\) for each \(k\). We have

$$\begin{aligned} u_{i}(\pi (\sigma ;h_{k})) \ge u_{i}(\pi (\sigma ;h_{k}\eta (h_{k}))) - \phi _{0}(h_{k}). \end{aligned}$$
(5.2)

Indeed, at histories \(h_{k}\) that belong to player \(i\) this inequality is an instance of (5.1) with \(a = \eta (h_{k})\). At histories \(h_{k}\) that do not belong to player \(i\) the inequality holds as we have \(\eta (h_{k}) = \sigma (h_{k})\) and so \(\pi (\sigma ;h_{k}) = \pi (\sigma ;h_{k}\eta (h_{k}))\). Now we have

$$\begin{aligned} \phi _{0}(h_{k}) = 2^{-|h_{k}|-1}\phi (h_{k}) \le 2^{-|h_{k}|-1}\phi (h_{0}) \le 2^{-k-1}\phi (h_{0}), \end{aligned}$$
(5.3)

where we used the fact that \(|h_{k}| = |h_{0}| + k\). Combining (5.2) and (5.3) and using the fact that \(h_{k + 1} = h_{k}\eta (h_{k})\) we obtain

$$\begin{aligned} u_{i}(\pi (\sigma ;h_{k})) \ge u_{i}(\pi (\sigma ;h_{k + 1})) - 2^{-k-1}\phi (h_{0}). \end{aligned}$$

Unraveling this recursive relation we obtain

$$\begin{aligned} u_{i}(\pi (\sigma ;h_{0})) \ge u_{i}(\pi (\sigma ;h_{m + 1})) - \sum _{k = 0}^{m}2^{-k-1}\phi (h_{0}). \end{aligned}$$

The sequence \(\{\pi (\sigma ;h_{m})\}_{m \in \mathbb {N}}\) of plays converges to the play \(\pi (\eta ; h_{0})\) as \(m\) approaches infinity. To see this, simply notice that \(h_{m}\) is the initial segment of both plays \(\pi (\eta ; h_{0})\) and \(\pi (\sigma ;h_{m})\). Hence passing to the limit and using the lower semicontinuity of the payoff function, we obtain

$$\begin{aligned} u_{i}(\pi (\sigma ;h_{0})) \ge \liminf _{m\rightarrow \infty }u_{i}(\pi (\sigma ;h_{m + 1})) - \sum _{k = 0}^{\infty }2^{-k-1}\phi (h_{0}) \ge u_{i}(\pi (\eta ; h_{0})) - \phi (h_{0}), \end{aligned}$$

as desired. \(\square \)

We can now complete the proof of the theorem. We follow the approach of Flesch et al. (2010) with two crucial modifications. The first one is concerned with the discretization of the payoff function. In both cases discretizations are introduced as a player is allowed to ignore a small difference in the payoff. However, while Flesch et al. (2010) use one fixed discretization as determined by a fixed level of loss tolerance \(\epsilon \), we are forced to use different discretizations for each of the players’ decision nodes \(h\), as determined by the respective tolerance levels \(\phi (h)\).

The second modification is in the argument that the procedure always returns a non-empty set of plays. By making use of the continuity of the payoff functions we are able to avoid some of the more difficult steps of the proof of Flesch et al. (2010). For more on this point see the remarks at the end of the proof.

Proof of Theorem 4.4

In view of Lemma 5.1 it is enough to show that for each loss tolerance function \(\phi \) there exists a strategy profile that is \(\phi \)-robust to one-shot deviations. For each \(h \in H\) define the function \(v_{h}\) by letting

$$\begin{aligned} v_{h}(p) = \phi (h)\left\lceil \frac{u_{\iota (h)}(p)}{\phi (h)}\right\rceil \end{aligned}$$

where \(\lceil r \rceil \) is the largest integer that is not greater than \(r\). Since the functions \(u_{i}\) are bounded, the function \(v_{h}\) only takes finitely many values. Moreover \(\lceil r \rceil \) is usc as a function of \(r\), and so \(v_{h}\) is usc as a function of \(p\).

We define by transfinite recursion the following sequences. For ordinal \(0\) let

$$\begin{aligned} P_{0}(h)= & {} \{p \in A^{\mathbb {N}}: h < p\}\\ \alpha _{0}(h)= & {} \min _{p \in P_{0}(h)}v_{h}(p). \end{aligned}$$

For each successor ordinal \(\xi + 1\) let

$$\begin{aligned}&\displaystyle \alpha _{\xi +1}(h)=\max _{a \in A}\min _{p \in P_{\xi }(h,a)}v_{h}(p)\end{aligned}$$
(5.4)
$$\begin{aligned}&\displaystyle P_{\xi +1}(h)=\left\{ p\in \bigcup _{a \in A}P_{\xi }(h,a):v_{h}(p) \ge \alpha _{\xi + 1}(h)\right\} . \end{aligned}$$
(5.5)

For each limit ordinal \(\xi \) we set

$$\begin{aligned} \begin{aligned}&P_{\xi }(h) = \bigcap _{\lambda < \xi }P_{\lambda }(h)\\&\alpha _{\xi }(h) = \min _{p \in P_{\xi }(h)}v_{h}(p). \end{aligned} \end{aligned}$$
(5.6)

By convention, a minimum over the empty set if \(+\infty \). These definitions are precisely the same as in Flesch et al. (2010) except that we use the function \(v_{h}\) in place of the function \(u_{\iota (h)}\). We state the following basic properties of these sequences without proof. The proof (which is straightforward) is exactly the same as the proof of Lemma 3.1, and of Properties Q2 and Q3 in Flesch et al. (2010).

Lemma 5.2

Let \(\xi \) be an ordinal and \(h \in H\). [1] Let \(a\) be an element of \(A\) that attains the maximum in (5.4). Then \(P_{\xi }(h,a) \subseteq P_{\xi + 1}(h)\). [2] If \(\xi \) is a limit ordinal or zero, \(p \in P_{\xi }(h)\) and \(h < h^{\prime } < p\), then \(p \in P_{\xi }(h^{\prime })\).

Lemma 5.3

(Monotonicity of the sequences) Let \(\eta \) and \(\lambda \) be ordinals and \(h \in H\). If \(\eta < \lambda \) then \(P_{\eta }(h) \supseteq P_{\lambda }(h)\) and \(\alpha _{\eta }(h) \le \alpha _{\lambda }(h)\).

We now turn to the key step of the proof. It is here at this step that we are able to use the continuity of the payoff functions to simplify the argument.

Lemma 5.4

The set \(P_{\xi }(h)\) is non-empty for each ordinal \(\xi \) and each \(h \in H\).

Proof

The proof is by transfinite induction on \(\xi \). Thus fix an ordinal \(\xi \) and suppose that \(P_{\nu }(h)\) is non-empty for each ordinal \(\nu < \xi \) and each history \(h \in H\). We have to prove that \(P_{\xi }(h) \ne \oslash \) for each \(h \in H\).

If \(\xi = 0\) the claim is clearly true. If \(\xi \) is a successor ordinal it follows from Lemma 5.2. Suppose \(\xi \) is a limit ordinal.

For each \(h \in H\) define

$$\begin{aligned} \tilde{\alpha }(h) = \max _{\lambda < \xi }\alpha _{\lambda }(h). \end{aligned}$$

Note that the maximum is well defined: \(v_{h}\) and hence also \(\alpha _{\lambda }(h)\) only take finitely many values. Moreover since the sequence \(\alpha _{\lambda }(h)\) is non-decreasing in \(\lambda \) we have \(\tilde{\alpha }(h) = \alpha _{\lambda }(h)\) for all sufficiently large ordinals \(\lambda < \xi \).

Now fix a history \(h \in H\). We recursively define a sequence of histories and ordinals as follows. Set \(h_{0} = h\). Suppose for some \(k \in \mathbb {N}\) the history \(h_{k}\) has been defined. Let \(\xi _{k}\) be a successor ordinal satisfying

$$\begin{aligned} \alpha _{\xi _{k}}(h_{k}) = \tilde{\alpha }(h_{k}). \end{aligned}$$

If \(k \ge 1\) we moreover require that \(\xi _{k - 1} - 1 \le \xi _{k}\). An ordinal \(\xi _{k}\) with these properties exists by the remarks above. By (5.4) there is an \(a_{k} \in A\) such that

$$\begin{aligned} \alpha _{\xi _{k}}(h_{k}) = \min _{p \in P_{\xi _{k} - 1}(h_{k},a_{k})}v_{h_{k}}(p). \end{aligned}$$

Set \(h_{k + 1} = (h_{k},a_{k})\). This completes the induction step.

We have the inclusions

$$\begin{aligned} P_{\xi _{k}}(h_{k}) \supseteq P_{\xi _{k} - 1}(h_{k + 1}) \supseteq P_{\xi _{k + 1}}(h_{k + 1}), \end{aligned}$$

where the first inclusion holds by the choice of the action \(a_{k}\) and by part [1] of Lemma 5.2, and the second inclusion holds because \(\xi _{k} - 1 \le \xi _{k + 1}\) and by Lemma 5.3. Combining these we obtain an infinite chain of inclusions

figure c

For each \(\ell \in \mathbb {N}\) let \(p_{\ell }\) be an arbitrary element of the set \(P_{\xi _{\ell }}(h_{\ell })\). Let \(p = (h_{0},a_{0},a_{1},\ldots )\). Since \(h_{\ell }\) is an initial segment of \(p_{\ell }\), the sequence \(\{p_{\ell }\}_{\ell \in \mathbb {N}}\) of plays converges to the play \(p\). If \(k \le \ell \) then \(P_{\xi _{k}}(h_{k}) \supseteq P_{\xi _{\ell }}(h_{\ell })\) and so \(p_{\ell }\) is an element of \(P_{\xi _{k}}(h_{k})\). Thus we have

$$\begin{aligned} v_{h_{k}}(p_{\ell }) \ge \alpha _{\xi _{k}}(h_{k}) = \tilde{\alpha }(h_{k}) \end{aligned}$$

whenever \(k \le \ell \), where the inequality follows from (5.5). Taking the limit as \(\ell \) approaches infinity and making use of the upper semicontinuity of \(v_{h_{k}}\) we obtain

$$\begin{aligned} v_{h_{k}}(p) \ge \limsup _{\ell \rightarrow \infty }v_{h_{k}}(p_{\ell }) \ge \tilde{\alpha }(h_{k}). \end{aligned}$$

Consequently for each \(\lambda < \xi \) it holds that

$$\begin{aligned} v_{h_{k}}(p) \ge \alpha _{\lambda }(h_{k}). \end{aligned}$$
(5.7)

Now let \(\Phi _{\lambda }\) be the assertion that \(p \in P_{\lambda }(h_{k})\) for each \(k \in \mathbb {N}\). We prove that \(\Phi _{\lambda }\) holds for each \(\lambda < \xi \). The assertion \(\Phi _{0}\) is true since \(h_{k}\) is an initial segment of \(p\). Suppose \(\Phi _{\nu }\) is true for each \(\nu < \lambda \). If \(\lambda \) is a limit ordinal then \(\Phi _{\lambda }\) is true by definition (5.6). Suppose that \(\lambda \) is a successor ordinal. Take a \(k \in \mathbb {N}\). Then \(p \in P_{\lambda - 1}(h_{k + 1})\) by \(\Phi _{\lambda - 1}\). Together with (5.7) and definition (5.5) this implies that \(p \in P_{\lambda }(h_{k})\). This completes the induction step.

In particular we have shown that \(p \in P_{\lambda }(h)\) for each \(\lambda < \xi \). Hence \(p \in P_{\xi }(h)\). \(\square \)

A reader will notice that the construction above is similar to that of the odd iteration in Flesch et al. (2010). The difference is that our version of the odd iteration is infinite. The continuity of the payoff functions helps us to avoid having to switch between odd and even iterations as is done in Flesch et al. (2010). We remark that the proof of Lemma 3.3 in Flesch et al. (2010) asserting the finiteness of the odd iteration cannot be carried over to our setup, because of our use of different dicretizations at different histories.

To complete the proof one constructs a strategy profile \(\sigma \). This is done just as in the proof of Theorem 2.3 in Flesch et al. (2010) using the function \(v_{h}\) in place of \(u_{\iota (h)}\). By Lemma 5.3 there exists an ordinal \(\xi ^{*}\) such that \(P_{\xi ^{*}}(h) = P_{\xi ^{*} + 1}(h)\) for each \(h \in H\) and consequently \(\alpha _{\xi ^{*}}(h) = \alpha _{\xi ^{*} + 1}(h)\) for each \(h \in H\). By Lemma 5.4 the sets \(P_{\xi ^{*}}(h)\) are non-empty for each \(h\). We can assume without loss of generality that \(\xi ^{*}\) is a limit ordinal. Let be an arbitrary element of , and for each history of the form \(ha\) let \(p(ha) \in P_{\xi ^{*}}(ha)\) be a play that attains the minimum in the following problem

$$\begin{aligned} \min _{p \in P_{\xi ^{*}}(ha)}v_{h}(p). \end{aligned}$$

The strategy profile \(\sigma \) is defined as follows: Follow , as long as no one deviates from it. Suppose the first deviation from occurs at history \(h_{0}\) where player \(i_{0}\) plays action \(a_{0}\). Then switch to the play \(p(h_{0}a_{0})\) and follow it as long as no one deviates. Suppose the first deviation from \(p(h_{0}a_{0})\) occurs at history \(h_{1}\) where player \(i_{1}\) plays action \(a_{1}\). Then switch to the play \(p(h_{1}a_{1})\). And so on.

We next argue that for every history \(h \in H\) and action \(a \in A\)

$$\begin{aligned} v_{h}(\pi (\sigma ;h)) \ge v_{h}(\pi (\sigma ;ha)). \end{aligned}$$

If \(\sigma (h) = a\) then \(\pi (\sigma ;h) = \pi (\sigma ;ha)\) so there is nothing to prove. Suppose otherwise. By construction there exists a history \(h^{\prime } \le h\) such that \(\pi (\sigma ;h) = p(h^{\prime })\). Moreover, \(\pi (\sigma ;ha) = p(ha)\). Hence it is sufficient to show that \(v_{h}(p(h^{\prime })) \ge v_{h}(p(ha))\). Now \(p(h^{\prime }) \in P_{\xi ^{*}}(h^{\prime })\). By claim [2] of Lemma 5.2 we have \(p(h^{\prime }) \in P_{\xi ^{*}}(h)\) and so \(p(h^{\prime }) \in P_{\xi ^{*} + 1}(h)\). Hence

$$\begin{aligned} v_{h}(p(h^{\prime })) \ge \alpha _{\xi ^{*} + 1}(h) \ge \min _{p \in P_{\xi ^{*}}(ha)}v_{h}(p) = v_{h}(p(ha)), \end{aligned}$$

as desired.

Finally we prove that \(\sigma \) is \(\phi \)-robust to one-shot deviations. We have

$$\begin{aligned} u_{\iota (h)}(\pi (\sigma ;h)) \ge v_{h}(\pi (\sigma ;h)) \ge v_{h}(\pi (\sigma ;ha)) \ge u_{\iota (h)}(\pi (\sigma ;ha)) - \phi (h). \end{aligned}$$

This completes the proof of Theorem 4.4.\(\square \)

Remark 5.5

For certain games the number of iterations in the proof of Theorem 4.4 may need to be transfinite. Indeed, let \(\tau \) be the smallest ordinal for which \(P_{\tau }(h) = P_{\tau + 1}(h)\) for each \(h \in H\). The following example shows that \(\tau \) can be arbitrarily large.

Let \(\xi \) be any ordinal. Slightly modifying Example 4.1.1 in Flesch et al. (2010) we obtain a game where \(\tau = \xi \). In this game players 1 and 2 choose a decreasing sequence \(a_{0} > a_{1} > \cdots \) of ordinals, with \(a_{0} = \xi \). The rules are as follows: If the current ordinal \(a_{t}\) is a successor ordinal, then player 1 plays \(a_{t+1}=a_t-1\) or he quits. If the current ordinal \(a_{t}\) is a limit ordinal, then player 2 chooses any ordinal \(a_{t + 1}\) smaller than \(a_{t}\). Finally, if \(a_{t} = 0\), then the play terminates. The payoff for player 1 equals 1 if the sequence \((a_{t})\) eventually reaches 0 and equals 0 otherwise. The payoff for player 2 equals 0 for every play. The payoffs are continuous and the proof that \(\tau = \xi \) is similar to that in Flesch et al. (2010).

6 Vanishing loss

In this section we discuss a property of vanishing loss, which is closely related to the two refinements introduced earlier. We deliberately keep our discussion informal. The point here is not to prove new results but rather to emphasize the connections between continuity \(\epsilon \)-SPE and loss tolerance equilibrium.

We are interested in strategy profiles such that the maximal improvement any player can gain by unilaterally deviating from the given strategy approaches zero as the play proceeds. To fix the ideas we define a loss \(\lambda (\sigma ; h)\) associated to a strategy profile \(\sigma \) at history \(h\) as the maximal improvement that a player could gain by unilaterally deviating from his strategy in the subgame that starts at \(h\). Formally \(\lambda (\sigma ; h)\) is the supremum of the left-hand side of Eq. (4.2) over all strategies \(\eta _{i}\) and all players \(i\). In particular \(\sigma \) is an \(\epsilon \)-SPE if \(\lambda (h; \sigma ) \le \epsilon \) for each \(h\) and it is a \(\phi \)-tolerance equilibrium if \(\lambda (\sigma ;h) \le \phi (h)\) for each \(h\).

Definition 6.1

A strategy profile \(\sigma \) exhibits vanishing loss along the play \(p = (a_{0}, a_{1}, \dots ) \in A^{\mathbb {N}}\) if

$$\begin{aligned} \lim _{t \rightarrow \infty }\lambda (\sigma ;(a_{0}, \dots , a_{t})) = 0. \end{aligned}$$

A strategy profile \(\sigma \) has vanishing loss if \(\sigma \) exhibits vanishing loss along the play \(\pi (\sigma ;h)\) for every history \(h \in H\).

The requirement of vanishing losses is intuitive: it might reflect the fact that the players become more experienced in the course of the game. It might also manifest the fact that a player thinks harder about a choice at a specific decision node once that node is reached, than he does in the beginning of the game. Finally, it might simply reflect the fact that the possibilities for deviating profitably shrink as time goes by.

Vanishing loss is a property shared by the two refinements we introduced. More precisely:

  1. [1]

    Each continuity \(\epsilon \)-SPE has vanishing loss.

  2. [2]

    Any \(\phi \)-tolerance equilibrium has vanishing loss provided that the tolerance level \(\phi (h)\) converges to zero as the length of the history increases.

Item [2] follows immediately from the definitions. Item [1] is only slightly less trivial, and can be verified by using the fact that the induced plays are continuity points of all payoff functions. In fact [1] follows from the following somewhat more general observation:

  1. [3]

    For each strategy profile \(\sigma \) if \(\pi (\sigma ; h)\) is a point of upper semicontinuity of all payoff functions, then \(\sigma \) exhibits vanishing loss along the path \(\pi (\sigma ;h)\).

Hence in games with usc payoff functions, any strategy profile \(\sigma \) has vanishing loss, and thus Definition 6.1 has no bite in such games. If the payoff functions are lsc rather than usc, it is no longer true that every strategy profile has vanishing loss. However, by item [1] above and Corollary 3.5, a game with bounded and lsc payoffs does have a strategy profile with vanishing loss.

Beyond games where the payoff functions are semicontinuous, the existence of strategy profiles with vanishing loss is difficult to guarantee. In the following example, for small \(\epsilon \) no \(\epsilon \)-SPE has vanishing loss along the induced play.

Example 6.2

In this game, there is only one player, who can choose at every period between playing down \((d)\) or moving to the right \((r)\). If in total he plays \(k\) times action right then his payoff is \(\tfrac{k}{k+1}\), whereas if he moves to the right infinitely many times then his payoff is 0. This game can be schematically represented as follows, where it is understood that the payoffs corresponding to the arc are only obtained if action \(r\) is not played any more:

figure d

Formally \(A = \{d, r\}\), and the payoff \(u_{1}(a_{0}, a_{1}, \dots )\) is zero if the set

$$\begin{aligned} R(a_{0}, a_{1}, \dots ) = \{t \in \mathbb {N}:a_{t} = r\} \end{aligned}$$

is infinite, and it is \(\frac{k}{k + 1}\) if it has cardinality \(k < \infty \). It is easy to see that \(u_{1}\) is discontinuous at each play in \(A^{\mathbb {N}}\) and is consequently neither lsc nor usc.

For each \(\epsilon > 0\) there exists an \(\epsilon \)-SPE, for example: Let \(k\) be such that \(\frac{k}{k + 1} > 1 - \epsilon \). Play \(r\) if in the past action \(r\) has been played less than \(k\) times, play \(d\) otherwise.

As we argue below, there is no \(\epsilon \)-SPE \(\sigma \), with \(\epsilon <1\), that exhibits vanishing loss along the induced play. Let \(p = \pi (\sigma )\). Then \(R(p)\) is finite, say consisting of \(k\) elements, so that \(u_{1}(p) = \frac{k}{k + 1}\). As the action \(r\) is played exactly \(k\) times along \(p\), the play \(p\) is of the form \((h, d, d, d, \dots )\) for some finite history \(h\). Now take any history \(h^{\prime } < p\) that is longer than \(h\) and consider the play \(p^{\prime } = (h^{\prime }, r, d, d, d, \dots )\). The play \(p^{\prime }\) results in the payoff of \(\frac{k + 1}{k + 2}\). Hence for every \(h^{\prime }\) such that \(h < h^{\prime } < p\) it holds that

$$\begin{aligned} \lambda (\sigma ; h^{\prime }) \ge \tfrac{k + 1}{k + 2} - \tfrac{k}{k + 1}. \end{aligned}$$

This provides a lower bound on the loss along the play \(p\).

Interestingly, for \(\epsilon \in (0,1)\), each \(\epsilon \)-SPE exhibits vanishing loss along the play \(q = (r, r, r, \dots )\), even though \(q\) is not the induced equilibrium play.