Paper The following article is Open access

Introspection dynamics: a simple model of counterfactual learning in asymmetric games

, and

Published 14 June 2022 © 2022 The Author(s). Published by IOP Publishing Ltd on behalf of the Institute of Physics and Deutsche Physikalische Gesellschaft
, , Citation M C Couto et al 2022 New J. Phys. 24 063010 DOI 10.1088/1367-2630/ac6f76

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1367-2630/24/6/063010

Abstract

Social behavior in human and animal populations can be studied as an evolutionary process. Individuals often make decisions between different strategies, and those strategies that yield a fitness advantage tend to spread. Traditionally, much work in evolutionary game theory considers symmetric games: individuals are assumed to have access to the same set of strategies, and they experience the same payoff consequences. As a result, they can learn more profitable strategies by imitation. However, interactions are oftentimes asymmetric. In that case, imitation may be infeasible (because individuals differ in the strategies they are able to use), or it may be undesirable (because individuals differ in their incentives to use a strategy). Here, we consider an alternative learning process which applies to arbitrary asymmetric games, introspection dynamics. According to this dynamics, individuals regularly compare their present strategy to a randomly chosen alternative strategy. If the alternative strategy yields a payoff advantage, it is more likely adopted. In this work, we formalize introspection dynamics for pairwise games. We derive simple and explicit formulas for the abundance of each strategy over time and apply these results to several well-known social dilemmas. In particular, for the volunteer's timing dilemma, we show that the player with the lowest cooperation cost learns to cooperate without delay.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Social behaviors often follow an evolutionary process. Behaviors that yield a high payoff proliferate, whereas inferior strategies go extinct [1, 2]. Providing a quantitative description of how individuals learn, make choices, and interact, however, is non-trivial. Actions of one group member may affect the behaviors of others; and, in a complex chain of interdependencies, this may, in turn, affect the group's surrounding environment [35]. Researchers aim to capture this complex dynamics with models of learning [613] and the tools of evolutionary game theory [1416]. One key application of this theory is the study of cooperation in social dilemmas, where an individual's self-interest is at odds with the common good [17, 18]. For example, to be physically isolated during a pandemic is an action that is simultaneously costly for the individual and valuable for public health. Game theory is a standard mathematical framework to model decision-making in this context [19]. A game—defined by the players, their possible strategies, and their payoffs—captures how individuals interact and what the resulting consequences are.

One of the more subtle aspects of such a model is how it specifies the strategy updating process [20, 21]. In evolutionary game theory, individuals do not act optimally from the outset. Rather they dynamically adjust their strategies, based on their current payoffs. To capture this dynamics, the model needs to specify how individuals adopt new strategies, which is governed by the players' strategy update rule or selection rule. One example of such an update rule is imitation by pairwise comparison [2224]: occasionally, individuals compare their own payoff to the payoff of a co-player. The larger the co-player's payoff, the more likely the focal individual imitates the co-player's strategy. Alternatively, there is also work that considers conformity-driven imitation [2527]. Here, individuals tend to copy those strategies that are abundant among their neighbors, independent of the strategies' payoff consequences.

Yet 'not all learning is social learning' (i.e., learning from others) [2]. People can also adjust their strategies via introvert reasoning. For example, some models suggest that individuals may experiment with new strategies when their current payoff falls below a predefined aspiration level [2830]. Other models assume that individuals adopt new strategies by figuring out the optimal response to their co-players' strategies, as in fictitious play or different variants of best reply dynamics [3133]. Additionally, there is a body of work that studies the interplay and competition among different strategy updating rules [3436].

Models of introvert reasoning are particularly relevant in asymmetric games [37, 38], where individuals are inherently unequal. Such inequalities naturally arise when individuals take different roles in an interaction (as in the ultimatum game [39, 40]), or when they differ in their available strategies [41, 42]. But even if individuals have access to the same set of strategies, they may differ in other relevant aspects, such as their initial endowments [43, 44], costs [45], productivities [46], or their location in a social network or hierarchy [4749]. In the presence of such heterogeneities, strategies that yield a high payoff for one player may no longer be optimal for another. In this case, at least some degree of introvert reasoning seems to be required for effective learning.

In the following, we consider a learning model based on introvert reasoning that applies to all pairwise games. We assume that, from time to time, one individual is given the opportunity to change their strategy. In doing so, the individual thinks counterfactually: 'how much better off a random alternative strategy would have made me in the last interaction?' The larger the difference between the counterfactual payoff and the realized payoff, the more likely the individual is to abandon the current strategy in favor of the random alternative. We refer to this strategy update rule as introspection dynamics. Similar to previous studies [5053], introspection dynamics assumes that players are 'myopic'. That is, when considering switching to a different strategy, players assume their co-player's strategy to be fixed. In particular, they do not anticipate that their new strategy may motivate their co-player to adapt later on. The rule implemented by introspection dynamics was previously used to simulate learning processes in (repeated) social dilemmas [5255]. Here, we introduce an easy-to-implement model in the context of one-shot games, and we systematically analyze its mathematical properties. In particular, we derive an explicit formula for how likely players are to use any given strategy in the long run. This formula takes a particularly simple form when selection is weak (i.e., when realized payoffs only marginally affect a player's learning behavior).

Although symmetric games are more commonly studied, researchers have been interested in the evolutionary dynamics of asymmetric games for decades [5666]. In comparison to these previous models, introspection dynamics has several properties that make it particularly comfortable to work with. For example, in models based on best reply dynamics and fictitious play [7, 31], players need to be aware of the entire payoff matrix. Moreover, they need to be able to compute best responses at all updating steps. Instead, in introspection dynamics, individuals only compare two given strategies at any point in time. In this way, introspection dynamics makes weaker cognitive assumptions on the players, and it allows a more immediate characterization of the eventual strategy distribution. This advantage seems to be particularly valuable in games with many strategies [67]. Similarly, there are natural connections between introspection dynamics and classical evolutionary models of asymmetric games [5664]. For instance, a common approach is to assume that each participant in an asymmetric game is a member of a distinct subpopulation. Evolution then only occurs within each subpopulation. This modeling approach, however, is difficult to apply when all players are unique (for example, when all players in a population differ in how much they benefit from a public good [54]), or when there are no clear boundaries between different subpopulations. In such cases, introspection dynamics may provide a more natural way to explore the resulting strategy dynamics.

The remainder of this paper is organized as follows. In section 2, we formalize introspection dynamics for 2-player, 1-shot games with finitely many strategies. Section 3 contains our main analytical results. Here, we compute the unique stationary distribution of strategies. In addition, we provide explicit formulas for two cases: when selection is weak, and when players can choose among two strategies only. To illustrate these results, section 4 discusses the effect of asymmetry in several well-known social dilemmas. In particular, for the volunteer's timing dilemma, we show that the player with the lower cooperation cost tends to cooperate without delay. We also look at symmetric games and compare introspection dynamics with an evolutionary model of imitation [24]. Finally, section 5 discusses the results and puts them into a broader context. All proofs are provided in the appendices.

2. Model

To model asymmetric interactions, we consider two individuals, player 1 and player 2, who interact in a normal-form game. Player 1 can choose among m pure strategies, ${\mathbf{S}}_{1}$ ,..., ${\mathbf{S}}_{m}$, and player 2 can choose among n strategies, ${\mathbf{S}}_{1}^{\prime }$, ..., ${\mathbf{S}}_{n}^{\prime }$. If player 1 chooses strategy ${\mathbf{S}}_{i}$ and player 2 chooses strategy ${\mathbf{S}}_{j}^{\prime }$, the resulting payoffs are πij and ${\pi }_{ij}^{\prime }$, respectively. We represent such a game by a bi-matrix,

Equation (1)

In this representation, player 1 can choose one of the m rows, whereas player 2 chooses one of the n columns. Thus, we can also refer to player 1 as the 'row player', and to player 2 as the 'column player'. The first entry in the bi-matrix is the payoff to player 1, whereas the second entry is the payoff to player 2. The game is called symmetric if the two players have the same set of strategies (that is, if m = n and ${\mathbf{S}}_{i}={\mathbf{S}}_{i}^{\prime }$ for all i), and if the corresponding strategies yield the same payoffs (if ${\pi }_{ij}={\pi }_{ji}^{\prime }$ for all i, j). In what follows we mainly focus on asymmetric games.

We assume that the two players interact over many time steps, engaging each time in the game defined by the above bi-matrix. In each time step, they can dynamically adjust their strategies. A possible learning dynamics may posit that players compare their current strategy with another one, assuming that the co-player's strategy remains fixed. The process, which we refer to as introspection dynamics, goes as in figure 1. At each time step, one player is randomly chosen to reconsider their strategy. To this end, the player randomly draws an alternative strategy from the set of all its possible strategies. The player then compares its realized payoff π with the payoff $\tilde{\pi }$ the player could have obtained by playing the alternative strategy instead (keeping the co-player's strategy fixed). Let ${\Delta}\pi {:=} \tilde{\pi }\!-\!\pi $ be the difference between the counterfactual payoff and the realized payoff. The player switches to the alternative strategy with probability φβ π) given by the Fermi function [23, 24, 68],

Equation (2)

Here, β ⩾ 0 is a parameter that measures the intensity of selection or selection strength. In one limiting case, β → 0, payoffs are irrelevant for the learning process and any alternative strategy is adopted with probability 1/2. In the other limiting case β, players only adopt the alternative strategy if the resulting payoff matches at least the current payoff. We refer to these two limits as the case of weak selection and strong selection, respectively.

Figure 1.

Figure 1. A schematic illustration of introspection dynamics. (a) We consider two players who interact in a matrix game over several time steps. Suppose in time step $\tau $ , the orange row player chooses strategy S2, whereas the blue column player chooses ${\mathbf{S}}_{1}^{\prime }$ (as indicated by the arrows). Then the row player obtains a payoff of 4, whereas the column player obtains a payoff of 1. (b) After their interaction, the column player is randomly chosen to update their strategy. To this end, the player randomly picks an alternative strategy from their strategy set. In this case, the alternative strategy is ${\mathbf{S}}_{3}^{\prime }$. The column player then compares its previously realized payoff π = 1 with the hypothetical payoff $\tilde{\pi }=2$ that the player could have obtained by playing ${\mathbf{S}}_{3}^{\prime }$ at time $\tau $ . (c) Depending on the payoff difference ${\Delta}\pi =\tilde{\pi }\!-\!\pi $, the column player decides whether to switch to the alternative strategy ${\mathbf{S}}_{3}^{\prime }$. Throughout this article, we assume that the switching probability is parametrized by (2). In the example, the payoff difference is positive, and thus the switching probability is larger than the neutral probability 1/2. (d) If the column player switches to the alternative strategy, the outcome of the game at time $\tau $ + 1 changes accordingly.

Standard image High-resolution image

As we iterate this elementary updating step over time, we obtain a stochastic process on the space of all strategy profiles $({\mathbf{S}}_{1},{\mathbf{S}}_{1}^{\prime })$, $({\mathbf{S}}_{1},{\mathbf{S}}_{2}^{\prime })$, ..., $({\mathbf{S}}_{m},{\mathbf{S}}_{n}^{\prime })$. Each state corresponds to the strategies that the two players currently adopt. Simulations of this process are straightforward [54, 55]. Here, our goal is to analyze the mathematical properties of this process.

3. Analytical properties of introspection dynamics

3.1. An explicit formula for the stationary strategy distribution

To derive explicit results, we note that each player's updating behavior only depends on the players' current strategies (whereas it is independent of the previous strategies of the players). As a result, we can represent the dynamics by a Markov chain. Given the current state $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$, we can compute the probability that the state changes to $({\mathbf{S}}_{k},{\mathbf{S}}_{l}^{\prime })$ in one time step. This transition probability is

Equation (3)

In these expressions, the factor $\frac{1}{2}$ corresponds to randomly drawing one of the two players. Similarly, the factors 1/(m − 1) and 1/(n − 1) correspond to randomly drawing an alternative strategy. The expressions φβ π) then give the probability that the alternative strategy is adopted Eq. (2). We can collect these transition probabilities in an mn × mn matrix T = (Tij,kl ). Here, the first double index denotes the row that corresponds to the previous state $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$ of the Markov chain, whereas the second double index corresponds to the next state $({\mathbf{S}}_{k},{\mathbf{S}}_{l}^{\prime })$. By construction, T is nonnegative and row stochastic (see appendix A).

We can use the matrix T to describe how likely we are to find the process in a given state at some time t given the initial distribution of the process. Let vij (t) denote the probability that at time t, the process is in state $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$. The mn-dimensional row-vector v(t) collects these probabilities in the same order of states as in the transition matrix. By the theory of Markov chains, the strategy distribution at time t is

Equation (4)

where v(0) is the initial strategy distribution.

To describe the long-run dynamics, we assume in the following that the selection strength β is finite (even if it may be arbitrarily large). In that case, the transition matrix T is primitive (see proposition 1 in appendix A). Thus, it follows from the Perron–Frobenius theorem [69] that v(t) converges in time to a unique and positive stationary distribution u, which is independent of the initial distribution v(0). The stationary distribution u solves the eigenvector problem

Equation (5)

Here, e denotes the mn-dimensional row-vector where each entry is equal to 1 and the superscript ⊤ indicates transposition. Hence, the second equation in (5) is the usual normalization for a probability vector (requiring that the sum of all components of u is equal to 1). While (5) defines the stationary distribution implicitly, we can also obtain an explicit formula. To this end, we multiply the second equation in (5) by the row-vector e from the right, and add the result to the first equation. After some rearranging, this yields the relationship u (I + UT) = e. Here, U = e e is the mn × mn matrix with all entries being equal to 1, and I is the identity matrix. The matrix (I + UT) is invertible (see proposition 2 in appendix A, with (I + UT)−1 being a fundamental matrix of the ergodic chain [70]). Thus, we obtain the explicit representation

Equation (6)

This equation allows us to numerically compute the stationary distribution u = (uij ) of introspection dynamics for any finite normal-form game.

To derive expressions that more immediately relate u to the payoffs of the game, we consider two important special cases in the following. First, for any number of strategies, we derive an approximation in the limit of weak selection (β ≪ 1). After that, we provide exact formulas in the case where each player can choose among two strategies (m = n = 2).

3.2. Weak selection

To directly relate the long-run abundance of strategies with the game's payoffs, we consider the limit of weak selection. For this, we first note that both the transition matrix (3) and its stationary distribution (6) depend on β; to make this dependence more explicit, we write T(β) for the transition matrix and u(β) for the stationary distribution. We then expand u(β) and T(β) as a Taylor series around β = 0,

Equation (7)

Here, T0 := T|β=0 and T1 := ∂T/∂β|β=0 are the constant and the linear term of the Taylor expansion of the transition matrix. Both terms can be computed explicitly, see appendix B. Similarly, u0 := u|β=0 and u1 := ∂u/∂β|β=0 are the constant and the linear term of the stationary distribution; those are the terms we wish to compute. To this end, we take the expressions (7) and plug them into the eigenvector problem (5). This yields the relation

Equation (8)

By setting β = 0 in (8), we note that u0 needs to satisfy the linear system

Equation (9)

Similarly, by taking the first derivative of both sides of (8) with respect to β, and then setting β = 0, it follows that u1 needs to satisfy

Equation (10)

As we show in proposition 3 in appendix B, both systems (9) and (10) can be solved explicitly.

In the special case that both players have the same number of strategies, m = n, the solution becomes particularly simple. In that case, we can approximate the abundance of strategy profile $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$ in the stationary strategy distribution by

Equation (11)

To interpret this formula, let us introduce the following shortcut notation for some payoff averages for player 1,

Equation (12)

Here, πj is the average payoff player 1 obtains when randomly sampling a strategy against a co-player with strategy ${\mathbf{S}}_{j}^{\prime }$. The next expression, πi is player 1's average payoff when using strategy ${\mathbf{S}}_{i}$ against a randomly sampling co-player. Finally, π•◦ is player 1's average payoff if both players sample randomly. Analogous averages can be defined with respect to player 2's payoffs π'. Using this notation, we can rewrite the weak-selection formula (11) as

Equation (13)

We say the strategy profile $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$ is favored by selection if its abundance is larger than neutral, (u)ij > 1/n2. For weak selection, expression (13) suggests the following two mechanisms for a strategy profile to be favored: either (i) player 1's payoff from using strategy ${\mathbf{S}}_{i}$ against ${\mathbf{S}}_{j}^{\prime }$ is better than average, πij > πj ; or (ii) player 1's payoff from using strategy ${\mathbf{S}}_{i}$ against a random strategy of the co-player is better than average, πi > π•◦. Two analogous mechanisms apply to player 2.

Interestingly, similar results have been previously derived for a birth–death model [60]. In that model, members of two separate populations engage in a bi-matrix game. The members of the first population act in the role of the row player, whereas the members of the second population act as the column player. In the special case that mutations are rare and selection is weak, the results of that model coincide with ours. In that case, the likelihood that population 1 uses strategy ${\mathbf{S}}_{i}$ and that population 2 uses strategy ${\mathbf{S}}_{j}^{\prime }$ simplifies to our Eq. (13). This agreement can be regarded as another instance of a more general observation: in the limit of weak selection, different selection rules often turn out to be equivalent [71, 72].

The previous expressions (11) and (13) tell us how often we are to observe a strategy profile $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$ on average. In many applications, it is also relevant to know how often player 1 adopts the given strategy ${\mathbf{S}}_{i}$, irrespective of the co-player's strategy. Based on (11), we can approximate the corresponding marginal probability ${\xi }_{i} {:=} \sum _{j=1}^{n}{u}_{ij}$ as

Equation (14)

Using the notation (12), this expression simplifies to ${\xi }_{i}=1/n+\beta ({\pi }_{i{\circ}}-{\pi }_{{\bullet}{\circ}})/n+\mathcal{O}({\beta }^{2})$. In particular, we can use this formula to predict which of two strategies Si and Sk is more likely to be played over time. We obtain that ξi > ξk if and only if πi > πk (i.e., if and only if Si performs better than Sk against a uniform sample of the co-player's strategies). Interestingly, this condition naturally depends on player 1's own payoffs πpq , but it is independent of the co-player's payoffs ${\pi }_{pq}^{\prime }$. While such a result may appear intuitive, we show that it only holds under weak selection. Once selection becomes stronger, the co-player's payoffs can have a major impact on how likely the focal player is to play a certain strategy (for an explicit example, see appendix B).

In this section, we have derived the first-order approximation of the stationary distribution as a function of the selection strength β. Using similar methods, we can recursively compute all higher-order terms in the Taylor expansion of u. The respective expressions are derived in appendix C.

3.3. Games with two strategies

We now apply our model of introspection dynamics to general asymmetric 2-player, 2-strategy games. In contrast to the previous section, the results here are valid for any intensity of selection. For m = n = 2, the payoff matrix (1) simplifies to

Equation (15)

Since the transitions in introspection dynamics only depend on payoff differences between strategies, we can further simplify this payoff matrix. Appendix A shows that the transition probabilities in (3) remain unchanged if we add a constant to all payoffs πij in a given column j or to all payoffs ${\pi }_{ij}^{\prime }$ in a given row i. We can therefore assume without loss of generality that the payoff matrix (15) takes the form

Equation (16)

where A := π11π21, ${A}^{\prime } {:=} {\pi }_{11}^{\prime }\!-\!{\pi }_{12}^{\prime }$, B := π22π12, and ${B}^{\prime } {:=} {\pi }_{22}^{\prime }\!-\!{\pi }_{21}^{\prime }$. Thus, the number of free parameters reduces from eight in (15) to four in (16). For general m × n payoff matrices, this approach reduces the number of free parameters from 2mn to 2mnmn.

Using the payoff matrix (16), we can write the transition matrix T according to (3) explicitly,

We then obtain the stationary strategy distribution u = (u11, u12, u21, u22) by either solving (5) or using (6),

Equation (17)

Here, C is a normalization constant ensuring that components of u add up to 1. Accordingly, the respective marginal probabilities ξ1 = u11 + u12 and ${\xi }_{1}^{\prime }={u}_{11}\!+\!{u}_{21}$ are

The remaining marginal probabilities are ξ2 = 1 − ξ1 and ${\xi }_{2}^{\prime }=1\!-\!{\xi }_{1}^{\prime }$. We note that, in general, the stationary distribution u does not factorize over these marginal distributions. That is, in general, ${u}_{ij}\ne {\xi }_{i}\cdot {\xi }_{j}^{\prime }$ for i, j ∈ {1, 2} and β > 0. However, one can verify that such a factorization holds when the game's payoffs satisfy A = −B and A' = −B'. These latter two conditions imply that each player's incentive to choose one strategy (rather than the other) is independent of the opponent's strategy. In such a case, this independence is also reflected in the game's stationary distribution.

The previous results hold for arbitrary selection strength. We can also look at the weak selection approximation. Using (13), we obtain

Equation (18)

In this case, the resulting marginal probabilities become particularly simple, ξ1 = 1/2 + β(AB)/8 and ${\xi }_{1}^{\prime }=1/2+\beta ({A}^{\prime }\!-\!{B}^{\prime })/8$. Player 1 favors strategy ${\mathbf{S}}_{1}$ if and only if A > B, and the analogous result holds for player 2. In the following, we apply these formulas to discuss the introspection dynamics of some classical 2-strategy games.

4. Applications

4.1. Social dilemmas with two strategies

As a first application of our model, we explore the introspection dynamics of asymmetric social dilemmas. In social dilemmas, players can choose whether to cooperate (C) or to defect (D). When players choose the same action, they prefer mutual cooperation to mutual defection; yet when they choose different actions, a defector gets a higher payoff than a cooperator [17].

4.1.1. Prisoner's dilemma

To start with, we consider the most stringent form of a social dilemma, the prisoner's dilemma [18]. For a simple instantiation of this game, we assume that a cooperating player pays a cost in order for the co-player to get a benefit. We incorporate asymmetry by assuming that the cooperation costs ci > 0 may differ between players (whereas the benefit b of cooperation is the same for both). Therefore, the payoff matrix takes the form

Equation (19)

The unique Nash equilibrium of this game is for both players to defect, independent of the players' exact costs. For an easier interpretation, however, we assume without loss of generality that cooperation tends to be more costly for the first player, c1c2.

In the following, we ask how likely players are to cooperate when they update their strategies according to introspection dynamics. To this end, we first consider the case of a fixed benefit of b = 1. Moreover, we assume that the first player faces considerable cooperation costs, c1 = 0.6, whereas the second player's costs are negligible, c2 = 0.1. To illustrate the workings of introspection dynamics, we start by simulating the basic process described in section 2. Figure 2(a) shows a representative realization. Over time, the two players independently switch between cooperation and defection. As a result, they experience all possible outcomes: there are times in which both players defect, but also instances in which either one or both players cooperate. Overall, however, mutual defection appears to be most abundant, as one may expect.

Figure 2.

Figure 2. Introspection dynamics of the prisoner's dilemma. (a) To start with, we simulate the strategies played by each player (C or D) over 50 game iterations. In the upper panel, the four possible states (C, C), (C, D), (D, C) and (D, D) are represented by different colors (using the same color scheme as in (c)). (b) In a next step, we compute the expected frequency of each state over time according to the Markov chain approach (4). The upper panel shows the respective frequencies; the lower panel shows each player's expected cooperation probability ξC and ${\xi }_{C}^{\prime }$. We compare these analytical results to the average cooperation rate obtained when simulating the process for 104 iterations (dashed line). (c) To further analyze the limiting behavior of the dynamics, we compute the stationary distribution in (20). As one may expect, mutual defection is the most abundant state, followed by the state in which player 1 defects and player 2 cooperates. (d) Here, we explore how asymmetry affects the abundance of each state in the stationary distribution. To this end, we vary the cost difference c1c2 while keeping the average cost constant, (c1 + c2)/2 = 0.35. The respective values of c1 and c2 are shown as full and dashed thin gray lines, respectively. The thick black line shows the overall likelihood of cooperation, which is a slightly increasing function of the cost difference. (e) Finally, we explore how the stationary distribution depends on the strength of selection. As selection becomes strong, most players defect. However, for intermediate selection strengths, the second player may cooperate for a considerable fraction of time. Parameters: b = 1, c1 = 0.6, c2 = 0.1, β = 5.

Standard image High-resolution image

To obtain a more quantitative understanding, we compute how likely we are to observe each of the four possible outcomes over time. To this end, we assume that initially both players defect, such that $\mathbf{v}(0)=\left({v}_{\mathbf{CC}}(0),{v}_{\mathbf{CD}}(0),{v}_{\mathbf{DC}}(0),{v}_{\mathbf{DD}}(0)\right)=(0,0,0,1)$. Then we use (4) to compute v(t) for all future time steps. In figure 2(b), we show the resulting cooperation probability for each player, as defined by ξC (t) = vCC (t) + vCD (t) for the first player and ${\xi }_{C}^{\prime }(t)={v}_{\mathbf{CC}}(t)\!+\!{v}_{\mathbf{DC}}(t)$ for the second player. As before, we observe that both players are most likely to defect. However, while player 1's cooperation probability remains low (approximately 5%), player 2's cooperation probability quickly reaches a stable value of about 38%.

To further explore this limiting behavior, we compute the game's stationary distribution. Setting the four game parameters in the payoff matrix (16) as A = −c1, A' = −c2, B = c1, B' = c2, we can use (17) to compute the stationary distribution $\mathbf{u}=\left({u}_{\mathbf{CC}},{u}_{\mathbf{CD}},{u}_{\mathbf{DC}},{u}_{\mathbf{DD}}\right)$ as

Equation (20)

In particular, we note that the invariant distribution does not depend on the benefit of cooperation (since b does not enter the payoff differences A, A', B, B'). Moreover, because β ⩾ 0 and c1c2 > 0, the abundances of the four possible states always obey the same relationship as in figure 2(c),

Equation (21)

For β > 0 and c1 > c2 all inequalities in (21) are strict. That is, among the two players, the player with the lower cooperation cost can be expected to cooperate more often. However, mutual defection is always the most abundant state.

We can also use the stationary distribution (20) to study how parameter changes affect cooperation. In figure 2(d), we explore the effect of asymmetry. To this end, we increase the difference in the players' costs c1c2 while keeping the average cost (c1 + c2)/2 fixed. Interestingly, we observe a weakly positive effect. As we increase c1 and decrease c2, the first player's reduced cooperation is more than compensated by the second player.

Similarly, in figure 2(e) we explore the impact of selection strength. When selection is weak, such that payoff differences have a negligible impact, the stationary distribution simplifies to

Equation (22)

In particular, the last two states (D, C) and (D, D) are favored by selection, whereas the other two states are disfavored.

In contrast, for the strong selection regime, we take the limit of u as β becomes arbitrarily large. We obtain

Equation (23)

Thus, we recover the classical prediction that both players learn to defect at all times.

4.1.2. Stag-hunt game

As another instance of a social dilemma, we explore a version of the stag-hunt game [73]. In contrast to the prisoner's dilemma, we now assume that players only derive a benefit if they both cooperate (which could reflect the advantage of collective action in hunting expeditions). Under this assumption, the payoff matrix becomes

Equation (24)

As before, we assume the payoff parameters satisfy b > c1c2 > 0. Under this assumption, the stag-hunt game becomes a coordination game. There are two pure equilibria, according to which either both players cooperate, or both players defect. Mutual cooperation is always payoff-dominant (it is the equilibrium that gives a higher payoff to both players). However, when the benefit of cooperation is comparably small, b < c1 + c2, mutual defection is risk-dominant (loosely meaning that mutual defection is the safer option when there is uncertainty about the co-player's actions [74]).

We can analyze the introspection dynamics of the stag-hunt game in the same way as the prisoner's dilemma. This time, we set A = bc1, A' = bc2, B = c1, B' = c2. As a result, the stationary distribution according to (17) becomes

Equation (25)

It follows that there are two possible orderings of the abundance of the four states. Which ordering applies depends on how the benefit b relates to the total costs of cooperation c1 + c2,

Irrespective of the precise ordering, players are always most likely to settle at one of the two equilibrium outcomes. Moreover, introspection dynamics lends further support to the static notion of risk-dominance: players only coordinate on mutual cooperation if the benefit exceeds the sum of the costs. In figure 3, we illustrate this result for our baseline parameters, b = 1, c1 = 0.6, and c2 = 0.1. In particular, since c1 + c2 < b, mutual cooperation is by far the most abundant outcome.

Figure 3.

Figure 3. Stationary distribution across games. For three different social dilemmas, we compare the frequency of the four possible states (C, C), (C, D), (D, C) and (D, D). In the upper panels (a), (c) and (e), we show the stationary distribution for our baseline parameters, b = 1, c1 = 0.6, c2 = 0.1, β = 5. In the bottom panels (b), (d) and (f), we vary selection strength. Here, solid lines give the exact abundance of each state, whereas dashed lines correspond to the weak selection (first order) approximation.

Standard image High-resolution image

As in the prisoner's dilemma, we can also use the stationary distribution to discuss the impact of selection strength on cooperation, see figure 3. If selection is weak, we can approximate the stationary distribution by (18), yielding

Equation (26)

In particular, the second state (C, D) in which only the high-cost player cooperates is always disfavored by selection. The other three states may be favored or disfavored, depending on the exact values of the benefits and costs. For the strong selection limit, we obtain

Equation (27)

That is, in the limit of strong selection, introspection dynamics always selects the risk-dominant equilibrium (even though mutual cooperation is payoff-dominant for all parameter values).

4.1.3. Volunteer's dilemma

As a final example of a social dilemma, we consider the volunteer's dilemma [75]. Here, already one cooperating player is sufficient for both players to get a benefit. The payoff matrix is

Equation (28)

where b > c1c2 > 0. The game has two pure equilibria in which either only the first player or only the second player cooperates, (C, D) and (D, C). In both cases, the dilemma arises because players prefer the other player to volunteer as a cooperator.

For the volunteer's dilemma, we set A = −c1, A' = −c2, B = −(bc1), B' = −(bc2). By (17), the stationary distribution is

Equation (29)

Similar to the stag-hunt game, there are two possible orderings of the four states, depending on how the benefit b relates to the total costs c1 + c2 of cooperation,

Equation (30)

In either case, the most likely outcome is that the low-cost player 2 cooperates, whereas the high-cost player 1 defects, as illustrated in figure 3.

The impact of selection strength can be discussed analogously to the previous cases (figure 3). When selection is weak, the stationary distribution simplifies to

Equation (31)

Here, the third state (D, C) is always favored by selection. The other three states may be favored or disfavored, depending on the magnitudes of b, c1, c2. For strong selection and c1 > c2,

Equation (32)

That is, the low-cost player volunteers with certainty.

We briefly summarize the key results of this section in table 1. For each of the three social dilemmas we considered, we describe (i) the respective game parameters, (ii) the resulting stationary distribution, (iii) the respective weak selection limit, (iv) the strong selection limit, and (v) the game's Nash equilibria.

Table 1. Summary of our results on asymmetric social dilemmas with two strategies.

GamePrisoner's dilemmaStag-hunt gameVolunteer's dilemma
Parameters (A, A', B, B')(−c1, −c2, c1, c2)(bc1, bc2, c1, c2) $\left(\!-\!{c}_{1},\!-{c}_{2},\!-(b\!-\!{c}_{1}),\!-(b\!-\!{c}_{2})\right)$
Stationary distribution (up to a constant) $\left(1,{\mathrm{e}}^{{c}_{2}\beta },{\mathrm{e}}^{{c}_{1}\beta },{\mathrm{e}}^{({c}_{1}+{c}_{2})\beta }\right)$ $\left({\mathrm{e}}^{b\beta },{\mathrm{e}}^{{c}_{2}\beta },{\mathrm{e}}^{{c}_{1}\beta },{\mathrm{e}}^{({c}_{1}+{c}_{2})\beta }\right)$ $\left({\mathrm{e}}^{b\beta }\!\!,{\mathrm{e}}^{(b+{c}_{2})\beta }\!\!,{\mathrm{e}}^{(b+{c}_{1})\beta }\!\!,{\mathrm{e}}^{({c}_{1}+{c}_{2})\beta }\right)$
Weak selection (β ≪ 1)$\frac{1}{4}(1,1,1,1)+\frac{\beta }{16}(\star )$ $2(\phantom{\rule{0ex}{0ex}}-\phantom{\rule{0ex}{0ex}}{c}_{1}\phantom{\rule{0ex}{0ex}}-\phantom{\rule{0ex}{0ex}}{c}_{2},\phantom{\rule{0ex}{0ex}}-\phantom{\rule{0ex}{0ex}}{c}_{1}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}{c}_{2}$, $\left.{c}_{1}\!-\!{c}_{2},{c}_{1}\!+\!{c}_{2}\right)$ $\left(3b\!-\!2({c}_{1}\!+\!{c}_{2}),-b\!-\!2({c}_{1}\!-\!{c}_{2})\right.$, $\left.-b\!+\!2({c}_{1}\!-\!{c}_{2}),-b\!+\!2({c}_{1}\!+\!{c}_{2})\right)$ $\left(b\!-\!2({c}_{1}\!+\!{c}_{2}),b\!-\!2({c}_{1}\!-\!{c}_{2})\right.$, $\left.b\!+\!2({c}_{1}\!-\!{c}_{2}),-3b\!+\!2({c}_{1}\!+\!{c}_{2})\right)$
Strong selection (β)(0, 0, 0, 1) $\begin{array}{cccc}(1,0,0,0)& \text{if}\enspace b\phantom{\rule{0ex}{0ex}} > \phantom{\rule{0ex}{0ex}}{c}_{1}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}{c}_{2}\\ (\frac{1}{2},\enspace 0,\enspace 0,\enspace \frac{1}{2})& \text{if}\enspace b\phantom{\rule{0ex}{0ex}}=\phantom{\rule{0ex}{0ex}}{c}_{1}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}{c}_{2}\\ (0,0,0,1)& \text{if}\enspace b\phantom{\rule{0ex}{0ex}}< \phantom{\rule{0ex}{0ex}}{c}_{1}\phantom{\rule{0ex}{0ex}}+\phantom{\rule{0ex}{0ex}}{c}_{2}\end{array}$ $\begin{array}{cccc}(0,0,1,0)& \text{if}\enspace {c}_{1}\phantom{\rule{0ex}{0ex}} > \phantom{\rule{0ex}{0ex}}{c}_{2}\\ (0,\frac{1}{2},\frac{1}{2},0)& \text{if}\enspace {c}_{1}={c}_{2}\\ (0,1,0,0)& \text{if}\enspace {c}_{1}\phantom{\rule{0ex}{0ex}}< \phantom{\rule{0ex}{0ex}}{c}_{2}\end{array}$
Pure Nash equilibria(D, D)(C, C) and (D, D)(C, D) and (D, C)

4.2. Comparing introspection and imitation in symmetric games

While the previous section focused on asymmetric social dilemmas, introspection dynamics is equally applicable to symmetric games. In the special case that the game has two strategies only, the assumption of symmetry implies A = A' and B = B', and the payoff matrix (16) simplifies to

Equation (33)

Similarly, the stationary distribution u = (uCC , uCD , uDC , uDD ) reduces to

Equation (34)

In particular, we can immediately see that uCD = uDC for any A and B, as one would expect from a symmetric game. Moreover, the average cooperation probability of player 1 (and therefore also of player 2) becomes

Equation (35)

For a given selection strength, this formula has only two free parameters. Hence, we can use this formula to explore the expected cooperation rate across all symmetric 2 × 2 games, by simultaneously varying both A and B as in figure 4(a). Depending on the signs of A and B, we recover the four classical symmetric games [76]: the prisoner's dilemma (A < 0 and B > 0), the stag-hunt game (A > 0 and B > 0), the snowdrift game (A < 0 and B < 0), and the harmony game (A > 0 and B < 0). For each possible combination of (A, B), we plot the resulting cooperation probability according to (35). We observe that there are three qualitative regions: (i) for B > 0 and A < B, defection is either a dominant strategy, or it is risk-dominant. In this parameter region, we, therefore, observe comparably little cooperation. (ii) Conversely, for A > 0 and B < A, cooperation is either dominant or risk-dominant. As a consequence, introspection dynamics leads to almost full cooperation. (iii) If both A < 0 and B < 0 (i.e., in the snowdrift game), players have an incentive to choose the opposite strategy of the opponent. In this parameter region, we, therefore, observe an approximately equal share of cooperators and defectors.

Figure 4.

Figure 4. Frequency of cooperation across all symmetric 2 × 2 games. For this figure, we systematically varied the game parameters A and B. For each parameter combination, we measure the average cooperation rate if players either adopt strategies according to (a) introspection dynamics or (b) pairwise imitation. We observe that the two dynamics generally yield similar results unless the interaction takes the form of a snowdrift game. Here, introspection dynamics typically leads to a cooperation rate of approximately 50%, irrespective of the exact game parameters. In contrast, imitation dynamics depends more gradually on the game parameters. For this figure, we use a selection strength of β = 8 in each case. For the pairwise imitation dynamics, we additionally need to specify the population size (Z = 50) and the mutation rate (μ = 0.05). Moreover, the shown results for imitation dynamics assume well-mixed populations; structured populations tend to yield a different dynamics [51, 77].

Standard image High-resolution image

For symmetric games, we can compare introspection dynamics to the classical pairwise imitation rule [24], see figure 4(b) (for details of the implemented imitation model, see appendix D). In most parameter regions, the corresponding results are strikingly similar. Only in the snowdrift game regime, imitation leads to a more gradual change from full cooperation (when B < 0 and A ≈ 0) to full defection (when A < 0 and B ≈ 0). To explain this difference, we note that the imitation process takes place in an entire population of players. As a result, individuals do not adapt their strategy to a specific opponent, but rather to the population average. The resulting imitation dynamics can lead to a mixed equilibrium in which cooperators and defectors coexist. The exact position of this equilibrium changes gradually in the game parameters, as observed in figure 4(b). Overall, these results suggest that the two dynamics are comparable when the game tends to converge toward a homogeneous population. However, if the game has at least one stable mixed equilibrium (as in the snowdrift game), the predictions of the two models may diverge.

4.3. The volunteer's timing dilemma

In the previous examples, we have considered social dilemmas with two strategies only. In this section, we illustrate how introspection dynamics can be applied to a game with arbitrarily many strategies. To this end, we take the volunteer's dilemma and turn it into a timing dilemma. Here, players no longer only determine whether or not to volunteer. Instead, the game takes place over time, and players determine how long they wait for the co-player to volunteer before they volunteer themselves. This kind of game has been first studied by Weesie [78] as a model to explore the emergence of 'wait and see' behaviors in the context of voluntary action. Here we explore the game's introspection dynamics.

To formalize the game, we assume that within a given time interval [t0, tmax] := [0, 1], players need to make a decision whether to volunteer. For simplicity, we assume that time is discretized, such that there are n + 1 evenly spaced time points {t0, t1, ..., tn } = {0, 1/n, ..., 1}, at which players may volunteer. As before, we assume that players may have different costs to volunteer and that player 1's cost tends to be larger, c1c2. If at least one player volunteers during the time interval, both players derive some benefit. However, to add a component of time pressure, we assume that the benefit of cooperation is linearly degrading in time. That is, if one of the players cooperates immediately at time t0, both players get a benefit of b > ci . However, if the first player to volunteer cooperates at time tmax, the resulting benefit is zero.

A strategy for the volunteer's timing dilemma is now a rule that tells the player at which point to volunteer. To this end, we associate each strategy {${\mathbf{S}}_{0},\,{\mathbf{S}}_{1}$, ..., ${\mathbf{S}}_{n}$} with a waiting time {t0, t1, ..., tn }. For i < n, a player with strategy Si volunteers at time ti , unless the co-player has already volunteered earlier (in which case the focal player's cooperation is no longer required). For i = n, we associate the respective strategy Sn with not cooperating at all. If player 1 and player 2 adopt the strategies Si and ${\mathbf{S}}_{j}^{\prime }$, respectively, the resulting payoffs are

Equation (36)

Equivalently, the game can be represented by the payoff matrix

Equation (37)

Note that for n = 1, we recover the payoffs of the original volunteer's dilemma.

We explore the introspection dynamics of the volunteer's timing dilemma numerically, by computing the stationary distribution with (6). To this end, we first consider a case in which n = 4, such that the players' possible waiting times are $\!t\!\in \left\{0,\frac{1}{4},\frac{1}{2},\frac{3}{4},1\right\}$. Moreover, we consider a normalized benefit of b = 1, and the player's cooperation costs are c1 = 0.7 and c2 = 0.3, respectively. The resulting stationary distribution is displayed in figure 5(a). As one may expect, we observe that the low-cost player is more cooperative; however, more surprisingly, this player typically cooperates without any delay.

Figure 5.

Figure 5. Introspection dynamics of the volunteer's timing dilemma. (a) We first consider a game in which players can volunteer at n = 4 possible time points 0, 1/4, 2/4, 3/4, or do not cooperate at all (time 1). We assume that the benefit of immediate cooperation is b = 1, whereas the players' costs of volunteering are c1 = 0.6 and c2 = 0.1, respectively. Using a strength of selection of β = 10, we compute the stationary distribution with (6). We find that the most likely outcome is that the low-cost player cooperates without delay, whereas the high-cost player waits as long as possible. (b) Maintaining n = 4 and b = 1, we compute the average volunteer time for varying cost difference (c1c2) and intensity of selection (β = 5, 10 and 50). Note that the average cost is kept constant, (c1 + c2)/2 = 0.5. We verify that for an increasing cost asymmetry, the time to act decreases. The impact is stronger for high intensity of selection. (c) Finally, we have computed the stationary distribution for varying discretizations of time (dots with solid lines), the other parameters being the same as in (a). In addition, we have simulated the basic process described in section 2, for which alternative strategies are uniformly drawn from the entire interval [0, 1] (dashed lines). We observe that for large n the numerically computed cooperation probabilities approach the time average of the simulations.

Standard image High-resolution image

A natural question, then, is if the cost asymmetry helps to solve the timing dilemma. In figure 5(b), we show that the higher the difference in costs, the faster one of the players volunteers. This positive effect of asymmetry is particularly pronounced when selection is strong. To explore how these results depend on our discretization of the time interval, we have repeated the above analysis for different n ∈ {1, ..., 15}. In addition, we have run simulations in which players are able to volunteer at any time in [0, 1]. The results of this analysis are displayed in figure 5(c). Interestingly, the low-cost player is most likely to volunteer when n = 1 (the original volunteer's dilemma), in which case a 'wait and see' approach is ruled out by the design of the game. However, across all values of n considered, the low-cost player always cooperates with a probability of at least 80%, implying that this player remains the most reliable volunteer.

5. Discussion and conclusion

Herein, we present a simple model of learning in social interactions. The model considers individuals who can choose among several strategies using introspection, that is, by reasoning about their strategies' prospective consequences. Compared to imitation models [24], this approach has the advantage that it can be applied to symmetric and asymmetric games alike. As another advantage, we can derive explicit formulas that describe how the system evolves in time, and what the long-run abundance of each strategy is. These formulas become particularly simple when players can only choose among two strategies, or when selection is weak [7981].

Mathematically, the model takes the form of a Markov chain, whose states are the possible combinations of the strategies of the two players. In particular, if the players can choose among m and n strategies, respectively, there are mn states, arguably the minimum number of states that any learning model for such asymmetric games must have. While a similar Markov approach can also be used to analyze imitation processes in populations of players, the two approaches differ in their computational complexity. Population models need to record how many players apply any given strategy at any point in time. As a result, if a symmetric game with n strategies is played in a population of size Z, there are $\left(\genfrac{}{}{0.0pt}{}{Z+n-1}{n-1}\right)$ possible states [82]. Since this number of states increases exponentially in n, numerically exact results are only feasible in games with a few strategies [83], or when mutations are rare [8487]. In contrast, the computational complexity of introspection dynamics only depends polynomially on the number of strategies. As a result, the stationary distribution can be easily computed even in complex games with many possible moves [67].

Despite these differences in terms of computational complexity, the results of introspection dynamics are often in remarkable agreement with other evolutionary processes. For example, in the limit of weak selection, the introspection dynamics of two adapting learners becomes equivalent to a birth–death model of two co-evolving populations [60] (section 3.2). Similarly, across all symmetric 2 × 2 games, we find that introspection dynamics often recovers the results of pairwise imitation [24] (figure 4). The only notable exception occurs for the snowdrift game (as in [10]). In the snowdrift game, pairwise imitation typically selects for the mixed symmetric equilibrium, in which cooperators and defectors coexist. In contrast, introspection dynamics selects pure but asymmetric equilibria in which one player cooperates and the other defects.

To illustrate our analytical results, we apply our framework to a number of asymmetric games. As one particular example, we study the dynamics of the volunteer's timing dilemma [78]. In this game, one player is required to volunteer as quickly as possible to create a benefit for the whole group; yet each player may be tempted to wait, hoping the other player would give in first. When players differ in their costs to volunteer, we observe not only that the player with the lower cost is more likely to volunteer; we also note that this player usually volunteers without any delay (figure 5). This game thus illustrates how players may sometimes benefit from asymmetry because it helps them to coordinate more efficiently. Herein, we have studied this advantage of asymmetry by assuming that players have different costs. However, similar results could be obtained for other sources of asymmetry, such as when people differ in their endowments [43, 44], their productivities [54], or their strategic options more generally [41].

Here, we consider a comparably simple setup of introspective learning: there are two individuals who continually interact with each other. However, one could also imagine an entire population of introspective learners who interact with one another in such pairwise games. The results of such a process are likely to depend on the population's network topology [51, 77, 8890]. Alternatively, one may also imagine that introspective learners engage in interactions that involve more than two players at a time [91, 92]. The introspection dynamics of such multiplayer games can easily be studied with simulations [54], similar to the 2-player games considered in this paper. Especially for asymmetric games with many unequal players, introspection dynamics can serve as a simple model to study the resulting learning processes.

Acknowledgments

This work was supported by the European Research Council Starting Grant 850529 (E-DIRECT) and by the Max Planck Society. We would like to thank the members of the Research Group Dynamics of Social Behavior and the Thesis Advisory Committee of MC for valuable feedback.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A.: General properties of introspection dynamics

Here, we derive some properties of the transition matrix T defined in (3) in the main text and its stationary distribution u.

Proposition 1 (Properties of the transition matrix T).

  • (a)  
    For a fixed j, T is unchanged by adding an arbitrary constant dj to all payoffs πij .
  • (b)  
    For a fixed i, T is unchanged by adding an arbitrary constant ${d}_{i}^{\prime }$ to all payoffs ${\pi }_{ij}^{\prime }$.
  • (c)  
    If β is finite, T is primitive.

Proof. 

  • (a)  
    Transition probabilities in (3) in the main text only depend on payoff differences. If we fix j and replace all πij with ${\tilde{\pi }}_{ij}{:=}{\pi }_{ij}\!+\!{d}_{j}$, the transition probabilities are unaltered since, for all p ∈ {1, ..., m}, ${\varphi }_{\beta }({\tilde{\pi }}_{pj}-{\tilde{\pi }}_{ij})={\varphi }_{\beta }\left(({\pi }_{pj}\!+\!{d}_{j})-({\pi }_{ij}\!+\!{d}_{j})\right)={\varphi }_{\beta }({\pi }_{pj}-{\pi }_{ij}).$
  • (b)  
    A similar argument as in point (a) applies by fixing i and replacing all ${\pi }_{ij}^{\prime }$ with ${\pi }_{ij}^{\prime }+{d}_{i}^{\prime }$.
  • (c)  
    T is primitive if there is $a\in \mathbb{N}$ such that Ta is positive. Let a = 2. For arbitrary i, j, k, l,
    For finite β, the entries of T as defined in (3) in the main text imply that both terms on the right-hand side of the above inequality are positive. Therefore, T2 is positive and T is primitive under finite selection intensity.

  □

The first two properties described in proposition 1 are useful as they allow us to simplify the payoff matrices we need to consider (as illustrated in section 3.3). The last property helps us to analyze the long-term abundances of each strategy. Because T is primitive for finite β, the Perron–Frobenius theorem implies that the likelihood uij to observe the two players in a given state (S i , ${\mathbf{S}}_{j}^{\prime }$) converges as a function of time. Moreover, this likelihood can be determined by finding the unique solution of (5) in the main text

Equation (A1)

While (A1) provides an implicit characterization of the stationary distribution u, we can also provide an explicit representation.

Proposition 2 (An explicit representation of the stationary distribution). For $p\!\in \!\mathbb{N}$, let T be a row-stochastic and primitive p × p matrix, I denote the p × p identity matrix, and U = e e denote the p × p matrix whose entries are all equal to 1. Then (I + UT) is invertible, and the unique solution of (A1) can be given by u = e (I + UT)−1.

Proof. One proof of this result can be found in [70], where (I + UT)−1 is shown to be a fundamental matrix of the ergodic chain. In the following, we provide an independent proof.

Suppose u satisfies (A1). If we multiply the second equation in (A1) by e from the right, and subtract the result from the first equation, we obtain, after rearranging some of the terms

Equation (A2)

Thus, for the proposition to hold, we only need to verify that (I + UT) is invertible. To this end, let λ1, ..., λp be the eigenvalues of T and w1, ..., wp the corresponding right eigenvectors. Since T is row-stochastic it has an eigenvalue λ1 = 1 with corresponding right eigenvector w1 = (1, 1, ..., 1). By T's primitivity, it follows that IT has a unique eigenvalue equal to 0 with corresponding right eigenvector w1, while 1 − λk ≠ 0 for k = 2, ..., p. Because $U={\mathbf{w}}_{1}{\mathbf{w}}_{1}^{\top }$, it follows by Brauer's theorem (example 1.2.8, p 51 of [93]), that the eigenvalues of (U + IT) are ${\mathbf{w}}_{1}^{\top }{\mathbf{w}}_{1},1-{\lambda }_{2},\dots ,1-{\lambda }_{p}$. That is, in general the eigenvalues are the same as for the matrix IT; only the eigenvalue corresponding to w1 gets replaced by ${\mathbf{w}}_{1}^{\top }{\mathbf{w}}_{1}$. Since ${\mathbf{w}}_{1}^{\top }{\mathbf{w}}_{1}=p\! > \!0$ and 1 − λk ≠ 0 for k = 2, ..., p, the matrix (U + IT) has no eigenvalue equal to 0. Therefore, it is invertible. □

While the stationary distribution u is well-defined for any finite β, in section 4.1 we also study which strategies are played as β approaches infinity. To this end, we first compute an expression for u(β) that is valid for finite β. Thereafter, we take the limit limβu(β) in ${\mathbb{R}}^{mn}$. For all considered 2 × 2 games, this yields a unique prediction of the strong selection limit—even for those games for which the respective limiting transition matrix limβT(β) allows for several absorbing states. One example is the stag-hunt game (see table 1). Here, introspection dynamics predicts that if (C, C) is risk-dominant, players coordinate on mutual cooperation. Similarly, if (D, D) is risk-dominant, players coordinate on mutual defection. In comparison, according to the limiting transition matrix limβT(β), both (C, C) and (D, D) are absorbing, irrespective of which one is risk-dominant. These results suggest that the strong selection limit of introspection dynamics can serve as an equilibrium selection device for 2 × 2 games, similar to other evolutionary dynamics [94, 95].

Appendix B.: Introspection dynamics under weak selection

Here, we derive an explicit expression for the linear approximation of the stationary distribution when selection is weak, uu0 + u1 β. As we show in the main text, in (9) and (10), the constant term u0 of the stationary distribution obeys

Equation (B1)

whereas the linear u1 term satisfies

Equation (B2)

Both systems can be solved explicitly. For that, we first compute T0 := T|β=0 and T1 := ∂T/∂β|β=0, which are the constant and the linear term of the transition matrix. Based on (3), we obtain

Equation (B3)

In the special case that both players have the same number of strategies, the stationary distribution takes a particularly simple form, as the following result shows.

Proposition 3 (Weak selection approximation of the invariant distribution).

  • (a)  
    The system of equations (B1) has a unique solution given by
    Equation (B4)
  • (b)  
    The system of equations (B2) has a unique solution. For m = n, it takes the simple form
    Equation (B5)

Proof. 

  • (a)  
    To prove the first part, we note that T0 is symmetric; indeed, swapping i with k and j with l in the expression for T0 in (B3) shows that ${({T}_{0})}_{ij,kl}={({T}_{0})}_{kl,ij}$. The stationary distribution of a Markov chain with symmetric transition matrix is uniform [70]. Since T is of size mn, the result in (B4) immediately follows.
  • (b)  
    To show that any solution to (B2) must be unique, we multiply the second equation in (B2) by the row-vector e from the right, and add the result to the first equation. This yields
    Equation (B6)
    where again U = e e. By proposition 2, the matrix (I + UT0) is invertible, and hence any solution to (B2) is uniquely determined by ${\mathbf{u}}_{1}={\mathbf{u}}_{0}\,{T}_{1}{(I+U-{T}_{0})}^{-1}$.

It remains to show that for m = n, the vector ${\hat{\mathbf{u}}}_{1}$ as defined by (B5) is in fact a solution of (B2). To this end, we first compute the right-hand side of the first condition in (B2) by plugging in the definitions of ${\hat{\mathbf{u}}}_{0}$ according to (B4) and of T1 according to (B3),

Equation (B7)

Similarly, we can compute the left-hand side of the first condition in (B2). By setting m = n and using the definition of T0 according to (B3), we obtain

Equation (B8)

In a next step, we plug in the definition of ${\hat{\mathbf{u}}}_{1}$ according to (B5), yielding

Equation (B9)

We can further simplify this expression by rewriting the sums in the second and third line,

Equation (B10)

Now, the second and the third sum can be further simplified naturally. Then, by collecting like terms across the three sums, we eventually obtain a simple result,

Equation (B11)

This expression for ${\left({\hat{\mathbf{u}}}_{1}(I\!-\!{T}_{0})\right)}_{\!kl}$ coincides with the expression we obtained for ${\left({\hat{\mathbf{u}}}_{0}{T}_{1}\right)}_{\!kl}$ in (B7). Thus, the first condition in (B2) is satisfied. To verify the second condition in (B2), we compute

Equation (B12)

Next, we derive (14) in the main text. For a given stationary distribution u, we define the respective marginal distributions by ${\xi }_{i}{:=}{\sum }_{j=1}^{n}{(\mathbf{u})}_{ij}$ and ${\xi }_{j}^{\prime }{:=}{\sum }_{i=1}^{m}{(\mathbf{u})}_{ij}$. Using proposition 3 we can derive the following weak selection formulas for ξi and ${\xi }_{j}^{\prime }$ when the two players have the same number of strategies (m = n).

Corollary 1 (Marginal distributions for weak selection). For m = n and small β, the abundance of the strategies Si and ${\mathbf{S}}_{j}^{\prime }$ is

Equation (B13)

Proof. To obtain the first equation, we use the weak selection formula in proposition 3 and sum up over all co-player's strategies ${\mathbf{S}}_{j}^{\prime }$,

Equation (B14)

The analogous formula for ${\xi }_{j}^{\prime }$ follows by symmetry. □

Three remarks are in order.

  • (a)  
    Independence of the co-player's payoffs. According to corollary 1, the abundance of a player's strategy in the limit of weak selection only depends on that player's payoffs. For instance, in (B13), the abundance ξi depends on the first player's payoffs πpq , but it is independent of the second player's payoffs ${\pi }_{pq}^{\prime }$. While such a result may appear intuitive, it is important to note that this result only holds in the limit of weak selection. As an example, consider the game with 2 × 2 payoff matrix
    Equation (B15)
    For this game, (B13) implies that for sufficiently small β, the marginal abundance of ${\mathbf{S}}_{1}$ is approximately given by ${\xi }_{1}\!=\!\frac{1}{2}\!+\!\beta \frac{3}{8}$, irrespective of the value of x. However, in the limit of strong selection, β, one can show that ξ1 = 1 if x is positive, but ξ1 = 0 if x is negative. Therefore, the co-player's payoffs do in general affect how likely a player is to adopt a certain strategy. Only when selection is weak (such that each co-player's strategy is played with approximately equal frequency), this dependency disappears.
  • (b)  
    Comparing the abundance of different strategies. Corollary 1 also allows us to rank the different strategies of a player according to how often they are played in the stationary distribution. To this end, we say strategy Si is favored to Sk , and we write Si Sk , if ξi > ξk . An analogous notation can be defined for the column player. By corollary 1 we find that when selection is weak, then Si Sk if and only if $\sum _{q=1}^{n}{\pi }_{iq}\! > \!\sum _{q=1}^{n}{\pi }_{kq}$. Again, the respective condition only depends on the focal player's payoffs πpq and not on the co-player's payoffs ${\pi }_{pq}^{\prime }$. However, as before, this independence vanishes for stronger selection (which can be shown with the same example (B15)).
  • (c)  
    Comparative statics. Finally, we can use corollary 1 to compute how changes in the players' payoffs affect the resulting strategy abundances in the limit of weak selection. By (B13), we note that ∂ξi /∂πpq is positive if p = i, while it is negative if pi. Hence, increasing any one of the payoffs πiq has a positive effect on the long-run abundance of strategy Si , whereas increasing any one of the other payoffs πpq with pi has a negative effect.

Appendix C.: Higher order approximations

In the previous section, we have derived a linear approximation for the long run abundance of each strategy profile $({\mathbf{S}}_{i},{\mathbf{S}}_{j}^{\prime })$ under weak selection. Using similar techniques, we can obtain arbitrary higher-order approximations. To this end, we define the kth derivative of v and T, respectively, evaluated at β = 0,

Equation (C1)

By computing the respective higher-order expansion of (7) and comparing coefficients with respect to the power βk , we obtain for k ⩾ 1

Equation (C2)

By the first equation, it follows that we can compute uk recursively from all lower terms uj with j < k as

Equation (C3)

By multiplying the second equation in (C2) by the row-vector e from the right, and adding the result to (C3), we obtain

Equation (C4)

Since (I + UT0) is invertible by proposition 2, we obtain the explicit solution

Equation (C5)

For introspection dynamics, the first factor on the right-hand side can be further simplified. We note that Ti is the zero matrix for all even i ⩾ 2, as the following result shows.

Proposition 4. For Ti defined as in (C1), Ti = 0 for any even integer i ⩾ 2.

Proof. By its definition (3) in the main text, we can write all entries of the transition matrix T as a linear combination of constant terms and functions of the form ${\varphi }_{\beta }({\Delta}\pi )={(1+{\mathrm{e}}^{-\beta {\Delta}\pi })}^{-1}$. ex Since, by taking the kth derivative of (2) in the main text,

Equation (C6)

the entries of Tk are proportional to

Equation (C7)

where E(k, l) are the Eulerian numbers defined by

For 1 ⩽ k ⩽ 5 and 0 ⩽ l ⩽ 5, these numbers are tabulated in table 2. A useful property of Eulerian numbers is the symmetry E(k, l) = E(k, kl − 1). The sum in (C7) translates into adding the numbers of the kth line of the table with alternating signs. For even k, that sum is zero because the different Eulerian numbers exactly cancel each other out due to their symmetry. □

As a result, the first seven terms of the Taylor expansion of the stationary distribution u are

Equation (C8)

Table 2. The Eulerian numbers E(k, l) for 1 ⩽ k ⩽ 5 and 0 ⩽ l ⩽ 5.

k\l 012345
1100000
2110000
3141000
411111100
5126662610

Appendix D.: Pairwise imitation dynamics

Pairwise imitation dynamics is a frequency-dependent update rule where players can imitate other players' strategies [24]. The dynamics takes place in an entire population of size Z. For symmetric games with two strategies, as considered in section 4.2, the state of the population can be defined by the number of individuals i who use strategy C (the number of individuals who use strategy D is then Zi). Given the current population state i, one can calculate the expected payoffs of cooperators and defectors, respectively, as

Equation (D1)

Similar to introspection dynamics, pairwise imitation dynamics assumes that at regular time intervals, a randomly chosen player is given the opportunity to revise their strategy. With probability μ (the mutation rate), this player simply adopts a randomly chosen strategy. With probability 1 − μ, the player instead picks a randomly chosen role model from the population. If the focal player's payoff is π and the role model's payoff is $\tilde{\pi }$, the focal player adopts the role model's strategy with probability ${\varphi }_{\beta }(\tilde{\pi }-\pi )$, where φβ is again the Fermi function defined by (2).

This process can also be described by a Markov chain. Given that the current population state is i, the probability that a D-player changes to C in the next time step is

Equation (D2a)

Similarly, the probability that a C-player changes to D is

Equation (D2b)

Since only one player is allowed to update at each time step,

Equation (D2c)

Finally, for the normalization condition to hold, the probability of remaining in the same state is

Equation (D2d)

As a result, the transition matrix is

As before, we can characterize the long-run dynamics of this process by computing the stationary distribution u = (u0, u1, ..., uZ ). In this case, however, ui denotes the prevalence of a population state with i cooperators and Zi defectors.

To construct a metric that is comparable to the one used for introspection dynamics, we compute the average fraction of cooperation,

Equation (D3)

which we plot in figure 4(b). Additionally, one can also obtain the average probability of drawing CC, CD, DC, and DD pairs from the population

Equation (D4a)

Equation (D4b)

Equation (D4c)

Again, pCD = pDC because players are indistinguishable in the symmetric case. These quantities are comparable to the state frequencies of our stationary distribution, u = (uCC , uCD , uDC , uDD ).

Please wait… references are loading.