Erschienen in:

Open Access 16.03.2018

Minmax via Replicator Dynamics

verfasst von: Josef Hofbauer

Erschienen in: Dynamic Games and Applications | Ausgabe 3/2018

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

I present a short proof of the minmax theorem using the replicator dynamics.

1 Introduction

The minmax theorem of von Neumann [10] says that

$$\begin{aligned} { \max _{x\in X} \min _{y \in Y} U(x,y)= \min _{y\in Y} \max _{x \in X} U(x,y) } \end{aligned}$$

where X, Y are the unit simplices in ${\mathbb {R}}^n, {\mathbb {R}}^m$ and $U: X \times Y \rightarrow {\mathbb {R}}$ is a continuous function, quasi-concave in x and quasi-convex in y. The proof was by induction on the number of variables, see also [7]. An important special case is where U is a bilinear function $U(x,y) = x \! \cdot \!Ay$, with A an $n \times m $ matrix.

The idea to use dynamics for proving the minmax theorem (and computing the equilibria) goes back to Brown [1, 2]: for symmetric zero-sum games, i.e., $A = -A^{{\mathsf {T}}}$, he proved together with von Neumann [2] that the solutions of a certain differential equation converge to the set of equilibria. In [1], he showed that the (continuous time) fictitious play process approaches the set of equilibria in any finite zero-sum game, which implies the minmax theorem for any $n \times m $ matrix A. Brown’s fictitious play process [1] is now often framed as best response dynamics and can be used to prove the minmax theorem for more general payoff functions U, which are continuous and concave/convex, see [6]. For the original version [10] for continuous quasi-concave/quasi-convex functions, a dynamic proof is still missing. Another proof based on differential inclusions can be found in [8].

In the present note, I give a short proof of the minmax theorem in the matrix case, based on the replicator dynamics.

2 Replicator Dynamics

The replicator dynamics [5] for an $n \times m $ bimatrix game (A, B) is given by

$$\begin{aligned} \dot{x}_i= & {} x_i \left( e^i \! \cdot \!Ay - x\! \cdot \!Ay \right) \qquad i=1, \ldots , n \\ \dot{y}_j= & {} y_j \left( e^j \! \cdot \!Bx - y\! \cdot \!Bx \right) \qquad j=1, \ldots , m \end{aligned}$$

Here $x_i$ denotes the frequency of strategy i of player 1, hence $x = (x_1, \dots , x_n)$ is in the probability simplex $ \Delta _n = \{x \in [0,1]^n: \sum x_i = 1\}$, $y_j$ is the frequency of strategy j of player 2, $y = (y_1, \dots , y_m) \in \Delta _m$, and $e^i$ denotes the ith unit vector.

Besides its original derivation from evolution and natural selection, there are at least two economic motivations based on imitation and on reinforcement learning.

For a zero-sum game $B = -A^{{\mathsf {T}}}$, in the interior of $\Delta _n \times \Delta _m$ we obtain

$$\begin{aligned} \dot{x}_i/x_i= & {} e^i \! \cdot \!Ay - x\! \cdot \!Ay \quad i=1, \ldots , n \end{aligned}$$

(1)

$$\begin{aligned} \dot{y}_j/y_j= & {} - x \! \cdot \!A e^j + x\! \cdot \!Ay \quad j=1, \ldots , m \end{aligned}$$

(2)

Now add these equations

$$\begin{aligned} \frac{\dot{x}_i}{x_i} + \frac{\dot{y}_j}{y_j} = e^i \! \cdot \!Ay - x \! \cdot \!A e^j \quad \forall i,j \end{aligned}$$

and integrate

$$\begin{aligned} \frac{\log x_i(T) - \log x_i(0)+ \log y_j(T) - \log y_j(0)}{T} = e^i \! \cdot \!A{\bar{y}}(T) - {\bar{x}}(T) \! \cdot \!A e^j \end{aligned}$$

where

$$\begin{aligned} {\bar{x}}(T) = \frac{1}{T} \int _0^T x(t)\mathrm{d}t, \quad {\bar{y}}(T) = \frac{1}{T} \int _0^T y(t)\mathrm{d}t\quad \end{aligned}$$

denote time averages of the solutions of (1, 2). Now consider limit points, i.e., choose a sequence $T_k \rightarrow \infty $ s.t. ${\bar{x}}(T_k) \rightarrow {\bar{x}}$, ${\bar{y}}(T_k) \rightarrow {\bar{y}}$. Since $\log x_i (T) \le 0$, we obtain $ 0 \ge e^i \! \cdot \!A{\bar{y}} - {\bar{x}} \! \cdot \!A e^j \quad \forall i,j $ or

$$\begin{aligned} e^i \! \cdot \!A{\bar{y}} \le {\bar{x}} \! \cdot \!A e^j \quad \forall i,j. \end{aligned}$$

(3)

Multiplying by $x_i$ and $y_j$ and summing over i and j, we obtain

$$\begin{aligned} \max _{x \in \Delta _n} x \! \cdot \!A{\bar{y}} \le \min _{y\in \Delta _m} {\bar{x}} \! \cdot \!A y \end{aligned}$$

(4)

and

$$\begin{aligned} \min _y \max _x x \! \cdot \!A y \le \max _x x \! \cdot \!A{\bar{y}} \le \min _y {\bar{x}} \! \cdot \!A y \le \max _x \min _y x \! \cdot \!A y, \end{aligned}$$

and together with the obvious inequality, we obtain

$$\begin{aligned} \min _{y \in \Delta _m} \max _{x \in \Delta _n} x \! \cdot \!A y = \max _{x \in \Delta _n} \min _{y \in \Delta _m} x \! \cdot \!A y. \end{aligned}$$

(5)

Additionally, (3) or (4) also imply

$$\begin{aligned} x \! \cdot \!A{\bar{y}} \le {\bar{x}} \! \cdot \!A{\bar{y}} \le {\bar{x}} \! \cdot \!A y \quad \forall x,y \end{aligned}$$

(6)

so $({\bar{x}}, {\bar{y}})$ is a pair of optimal strategies for the zero–sum game. (In particular, this shows the existence of equilibria.)

Furthermore, if we integrate (1, 2) directly, then we obtain $ 0 \ge e^i \! \cdot \!A{\bar{y}} - {\bar{a}}$ and $0 \ge - {\bar{x}} \! \cdot \!A e^j + {\bar{a}}$, with ${\bar{a}} = \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T x(t)A y(t) \mathrm{d}t$, and hence

$$\begin{aligned} e^i \! \cdot \!A{\bar{y}} \le {\bar{a}} \le {\bar{x}} \! \cdot \!A e^j \quad \forall i,j. \end{aligned}$$

(7)

Comparing with (6), we get ${\bar{a}} = {\bar{x}} \! \cdot \!A{\bar{y}}$.

Summarizing, besides minmax theorem (5), we have shown:

Theorem

Every limit point $({\bar{x}}, {\bar{y}})$ of the time averages $({\bar{x}}(T), {\bar{y}}(T))$ of positive solutions (x(t), y(t)) of the replicator dynamics is a pair of optimal strategies of the zero-sum game. And the time averages of the payoffs

$$\begin{aligned} \frac{1}{T} \int _0^T \sum _{i,j} a_{ij} x_i(t)y_j(t)dt \end{aligned}$$

converge to the value ${\bar{x}} \! \cdot \!A {\bar{y}}$ of the game, as $T \rightarrow \infty $.

3 Remarks

If $\log x_i(T)$ and $\log y_j(T)$ are bounded functions of T (i.e., the solution stays at a positive distance from the boundary of $\Delta _n$ and $\Delta _m$), then we have equality in (3) for all i, j, and the existence of a fully mixed equilibrium follows. The converse holds as well, see [3, 5, 9]: If $(p,q) > 0$ is an equilibrium of the zero-sum game, then the relative entropy

$$\begin{aligned} H(x,y ) = - \sum _i p_i \log \frac{x_i}{p_i} - \sum _j q_j \log \frac{y_j}{q_j} \ge 0 \end{aligned}$$

or Kullback–Leibler divergence is a constant of motion for (1, 2): $\dot{H} = 0$. The replicator dynamics is even a Hamiltonian system w.r.t. a suitable symplectic or Poisson structure [3], and hence, on each level set of H, by Poincaré’s recurrence theorem, almost every solution is recurrent. The behavior of the solutions might be chaotic, but by the above theorem, their time averages approach the set of equilibria.

For nonzero-sum games, a similar argument shows that the time averages

$$\begin{aligned} \frac{1}{T} \int _0^T x_i(t)y_j(t)\mathrm{d}t \end{aligned}$$

(i.e., how often does player 1 use strategy i against strategy j of player 2 in a given period) converge (as $T \rightarrow \infty $) to the set of exact coarse correlated equilibria, see [4]. This holds also for N player normal form games.

Acknowledgements

Open access funding provided by University of Vienna.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Vorheriger Artikel Selection of a Markov Perfect Nash Equilibrium in a Class of Differential Games

Springer Professional

Abstract

1 Introduction

2 Replicator Dynamics

3 Remarks

Acknowledgements

Weitere Artikel der Ausgabe 3/2018

A Dynamic Collective Choice Model with an Advertiser

Nurturing an Infant Industry by Markovian Subsidy Schemes

Strategic Rivalry for Market Share: A Contest Theory Approach to Dynamic Advertising Competition

Finance and Economic Growth in a Dynamic Game

Preface: Special issue of Dynamic Games and Applications in Memory of Professor Engelbert J. Dockner

The Strategic Role of Dividends and Debt in Markets with Imperfect Competition