main-content

Published in:

03-12-2018

# Dynamic Exploitation of Myopic Best Response

Author: Burkhard C. Schipper

Published in: Dynamic Games and Applications | Issue 4/2019

Login to get access

## Abstract

How can a rational player manipulate a myopic best response player in a repeated two-player game? We show that in games with strategic substitutes or strategic complements the optimal control strategy is monotone in the initial action of the opponent, in time periods, and in the discount rate. As an interesting example outside this class of games we present a repeated “textbook-like” Cournot duopoly with nonnegative prices and show that the optimal control strategy involves a cycle.

### Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

• über 69.000 Bücher
• über 500 Zeitschriften

aus folgenden Fachgebieten:

• Automobil + Motoren
• Bauwesen + Immobilien
• Business IT + Informatik
• Elektrotechnik + Elektronik
• Energie + Nachhaltigkeit
• Finance + Banking
• Management + Führung
• Marketing + Vertrieb
• Maschinenbau + Werkstoffe
• Versicherung + Risiko

Testen Sie jetzt 15 Tage kostenlos.

### Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

• über 50.000 Bücher
• über 380 Zeitschriften

aus folgenden Fachgebieten:

• Automobil + Motoren
• Bauwesen + Immobilien
• Business IT + Informatik
• Elektrotechnik + Elektronik
• Energie + Nachhaltigkeit
• Maschinenbau + Werkstoffe

Testen Sie jetzt 15 Tage kostenlos.

### Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

• über 58.000 Bücher
• über 300 Zeitschriften

aus folgenden Fachgebieten:

• Bauwesen + Immobilien
• Business IT + Informatik
• Finance + Banking
• Management + Führung
• Marketing + Vertrieb
• Versicherung + Risiko

Testen Sie jetzt 15 Tage kostenlos.

Appendix
Available only for authorised users
Footnotes
1
The game was repeated over 40 rounds. The participant played the cycle of quantities (108, 70, 54, 42). This cycle yields an average payoff of 1520 which is well above Stackelberg leader payoff of 1458. In this game, the Stackelberg leader’s quantity is 54, the follower’s quantity is 27 (payoff 728), the Cournot Nash equilibrium quantity 36 (payoff 1296). The computer is programmed to myopic best response with some noise. The x-axis in Fig. 1 indicates the rounds of play, the y-axis the quantities. The lower time series depicts the computer’s sequence of actions. The upper time series shows the participant’s quantities. See Duersch et al. [15] for details of the game and the experiment.

2
In fact, the average payoff of the optimal cycle is 1522, only a minor improvement over the average payoff (1520) of the cycle played by the participant.

3
As a reviewer pointed out, this literature is related to the literature on indirect evolution (e.g., [25, 29]). Yet, instead of the evolution of utility function, the evolution of learning heuristics is featured.

4
In Sect. 4 we explain why we do not consider here multi-dimensional strategy sets.

5
Note that throughout the analysis we do not allow the manipulator to choose suitably the initial action of the puppet.

6
As a reviewer rightfully points out this would be problematic if the manipulator does not know the learning heuristic used by the puppet.

7
As a reviewer pointed out, we could have stated the model just in terms of assumptions on m and a continuous best-response function b. This might be even more realistic as the manipulator may observe the opponent’s best responses but not necessarily the opponent’s payoff function.

8
In the first four periods, the cyclic example of Sect. 3 coincides with the smooth problem that we discuss in Sect. 3. Proposition 1 applies to this smooth problem. The manipulator’s quantity in the last period is 41, which is the best response to the puppet’s Stackelberg follower quantity.

9
Amir ([1], Theorem 2 (ii)) does not state explicitly that the one-period value function is increasing and $$X_y$$ is expanding. Yet, this property is required in the proof.

10
This finding that an optimal control strategy involves strictly dominated actions is not restricted to games for which monotone differences differ among players.

11
Since we look at cycles (of finite length), we can neglect discounting in the calculations below.

12
To save space, we write out only the objective functions for $$n = 1, 2, 3$$.

13
Interestingly, the denominator in the linear factor in $$s_n$$ is identical the numerator of the linear factor in $$s_{n+1}$$.

14
We like to remark that not in all zero-sum games the optimal control strategy of the manipulator involves a cycle. This is the case for some classes of zero-sum games studied in Duersch et al. [16, 17].

15
One reviewer suggested that if the puppet uses fictitious play rather than myopic best response, then it is much more difficult to manipulate with a cycle. Fictitious play is an uncoupled learning heuristic. Moreover, in our Cournot example, the Stackelberg outcome is unique. Thus, it follows from Schipper [40] that the payoff to the dynamic optimizer would be strictly above Nash equilibrium. So fictitious play can be exploited by a patient dynamic optimizers in our Cournot example although the strategy may not be cyclic. At present, the form of the optimal manipulation strategy against a fictitious player is not clear to us and is left for future research.

16
A real-valued function f on a lattice X is supermodular on X if $$f(x'' \vee x') - f(x'') \ge f(x') - f(x'' \wedge x')$$ for all $$x'', x' \in X$$ (see [45], p. 43).

Literature
1.
Amir R (1996a) Sensitivity analysis of multisector optimal economic dynamics. J Math Econ 25:123–141
2.
Amir R (1996b) Cournot oligopoly and the theory of supermodular games. Games Econ Behav 15:132–148
3.
Aoyagi M (1996) Evolution of beliefs and the Nash equilibrium of normal form games. J Econ Theory 70:444–469
4.
Banerjee A, Weibull JW (1995) Evolutionary selection and rational behavior. In: Kirman A, Salmon M (eds) Learning and rationality in economics. Blackwell, Oxford, pp 343–363
5.
Benhabib J, Nishimura K (1985) Competitive equilibrium cycles. J Econ Theory 35:284–306
6.
Berge C (1963) Topological spaces, Dover edition, 1997. Dover Publications Inc, Mineola
7.
Bertsekas DP (2005) Dynamic programming and optimal control, vol I & II, 3rd edn. Athena Scientific, Belmont MATH
8.
Boldrin M, Montrucchio L (1986) On the indeterminancy of capital accumulation paths. J Econ Theory 40:26–29 MATH
9.
Bryant J (1983) A simple rational expectations Keynes-type coordination model. Q J Econ 98:525–528
10.
Bulavsky VA, Kalashnikov VV (1996) Equilibria in generalized Cournot and Stackelberg markets. Z Angew Math Mech 76(S3):387–388 MATH
11.
Camerer CF, Ho T-H, Chong J-K (2002) Sophisticated experience-weighted attraction learning and strategic teaching in repeated games. J Econ Theory 104:137–188 MATH
12.
Chong J-K, Camerer CF, Ho T-H (2006) A learning-based model of repeated games with incomplete information. Games Econ Behav 55:340–371
13.
Cournot A (1838) Researches into the mathematical principles of the theory of wealth. MacMillan, London MATH
14.
Dubey P, Haimanko O, Zapechelnyuk A (2006) Strategic substitutes and complements, and potential games. Games Econ Behav 54:77–94 MATH
15.
Duersch P, Kolb A, Oechssler J, Schipper BC (2010) Rage against the machines: how subjects learn to play against computers. Econ Theory 43:407–430 MATH
16.
Duersch P, Oechssler J, Schipper BC (2012) Unbeatable imitation. Games Econ Behav 76:88–96
17.
Duersch P, Oechssler J, Schipper BC (2014) When is tit-for-tat unbeatable? Int J Game Theory 43:25–36
18.
Dutta PK (1995) A folk theorem for stochastic games. J Econ Theory 66:1–32
19.
Droste E, Hommes C, Tuinstra J (2002) Endogenous fluctuations under evolutionary pressure in Cournot competition. Games Econ Behav 40:232–269
20.
Ellison G (1997) Learning from personal experience: one rational guy and the justification of myopia. Games Econ Behav 19:180–210
21.
Fudenberg D, Kreps DM, Maskin ES (1990) Repeated games with long-run short-run players. Rev Econ Stud 57:555–573
22.
Fudenberg D, Levine DK (1998) The theory of learning in games. The MIT Press, Cambridge MATH
23.
Fudenberg D, Levine DK (1994) Efficiency and observability with long-run and short-run players. J Econ Theory 62:103–135
24.
Fudenberg D, Levine DK (1989) Reputation and equilibrium selection in games with a patient player. Econometrica 57:759–778
25.
Güth W, Peleg B (2001) When will payoff maximization survive? An indirect evolutionary analysis. J Evol Econ 11:479–499
26.
Hart S, Mas-Colell A (2013) Simple adaptive strategies: from regret-matching to uncoupled dynamics. World Scientific Publishing, Singapore MATH
27.
Hart S, Mas-Colell A (2006) Stochastic uncoupled dynamics and Nash equilibrium. Games Econ Behav 57:286–303
28.
Hehenkamp B, Kaarbøe O (2006) Imitators and optimizers in a changing environment. J Econ Dyn Control 32:1357–1380 MATH
29.
Heifetz A, Shannon C, Spiegel Y (2007) What to maximize if you must. J Econ Theory 133:31–57
30.
Hyndman K, Ozbay EY, Schotter A, Ehrblatt WZ (2012) Convergence: an experimental study of teaching and learning in repeated games. J Eur Econ Assoc 10:573–604
31.
Juang WT (2002) Rule evolution and equilibrium selection. Games Econ Behav 39:71–90
32.
Kordonis I, Charalampidis AC, Papavassilopoulos GP (2018) Pretending in dynamic games: alternative outcomes and application to electricity markets. Dyn Games Appl 8:844–873
33.
Kukushkin NS (2004) Best response dynamics in finite games with additive aggregation. Games Econ Behav 48:94–110
34.
Milgrom P, Roberts J (1990) Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 58:1255–1277
35.
Milgrom P, Shannon C (1994) Monotone comparative statics. Econometrica 62:157–180
36.
Monderer D, Shapley LS (1996) Potential games. Games Econ Behav 14:124–143
37.
Osborne M (2004) An introduction to game theory. Oxford University Press, Oxford
38.
Puterman ML (1994) Markov decision processes. Discrete stochastic dynamic programming. Wiley, New York MATH
39.
Rand D (1978) Excotic phenomena in games and duopoly models. J Math Econ 5:173–184 MATH
40.
Schipper BC (2017) Strategic teaching and learning in games. The University of California, Davis, Davis
41.
Schipper BC (2009) Imitators and optimizers in Cournot oligopoly. J Econ Dyn Control 33:1981–1990
42.
Stokey NL, Lucas RE, Prescott EC (1989) Recursive methods in economic dynamics. Harvard University Press, Cambridge
43.
Terracol A, Vaksmann J (2009) Dumbing down rational players: learning and teaching in an experimental game. J Econ Behav Organ 70:54–71
44.
Topkis D (1978) Minimizing a submodular function on a lattice. Oper Res 26:305–321
45.
Topkis D (1998) Supermodularity and complementarity. Princeton University Press, Princeton
46.
Van Huyck J, Battalio R, Beil R (1990) Tacit coordination games, strategic uncertainty and coordination failure. Am Econ Rev 80:234–248
47.
Vives X (1999) Oligopoly pricing. Old ideas and new tools. Cambridge University Press, Cambridge
48.
Walker JM, Gardner R, Ostrom E (1990) Rent dissipation in a limited access Common-Pool resource: experimental evidence. J Environ Econ Manag 19:203–211
49.
Young P (2013) Strategic learning and its limits. Oxford University Press, Oxford
Title
Dynamic Exploitation of Myopic Best Response
Author
Burkhard C. Schipper
Publication date
03-12-2018
Publisher
Springer US
Published in
Dynamic Games and Applications / Issue 4/2019
Print ISSN: 2153-0785
Electronic ISSN: 2153-0793
DOI
https://doi.org/10.1007/s13235-018-0289-z

Go to the issue