nach oben

Dynamic Games and Applications

Erschienen in:

01.03.2016

A General Internal Regret-Free Strategy

verfasst von: Ehud Lehrer, Eilon Solan

Erschienen in: Dynamic Games and Applications | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We study sequential decision problems where the decision maker does not observe the states of nature, but rather receives a noisy signal, whose distribution depends on the current state and on the action that she plays. We do not assume that the decision maker considers the worst-case scenario, but rather has a response correspondence, which maps distributions over signals to subjective best responses. We extend the concept of internal regret-free strategy to this setup and provide an algorithm that generates such a strategy.

Vorheriger Artikel Strong and Weak Rarity Value: Resource Games with Complex Price–Scarcity Relationships

Nächster Artikel Hopf Bifurcations in Delayed Rock–Paper–Scissors Replicator Dynamics

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

For a finite set \(Z\), the set of probability distributions over \(Z\) is denoted by \(\Delta (Z)\).

Unless indicated otherwise, we use the supremum norm: for every two vectors \(x,x' \in {\mathbb {R}}^d\), the distance between \(x\) and \(x'\) is \(d(x,x') = \max \{ |x_i-x'_i|, 1 \le i \le d\}\), and for every subset \(D \subseteq {\mathbb {R}}^d\), the distance between \(x\) and \(D\) is \(d(x,D) := \inf _{x' \in D} d(x,x')\). Likewise, the distance between two subsets \(Y_1\) and \(Y_2\) of \(Y\) is given by \(d(Y_1,Y_2) := \max _{y_1 \in Y_1} \min _{y_2 \in Y_2}d(y_1,y_2)\).

We say that event \(B\) holds on event \(A\) if \({\mathbb {P}}(A \cap B) = {\mathbb {P}}(A)\). Thus, a certain inequality holds on event \(A\) if the probability of all points in \(A\) that do not satisfy it is 0.

For convenience, in the examples we allow payoffs to be greater than \(1\).

The compactness of \(Y(\mu )\) and the concavity of the entropy function ensure that \(y^\mathrm{ENT}_{\mu }\) is well defined.

Because the range of \(\sigma \) is finite, we can omit the dependency of \(T'\) on \(x\).

[19] refer to Blackwell’s invited address to the Institute of Mathematical Statistics, Seattle, August 1956, entitled “Controlled random walks”.

In most applications of Blackwell’s theory in game theory, the payoffs are uniquely determined by the actions of the players. In our proof, as in Blackwell’s original paper, the payoffs are random variables.

For any two vectors \(a,b\in {\mathbb {R}}^d\), we denote by \(a \cdot b\) the coordinate-wise product: \((a\cdot b)_k:=a_kb_k\) for every \(k\). Similarly, \(\frac{a}{b} \) denotes the coordinate-wise quotient of \(a\) and \(b\): \((\frac{a}{b})_k := \frac{a_k}{b_k}\) for every \(k\).

Aumann RJ, Maschler M (1995) Repeated games with incomplete information. MIT Press, CambridgeMATH

Bartók G, Foster D, Pál D, Rakhlin A, Szepesvári C (2013) Partial monitoring-classification, regret bounds, and algorithms, preprint

Blackwell D (1956) An analog of the minmax theorem for vector payoffs. Pac J Math 6:1–8CrossRefMathSciNetMATH

Blum A, Mansour Y (2007) From external to internal regret. J Mach Learn Res 8:1307–1324MathSciNetMATH

Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, CambridgeCrossRefMATH

Cesa-Bianchi N, Lugosi G, Stoltz G (2006) Regret minimization under partial monitoring. Math Oper Res 31:562–580CrossRefMathSciNetMATH

Foster DP (1999) A proof of calibration via Blackwell’s approachability theorem. Games Econ Behav 29:73–78CrossRefMATH

Foster DP, Vohra RV (1997) Calibrated learning and correlated equilibrium. Games Econ Behav 21:40–55CrossRefMathSciNetMATH

Foster DP, Vohra RV (1998) Asymptotic calibration. Biometrika 85:379–390CrossRefMathSciNetMATH

10.

Foster DP, Vohra RV (1999) Regret in the on-line decision problem. Games Econ Behav 29:7–36CrossRefMathSciNetMATH

11.

Fudenberg D, Levine DK (1999) Conditional universal consistency. Games Econ Behav 29:104–130CrossRefMathSciNetMATH

12.

Hannan J (1957) Approximation to Bayes risk in repeated play. Contrib Theory Games 3:97–139MATH

13.

Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68:1127–1150CrossRefMathSciNetMATH

14.

Hart S, Mas-Colell A (2001) A general class of adaptive strategies. J Econ Theory 98:26–54CrossRefMathSciNetMATH

15.

Hazan E, Kakade SM (2012) (weak) Calibration is computationally hard. In: Conference on learning theory (COLT) 2012

16.

Lehrer E (2002) Approachability in infinitely dimensional spaces. Int J Game Theory 31:255–270MathSciNetMATH

17.

Lehrer E (2012) Partially specified probabilities: decisions and games. Am Econ J Microecon 4:70–100CrossRef

18.

Lehrer E, Solan E (2007) Learning to play partially specified equilibrium. Mimeo

19.

Luce DR, Raiffa H (1958) Games and decisions. Wiley, NYMATH

20.

Lugosi G, Mannor S, Stoltz G (2008) Strategies for prediction under imperfect monitoring. Math Oper Res 33:513–528CrossRefMathSciNetMATH

21.

Mannor S, Shimkin N (2003) On-line learning with imperfect monitoring. In: Proceedings of the 16th annual conference on learning theory. Springer, Berlin, pp 552–567

22.

Perchet V (2009) Calibration and internal no-regret with random signals. In: ALT2009, pp 68–82

23.

Perchet V (2011) Internal regret with partial monitoring calibration-based optimal algorithms. J Mach Learn Res 12:1893–1921MathSciNetMATH

24.

Perchet V (2013) Approachability, regret and calibration; implications and equivalences, http://arxiv.org/abs/1301.2663

25.

Piccolboni A, Schindelhauer C (2001) Discrete prediction games with arbitrary feedback and loss. In: COLT 2001, annual conference on computational learning theory #14, lecture notes in computer science, 2111, pp 208–223

26.

Rockafeller RT, Wets RJB (2009) Variational analysis. Springer, Berlin

27.

Rustichini A (1999) Minimizing regret: the general case. Games Econ Behavior 29:224–243CrossRefMathSciNetMATH

28.

Stoltz G, Lugosi G (2005) Internal regret in on-line portfolio selection. Mach Learn 59:125–159CrossRefMATH

Titel: A General Internal Regret-Free Strategy
verfasst von: Ehud Lehrer
Eilon Solan
Publikationsdatum: 01.03.2016
Verlag: Springer US
Erschienen in: Dynamic Games and Applications / Ausgabe 1/2016
Print ISSN: 2153-0785
Elektronische ISSN: 2153-0793
DOI: https://doi.org/10.1007/s13235-015-0143-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2016

A Characterization of Sub-game Perfect Equilibria for SDEs of Mean-Field Type

Differential Games with Incomplete Information on a Continuum of Initial Positions and without Isaacs Condition

A Cost-Effectiveness Differential Game Model for Climate Agreements

Hopf Bifurcations in Delayed Rock–Paper–Scissors Replicator Dynamics

Strong and Weak Rarity Value: Resource Games with Complex Price–Scarcity Relationships

State- and Control-Dependent Incentives in a Closed-Loop Supply Chain with Dynamic Returns

Premium Partner