Abstract
The exponential weight algorithm has been introduced in the framework of discrete time on-line problems. Given an observed process \(\{X_m\}_{m=1,2,\ldots}\) the input at stage m + 1 is an exponential function of the sum \(S_m = \sum_{\ell = 1}^m X_{\ell}\) . We define the analog algorithm for a continuous time process X t and prove similar properties in terms of external or internal consistency. We then deduce results for discrete time from their counterpart in continuous time. Finally we compare this approach to another continuous time approximation of a discrete time exponential algorithm based on the average sum S m /m.
Similar content being viewed by others
References
Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. Proceedings of the 36th annual IEEE symposium on foundations of computer science, pp. 322–331 (1995)
Auer P., Cesa-Bianchi N., Freund Y. and Shapire R.E. (2002). The non-stochastic multiarmed bandit problem. SIAM J. Comput. 32: 48–77
Benaim M., Hofbauer J. and Sorin S. (2005). Stochastic approximations and differential inclusions. SIAM J. Opt. Control 44: 328–348
Benaim M., Hofbauer J., Sorin S. (2006) Stochastic approximations and differential inclusions. Part II: applications. Math. Oper. Res. 31: 673–695
Blackwell D. (1956). An analog of the minmax theorem for vector payoffs. Pac. J. Math. 6: 1–8
Brezis H. and Lions P.-L. (1978). Produits infinis de résolvantes. Israel J. Math. 29: 329–345
Cesa-Bianchi N. and Lugosi G. (2003). Potential-based algorithms in on-line prediction and game theory. Mach. Learn. 51: 239–261
Foster D. and Vohra R. (1998). Asymptotic calibration. Biometrika 85: 379–390
Foster D. and Vohra R. (1999). Regret in the on-line decision problem. Games Econ. Behav. 29: 7–35
Freund Y. and Shapire R.E. (1999). Adaptive game playing using multiplicative weights. Games Econ. Behav. 29: 79–103
Fudenberg D. and Levine D.K. (1995). Consistency and cautious fictitious play. J. Econ. Dyn. Control 19: 1065–1089
Fudenberg D. and Levine D.K. (1999). Conditional universal consistency. Games Econ. Behav. 29: 104–130
Hall P., Heyde C. (1980) Martingale limit theory and its applications. Academic, London
Hart S. and Mas-Colell A. (2000). A simple adaptive procedure leading to correlated equilibria. Econometrica 68: 1127–1150
Hart S. and Mas-Colell A. (2001). A general class of adaptive strategies. J. Econ. Theory 98: 26–54
Hart S. and Mas-Colell A. (2003). Regret-based continuous time dynamics. Games Econ. Behav. 45: 375–394
Littlestone N. and Warmuth M.K. (1994). The weighted majority algorithm. Inform. Comput. 108: 212–261
Seneta E. (1981). Non-negative matrices and Markov chains. Springer, Heidelberg
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sorin, S. Exponential weight algorithm in continuous time. Math. Program. 116, 513–528 (2009). https://doi.org/10.1007/s10107-007-0111-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-007-0111-y