Skip to main content
Erschienen in: International Journal of Data Science and Analytics 4/2017

30.03.2017 | Regular Paper

The non-stationary stochastic multi-armed bandit problem

verfasst von: Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary stochastic process. We first study the unique best arm setting when there exists one unique best arm. Second, we study the general switching best arm setting when a best arm switches at some unknown steps. For both settings, we target problem-dependent bounds, instead of the more conservative problem-free bounds. We consider two classical problems: (1) identify a best arm with high probability (best arm identification), for which the performance measure by the sample complexity (number of samples before finding a near-optimal arm). To this end, we naturally extend the definition of sample complexity so that it makes sense in the switching best arm setting, which may be of independent interest. (2) Achieve the smallest cumulative regret (regret minimization) where the regret is measured with respect to the strategy pulling an arm with the best instantaneous mean at each step.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Allesiardo, R., Féraud, R.: EXP3 with drift detection for the switching bandit problem. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015) Allesiardo, R., Féraud, R.: EXP3 with drift detection for the switching bandit problem. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015)
2.
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002a)CrossRefMATH Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002a)CrossRefMATH
3.
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002b)MathSciNetCrossRefMATH Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002b)MathSciNetCrossRefMATH
5.
Zurück zum Zitat Bubeck, S., Slivkins, A.: The best of both worlds: stochastic and adversarial bandits. In: COLT 2012—the 25th Annual Conference on Learning Theory, pp. 42.1–42.23. Edinburgh, Scotland, 25–27 June 2012 Bubeck, S., Slivkins, A.: The best of both worlds: stochastic and adversarial bandits. In: COLT 2012—the 25th Annual Conference on Learning Theory, pp. 42.1–42.23. Edinburgh, Scotland, 25–27 June 2012
6.
Zurück zum Zitat Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7, 1079–1105 (2006) Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7, 1079–1105 (2006)
7.
Zurück zum Zitat Féraud, R., Allesiardo, R., Urvoy, T., Clérot, F.: Random forest for the contextual bandit problem. In: AISTATS. (2016) Féraud, R., Allesiardo, R., Urvoy, T., Clérot, F.: Random forest for the contextual bandit problem. In: AISTATS. (2016)
10.
Zurück zum Zitat Hartland, C., Baskiotis, N., Gelly, S., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits. In: Online Trading of Exploration and Exploitation Workshop, NIPS. (2006) Hartland, C., Baskiotis, N., Gelly, S., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits. In: Online Trading of Exploration and Exploitation Workshop, NIPS. (2006)
11.
12.
Zurück zum Zitat Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. 17(1), 1–42 (2016)MathSciNetMATH Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. 17(1), 1–42 (2016)MathSciNetMATH
13.
Zurück zum Zitat Kocsis, L., Szepesvári, C.: Discounted UCB. In: 2nd PASCAL Challenges Workshop. (2006) Kocsis, L., Szepesvári, C.: Discounted UCB. In: 2nd PASCAL Challenges Workshop. (2006)
14.
Zurück zum Zitat Neu, G.: Explore no more: improved high-probability regret bounds for non-stochastic bandits. In: NIPS. pp. 3168–3176 (2015) Neu, G.: Explore no more: improved high-probability regret bounds for non-stochastic bandits. In: NIPS. pp. 3168–3176 (2015)
15.
Zurück zum Zitat Seldin, Y., Slivkins, A.: One practical algorithm for both stochastic and adversarial bandits. In: 31th International Conference on Machine Learning (ICML). (2014) Seldin, Y., Slivkins, A.: One practical algorithm for both stochastic and adversarial bandits. In: 31th International Conference on Machine Learning (ICML). (2014)
16.
17.
Zurück zum Zitat Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)CrossRefMATH Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)CrossRefMATH
18.
Zurück zum Zitat Yu, J.Y., Mannor, S.: Piecewise-stationary bandit problems with side observations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML. (2009) Yu, J.Y., Mannor, S.: Piecewise-stationary bandit problems with side observations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML. (2009)
Metadaten
Titel
The non-stationary stochastic multi-armed bandit problem
verfasst von
Robin Allesiardo
Raphaël Féraud
Odalric-Ambrym Maillard
Publikationsdatum
30.03.2017
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 4/2017
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0050-5

Weitere Artikel der Ausgabe 4/2017

International Journal of Data Science and Analytics 4/2017 Zur Ausgabe

Premium Partner