nach oben

International Journal of Data Science and Analytics

Erschienen in:

30.03.2017 | Regular Paper

The non-stationary stochastic multi-armed bandit problem

verfasst von: Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary stochastic process. We first study the unique best arm setting when there exists one unique best arm. Second, we study the general switching best arm setting when a best arm switches at some unknown steps. For both settings, we target problem-dependent bounds, instead of the more conservative problem-free bounds. We consider two classical problems: (1) identify a best arm with high probability (best arm identification), for which the performance measure by the sample complexity (number of samples before finding a near-optimal arm). To this end, we naturally extend the definition of sample complexity so that it makes sense in the switching best arm setting, which may be of independent interest. (2) Achieve the smallest cumulative regret (regret minimization) where the regret is measured with respect to the strategy pulling an arm with the best instantaneous mean at each step.

Vorheriger Artikel -norm transformation for improving k-means clustering

Nächster Artikel Meta-path-based link prediction in schema-rich heterogeneous information network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Allesiardo, R., Féraud, R.: EXP3 with drift detection for the switching bandit problem. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015)

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002a)CrossRefMATH

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002b)MathSciNetCrossRefMATH

Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. In: Foundations and Trends in Machine Learning. (2012) http://dblp.uni-trier.de/rec/bibtex/journals/ftml/BubeckC12

Bubeck, S., Slivkins, A.: The best of both worlds: stochastic and adversarial bandits. In: COLT 2012—the 25th Annual Conference on Learning Theory, pp. 42.1–42.23. Edinburgh, Scotland, 25–27 June 2012

Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7, 1079–1105 (2006)

Féraud, R., Allesiardo, R., Urvoy, T., Clérot, F.: Random forest for the contextual bandit problem. In: AISTATS. (2016)

Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. In: Algorithmic Learning Theory, pp. 174–188. (2011) http://dblp.uni-trier.de/rec/bibtex/conf/alt/GarivierM11

Garivier, A., Kaufmann, E., Lattimore, T.: On explore-then-commit strategies. In: NIPS, vol. 30. (2016) http://dblp.uni-trier.de/rec/bibtex/conf/nips/GarivierLK16

10.

Hartland, C., Baskiotis, N., Gelly, S., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits. In: Online Trading of Exploration and Exploitation Workshop, NIPS. (2006)

11.

Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRefMATH

12.

Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. 17(1), 1–42 (2016)MathSciNetMATH

13.

Kocsis, L., Szepesvári, C.: Discounted UCB. In: 2nd PASCAL Challenges Workshop. (2006)

14.

Neu, G.: Explore no more: improved high-probability regret bounds for non-stochastic bandits. In: NIPS. pp. 3168–3176 (2015)

15.

Seldin, Y., Slivkins, A.: One practical algorithm for both stochastic and adversarial bandits. In: 31th International Conference on Machine Learning (ICML). (2014)

16.

Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Stat. 2(1), 39–48 (1974)MathSciNetCrossRefMATH

17.

Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)CrossRefMATH

18.

Yu, J.Y., Mannor, S.: Piecewise-stationary bandit problems with side observations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML. (2009)

Titel: The non-stationary stochastic multi-armed bandit problem
verfasst von: Robin Allesiardo
Raphaël Féraud
Odalric-Ambrym Maillard
Publikationsdatum: 30.03.2017
Verlag: Springer International Publishing
Erschienen in: International Journal of Data Science and Analytics / Ausgabe 4/2017
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI: https://doi.org/10.1007/s41060-017-0050-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2017

Meta-path-based link prediction in schema-rich heterogeneous information network

Entropy-based time-varying window width selection for nonlinear-type time–frequency analysis

Collaborative filtering-based recommender systems by effective trust

-norm transformation for improving k-means clustering

Premium Partner