nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Variational Thompson Sampling for Relational Recurrent Bandits

verfasst von : Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we introduce a novel non-stationary bandit setting, called relational recurrent bandit, where rewards of arms at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: the first one directly models temporal dependencies between arms, as the second one assumes the existence of hidden states of the system behind the observed rewards. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond

Nächstes Kapitel Explaining Deviating Subsets Through Explanation Networks

Where the regret corresponds to the expectation of what we missed with a given policy compared with an optimal strategy that knows exact distribution parameters.

When only the state of the selected action changes at each iteration, the problem is called rested bandit.

Available at http://www-connex.lip6.fr/~lampriers/ECML2017-supMat.pdf.

Note however that, to insure a non divergent model, \(\varTheta \) must be chosen such that \(\lambda _{max}(\varTheta ^{\top }\varTheta ) \le 1\), with \(\lambda _{max}(A)\) the maximal eigenvalue of a matrix A (see the supplementary material for more details).

http://opendata.paris.fr/.

http://qwone.com/jason/20Newsgroups/.

Abbasi-yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS (2011)

Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT (2012)

Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (2013)

Audibert, J.Y., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: COLT (2009)

Audibert, J.-Y., Munos, R., Szepesvári, C.: Tuning bandit algorithms in stochastic environments. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 150–165. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75225-7_15 CrossRef

Audiffren, J., Ralaivola, L.: Cornering stationary and restless mixing bandits with remix-ucb. In: NIPS, pp. 3339–3347 (2015)

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)CrossRefMATH

Beal, M.J.: Variational algorithms for approximate Bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London (2003)

Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: NIPS (2014)

10.

Bubeck, S., Stoltz, G., Szepesvári, C., Munos, R.: Online optimization in x-armed bandits. In: NIPS (2009)

11.

Buccapatnam, S., Eryilmaz, A., Shroff, N.B.: Stochastic bandits with side observations on networks. In: SIGMETRICS (2014)

12.

Caron, S., Kveton, B., Lelarge, M., Bhagat, S.: Leveraging side observations in stochastic bandits. In: UAI (2012)

13.

Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: AISTATS, Seville, Spain (2016)

14.

Cesa-Bianchi, N., Gentile, C., Zappella, G.: A gang of bandits. In: NIPS (2013)

15.

Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: NIPS. Curran Associates, Inc. (2011)

16.

Chen, W., Wang, Y., Yuan, Y.: Combinatorial multi-armed bandit: general framework and applications. In: ICML (2013)

17.

Claudio, G., Shuai, L., Giovanni, Z.: Online clustering of bandits. In: ICML (2014)

18.

Dani, V., Hayes, T.P., Kakade, S.M.: Stochastic linear optimization under bandit feedback. In: COLT (2008)

19.

Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS (LNAI), vol. 6925, pp. 174–188. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24412-4_16 CrossRef

20.

Garivier, A.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: COLT (2011)

21.

Gisselbrecht, T., Denoyer, L., Gallinari, P., Lamprier, S.: WhichStreams: a dynamic approach for focused data capture from large social media. In: ICWSM (2015)

22.

Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Ser. D), 35–45 (1960)CrossRef

23.

Komiyama, J., Honda, J., Nakagawa, H.: Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. In: ICML (2015)

24.

Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adva. Appl. Math. 6(1), 4–22 (1985)MathSciNetCrossRefMATH

25.

Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW (2010)

26.

Mannor, S., Shamir, O.: From bandits to experts: on the value of side-observations. In: NIPS (2011)

27.

Ortner, R., Ryabko, D., Auer, P., Munos, R.: Regret bounds for restless Markov bandits. Theor. Comput. Sci. 558, 62–76 (2014)MathSciNetCrossRefMATH

28.

Pczos, B., Lrincz, A., Ghahramani, Z.: Identification of recurrent neural networks by Bayesian interrogation techniques. JMLR 10, 515–554 (2009)

29.

Richard, C., Alexandre, P.: Unimodal bandits: regret lower bounds and optimal algorithms. In: ICML (2014)

30.

Slivkins, A., Upfal, E.: Adapting to a changing environment: the Brownian restless bandits. In: COLT (2008)

31.

Tekin, C., Liu, M.: Online learning of rested and restless bandits. IEEE Trans. Inf. Theory 58(8), 5588–5611 (2012)MathSciNetCrossRefMATH

32.

Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Am. Math. Soc. 25, 285–294 (1933)MATH

33.

Whittle, P.: Restless bandits: activity allocation in a changing world. J. Appl. Probab. 25, 287–298 (1988)MathSciNetCrossRefMATH

Titel: Variational Thompson Sampling for Relational Recurrent Bandits
verfasst von: Sylvain Lamprier
Thibault Gisselbrecht
Patrick Gallinari
Verlag: Springer International Publishing
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-319-71245-1

Electronic ISBN: 978-3-319-71246-8

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-71246-8_25

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"