Skip to main content

2017 | OriginalPaper | Buchkapitel

Variational Thompson Sampling for Relational Recurrent Bandits

verfasst von : Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we introduce a novel non-stationary bandit setting, called relational recurrent bandit, where rewards of arms at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: the first one directly models temporal dependencies between arms, as the second one assumes the existence of hidden states of the system behind the observed rewards. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Where the regret corresponds to the expectation of what we missed with a given policy compared with an optimal strategy that knows exact distribution parameters.
 
2
When only the state of the selected action changes at each iteration, the problem is called rested bandit.
 
4
Note however that, to insure a non divergent model, \(\varTheta \) must be chosen such that \(\lambda _{max}(\varTheta ^{\top }\varTheta ) \le 1\), with \(\lambda _{max}(A)\) the maximal eigenvalue of a matrix A (see the supplementary material for more details).
 
Literatur
1.
Zurück zum Zitat Abbasi-yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS (2011) Abbasi-yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS (2011)
2.
Zurück zum Zitat Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT (2012) Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT (2012)
3.
Zurück zum Zitat Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (2013) Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (2013)
4.
Zurück zum Zitat Audibert, J.Y., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: COLT (2009) Audibert, J.Y., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: COLT (2009)
6.
Zurück zum Zitat Audiffren, J., Ralaivola, L.: Cornering stationary and restless mixing bandits with remix-ucb. In: NIPS, pp. 3339–3347 (2015) Audiffren, J., Ralaivola, L.: Cornering stationary and restless mixing bandits with remix-ucb. In: NIPS, pp. 3339–3347 (2015)
7.
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)CrossRefMATH Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)CrossRefMATH
8.
Zurück zum Zitat Beal, M.J.: Variational algorithms for approximate Bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London (2003) Beal, M.J.: Variational algorithms for approximate Bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London (2003)
9.
Zurück zum Zitat Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: NIPS (2014) Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: NIPS (2014)
10.
Zurück zum Zitat Bubeck, S., Stoltz, G., Szepesvári, C., Munos, R.: Online optimization in x-armed bandits. In: NIPS (2009) Bubeck, S., Stoltz, G., Szepesvári, C., Munos, R.: Online optimization in x-armed bandits. In: NIPS (2009)
11.
Zurück zum Zitat Buccapatnam, S., Eryilmaz, A., Shroff, N.B.: Stochastic bandits with side observations on networks. In: SIGMETRICS (2014) Buccapatnam, S., Eryilmaz, A., Shroff, N.B.: Stochastic bandits with side observations on networks. In: SIGMETRICS (2014)
12.
Zurück zum Zitat Caron, S., Kveton, B., Lelarge, M., Bhagat, S.: Leveraging side observations in stochastic bandits. In: UAI (2012) Caron, S., Kveton, B., Lelarge, M., Bhagat, S.: Leveraging side observations in stochastic bandits. In: UAI (2012)
13.
Zurück zum Zitat Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: AISTATS, Seville, Spain (2016) Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: AISTATS, Seville, Spain (2016)
14.
Zurück zum Zitat Cesa-Bianchi, N., Gentile, C., Zappella, G.: A gang of bandits. In: NIPS (2013) Cesa-Bianchi, N., Gentile, C., Zappella, G.: A gang of bandits. In: NIPS (2013)
15.
Zurück zum Zitat Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: NIPS. Curran Associates, Inc. (2011) Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: NIPS. Curran Associates, Inc. (2011)
16.
Zurück zum Zitat Chen, W., Wang, Y., Yuan, Y.: Combinatorial multi-armed bandit: general framework and applications. In: ICML (2013) Chen, W., Wang, Y., Yuan, Y.: Combinatorial multi-armed bandit: general framework and applications. In: ICML (2013)
17.
Zurück zum Zitat Claudio, G., Shuai, L., Giovanni, Z.: Online clustering of bandits. In: ICML (2014) Claudio, G., Shuai, L., Giovanni, Z.: Online clustering of bandits. In: ICML (2014)
18.
Zurück zum Zitat Dani, V., Hayes, T.P., Kakade, S.M.: Stochastic linear optimization under bandit feedback. In: COLT (2008) Dani, V., Hayes, T.P., Kakade, S.M.: Stochastic linear optimization under bandit feedback. In: COLT (2008)
20.
Zurück zum Zitat Garivier, A.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: COLT (2011) Garivier, A.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: COLT (2011)
21.
Zurück zum Zitat Gisselbrecht, T., Denoyer, L., Gallinari, P., Lamprier, S.: WhichStreams: a dynamic approach for focused data capture from large social media. In: ICWSM (2015) Gisselbrecht, T., Denoyer, L., Gallinari, P., Lamprier, S.: WhichStreams: a dynamic approach for focused data capture from large social media. In: ICWSM (2015)
22.
Zurück zum Zitat Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Ser. D), 35–45 (1960)CrossRef Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Ser. D), 35–45 (1960)CrossRef
23.
Zurück zum Zitat Komiyama, J., Honda, J., Nakagawa, H.: Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. In: ICML (2015) Komiyama, J., Honda, J., Nakagawa, H.: Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. In: ICML (2015)
25.
Zurück zum Zitat Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW (2010) Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW (2010)
26.
Zurück zum Zitat Mannor, S., Shamir, O.: From bandits to experts: on the value of side-observations. In: NIPS (2011) Mannor, S., Shamir, O.: From bandits to experts: on the value of side-observations. In: NIPS (2011)
27.
Zurück zum Zitat Ortner, R., Ryabko, D., Auer, P., Munos, R.: Regret bounds for restless Markov bandits. Theor. Comput. Sci. 558, 62–76 (2014)MathSciNetCrossRefMATH Ortner, R., Ryabko, D., Auer, P., Munos, R.: Regret bounds for restless Markov bandits. Theor. Comput. Sci. 558, 62–76 (2014)MathSciNetCrossRefMATH
28.
Zurück zum Zitat Pczos, B., Lrincz, A., Ghahramani, Z.: Identification of recurrent neural networks by Bayesian interrogation techniques. JMLR 10, 515–554 (2009) Pczos, B., Lrincz, A., Ghahramani, Z.: Identification of recurrent neural networks by Bayesian interrogation techniques. JMLR 10, 515–554 (2009)
29.
Zurück zum Zitat Richard, C., Alexandre, P.: Unimodal bandits: regret lower bounds and optimal algorithms. In: ICML (2014) Richard, C., Alexandre, P.: Unimodal bandits: regret lower bounds and optimal algorithms. In: ICML (2014)
30.
Zurück zum Zitat Slivkins, A., Upfal, E.: Adapting to a changing environment: the Brownian restless bandits. In: COLT (2008) Slivkins, A., Upfal, E.: Adapting to a changing environment: the Brownian restless bandits. In: COLT (2008)
31.
32.
Zurück zum Zitat Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Am. Math. Soc. 25, 285–294 (1933)MATH Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Am. Math. Soc. 25, 285–294 (1933)MATH
Metadaten
Titel
Variational Thompson Sampling for Relational Recurrent Bandits
verfasst von
Sylvain Lamprier
Thibault Gisselbrecht
Patrick Gallinari
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71246-8_25