Skip to main content

2023 | OriginalPaper | Buchkapitel

On the Complexity of All \(\varepsilon \)-Best Arms Identification

verfasst von : Aymen al Marjani, Tomas Kocak, Aurélien Garivier

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the question introduced by [16] of identifying all the \(\varepsilon \)-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least \(1-\delta \). The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk \(\delta \) goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [16]. The second lower bound deals with the regime of high and moderate values of the risk \(\delta \), and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm’s advantage over state-of-the-art methods, even for moderate risks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
For \(\sigma ^2\)-subgaussian distributions, we only need to multiply our bounds by \(\sigma ^2\). For bandits coming from another single-parameter exponential family, we lose the closed-form expression of the best response oracle that we have in the Gaussian case, but one can use binary search to solve the best response problem.
 
2
or a subset of arms, as in our case.
 
3
The phenomenon discussed above is essentially already discussed in [16], a very rich study of the problem. However, we do not fully understand the proof of Theorem 4.1. Define a sub-instance to be a bandit \(\widetilde{\nu }\) with fewer arms \(m \le K\) such that \(\{\widetilde{\nu }_1,\ldots , \widetilde{\nu }_{m}\} \subset \{\nu _1, \ldots , \nu _K\}\). Lemma D.5 in [16] actually shows that there exists some sub-instance of \(\nu \) on which the algorithm must pay \(\varOmega (\sum _{b=2}^{m} 1/(\mu _1-\mu _b)^2)\) samples. But this does not imply that such cost must be paid for the instance of interest \(\nu \) instead of some sub-instance with very few arms.
 
4
\(\overline{{\boldsymbol{\mu }}}_{\varepsilon }^{k,\ell }({\boldsymbol{\omega }})\) has a different definition depending on k being a good or a bad arm.
 
5
percent control is a metric expressing the efficiency of the compound as an inhibitor against the target Kinaze.
 
6
F1 score is the harmonic mean of precision (the proportion of arms in \(\widehat{G}\) that are actually good) and recall (the proportion of arms in \(G_{\varepsilon }({\boldsymbol{\mu }})\) that were correctly returned in \(\widehat{G}\)).
 
Literatur
2.
Zurück zum Zitat Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning (2015) Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning (2015)
6.
Zurück zum Zitat Garivier, A., Kaufmann, E.: Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models. Sequential Anal. 40, 61–96 (2021)MathSciNetCrossRefMATH Garivier, A., Kaufmann, E.: Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models. Sequential Anal. 40, 61–96 (2021)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: Proceedings of the 29th Conference On Learning Theory, pp. 998–1027 (2016) Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: Proceedings of the 29th Conference On Learning Theory, pp. 998–1027 (2016)
10.
Zurück zum Zitat Jourdan, M., Mutn’y, M., Kirschner, J., Krause, A.: Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In: ALT (2021) Jourdan, M., Mutn’y, M., Kirschner, J., Krause, A.: Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In: ALT (2021)
11.
Zurück zum Zitat Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. (2015) Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. (2015)
12.
Zurück zum Zitat Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. arXiv preprint arXiv:1811.11419 (2018) Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. arXiv preprint arXiv:​1811.​11419 (2018)
14.
Zurück zum Zitat Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2019)MATH Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2019)MATH
15.
Zurück zum Zitat Magureanu, S., Combes, R., Proutiere, A.: Lipschitz bandits: regret lower bounds and optimal algorithms. In: Conference on Learning Theory (2014) Magureanu, S., Combes, R., Proutiere, A.: Lipschitz bandits: regret lower bounds and optimal algorithms. In: Conference on Learning Theory (2014)
17.
18.
Zurück zum Zitat Simchowitz, M., Jamieson, K., Recht, B.: The simulator: understanding adaptive sampling in the moderate-confidence regime. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 65, pp. 1794–1834. PMLR, Amsterdam, Netherlands (07–10 Jul 2017), http://proceedings.mlr.press/v65/simchowitz17a.html Simchowitz, M., Jamieson, K., Recht, B.: The simulator: understanding adaptive sampling in the moderate-confidence regime. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 65, pp. 1794–1834. PMLR, Amsterdam, Netherlands (07–10 Jul 2017), http://​proceedings.​mlr.​press/​v65/​simchowitz17a.​html
19.
Zurück zum Zitat Wang, P.A., Tzeng, R.C., Proutiere, A.: Fast pure exploration via frank-wolfe. In: Advances in Neural Information Processing Systems, vol. 34 (2021) Wang, P.A., Tzeng, R.C., Proutiere, A.: Fast pure exploration via frank-wolfe. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Metadaten
Titel
On the Complexity of All -Best Arms Identification
verfasst von
Aymen al Marjani
Tomas Kocak
Aurélien Garivier
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-26412-2_20

Premium Partner