nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

On the Complexity of All \(\varepsilon \)-Best Arms Identification

verfasst von : Aymen al Marjani, Tomas Kocak, Aurélien Garivier

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We consider the question introduced by [16] of identifying all the \(\varepsilon \)-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least \(1-\delta \). The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk \(\delta \) goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [16]. The second lower bound deals with the regime of high and moderate values of the risk \(\delta \), and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm’s advantage over state-of-the-art methods, even for moderate risks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Multi-agent Heterogeneous Stochastic Linear Bandits

Nächstes Kapitel Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

Nur mit Berechtigung zugänglich

For \(\sigma ^2\)-subgaussian distributions, we only need to multiply our bounds by \(\sigma ^2\). For bandits coming from another single-parameter exponential family, we lose the closed-form expression of the best response oracle that we have in the Gaussian case, but one can use binary search to solve the best response problem.

or a subset of arms, as in our case.

The phenomenon discussed above is essentially already discussed in [16], a very rich study of the problem. However, we do not fully understand the proof of Theorem 4.1. Define a sub-instance to be a bandit \(\widetilde{\nu }\) with fewer arms \(m \le K\) such that \(\{\widetilde{\nu }_1,\ldots , \widetilde{\nu }_{m}\} \subset \{\nu _1, \ldots , \nu _K\}\). Lemma D.5 in [16] actually shows that there exists some sub-instance of \(\nu \) on which the algorithm must pay \(\varOmega (\sum _{b=2}^{m} 1/(\mu _1-\mu _b)^2)\) samples. But this does not imply that such cost must be paid for the instance of interest \(\nu \) instead of some sub-instance with very few arms.

\(\overline{{\boldsymbol{\mu }}}_{\varepsilon }^{k,\ell }({\boldsymbol{\omega }})\) has a different definition depending on k being a good or a bad arm.

percent control is a metric expressing the efficiency of the compound as an inhibitor against the target Kinaze.

F1 score is the harmonic mean of precision (the proportion of arms in \(\widehat{G}\) that are actually good) and recall (the proportion of arms in \(G_{\varepsilon }({\boldsymbol{\mu }})\) that were correctly returned in \(\widehat{G}\)).

Bocci, M., et al.: Activin receptor-like kinase 1 is associated with immune cell infiltration and regulates CLEC14A transcription in cancer. Angiogenesis 22(1), 117–131 (2018). https://doi.org/10.1007/s10456-018-9642-5MathSciNetCrossRef

Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning (2015)

Chernoff, H.: Sequential design of experiments. Ann. Math. Stat. 30(3), 755–770 (1959)MathSciNetCrossRefMATH

Danskin, J.M.: The theory of max-min, with applications. SIAM J. Appl. Math. 14, 641–664 (1966)MathSciNetCrossRefMATH

Degenne, R., Koolen, W.M., Ménard, P.: Non-asymptotic pure exploration by solving games. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/8d1de7457fa769ece8d93a13a59c8552-Paper.pdf

Garivier, A., Kaufmann, E.: Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models. Sequential Anal. 40, 61–96 (2021)MathSciNetCrossRefMATH

Garivier, A.: Informational confidence bounds for self-normalized averages and applications. In: 2013 IEEE Information Theory Workshop (ITW) (Sep 2013). https://doi.org/10.1109/itw.2013.6691311

Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: Proceedings of the 29th Conference On Learning Theory, pp. 998–1027 (2016)

Jedra, Y., Proutiere, A.: Optimal best-arm identification in linear bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 10007–10017. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/7212a6567c8a6c513f33b858d868ff80-Paper.pdf

10.

Jourdan, M., Mutn’y, M., Kirschner, J., Krause, A.: Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In: ALT (2021)

11.

Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. (2015)

12.

Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. arXiv preprint arXiv:1811.11419 (2018)

13.

Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)MathSciNetCrossRefMATH

14.

Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2019)MATH

15.

Magureanu, S., Combes, R., Proutiere, A.: Lipschitz bandits: regret lower bounds and optimal algorithms. In: Conference on Learning Theory (2014)

16.

Mason, B., Jain, L., Tripathy, A., Nowak, R.: Finding all \(\epsilon \)-good arms in stochastic bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 20707–20718. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/edf0320adc8658b25ca26be5351b6c4a-Paper.pdf

17.

Ménard, P.: Gradient ascent for active exploration in bandit problems. arXiv e-prints p. arXiv:1905.08165 (May 2019)

18.

Simchowitz, M., Jamieson, K., Recht, B.: The simulator: understanding adaptive sampling in the moderate-confidence regime. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 65, pp. 1794–1834. PMLR, Amsterdam, Netherlands (07–10 Jul 2017), http://proceedings.mlr.press/v65/simchowitz17a.html

19.

Wang, P.A., Tzeng, R.C., Proutiere, A.: Fast pure exploration via frank-wolfe. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

Titel: On the Complexity of All -Best Arms Identification
verfasst von: Aymen al Marjani
Tomas Kocak
Aurélien Garivier
Verlag: Springer Nature Switzerland
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-031-26411-5

Electronic ISBN: 978-3-031-26412-2

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-26412-2_20

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner