nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

Hierarchical Unimodal Bandits

verfasst von : Tianchi Zhao, Chicheng Zhang, Ming Li

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We study a multi-armed bandit problem with clustered arms and a unimodal reward structure, which has applications in millimeter wave (mmWave) communication, road navigation, etc. More specifically, a set of N arms are grouped together to form C clusters, and the expected reward of arms belonging to the same cluster forms a Unimodal function (a function is Unimodal if it has only one peak, e.g. parabola). First, in the setting when \(C=1\), we propose an algorithm, SGSD (Stochastic Golden Search for Discrete Arm), that has better guarantees than the prior Unimodal bandit algorithm [Yu and Mannor 2011]. Second, in the setting when \(C \ge 2\), we develop HUUCB (Hierarchical Unimodal Upper Confidence Bound (UCB) algorithm), an algorithm that utilizes the clustering structure of the arms and the Unimodal structure of the rewards. We show that the regret upper bound of our algorithm grows as \(O(\sqrt{CT\log (T)})\), which can be significantly smaller than UCB’s \(O(\sqrt{NT\log (T)})\) regret guarantee. We perform a multi-channel mmWave communication simulation to evaluate our algorithm. Our simulation results confirm the advantage of our algorithm over the UCB algorithm [Auer et al. 2002] and a two-level policy (TLP) proposed in prior works [Pandey et al. 2007].

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning

Nächstes Kapitel Hypothesis Transfer in Bandits by Weighted Models

Strong Unimodality means that it only has one optimal arm among the arm set.

Yu, J.Y., Mannor, S.: Unimodal bandits (2011)

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH

Pandey, S., Chakrabarti, D., Agarwal, D.: Multi-armed bandit problems with dependent arms. In: Proceedings of the 24th international conference on Machine learning, pp. 721–728 (2007)

Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)CrossRefMATH

Sun, M., Li, M., Gerdes, R.: Truth-aware optimal decision-making framework with driver preferences for v2v communications. In: 2018 IEEE Conference on Communications and Network Security (CNS), pp. 1–9. IEEE (2018)

Wen, W., Cheng, N., Zhang, N., Yang, P., Zhuang, W., Shen, X.: Fast mmwave beam alignment via correlated bandit learning. IEEE Trans. Wirel. Commun. 18(12), 5894–5908 (2019)CrossRef

Hashemi, M., Sabharwal, A., Koksal, C.E., Shroff, N.B.: Efficient beam alignment in millimeter wave systems using contextual bandits. In: IEEE INFOCOM 2018, pp. 2393–2401. IEEE (2018)

Lopez, A.V., Chervyakov, A., Chance, G., Verma, S., Tang, Y.: Opportunities and challenges of mmwave nr. IEEE Wirel. Commun. 26(2), 4–6 (2019)CrossRef

Nguyen, T.T., Lauw, H.W.: Dynamic clustering of contextual multi-armed bandits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1959–1962 (2014)

Jedor, M., Perchet, V., Louedec, J.: Categorized bandits. Adv. Neural Inf. Process. Syst. 32, 1–11 (2019)

Bouneffouf, D., Parthasarathy, S., Samulowitz, H., Wistub, M.: Optimal exploitation of clustering and history information in multi-armed bandit. arXiv preprint arXiv:1906.03979 (2019)

Carlsson, E., Dubhashi, D., Johansson, F.D.: Thompson sampling for bandits with clustered arms. arXiv preprint arXiv:2109.01656 (2021)

Zhao, T., Li, M., Poloczek, M.: Fast reconfigurable antenna state selection with hierarchical thompson sampling. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2019)

Yang, J., Zhong, Z., Tan, V.Y.F.: Optimal clustering with bandit feedback. arXiv preprint arXiv:2202.04294 (2022)

Kumar, S., Gao, H., Wang, C., Chang, K.C.C., Sundaram, H.: Hierarchical multi-armed bandits for discovering hidden populations. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 145–153 (2019)

Singh, R., Liu, F., Sun, Y., Shroff, N.: Multi-armed bandits with dependent arms. arXiv preprint arXiv:2010.09478 (2020)

Combes, R., Proutiere, A.: Unimodal bandits: regret lower bounds and optimal algorithms. In: International Conference on Machine Learning, pp. 521–529. PMLR (2014)

Zhang, Y., Basu, S., Shakkottai, S., Heath Jr, R.W.: Mmwave codebook selection in rapidly-varying channels via multinomial thompson sampling. In: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pp. 151–160 (2021)

Blinn, N., Boerger, J., Bloch, M.: mmwave beam steering with hierarchical optimal sampling for unimodal bandits. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021)

Bubeck, S.,Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721 (2012)

Abbasi-Yadkori, Y., Pacchiano, A., Phan, M.: Regret balancing for bandit and rl model selection. arXiv preprint arXiv:2006.05491 (2020)

Cutkosky, A., Dann, C., Das, A., Gentile, C., Pacchiano, A., Purohit, M.: Dynamic balancing for model selection in bandits and rl. In: International Conference on Machine Learning, pp. 2276–2285. PMLR (2021)

Molisch, A.F.: Wireless Communications, vol. 34. John Wiley & Sons, Hoboken (2012)

Samimi, M.K., MacCartney, G.R., Sun, S., Rappaport, T.S.: 28 ghz millimeter-wave ultrawideband small-scale fading models in wireless channels. In: 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring), pp. 1–6. IEEE (2016)

Titel: Hierarchical Unimodal Bandits
verfasst von: Tianchi Zhao
Chicheng Zhang
Ming Li
Verlag: Springer Nature Switzerland
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-031-26411-5

Electronic ISBN: 978-3-031-26412-2

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-26412-2_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner