Skip to main content

2023 | OriginalPaper | Buchkapitel

Hierarchical Unimodal Bandits

verfasst von : Tianchi Zhao, Chicheng Zhang, Ming Li

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We study a multi-armed bandit problem with clustered arms and a unimodal reward structure, which has applications in millimeter wave (mmWave) communication, road navigation, etc. More specifically, a set of N arms are grouped together to form C clusters, and the expected reward of arms belonging to the same cluster forms a Unimodal function (a function is Unimodal if it has only one peak, e.g. parabola). First, in the setting when \(C=1\), we propose an algorithm, SGSD (Stochastic Golden Search for Discrete Arm), that has better guarantees than the prior Unimodal bandit algorithm [Yu and Mannor 2011]. Second, in the setting when \(C \ge 2\), we develop HUUCB (Hierarchical Unimodal Upper Confidence Bound (UCB) algorithm), an algorithm that utilizes the clustering structure of the arms and the Unimodal structure of the rewards. We show that the regret upper bound of our algorithm grows as \(O(\sqrt{CT\log (T)})\), which can be significantly smaller than UCB’s \(O(\sqrt{NT\log (T)})\) regret guarantee. We perform a multi-channel mmWave communication simulation to evaluate our algorithm. Our simulation results confirm the advantage of our algorithm over the UCB algorithm [Auer et al. 2002] and a two-level policy (TLP) proposed in prior works [Pandey et al. 2007].

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Strong Unimodality means that it only has one optimal arm among the arm set.
 
Literatur
Zurück zum Zitat Yu, J.Y., Mannor, S.: Unimodal bandits (2011) Yu, J.Y., Mannor, S.: Unimodal bandits (2011)
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH
Zurück zum Zitat Pandey, S., Chakrabarti, D., Agarwal, D.: Multi-armed bandit problems with dependent arms. In: Proceedings of the 24th international conference on Machine learning, pp. 721–728 (2007) Pandey, S., Chakrabarti, D., Agarwal, D.: Multi-armed bandit problems with dependent arms. In: Proceedings of the 24th international conference on Machine learning, pp. 721–728 (2007)
Zurück zum Zitat Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)CrossRefMATH Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)CrossRefMATH
Zurück zum Zitat Sun, M., Li, M., Gerdes, R.: Truth-aware optimal decision-making framework with driver preferences for v2v communications. In: 2018 IEEE Conference on Communications and Network Security (CNS), pp. 1–9. IEEE (2018) Sun, M., Li, M., Gerdes, R.: Truth-aware optimal decision-making framework with driver preferences for v2v communications. In: 2018 IEEE Conference on Communications and Network Security (CNS), pp. 1–9. IEEE (2018)
Zurück zum Zitat Wen, W., Cheng, N., Zhang, N., Yang, P., Zhuang, W., Shen, X.: Fast mmwave beam alignment via correlated bandit learning. IEEE Trans. Wirel. Commun. 18(12), 5894–5908 (2019)CrossRef Wen, W., Cheng, N., Zhang, N., Yang, P., Zhuang, W., Shen, X.: Fast mmwave beam alignment via correlated bandit learning. IEEE Trans. Wirel. Commun. 18(12), 5894–5908 (2019)CrossRef
Zurück zum Zitat Hashemi, M., Sabharwal, A., Koksal, C.E., Shroff, N.B.: Efficient beam alignment in millimeter wave systems using contextual bandits. In: IEEE INFOCOM 2018, pp. 2393–2401. IEEE (2018) Hashemi, M., Sabharwal, A., Koksal, C.E., Shroff, N.B.: Efficient beam alignment in millimeter wave systems using contextual bandits. In: IEEE INFOCOM 2018, pp. 2393–2401. IEEE (2018)
Zurück zum Zitat Lopez, A.V., Chervyakov, A., Chance, G., Verma, S., Tang, Y.: Opportunities and challenges of mmwave nr. IEEE Wirel. Commun. 26(2), 4–6 (2019)CrossRef Lopez, A.V., Chervyakov, A., Chance, G., Verma, S., Tang, Y.: Opportunities and challenges of mmwave nr. IEEE Wirel. Commun. 26(2), 4–6 (2019)CrossRef
Zurück zum Zitat Nguyen, T.T., Lauw, H.W.: Dynamic clustering of contextual multi-armed bandits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1959–1962 (2014) Nguyen, T.T., Lauw, H.W.: Dynamic clustering of contextual multi-armed bandits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1959–1962 (2014)
Zurück zum Zitat Jedor, M., Perchet, V., Louedec, J.: Categorized bandits. Adv. Neural Inf. Process. Syst. 32, 1–11 (2019) Jedor, M., Perchet, V., Louedec, J.: Categorized bandits. Adv. Neural Inf. Process. Syst. 32, 1–11 (2019)
Zurück zum Zitat Bouneffouf, D., Parthasarathy, S., Samulowitz, H., Wistub, M.: Optimal exploitation of clustering and history information in multi-armed bandit. arXiv preprint arXiv:1906.03979 (2019) Bouneffouf, D., Parthasarathy, S., Samulowitz, H., Wistub, M.: Optimal exploitation of clustering and history information in multi-armed bandit. arXiv preprint arXiv:​1906.​03979 (2019)
Zurück zum Zitat Carlsson, E., Dubhashi, D., Johansson, F.D.: Thompson sampling for bandits with clustered arms. arXiv preprint arXiv:2109.01656 (2021) Carlsson, E., Dubhashi, D., Johansson, F.D.: Thompson sampling for bandits with clustered arms. arXiv preprint arXiv:​2109.​01656 (2021)
Zurück zum Zitat Zhao, T., Li, M., Poloczek, M.: Fast reconfigurable antenna state selection with hierarchical thompson sampling. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2019) Zhao, T., Li, M., Poloczek, M.: Fast reconfigurable antenna state selection with hierarchical thompson sampling. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2019)
Zurück zum Zitat Kumar, S., Gao, H., Wang, C., Chang, K.C.C., Sundaram, H.: Hierarchical multi-armed bandits for discovering hidden populations. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 145–153 (2019) Kumar, S., Gao, H., Wang, C., Chang, K.C.C., Sundaram, H.: Hierarchical multi-armed bandits for discovering hidden populations. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 145–153 (2019)
Zurück zum Zitat Combes, R., Proutiere, A.: Unimodal bandits: regret lower bounds and optimal algorithms. In: International Conference on Machine Learning, pp. 521–529. PMLR (2014) Combes, R., Proutiere, A.: Unimodal bandits: regret lower bounds and optimal algorithms. In: International Conference on Machine Learning, pp. 521–529. PMLR (2014)
Zurück zum Zitat Zhang, Y., Basu, S., Shakkottai, S., Heath Jr, R.W.: Mmwave codebook selection in rapidly-varying channels via multinomial thompson sampling. In: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pp. 151–160 (2021) Zhang, Y., Basu, S., Shakkottai, S., Heath Jr, R.W.: Mmwave codebook selection in rapidly-varying channels via multinomial thompson sampling. In: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pp. 151–160 (2021)
Zurück zum Zitat Blinn, N., Boerger, J., Bloch, M.: mmwave beam steering with hierarchical optimal sampling for unimodal bandits. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021) Blinn, N., Boerger, J., Bloch, M.: mmwave beam steering with hierarchical optimal sampling for unimodal bandits. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021)
Zurück zum Zitat Bubeck, S.,Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721 (2012) Bubeck, S.,Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:​1204.​5721 (2012)
Zurück zum Zitat Abbasi-Yadkori, Y., Pacchiano, A., Phan, M.: Regret balancing for bandit and rl model selection. arXiv preprint arXiv:2006.05491 (2020) Abbasi-Yadkori, Y., Pacchiano, A., Phan, M.: Regret balancing for bandit and rl model selection. arXiv preprint arXiv:​2006.​05491 (2020)
Zurück zum Zitat Cutkosky, A., Dann, C., Das, A., Gentile, C., Pacchiano, A., Purohit, M.: Dynamic balancing for model selection in bandits and rl. In: International Conference on Machine Learning, pp. 2276–2285. PMLR (2021) Cutkosky, A., Dann, C., Das, A., Gentile, C., Pacchiano, A., Purohit, M.: Dynamic balancing for model selection in bandits and rl. In: International Conference on Machine Learning, pp. 2276–2285. PMLR (2021)
Zurück zum Zitat Molisch, A.F.: Wireless Communications, vol. 34. John Wiley & Sons, Hoboken (2012) Molisch, A.F.: Wireless Communications, vol. 34. John Wiley & Sons, Hoboken (2012)
Zurück zum Zitat Samimi, M.K., MacCartney, G.R., Sun, S., Rappaport, T.S.: 28 ghz millimeter-wave ultrawideband small-scale fading models in wireless channels. In: 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring), pp. 1–6. IEEE (2016) Samimi, M.K., MacCartney, G.R., Sun, S., Rappaport, T.S.: 28 ghz millimeter-wave ultrawideband small-scale fading models in wireless channels. In: 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring), pp. 1–6. IEEE (2016)
Metadaten
Titel
Hierarchical Unimodal Bandits
verfasst von
Tianchi Zhao
Chicheng Zhang
Ming Li
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-26412-2_17

Premium Partner