Skip to main content

2024 | OriginalPaper | Buchkapitel

Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards

verfasst von : Yun-Da Tsai, Shou-De Lin

Erschienen in: Trends and Applications in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a study on the non-stationary stochastic multi-armed bandit (MAB) problem, which is relevant for addressing real-world challenges related to sequential decision-making. Our work involves a thorough analysis of state-of-the-art algorithms in dynamically changing environments. To address the limitations of existing methods, we propose the Concept Drift Adaptive Bandit (CDAB) framework, which aims to capture and predict potential future concept drift patterns in reward distribution, allowing for better adaptation in non-stationary environments. We conduct extensive numerical experiments to evaluate the effectiveness of the CDAB approach in comparison to both stationary and non-stationary state-of-the-art baselines. Our experiments involve testing on both artificial datasets and real-world data under different types of changing environments. The results show that the CDAB approach exhibits strong empirical performance, outperforming existing methods in all versions tested.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Auer, P., Gajane, P., Ortner, R.: Adaptively tracking the best bandit arm with an unknown number of distribution changes. In: Conference on Learning Theory, pp. 138–158. PMLR (2019) Auer, P., Gajane, P., Ortner, R.: Adaptively tracking the best bandit arm with an unknown number of distribution changes. In: Conference on Learning Theory, pp. 138–158. PMLR (2019)
2.
Zurück zum Zitat Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53 (2004) Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53 (2004)
3.
Zurück zum Zitat Bergemann, D., Hege, U.: The financing of innovation: learning and stopping. RAND J. Econ. 36(4), 719–752 (2005) Bergemann, D., Hege, U.: The financing of innovation: learning and stopping. RAND J. Econ. 36(4), 719–752 (2005)
4.
Zurück zum Zitat Bergemann, D., Välimäki, J.: Learning and strategic pricing. Econometrica: J. Econometric Soc. 64(5), 1125–1149 (1996)CrossRef Bergemann, D., Välimäki, J.: Learning and strategic pricing. Econometrica: J. Econometric Soc. 64(5), 1125–1149 (1996)CrossRef
5.
Zurück zum Zitat Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in Neural Information Processing Systems, vol. 27 (2014) Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
6.
Zurück zum Zitat Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007) Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)
7.
Zurück zum Zitat Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 418–427. PMLR (2019) Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 418–427. PMLR (2019)
8.
Zurück zum Zitat Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: Artificial Intelligence and Statistics, pp. 10–18. PMLR (2016) Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: Artificial Intelligence and Statistics, pp. 10–18. PMLR (2016)
9.
Zurück zum Zitat Cavenaghi, E., Sottocornola, G., Stella, F., Zanker, M.: Non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm. Entropy 23(3), 380 (2021)MathSciNetCrossRef Cavenaghi, E., Sottocornola, G., Stella, F., Zanker, M.: Non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm. Entropy 23(3), 380 (2021)MathSciNetCrossRef
10.
Zurück zum Zitat Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z.: Freeway performance measurement system: mining loop detector data. Transp. Res. Rec. 1748(1), 96–102 (2001)CrossRef Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z.: Freeway performance measurement system: mining loop detector data. Transp. Res. Rec. 1748(1), 96–102 (2001)CrossRef
11.
Zurück zum Zitat Combes, R., Magureanu, S., Proutiere, A., Laroche, C.: Learning to rank: regret lower bounds and efficient algorithms. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 231–244 (2015) Combes, R., Magureanu, S., Proutiere, A., Laroche, C.: Learning to rank: regret lower bounds and efficient algorithms. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 231–244 (2015)
12.
Zurück zum Zitat Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. ASA Data Sci. J. 2(5–6), 311–327 (2009)MathSciNetCrossRef Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. ASA Data Sci. J. 2(5–6), 311–327 (2009)MathSciNetCrossRef
13.
14.
Zurück zum Zitat Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)CrossRef Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)CrossRef
15.
Zurück zum Zitat Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415 (2008) Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:​0805.​3415 (2008)
16.
Zurück zum Zitat Guo, D., et al.: Deep Bayesian bandits: exploring in online personalized recommendations. In: Fourteenth ACM Conference on Recommender Systems, pp. 456–461 (2020) Guo, D., et al.: Deep Bayesian bandits: exploring in online personalized recommendations. In: Fourteenth ACM Conference on Recommender Systems, pp. 456–461 (2020)
17.
Zurück zum Zitat Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015) Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015)
18.
Zurück zum Zitat Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits (2006) Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits (2006)
19.
Zurück zum Zitat Hernandez-Leal, P., Kaisers, M., Baarslag, T., de Cote, E.M.: A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:1707.09183 (2017) Hernandez-Leal, P., Kaisers, M., Baarslag, T., de Cote, E.M.: A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:​1707.​09183 (2017)
21.
Zurück zum Zitat Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, pp. 594–605. IEEE (2003) Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, pp. 594–605. IEEE (2003)
22.
Zurück zum Zitat Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML, pp. 487–494 (2000) Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML, pp. 487–494 (2000)
23.
Zurück zum Zitat Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A.: Cascading bandits: learning to rank in the cascade model. In: International Conference on Machine Learning, pp. 767–776. PMLR (2015) Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A.: Cascading bandits: learning to rank in the cascade model. In: International Conference on Machine Learning, pp. 767–776. PMLR (2015)
24.
Zurück zum Zitat Liang, X., Li, S., Zhang, S., Huang, H., Chen, S.X.: PM\(_{2.5}\) data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. Atmos. 121(17), 10–220 (2016)CrossRef Liang, X., Li, S., Zhang, S., Huang, H., Chen, S.X.: PM\(_{2.5}\) data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. Atmos. 121(17), 10–220 (2016)CrossRef
25.
Zurück zum Zitat Liu, F., Lee, J., Shroff, N.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018) Liu, F., Lee, J., Shroff, N.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
26.
Zurück zum Zitat Mellor, J., Shapiro, J.: Thompson sampling in switching environments with Bayesian online change detection. In: Artificial Intelligence and Statistics, pp. 442–450. PMLR (2013) Mellor, J., Shapiro, J.: Thompson sampling in switching environments with Bayesian online change detection. In: Artificial Intelligence and Statistics, pp. 442–450. PMLR (2013)
28.
Zurück zum Zitat Pandey, A., Singh, P., Iyengar, L.: Bacterial decolorization and degradation of azo dyes. Int. Biodeterior. Biodegradation 59(2), 73–84 (2007)CrossRef Pandey, A., Singh, P., Iyengar, L.: Bacterial decolorization and degradation of azo dyes. Int. Biodeterior. Biodegradation 59(2), 73–84 (2007)CrossRef
30.
Zurück zum Zitat Saito, Y., Aihara, S.: Large-scale open dataset, pipeline, and benchmark for bandit algorithms. arXiv preprint arXiv:2008.07146 (2020) Saito, Y., Aihara, S.: Large-scale open dataset, pipeline, and benchmark for bandit algorithms. arXiv preprint arXiv:​2008.​07146 (2020)
31.
Zurück zum Zitat Tóth, B., Sachidanandan, S., Jørgensen, E.S.: Balancing relevance and discovery to inspire customers in the IKEA app. In: Fourteenth ACM Conference on Recommender Systems, pp. 563–563 (2020) Tóth, B., Sachidanandan, S., Jørgensen, E.S.: Balancing relevance and discovery to inspire customers in the IKEA app. In: Fourteenth ACM Conference on Recommender Systems, pp. 563–563 (2020)
32.
Zurück zum Zitat Trovo, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window thompson sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)MathSciNetCrossRef Trovo, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window thompson sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)MathSciNetCrossRef
33.
Zurück zum Zitat Tsai, T.H., Tsai, Y.D., Lin, S.D.: lil’HDoC: an algorithm for good arm identification under small threshold gap. arXiv preprint arXiv:2401.15879 (2024) Tsai, T.H., Tsai, Y.D., Lin, S.D.: lil’HDoC: an algorithm for good arm identification under small threshold gap. arXiv preprint arXiv:​2401.​15879 (2024)
34.
Zurück zum Zitat Tsai, Y.D., Lin, S.D., Lin, S.D.: Fast online inference for nonlinear contextual bandit based on generative adversarial network. arXiv preprint arXiv:2202.08867 (2022) Tsai, Y.D., Lin, S.D., Lin, S.D.: Fast online inference for nonlinear contextual bandit based on generative adversarial network. arXiv preprint arXiv:​2202.​08867 (2022)
36.
Zurück zum Zitat Zelen, M.: Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64(325), 131–146 (1969)MathSciNetCrossRef Zelen, M.: Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64(325), 131–146 (1969)MathSciNetCrossRef
Metadaten
Titel
Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards
verfasst von
Yun-Da Tsai
Shou-De Lin
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2650-9_13

Premium Partner