Top

Published in:

2024 | OriginalPaper | Chapter

Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards

Authors : Yun-Da Tsai, Shou-De Lin

Published in: Trends and Applications in Knowledge Discovery and Data Mining

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We present a study on the non-stationary stochastic multi-armed bandit (MAB) problem, which is relevant for addressing real-world challenges related to sequential decision-making. Our work involves a thorough analysis of state-of-the-art algorithms in dynamically changing environments. To address the limitations of existing methods, we propose the Concept Drift Adaptive Bandit (CDAB) framework, which aims to capture and predict potential future concept drift patterns in reward distribution, allowing for better adaptation in non-stationary environments. We conduct extensive numerical experiments to evaluate the effectiveness of the CDAB approach in comparison to both stationary and non-stationary state-of-the-art baselines. Our experiments involve testing on both artificial datasets and real-world data under different types of changing environments. The results show that the CDAB approach exhibits strong empirical performance, outperforming existing methods in all versions tested.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Adaptive Knowledge Sharing in Multi-Task Learning: Insights from Electricity Data Analysis

Available only for authorised users

Auer, P., Gajane, P., Ortner, R.: Adaptively tracking the best bandit arm with an unknown number of distribution changes. In: Conference on Learning Theory, pp. 138–158. PMLR (2019)

Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53 (2004)

Bergemann, D., Hege, U.: The financing of innovation: learning and stopping. RAND J. Econ. 36(4), 719–752 (2005)

Bergemann, D., Välimäki, J.: Learning and strategic pricing. Econometrica: J. Econometric Soc. 64(5), 1125–1149 (1996)CrossRef

Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)

Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 418–427. PMLR (2019)

Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: Artificial Intelligence and Statistics, pp. 10–18. PMLR (2016)

Cavenaghi, E., Sottocornola, G., Stella, F., Zanker, M.: Non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm. Entropy 23(3), 380 (2021)MathSciNetCrossRef

10.

Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z.: Freeway performance measurement system: mining loop detector data. Transp. Res. Rec. 1748(1), 96–102 (2001)CrossRef

11.

Combes, R., Magureanu, S., Proutiere, A., Laroche, C.: Learning to rank: regret lower bounds and efficient algorithms. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 231–244 (2015)

12.

Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. ASA Data Sci. J. 2(5–6), 311–327 (2009)MathSciNetCrossRef

13.

Dudík, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011)

14.

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)CrossRef

15.

Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415 (2008)

16.

Guo, D., et al.: Deep Bayesian bandits: exploring in online personalized recommendations. In: Fourteenth ACM Conference on Recommender Systems, pp. 456–461 (2020)

17.

Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015)

18.

Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits (2006)

19.

Hernandez-Leal, P., Kaisers, M., Baarslag, T., de Cote, E.M.: A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:1707.09183 (2017)

20.

Huang, K.H., Lin, H.T.: Linear upper confidence bound algorithm for contextual bandit problem with piled rewards. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 143–155. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_12CrossRef

21.

Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, pp. 594–605. IEEE (2003)

22.

Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML, pp. 487–494 (2000)

23.

Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A.: Cascading bandits: learning to rank in the cascade model. In: International Conference on Machine Learning, pp. 767–776. PMLR (2015)

24.

Liang, X., Li, S., Zhang, S., Huang, H., Chen, S.X.: PM\(_{2.5}\) data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. Atmos. 121(17), 10–220 (2016)CrossRef

25.

Liu, F., Lee, J., Shroff, N.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

26.

Mellor, J., Shapiro, J.: Thompson sampling in switching environments with Bayesian online change detection. In: Artificial Intelligence and Statistics, pp. 442–450. PMLR (2013)

27.

Nishida, K., Yamauchi, K.: Detecting concept drift using statistical testing. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS, vol. 4755, pp. 264–269. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75488-6_27CrossRef

28.

Pandey, A., Singh, P., Iyengar, L.: Bacterial decolorization and degradation of azo dyes. Int. Biodeterior. Biodegradation 59(2), 73–84 (2007)CrossRef

29.

Raj, V., Kalyani, S.: Taming non-stationary bandits: a Bayesian approach. arXiv preprint arXiv:1707.09727 (2017)

30.

Saito, Y., Aihara, S.: Large-scale open dataset, pipeline, and benchmark for bandit algorithms. arXiv preprint arXiv:2008.07146 (2020)

31.

Tóth, B., Sachidanandan, S., Jørgensen, E.S.: Balancing relevance and discovery to inspire customers in the IKEA app. In: Fourteenth ACM Conference on Recommender Systems, pp. 563–563 (2020)

32.

Trovo, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window thompson sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)MathSciNetCrossRef

33.

Tsai, T.H., Tsai, Y.D., Lin, S.D.: lil’HDoC: an algorithm for good arm identification under small threshold gap. arXiv preprint arXiv:2401.15879 (2024)

34.

Tsai, Y.D., Lin, S.D., Lin, S.D.: Fast online inference for nonlinear contextual bandit based on generative adversarial network. arXiv preprint arXiv:2202.08867 (2022)

35.

Tsai, Y.D., Tsai, T.H., Lin, S.D.: Differential good arm identification. arXiv preprint arXiv:2303.07154 (2023)

36.

Zelen, M.: Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64(325), 131–146 (1969)MathSciNetCrossRef

37.

Žliobaitė, I.: Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784 (2010)

Title: Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards
Authors: Yun-Da Tsai
Shou-De Lin
Publisher: Springer Nature Singapore
Book: Trends and Applications in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9726-49-3

Electronic ISBN: 978-981-9726-50-9

Copyright Year: 2024
DOI: https://doi.org/10.1007/978-981-97-2650-9_13

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner