nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards

verfasst von : Yun-Da Tsai, Shou-De Lin

Erschienen in: Trends and Applications in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present a study on the non-stationary stochastic multi-armed bandit (MAB) problem, which is relevant for addressing real-world challenges related to sequential decision-making. Our work involves a thorough analysis of state-of-the-art algorithms in dynamically changing environments. To address the limitations of existing methods, we propose the Concept Drift Adaptive Bandit (CDAB) framework, which aims to capture and predict potential future concept drift patterns in reward distribution, allowing for better adaptation in non-stationary environments. We conduct extensive numerical experiments to evaluate the effectiveness of the CDAB approach in comparison to both stationary and non-stationary state-of-the-art baselines. Our experiments involve testing on both artificial datasets and real-world data under different types of changing environments. The results show that the CDAB approach exhibits strong empirical performance, outperforming existing methods in all versions tested.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Adaptive Knowledge Sharing in Multi-Task Learning: Insights from Electricity Data Analysis

Nur mit Berechtigung zugänglich

Auer, P., Gajane, P., Ortner, R.: Adaptively tracking the best bandit arm with an unknown number of distribution changes. In: Conference on Learning Theory, pp. 138–158. PMLR (2019)

Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53 (2004)

Bergemann, D., Hege, U.: The financing of innovation: learning and stopping. RAND J. Econ. 36(4), 719–752 (2005)

Bergemann, D., Välimäki, J.: Learning and strategic pricing. Econometrica: J. Econometric Soc. 64(5), 1125–1149 (1996)CrossRef

Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)

Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 418–427. PMLR (2019)

Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: Artificial Intelligence and Statistics, pp. 10–18. PMLR (2016)

Cavenaghi, E., Sottocornola, G., Stella, F., Zanker, M.: Non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm. Entropy 23(3), 380 (2021)MathSciNetCrossRef

10.

Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z.: Freeway performance measurement system: mining loop detector data. Transp. Res. Rec. 1748(1), 96–102 (2001)CrossRef

11.

Combes, R., Magureanu, S., Proutiere, A., Laroche, C.: Learning to rank: regret lower bounds and efficient algorithms. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 231–244 (2015)

12.

Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. ASA Data Sci. J. 2(5–6), 311–327 (2009)MathSciNetCrossRef

13.

Dudík, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011)

14.

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)CrossRef

15.

Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415 (2008)

16.

Guo, D., et al.: Deep Bayesian bandits: exploring in online personalized recommendations. In: Fourteenth ACM Conference on Recommender Systems, pp. 456–461 (2020)

17.

Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015)

18.

Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits (2006)

19.

Hernandez-Leal, P., Kaisers, M., Baarslag, T., de Cote, E.M.: A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:1707.09183 (2017)

20.

Huang, K.H., Lin, H.T.: Linear upper confidence bound algorithm for contextual bandit problem with piled rewards. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 143–155. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_12CrossRef

21.

Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, pp. 594–605. IEEE (2003)

22.

Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML, pp. 487–494 (2000)

23.

Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A.: Cascading bandits: learning to rank in the cascade model. In: International Conference on Machine Learning, pp. 767–776. PMLR (2015)

24.

Liang, X., Li, S., Zhang, S., Huang, H., Chen, S.X.: PM\(_{2.5}\) data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. Atmos. 121(17), 10–220 (2016)CrossRef

25.

Liu, F., Lee, J., Shroff, N.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

26.

Mellor, J., Shapiro, J.: Thompson sampling in switching environments with Bayesian online change detection. In: Artificial Intelligence and Statistics, pp. 442–450. PMLR (2013)

27.

Nishida, K., Yamauchi, K.: Detecting concept drift using statistical testing. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS, vol. 4755, pp. 264–269. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75488-6_27CrossRef

28.

Pandey, A., Singh, P., Iyengar, L.: Bacterial decolorization and degradation of azo dyes. Int. Biodeterior. Biodegradation 59(2), 73–84 (2007)CrossRef

29.

Raj, V., Kalyani, S.: Taming non-stationary bandits: a Bayesian approach. arXiv preprint arXiv:1707.09727 (2017)

30.

Saito, Y., Aihara, S.: Large-scale open dataset, pipeline, and benchmark for bandit algorithms. arXiv preprint arXiv:2008.07146 (2020)

31.

Tóth, B., Sachidanandan, S., Jørgensen, E.S.: Balancing relevance and discovery to inspire customers in the IKEA app. In: Fourteenth ACM Conference on Recommender Systems, pp. 563–563 (2020)

32.

Trovo, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window thompson sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)MathSciNetCrossRef

33.

Tsai, T.H., Tsai, Y.D., Lin, S.D.: lil’HDoC: an algorithm for good arm identification under small threshold gap. arXiv preprint arXiv:2401.15879 (2024)

34.

Tsai, Y.D., Lin, S.D., Lin, S.D.: Fast online inference for nonlinear contextual bandit based on generative adversarial network. arXiv preprint arXiv:2202.08867 (2022)

35.

Tsai, Y.D., Tsai, T.H., Lin, S.D.: Differential good arm identification. arXiv preprint arXiv:2303.07154 (2023)

36.

Zelen, M.: Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64(325), 131–146 (1969)MathSciNetCrossRef

37.

Žliobaitė, I.: Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784 (2010)

Titel: Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards
verfasst von: Yun-Da Tsai
Shou-De Lin
Verlag: Springer Nature Singapore
Buch: Trends and Applications in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9726-49-3

Electronic ISBN: 978-981-9726-50-9

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2650-9_13

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner