Top

Published in:

2015 | OriginalPaper | Chapter

A Bayesian Sarsa Learning Algorithm with Bandit-Based Method

Authors : Shuhua You, Quan Liu, Qiming Fu, Shan Zhong, Fei Zhu

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We propose an efficient algorithm called Bayesian Sarsa (BS) on the consideration of balancing the tradeoff between exploration and exploitation in reinforcement learning. We adopt probability distributions to estimate Q-values and compute posterior distributions about Q-values by Bayesian Inference. It can improve the accuracy of Q-values function estimation. In the process of algorithm learning, we use a Bandit-based method to solve the exploration/exploitation problem. It chooses actions according to the current mean estimate of Q-values plus an additional reward bonus for state-action pairs that have been observed relatively little. We demonstrate that Bayesian Sarsa performs quite favorably compared to state-of-the-art reinforcement learning approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Framework for Online Inter-subjects Classification in Endogenous Brain-Computer Interfaces

next chapter Incrementally Built Dictionary Learning for Sparse Representation

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

Kalidindi, K., Bowman, H.: Using \(\epsilon \)-greedy reinforcement learning methods to further understand ventromedial prefrontal patients’ deficits on the Iowa Gambling Task. Neural Netw. 20(6), 676–689 (2007)CrossRefMATH

Coggan, M.: Exploration and exploitation in reinforcement learning. In: 4th International Conference on Computational Intelligence and Multimedia Applications. IEEE Press, Japan (2001)

Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877–917 (2006)MathSciNetMATH

Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: 15th International Conference on Artificial Intelligence. AAAI Press, Menlo Park (1998)

WClaxton, K., Neumann, P.J., Araki, S., et al.: Bayesian value-of-information analysis. Int. J. Technol. Assess. Health Care 17(1), 38–55 (2001)CrossRef

Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: a Bayesian approach. In: Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 709–716. ACM (2003)

Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4), 665–687 (2002)CrossRef

Brochu, E., Cora, V.M., De, Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599 (2010)

10.

Kolter, J.Z., Ng, A.: Near-Bayesian exploration in polynomial time. In: 26th International Conference on Machine Learning, pp. 513–520 (2009)

11.

Brafman, R.I., Tennenholtz, M.: R-max: a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)MathSciNetMATH

12.

Strehl, A.L., Li, L., Wiewiora, E., et al.: PAC model-free reinforcement learning. In: 23rd International Conference on Machine Learning, pp. 881–888. ACM (2006)

13.

Degroot, M., Schervish, M.: Probability and Statistics, 4th edn. Pearson Education, Inc., New York (2010)

14.

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH

15.

Cox, C., Chu, H., Schneider, M.F., et al.: Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat. Med. 26(23), 4352–4374 (2007)MathSciNetCrossRef

16.

Shin, J.W., Chang, J.H., Kim, N.S.: Statistical modeling of speech signals based on generalized gamma distribution. IEEE Sig. Process. Lett. 12(3), 258–261 (2005)CrossRef

Title: A Bayesian Sarsa Learning Algorithm with Bandit-Based Method
Authors: Shuhua You
Quan Liu
Qiming Fu
Shan Zhong
Fei Zhu
Publisher: Springer International Publishing
Book: Neural Information Processing
Print ISBN: 978-3-319-26531-5

Electronic ISBN: 978-3-319-26532-2

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-26532-2_13

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner