Skip to main content
Top

2015 | OriginalPaper | Chapter

A Bayesian Sarsa Learning Algorithm with Bandit-Based Method

Authors : Shuhua You, Quan Liu, Qiming Fu, Shan Zhong, Fei Zhu

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose an efficient algorithm called Bayesian Sarsa (BS) on the consideration of balancing the tradeoff between exploration and exploitation in reinforcement learning. We adopt probability distributions to estimate Q-values and compute posterior distributions about Q-values by Bayesian Inference. It can improve the accuracy of Q-values function estimation. In the process of algorithm learning, we use a Bandit-based method to solve the exploration/exploitation problem. It chooses actions according to the current mean estimate of Q-values plus an additional reward bonus for state-action pairs that have been observed relatively little. We demonstrate that Bayesian Sarsa performs quite favorably compared to state-of-the-art reinforcement learning approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
2.
go back to reference Kalidindi, K., Bowman, H.: Using \(\epsilon \)-greedy reinforcement learning methods to further understand ventromedial prefrontal patients’ deficits on the Iowa Gambling Task. Neural Netw. 20(6), 676–689 (2007)CrossRefMATH Kalidindi, K., Bowman, H.: Using \(\epsilon \)-greedy reinforcement learning methods to further understand ventromedial prefrontal patients’ deficits on the Iowa Gambling Task. Neural Netw. 20(6), 676–689 (2007)CrossRefMATH
3.
go back to reference Coggan, M.: Exploration and exploitation in reinforcement learning. In: 4th International Conference on Computational Intelligence and Multimedia Applications. IEEE Press, Japan (2001) Coggan, M.: Exploration and exploitation in reinforcement learning. In: 4th International Conference on Computational Intelligence and Multimedia Applications. IEEE Press, Japan (2001)
4.
go back to reference Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877–917 (2006)MathSciNetMATH Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877–917 (2006)MathSciNetMATH
5.
go back to reference Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: 15th International Conference on Artificial Intelligence. AAAI Press, Menlo Park (1998) Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: 15th International Conference on Artificial Intelligence. AAAI Press, Menlo Park (1998)
6.
go back to reference WClaxton, K., Neumann, P.J., Araki, S., et al.: Bayesian value-of-information analysis. Int. J. Technol. Assess. Health Care 17(1), 38–55 (2001)CrossRef WClaxton, K., Neumann, P.J., Araki, S., et al.: Bayesian value-of-information analysis. Int. J. Technol. Assess. Health Care 17(1), 38–55 (2001)CrossRef
7.
go back to reference Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: a Bayesian approach. In: Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 709–716. ACM (2003) Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: a Bayesian approach. In: Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 709–716. ACM (2003)
8.
go back to reference Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4), 665–687 (2002)CrossRef Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4), 665–687 (2002)CrossRef
9.
go back to reference Brochu, E., Cora, V.M., De, Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599 (2010) Brochu, E., Cora, V.M., De, Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions with application to active user modeling and hierarchical reinforcement learning. arXiv:​1012.​2599 (2010)
10.
go back to reference Kolter, J.Z., Ng, A.: Near-Bayesian exploration in polynomial time. In: 26th International Conference on Machine Learning, pp. 513–520 (2009) Kolter, J.Z., Ng, A.: Near-Bayesian exploration in polynomial time. In: 26th International Conference on Machine Learning, pp. 513–520 (2009)
11.
go back to reference Brafman, R.I., Tennenholtz, M.: R-max: a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)MathSciNetMATH Brafman, R.I., Tennenholtz, M.: R-max: a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)MathSciNetMATH
12.
go back to reference Strehl, A.L., Li, L., Wiewiora, E., et al.: PAC model-free reinforcement learning. In: 23rd International Conference on Machine Learning, pp. 881–888. ACM (2006) Strehl, A.L., Li, L., Wiewiora, E., et al.: PAC model-free reinforcement learning. In: 23rd International Conference on Machine Learning, pp. 881–888. ACM (2006)
13.
go back to reference Degroot, M., Schervish, M.: Probability and Statistics, 4th edn. Pearson Education, Inc., New York (2010) Degroot, M., Schervish, M.: Probability and Statistics, 4th edn. Pearson Education, Inc., New York (2010)
14.
go back to reference Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH
15.
go back to reference Cox, C., Chu, H., Schneider, M.F., et al.: Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat. Med. 26(23), 4352–4374 (2007)MathSciNetCrossRef Cox, C., Chu, H., Schneider, M.F., et al.: Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat. Med. 26(23), 4352–4374 (2007)MathSciNetCrossRef
16.
go back to reference Shin, J.W., Chang, J.H., Kim, N.S.: Statistical modeling of speech signals based on generalized gamma distribution. IEEE Sig. Process. Lett. 12(3), 258–261 (2005)CrossRef Shin, J.W., Chang, J.H., Kim, N.S.: Statistical modeling of speech signals based on generalized gamma distribution. IEEE Sig. Process. Lett. 12(3), 258–261 (2005)CrossRef
Metadata
Title
A Bayesian Sarsa Learning Algorithm with Bandit-Based Method
Authors
Shuhua You
Quan Liu
Qiming Fu
Shan Zhong
Fei Zhu
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-26532-2_13

Premium Partner