Skip to main content
Top

2016 | OriginalPaper | Chapter

Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping

Authors : Cijia Sun, Xinghong Ling, Yuchen Fu, Quan Liu, Haijun Zhu, Jianwei Zhai, Peng Zhang

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
2.
go back to reference Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)CrossRefMATH Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)CrossRefMATH
3.
go back to reference Wiering, M., van Otterlo, M.: Reinforcement learning: state-of-the-art. Phillip Journal Fr Restaurative Zahnmedizin (2012) Wiering, M., van Otterlo, M.: Reinforcement learning: state-of-the-art. Phillip Journal Fr Restaurative Zahnmedizin (2012)
4.
go back to reference van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 205–248. Springer, Heidelberg (2012) van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 205–248. Springer, Heidelberg (2012)
5.
go back to reference Xu, X., Xie, X., Hu, D.: Kernel least-squares temporal difference learning. Int. J. Inf. Technol. 11(9), 54–63 (2005) Xu, X., Xie, X., Hu, D.: Kernel least-squares temporal difference learning. Int. J. Inf. Technol. 11(9), 54–63 (2005)
6.
go back to reference Xu, X., Hu, D.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)CrossRef Xu, X., Hu, D.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)CrossRef
7.
go back to reference Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993) Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
8.
go back to reference Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.P.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Conference on Uncertainty in Artificial Intelligence (2008) Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.P.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Conference on Uncertainty in Artificial Intelligence (2008)
9.
go back to reference Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4(6), 1107–1149 (2010)MathSciNetMATH Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4(6), 1107–1149 (2010)MathSciNetMATH
10.
go back to reference Liu, Q., Zhou, X., Zhu, F., Fu, Q., Fu, Y.: Experience replay for least-squares policy iteration. IEEE/CAA J. Autom. Sin. 1(3), 274–281 (2014). IEEECrossRef Liu, Q., Zhou, X., Zhu, F., Fu, Q., Fu, Y.: Experience replay for least-squares policy iteration. IEEE/CAA J. Autom. Sin. 1(3), 274–281 (2014). IEEECrossRef
11.
go back to reference Xu, X.: A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006). doi:10.1007/11881070_8 CrossRef Xu, X.: A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006). doi:10.​1007/​11881070_​8 CrossRef
12.
go back to reference Jong, N., Stone, P.: Kernel-based models for reinforcement learning. In: ICML Workshop on Kernel Machines and Reinforcement Learning (2006) Jong, N., Stone, P.: Kernel-based models for reinforcement learning. In: ICML Workshop on Kernel Machines and Reinforcement Learning (2006)
13.
go back to reference Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: 20th International Conference on Machine Learning, pp. 154–161. American Association for Artificial Intelligence (2003) Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: 20th International Conference on Machine Learning, pp. 154–161. American Association for Artificial Intelligence (2003)
14.
go back to reference Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of least-squares policy iteration. J. Mach. Learn. Res. 13(1), 3041–3074 (2012). Microtome PublishingMathSciNetMATH Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of least-squares policy iteration. J. Mach. Learn. Res. 13(1), 3041–3074 (2012). Microtome PublishingMathSciNetMATH
Metadata
Title
Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping
Authors
Cijia Sun
Xinghong Ling
Yuchen Fu
Quan Liu
Haijun Zhu
Jianwei Zhai
Peng Zhang
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46675-0_25

Premium Partner