Top

Cluster Computing

Published in:

10-11-2017

Residual Sarsa algorithm with function approximation

Authors: Fu Qiming, Hu Wen, Liu Quan, Luo Heng, Hu Lingyao, Chen Jianping

Published in: Cluster Computing | Special Issue 1/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms.

previous article Fitting a cluster of line images under central catadioptric camera

next article Expected value model of bus gas station site layout problem with fuzzy demand in supplementary fuel using genetic algorithm

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (1998)MATH

Liu, Q., Fu, Q.M., Gong, S.R., Fu, Y.C., Cui, Z.M.: Reinforcement learning algorithm based on minimum state method and average reward. J. Commun. 32(1), 66–71 (2011)

Sutton, R.S.: Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988)

Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016)

Chettibi, S., Chikhi, S.: Dynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networks. Appl. Soft Comput. 38, 321–328 (2016)CrossRef

Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016)

Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006)

Yen, G., Yang, F., Hickey, T.: Coordination of exploration and exploitation in a dynamic environment. Int. J. Smart Eng. Syst. Des. 4(3), 177–182 (2002)CrossRef

Derhami, V., Majd, V.J., Ahmadabadi, M.N.: Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Syst. 161(4), 578–595 (2010)MathSciNetCrossRef

10.

You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015)

11.

Liu, Q., Li, J., Fu, Q.M.: A multiple-goal Sarsa(\(\lambda )\) algorithm based on lost reward of greatest mass. J. Electron. 41(8), 1469–1473 (2013)

12.

Xiao, F., Liu, Q., Fu, Q.M.: Gradient descent Sarsa(\(\lambda )\) algorithm based on the adaptive potential function shaping reward mechanism. J. Commun. 1, 77–88 (2013)

13.

Fu, Q.M., Liu, Q., You, S.H.: A novel fast Sarsa algorithm based on value function transfer. J. Electron. 42(11), 2157–2161 (2014)

14.

Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(\(\lambda )\) algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016)

15.

Antos, A., Szepesvari, C., Mounos, R.: Learning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)CrossRef

16.

Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement learning and dynamic programming using function approximators. CRC Press, New York (2010)

17.

Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998)

18.

Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011)

19.

Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014)

20.

Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009)

21.

Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009)

22.

Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010)

23.

Kalyanakrish, S., Stone, P.: Characterizing reinforcement learning methods through parameterized learning problems. Mach. Learn. 84(1–2), 205–247 (2011)MathSciNetCrossRef

24.

Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015)

25.

Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016)

26.

Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017)

27.

Singh, S., Jaakkola, T., Littman, M.L., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)CrossRefMATH

28.

Barnard, E.: Temporal-difference methods and Markov models. IEEE Trans. Syst. Man Cybern. 23(2), 357–365 (1993)CrossRefMATH

Title: Residual Sarsa algorithm with function approximation
Authors: Fu Qiming
Hu Wen
Liu Quan
Luo Heng
Hu Lingyao
Chen Jianping
Publication date: 10-11-2017
Publisher: Springer US
Published in: Cluster Computing / Issue Special Issue 1/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-1303-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Special Issue 1/2019

Performance evaluation of revised virtual resources allocation scheme in network function virtualization (NFV) networks

Intelligent computation techniques for optimization of the shortest path in an asynchronous network-on-chip

Safe and convenient personal authentication method using Moiré 3D authentication based on biometric authentication

Smart hardware hybrid secure searchable encryption in cloud with IoT privacy management for smart home system

Outsourced privacy-preserving C4.5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties

A parameter based growing ensemble of self-organizing maps for outlier detection in healthcare

Premium Partner