Skip to main content
Top
Published in: Cluster Computing 1/2019

10-11-2017

Residual Sarsa algorithm with function approximation

Authors: Fu Qiming, Hu Wen, Liu Quan, Luo Heng, Hu Lingyao, Chen Jianping

Published in: Cluster Computing | Special Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (1998)MATH Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (1998)MATH
2.
go back to reference Liu, Q., Fu, Q.M., Gong, S.R., Fu, Y.C., Cui, Z.M.: Reinforcement learning algorithm based on minimum state method and average reward. J. Commun. 32(1), 66–71 (2011) Liu, Q., Fu, Q.M., Gong, S.R., Fu, Y.C., Cui, Z.M.: Reinforcement learning algorithm based on minimum state method and average reward. J. Commun. 32(1), 66–71 (2011)
3.
go back to reference Sutton, R.S.: Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988) Sutton, R.S.: Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988)
4.
go back to reference Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016) Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016)
5.
go back to reference Chettibi, S., Chikhi, S.: Dynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networks. Appl. Soft Comput. 38, 321–328 (2016)CrossRef Chettibi, S., Chikhi, S.: Dynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networks. Appl. Soft Comput. 38, 321–328 (2016)CrossRef
6.
go back to reference Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016) Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016)
7.
go back to reference Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006) Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006)
8.
go back to reference Yen, G., Yang, F., Hickey, T.: Coordination of exploration and exploitation in a dynamic environment. Int. J. Smart Eng. Syst. Des. 4(3), 177–182 (2002)CrossRef Yen, G., Yang, F., Hickey, T.: Coordination of exploration and exploitation in a dynamic environment. Int. J. Smart Eng. Syst. Des. 4(3), 177–182 (2002)CrossRef
9.
go back to reference Derhami, V., Majd, V.J., Ahmadabadi, M.N.: Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Syst. 161(4), 578–595 (2010)MathSciNetCrossRef Derhami, V., Majd, V.J., Ahmadabadi, M.N.: Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Syst. 161(4), 578–595 (2010)MathSciNetCrossRef
10.
go back to reference You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015) You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015)
11.
go back to reference Liu, Q., Li, J., Fu, Q.M.: A multiple-goal Sarsa(\(\lambda )\) algorithm based on lost reward of greatest mass. J. Electron. 41(8), 1469–1473 (2013) Liu, Q., Li, J., Fu, Q.M.: A multiple-goal Sarsa(\(\lambda )\) algorithm based on lost reward of greatest mass. J. Electron. 41(8), 1469–1473 (2013)
12.
go back to reference Xiao, F., Liu, Q., Fu, Q.M.: Gradient descent Sarsa(\(\lambda )\) algorithm based on the adaptive potential function shaping reward mechanism. J. Commun. 1, 77–88 (2013) Xiao, F., Liu, Q., Fu, Q.M.: Gradient descent Sarsa(\(\lambda )\) algorithm based on the adaptive potential function shaping reward mechanism. J. Commun. 1, 77–88 (2013)
13.
go back to reference Fu, Q.M., Liu, Q., You, S.H.: A novel fast Sarsa algorithm based on value function transfer. J. Electron. 42(11), 2157–2161 (2014) Fu, Q.M., Liu, Q., You, S.H.: A novel fast Sarsa algorithm based on value function transfer. J. Electron. 42(11), 2157–2161 (2014)
14.
go back to reference Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(\(\lambda )\) algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016) Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(\(\lambda )\) algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016)
15.
go back to reference Antos, A., Szepesvari, C., Mounos, R.: Learning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)CrossRef Antos, A., Szepesvari, C., Mounos, R.: Learning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)CrossRef
16.
go back to reference Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement learning and dynamic programming using function approximators. CRC Press, New York (2010) Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement learning and dynamic programming using function approximators. CRC Press, New York (2010)
17.
go back to reference Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998) Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998)
18.
go back to reference Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011) Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011)
19.
go back to reference Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014) Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014)
20.
go back to reference Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009) Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009)
21.
go back to reference Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009) Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009)
22.
go back to reference Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010) Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010)
23.
go back to reference Kalyanakrish, S., Stone, P.: Characterizing reinforcement learning methods through parameterized learning problems. Mach. Learn. 84(1–2), 205–247 (2011)MathSciNetCrossRef Kalyanakrish, S., Stone, P.: Characterizing reinforcement learning methods through parameterized learning problems. Mach. Learn. 84(1–2), 205–247 (2011)MathSciNetCrossRef
24.
go back to reference Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015) Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015)
25.
go back to reference Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016) Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:​1608.​05151 (2016)
26.
go back to reference Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017) Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017)
27.
go back to reference Singh, S., Jaakkola, T., Littman, M.L., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)CrossRefMATH Singh, S., Jaakkola, T., Littman, M.L., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)CrossRefMATH
28.
go back to reference Barnard, E.: Temporal-difference methods and Markov models. IEEE Trans. Syst. Man Cybern. 23(2), 357–365 (1993)CrossRefMATH Barnard, E.: Temporal-difference methods and Markov models. IEEE Trans. Syst. Man Cybern. 23(2), 357–365 (1993)CrossRefMATH
Metadata
Title
Residual Sarsa algorithm with function approximation
Authors
Fu Qiming
Hu Wen
Liu Quan
Luo Heng
Hu Lingyao
Chen Jianping
Publication date
10-11-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 1/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1303-8

Other articles of this Special Issue 1/2019

Cluster Computing 1/2019 Go to the issue

Premium Partner