Incremental Multi-Step Q-Learning

Peng, Jing; Williams, Ronald J.

doi:10.1023/A:1018076709321

Incremental Multi-Step Q-Learning

Published: January 1996

Volume 22, pages 283–290, (1996)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Incremental Multi-Step Q-Learning

Download PDF

Jing Peng &
Ronald J. Williams

2954 Accesses
141 Citations
Explore all metrics

Abstract

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

References

BarroA.G.-Sutton R.S. & Anderson C.W. 1983. Neuron like elements that can solve difficult learning control problems. IEEE Transactions on Systems Manand Cybernetics 13: 835–846
Google Scholar
Cichosz, P. & Mulawka, J. J. (1995) Fast and efficient reinforcement learning with truncated temporal differences. Proceedings of the Twelfth International Conference on Machine Learning 99–107.
Lin, L. J. (1992). Reinforcement learning for robots using neural networks. Ph. D. Dissertation, Carnegie Mellon University, PA.
Google Scholar
Moore, A. W. & Atkcson, C. G. (1994). Prioritized sweeping: reinforcement learning with less data and less time. Machine Learning 13(1):103–130.
Google Scholar
Pendrith, M. (1994). On reinforcement learning of control actions in noisy and nont-Markovian domains. UNSW-CSE-TR-9410, University of New South Wales, Australia.
Google Scholar
Peng, J. (1993) Efficient Dynamic Programming-Based Learning for Control. Ph. D. Dissertation, Northeastern University, Boston, MA 02115.
Google Scholar
Peng, J. & Williams R. J. (1993). Efficient learning and planning within the Dyna Framework. Adaptive Behavior 1(4):437–454.
Google Scholar
Ross, S. (1983). Introduction to Stochastic Dynamic Programming. New York, Academic Press.
Google Scholar
Runmey, G. A. & Niranjan, M. (1994). On-line Q-learning using connectionist systems CUED/F-INFENG/IR 166. Cambridge University, UK.
Google Scholar
Sutton, R S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, 216–224.
Sutton, R S. (1988). Learning to predict by the method of temporal differences. Machine Learning 3:9-l4.
Google Scholar
Sutton, R. S. & Singh, S. P. (1994). On step-size and bias in temporal-difference learning. In Eighth Yale Workshop on adaptive and Learning Systems, pages 91–96. New Haven, CT.
Google Scholar
Watkins, C. J. C. H. & Dayan, P (1992). Q-leaming. Machine Learning 279–292.
Watkins, C. J. C. H (1989). Learning from delayed rewards. Ph. D. Dissertation, King's College, UK.
Google Scholar

Download references

Authors

Jing Peng
View author publications
You can also search for this author in PubMed Google Scholar
Ronald J. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, J., Williams, R.J. Incremental Multi-Step Q-Learning. Machine Learning 22, 283–290 (1996). https://doi.org/10.1023/A:1018076709321

Download citation

Issue Date: January 1996
DOI: https://doi.org/10.1023/A:1018076709321

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Incremental Multi-Step Q-Learning

Abstract

Article PDF

Similar content being viewed by others

Monte Carlo Bias Correction in Q-Learning

Reinforcement Learning

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

References

Rights and permissions

About this article

Cite this article

Navigation

Incremental Multi-Step Q-Learning

Abstract

Article PDF

Similar content being viewed by others

Monte Carlo Bias Correction in Q-Learning

Reinforcement Learning

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation