2012 | OriginalPaper | Chapter
Recursive Least-Squares Learning with Eligibility Traces
Authors : Bruno Scherrer, Matthieu Geist
Published in: Recent Advances in Reinforcement Learning
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting
on-policy
learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to
off-policy learning with eligibility traces
. This leads to two known algorithms, LSTD(
λ
)/LSPE(
λ
) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(
λ
) [21] remains the best least-squares algorithm.