In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting
learning least squares algorithms of the literature (LSTD , LSPE , FPKF  and GPTD /KTD ) to
off-policy learning with eligibility traces
. This leads to two known algorithms, LSTD(
)  and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(
)  remains the best least-squares algorithm.