Transfer in variable-reward hierarchical reinforcement learning

Mehta, Neville; Natarajan, Sriraam; Tadepalli, Prasad; Fern, Alan

doi:10.1007/s10994-008-5061-y

Transfer in variable-reward hierarchical reinforcement learning

Published: 03 June 2008

Volume 73, pages 289–312, (2008)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Transfer in variable-reward hierarchical reinforcement learning

Download PDF

Neville Mehta¹,
Sriraam Natarajan¹,
Prasad Tadepalli¹ &
…
Alan Fern¹

1726 Accesses
37 Citations
Explore all metrics

Abstract

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs.

References

Abbeel, P., & Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the ICML.
Andre, D., & Russell, S. (2002). State abstraction for programmable reinforcement learning agents. In Eighteenth national conference on artificial intelligence (pp. 119–125).
Dietterich, T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 9, 227–303.
MathSciNet Google Scholar
Feinberg, E., & Schwartz, A. (1995). Constrained Markov decision models with weighted discounted rewards. Mathematics of Operations Research, 20(2), 302–320.
Article MATH MathSciNet Google Scholar
Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning. In Proceedings of the ICML.
Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored MDPs. In Proceedings NIPS-01.
Guestrin, C., Koller, D., Gearhart, C., & Kanodia, N. (2003). Generalizing plans to new environments in relational MDPs. In International joint conference on artificial intelligence.
Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. AI Journal.
Liu, Y., & Stone, P. (2006). Value-function-based transfer for reinforcement learning using structure mapping. In Proceedings of the twenty-first national conference on artificial intelligence.
Mausam, D. (2003). Solving relational MDPs with first-order machine learning. In Proceedings of the ICAPS workshop on planning under uncertainty and incomplete information.
Mehta, N., & Tadepalli, P. (2005). Multi-agent shared hierarchy reinforcement learning. In ICML workshop on rich representations in reinforcement learning.
Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the ICML.
Parr, R. (1998). Flexible decomposition algorithms for weakly coupled Markov decision problems. In UAI.
Price, B., & Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 569–629.
Puterman, M. L. (1994). Markov decision processes. New York: Wiley.
MATH Google Scholar
Russell, S., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. In Proceedings of ICML-03.
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the 10th international conference on machine learning. San Mateo: Morgan Kaufmann.
Google Scholar
Seri, S., & Tadepalli, P. (2002). Model-based hierarchical average reward reinforcement learning. In Proceedings of the ICML (pp. 562–569).
Sutton, R., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.
Article MATH MathSciNet Google Scholar
Tadepalli, P., & Ok, D. (1998). Model-based average reward reinforcement learning. Artificial Intelligence, 100, 177–224.
Article MATH Google Scholar
Taylor, M., Stone, P., & Liu, Y. (2005). Value functions for RL-based behavior transfer: a comparative study. In Proceedings of the twentieth national conference on artificial intelligence.
Torrey, L., Shavlik, J., Walker, T., & Maclin, R. (2007). Relational macros for transfer in reinforcement learning. In Proceedings of the 17th conference on inductive logic programming.
Weeks, J. (1985). The shape of space: how to visualize surfaces and three-dimensional manifolds.
White, D. (1982). Multi-objective infinite-horizon discounted Markov decision processes. Journal of Mathematical Analysis and Applications, 89, 639–647.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330, USA
Neville Mehta, Sriraam Natarajan, Prasad Tadepalli & Alan Fern

Authors

Neville Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Sriraam Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Tadepalli
View author publications
You can also search for this author in PubMed Google Scholar
Alan Fern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neville Mehta.

Additional information

Editors: Daniel L. Silver, Kristin Bennett, Richard Caruana.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehta, N., Natarajan, S., Tadepalli, P. et al. Transfer in variable-reward hierarchical reinforcement learning. Mach Learn 73, 289–312 (2008). https://doi.org/10.1007/s10994-008-5061-y

Download citation

Received: 22 February 2007
Revised: 21 April 2008
Accepted: 08 May 2008
Published: 03 June 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10994-008-5061-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Transfer in variable-reward hierarchical reinforcement learning

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Relabeling and policy distillation of hierarchical reinforcement learning

Options in Multi-task Reinforcement Learning - Transfer via Reflection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transfer in variable-reward hierarchical reinforcement learning

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Relabeling and policy distillation of hierarchical reinforcement learning

Options in Multi-task Reinforcement Learning - Transfer via Reflection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation