Skip to main content
Top
Published in: Autonomous Agents and Multi-Agent Systems 4/2014

01-07-2014

Learning potential functions and their representations for multi-task reinforcement learning

Authors: Matthijs Snel, Shimon Whiteson

Published in: Autonomous Agents and Multi-Agent Systems | Issue 4/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In multi-task learning, there are roughly two approaches to discovering representations. The first is to discover task relevant representations, i.e., those that compactly represent solutions to particular tasks. The second is to discover domain relevant representations, i.e., those that compactly represent knowledge that remains invariant across many tasks. In this article, we propose a new approach to multi-task learning that captures domain-relevant knowledge by learning potential-based shaping functions, which augment a task’s reward function with artificial rewards. We address two key issues that arise when deriving potential functions. The first is what kind of target function the potential function should approximate; we propose three such targets and show empirically that which one is best depends critically on the domain and learning parameters. The second issue is the representation for the potential function. This article introduces the notion of \(k\)-relevance, the expected relevance of a representation on a sample sequence of \(k\) tasks, and argues that this is a unifying definition of relevance of which both task and domain relevance are special cases. We prove formally that, under certain assumptions, \(k\)-relevance converges monotonically to a fixed point as \(k\) increases, and use this property to derive Feature Selection Through Extrapolation of k-relevance (FS-TEK), a novel feature-selection algorithm. We demonstrate empirically the benefit of FS-TEK on artificial domains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The authors termed these potential-based advice; specifically, look-ahead advice for the formula introduced here. We use the term “shaping” for both methods, and let function arguments resolve any ambiguity.
 
2
Relevance is not a measure in the strict mathematical sense; because of dependence between feature sets, \(\rho (\mathsf{F } \cup \mathsf{G }) \ne \rho (\mathsf{F }) + \rho (\mathsf{G })\) for some disjoint feature sets \(\mathsf{F }\) and \(\mathsf{G }\) and relevance \(\rho \).
 
3
We employ a standard real-valued GA with population size 100, no crossover and mutation with \(p=0.5\); mutation adds a random value \(\delta \in [-0.05, 0.05]\). Policies are constructed by a softmax distribution over the chromosome values.
 
4
Note that the addition of this sensor is not the same as the manual separation of state features for the value and potential function as done in [34, 63]—see related work (Sect. 6). In the experiments reported in this section, both functions use the exact same set of features.
 
5
In the policy improvement step, the policy is made only \(\varepsilon \)-greedy w.r.t. the value function.
 
Literature
1.
go back to reference Albus, J. S. (1971). A theory of cerebellar function. Mathematical Biosciences, 10, 25–61.CrossRef Albus, J. S. (1971). A theory of cerebellar function. Mathematical Biosciences, 10, 25–61.CrossRef
2.
go back to reference Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.CrossRef Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.CrossRef
3.
go back to reference Asmuth, J., Littman, M., & Zinkov, R. (2008). Potential-based shaping in model-based reinforcement learning. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence ( pp. 604–609). Cambridge: The AAAI Press. Asmuth, J., Littman, M., & Zinkov, R. (2008). Potential-based shaping in model-based reinforcement learning. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence ( pp. 604–609). Cambridge: The AAAI Press.
4.
go back to reference Babes, M., de Cote, E.M., & Littman, M. L. (2008). Social reward shaping in the prisoner’s dilemma. In 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008) (pp. 1389–1392). Babes, M., de Cote, E.M., & Littman, M. L. (2008). Social reward shaping in the prisoner’s dilemma. In 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008) (pp. 1389–1392).
5.
go back to reference Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research (JAIR), 12, 149–198.MATHMathSciNet Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research (JAIR), 12, 149–198.MATHMathSciNet
6.
go back to reference Bertsekas, D. P. (1995). Dynamic programming and optimal control. Belmont: Athena.MATH Bertsekas, D. P. (1995). Dynamic programming and optimal control. Belmont: Athena.MATH
7.
go back to reference Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1–2), 49–107.CrossRefMATHMathSciNet Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1–2), 49–107.CrossRefMATHMathSciNet
9.
go back to reference Caruana, R. (2005). Inductive transfer retrospective and review. In NIPS 2005 Workshop on Inductive Transfer: 10 Years Later. Caruana, R. (2005). Inductive transfer retrospective and review. In NIPS 2005 Workshop on Inductive Transfer: 10 Years Later.
10.
go back to reference Devlin, S., Grzes, M., & Kudenko, D. (2011). Multi-agent, reward shaping for robocup keepaway. In AAMAS (pp. 1227–1228). Devlin, S., Grzes, M., & Kudenko, D. (2011). Multi-agent, reward shaping for robocup keepaway. In AAMAS (pp. 1227–1228).
11.
go back to reference Devlin, S., & Kudenko, D. (2011). Theoretical considerations of potential-based reward shaping for multi-agent systems. In AAMAS, AAMAS ’11 (pp. 225–232). Devlin, S., & Kudenko, D. (2011). Theoretical considerations of potential-based reward shaping for multi-agent systems. In AAMAS, AAMAS ’11 (pp. 225–232).
12.
go back to reference Devlin, S., & Kudenko, D. (2012). Dynamic potential-based reward shaping. In AAMAS (pp. 433–440). Devlin, S., & Kudenko, D. (2012). Dynamic potential-based reward shaping. In AAMAS (pp. 433–440).
13.
go back to reference Diuk, C., Li, L., & Leffler, B. R. (2009). The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In ICML (p. 32). Diuk, C., Li, L., & Leffler, B. R. (2009). The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In ICML (p. 32).
14.
go back to reference Dorigo, M., & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71(2), 321–370.CrossRef Dorigo, M., & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71(2), 321–370.CrossRef
15.
go back to reference Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. (2008). Co-evolution of shaping: Rewards and meta-parameters in reinforcement learning. Adaptive Behavior, 16(6), 400–412.CrossRef Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. (2008). Co-evolution of shaping: Rewards and meta-parameters in reinforcement learning. Adaptive Behavior, 16(6), 400–412.CrossRef
16.
go back to reference Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2011). Darwinian embodied evolution of the learning ability for survival. Adaptive Behavior, 19(2), 101–120.CrossRef Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2011). Darwinian embodied evolution of the learning ability for survival. Adaptive Behavior, 19(2), 101–120.CrossRef
17.
go back to reference Erez, T., & Smart, W. (2008) What does shaping mean for computational reinforcement learning? In 7th IEEE International Conference on Development and Learning, 2008. ICDL 2008 (pp. 215–219). Erez, T., & Smart, W. (2008) What does shaping mean for computational reinforcement learning? In 7th IEEE International Conference on Development and Learning, 2008. ICDL 2008 (pp. 215–219).
18.
go back to reference Ferguson, K., & Mahadevan, S. (2006). Proto-transfer learning in markov decision processes using spectral methods. In ICML Workshop on Structural Knowledge Transfer for Machine Learning. Ferguson, K., & Mahadevan, S. (2006). Proto-transfer learning in markov decision processes using spectral methods. In ICML Workshop on Structural Knowledge Transfer for Machine Learning.
19.
go back to reference Ferrante, E., Lazaric, A., & Restelli, M. (2008). Transfer of task representation in reinforcement learning using policy-based proto-value functions. In AAMAS (pp. 1329–1332). Ferrante, E., Lazaric, A., & Restelli, M. (2008). Transfer of task representation in reinforcement learning using policy-based proto-value functions. In AAMAS (pp. 1329–1332).
20.
go back to reference Foster, D. J., & Dayan, P. (2002). Structure in the space of value functions. Machine Learning, 49(2–3), 325–346.CrossRefMATH Foster, D. J., & Dayan, P. (2002). Structure in the space of value functions. Machine Learning, 49(2–3), 325–346.CrossRefMATH
21.
go back to reference Frommberger, L. (2011). Task space tile coding: In-task and cross-task generalization in reinforcement learning. In Proceedings of the 9th European Workshop on Reinforcement, Learning (EWRL9). Frommberger, L. (2011). Task space tile coding: In-task and cross-task generalization in reinforcement learning. In Proceedings of the 9th European Workshop on Reinforcement, Learning (EWRL9).
22.
go back to reference Frommberger, L., & Wolter, D. (2010). Structural knowledge transfer by spatial abstraction for reinforcement learning agents. Adaptive Behavior, 18(6), 507–525.CrossRef Frommberger, L., & Wolter, D. (2010). Structural knowledge transfer by spatial abstraction for reinforcement learning agents. Adaptive Behavior, 18(6), 507–525.CrossRef
23.
go back to reference Geramifard, A., Doshi, F., Redding, J., Roy, N., & How, J. P. (2011). Online discovery of feature dependencies. In ICML (pp. 881–888). Geramifard, A., Doshi, F., Redding, J., Roy, N., & How, J. P. (2011). Online discovery of feature dependencies. In ICML (pp. 881–888).
24.
go back to reference Grześ, M., & Kudenko, D. (2009). Learning shaping rewards in model-based reinforcement learning. In Proceedings of AAMAS 2009 Workshop on Adaptive Learning Agents. Grześ, M., & Kudenko, D. (2009). Learning shaping rewards in model-based reinforcement learning. In Proceedings of AAMAS 2009 Workshop on Adaptive Learning Agents.
25.
go back to reference Grzes, M., & Kudenko, D. (2009). Theoretical and empirical analysis of reward shaping in reinforcement learning. In ICMLA (pp. 337–344). Grzes, M., & Kudenko, D. (2009). Theoretical and empirical analysis of reward shaping in reinforcement learning. In ICMLA (pp. 337–344).
26.
go back to reference Grześ, M., & Kudenko, D. (2010). Online learning of shaping rewards in reinforcement learning. Neural Networks, 23(4), 541–550.CrossRef Grześ, M., & Kudenko, D. (2010). Online learning of shaping rewards in reinforcement learning. Neural Networks, 23(4), 541–550.CrossRef
27.
go back to reference Gullapalli, V., & Barto, A.G. (1992). Shaping as a method for accelerating reinforcement learning. In Proceedings of IEEE International Symposium on Intelligent, Control (pp. 554–559). Gullapalli, V., & Barto, A.G. (1992). Shaping as a method for accelerating reinforcement learning. In Proceedings of IEEE International Symposium on Intelligent, Control (pp. 554–559).
28.
go back to reference Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.MATH Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.MATH
29.
go back to reference Hachiya, H., & Sugiyama, M. (2010). Feature selection for reinforcement learning: Evaluating implicit state-reward dependency via conditional mutual information. In ECML/PKDD (pp. 474–489). Hachiya, H., & Sugiyama, M. (2010). Feature selection for reinforcement learning: Evaluating implicit state-reward dependency via conditional mutual information. In ECML/PKDD (pp. 474–489).
30.
go back to reference Jong, N. K., & Stone, P. (2005). State abstraction discovery from irrelevant state variables. In IJCAI-05. Jong, N. K., & Stone, P. (2005). State abstraction discovery from irrelevant state variables. In IJCAI-05.
31.
go back to reference Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Ph.D. Thesis, University College London, London. Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Ph.D. Thesis, University College London, London.
32.
go back to reference Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In ICML (pp. 284–292). Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In ICML (pp. 284–292).
33.
go back to reference Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In ICML (p. 66). Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In ICML (p. 66).
34.
go back to reference Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of 23rd International Conference on Machine Learning (pp. 489–496). Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of 23rd International Conference on Machine Learning (pp. 489–496).
35.
go back to reference Konidaris, G., Scheidwasser, I., & Barto, A. G. (2012). Transfer in reinforcement learning via shared features. Journal of Machine Learning Research, 13, 1333–1371.MathSciNet Konidaris, G., Scheidwasser, I., & Barto, A. G. (2012). Transfer in reinforcement learning via shared features. Journal of Machine Learning Research, 13, 1333–1371.MathSciNet
36.
go back to reference Koren, Y., & Borenstein, J. (1991). Potential field methods and their inherent limitations for mobile robot navigation. In Proceedings of IEEE Conference on Robotics and Automation (pp. 1398–1404). Koren, Y., & Borenstein, J. (1991). Potential field methods and their inherent limitations for mobile robot navigation. In Proceedings of IEEE Conference on Robotics and Automation (pp. 1398–1404).
37.
go back to reference Kroon, M., & Whiteson, S. (2009). Automatic feature selection for model-based reinforcement learning in factored MDPs. In ICMLA 2009: Proceedings of the Eighth International Conference on Machine Learning and Applications (pp. 324–330). Kroon, M., & Whiteson, S. (2009). Automatic feature selection for model-based reinforcement learning in factored MDPs. In ICMLA 2009: Proceedings of the Eighth International Conference on Machine Learning and Applications (pp. 324–330).
38.
go back to reference Laud, A., & DeJong, G. (2002). Reinforcement learning and shaping: Encouraging intended behaviors. In Proceedings of 19th International Conference on Machine Learning (pp. 355–362). Laud, A., & DeJong, G. (2002). Reinforcement learning and shaping: Encouraging intended behaviors. In Proceedings of 19th International Conference on Machine Learning (pp. 355–362).
39.
go back to reference Laud, A., & DeJong, G. (2003). The influence of reward on the speed of reinforcement learning: An analysis of shaping. In ICML (pp. 440–447). Laud, A., & DeJong, G. (2003). The influence of reward on the speed of reinforcement learning: An analysis of shaping. In ICML (pp. 440–447).
40.
go back to reference Lazaric, A. (2008). Knowledge transfer in reinforcement learning. Ph.D. Thesis, Politecnico di Milano, Milan. Lazaric, A. (2008). Knowledge transfer in reinforcement learning. Ph.D. Thesis, Politecnico di Milano, Milan.
41.
go back to reference Lazaric, A., & Ghavamzadeh, M. (2010). Bayesian multi-task reinforcement learning. In ICML (pp. 599–606). Lazaric, A., & Ghavamzadeh, M. (2010). Bayesian multi-task reinforcement learning. In ICML (pp. 599–606).
42.
go back to reference Lazaric, A., Restelli, M., & Bonarini, A. (2008). Transfer of samples in batch reinforcement learning. In ICML (pp. 544–551). Lazaric, A., Restelli, M., & Bonarini, A. (2008). Transfer of samples in batch reinforcement learning. In ICML (pp. 544–551).
43.
go back to reference Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a unified theory of state abstraction for mdps. In Aritificial Intelligence and Mathematics. Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a unified theory of state abstraction for mdps. In Aritificial Intelligence and Mathematics.
44.
go back to reference Lu, X., Schwartz, H. M., & Givigi, S. N. (2011). Policy invariance under reward transformations for general-sum stochastic games. Journal of Artificial Intelligence Research (JAIR), 41, 397–406.MATHMathSciNet Lu, X., Schwartz, H. M., & Givigi, S. N. (2011). Policy invariance under reward transformations for general-sum stochastic games. Journal of Artificial Intelligence Research (JAIR), 41, 397–406.MATHMathSciNet
45.
go back to reference Maclin, R., & Shavlik, J. W. (1996). Creating advice-taking reinforcement learners. Machine Learning, 22(1–3), 251–281. Maclin, R., & Shavlik, J. W. (1996). Creating advice-taking reinforcement learners. Machine Learning, 22(1–3), 251–281.
46.
go back to reference Mahadevan, S. (2010). Representation discovery in sequential decision making. In AAAI. Mahadevan, S. (2010). Representation discovery in sequential decision making. In AAAI.
47.
go back to reference Manoonpong, P., Wörgötter, F., & Morimoto, J. (2010). Extraction of reward-related feature space using correlation-based and reward-based learning methods. In ICONIP (Vol. 1, pp. 414–421). Manoonpong, P., Wörgötter, F., & Morimoto, J. (2010). Extraction of reward-related feature space using correlation-based and reward-based learning methods. In ICONIP (Vol. 1, pp. 414–421).
48.
go back to reference Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal of Applied Mathematics, 11, 431–441.CrossRefMATHMathSciNet Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal of Applied Mathematics, 11, 431–441.CrossRefMATHMathSciNet
49.
go back to reference Marthi, B. (2007). Automatic shaping and decomposition of reward functions. In Proceedings of 24th International Conference on Machine Learning (pp. 601–608). Marthi, B. (2007). Automatic shaping and decomposition of reward functions. In Proceedings of 24th International Conference on Machine Learning (pp. 601–608).
50.
go back to reference Matarić, M. J. (1994). Reward functions for accelerated learning. In Proceedings of 11th International Conference on Machine Learning. Matarić, M. J. (1994). Reward functions for accelerated learning. In Proceedings of 11th International Conference on Machine Learning.
51.
go back to reference Mehta, N., Natarajan, S., Tadepalli, P., & Fern, A. (2008). Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 73(3), 289–312.CrossRef Mehta, N., Natarajan, S., Tadepalli, P., & Fern, A. (2008). Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 73(3), 289–312.CrossRef
52.
go back to reference Midtgaard, M., Vinther, L., Christiansen, J. R., Christensen, A. M., & Zeng, Y. (2010). Time-based reward shaping in real-time strategy games. In Proceedings of the 6th International Conference on Agents and Data Mining Interaction, ADMI’10 (pp. 115–125). Berlin, Heidelberg: Springer-Verlag. Midtgaard, M., Vinther, L., Christiansen, J. R., Christensen, A. M., & Zeng, Y. (2010). Time-based reward shaping in real-time strategy games. In Proceedings of the 6th International Conference on Agents and Data Mining Interaction, ADMI’10 (pp. 115–125). Berlin, Heidelberg: Springer-Verlag.
53.
go back to reference Ng, A., Harada, D., & Russell, S.(1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of 16th International Conference on Machine Learning. Ng, A., Harada, D., & Russell, S.(1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of 16th International Conference on Machine Learning.
54.
go back to reference Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In ICML (pp. 752–759). Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In ICML (pp. 752–759).
55.
go back to reference Petrik, M., Taylor, G., Parr, R., & Zilberstein, S. (2010). Feature selection using regularization in approximate linear programs for markov decision processes. InICML (pp. 871–878). Petrik, M., Taylor, G., Parr, R., & Zilberstein, S. (2010). Feature selection using regularization in approximate linear programs for markov decision processes. InICML (pp. 871–878).
56.
go back to reference Proper, S., & Tumer, K. (2012). Modeling difference rewards for multiagent learning (extended abstract). In AAMAS, Valencia, Spain. Proper, S., & Tumer, K. (2012). Modeling difference rewards for multiagent learning (extended abstract). In AAMAS, Valencia, Spain.
57.
go back to reference Randløv, J., & Alstrøm, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of 15th International Conference on Machine Learning. Randløv, J., & Alstrøm, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of 15th International Conference on Machine Learning.
58.
go back to reference Rummery, G., & Niranjan, M. (1994). On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, Cambridge. Rummery, G., & Niranjan, M. (1994). On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, Cambridge.
59.
go back to reference Saksida, L. M., Raymond, S. M., & Touretzky, D. S. (1997). Shaping robot behavior using principles from instrumental conditioning. Robotics and Autonomous Systems, 22(3–4), 231–249.CrossRef Saksida, L. M., Raymond, S. M., & Touretzky, D. S. (1997). Shaping robot behavior using principles from instrumental conditioning. Robotics and Autonomous Systems, 22(3–4), 231–249.CrossRef
60.
go back to reference van Seijen, H., Whiteson, S., & Kester, L. (2010). Switching between representations in reinforcement learning. In Interactive Collaborative, Information Systems (pp. 65–84). van Seijen, H., Whiteson, S., & Kester, L. (2010). Switching between representations in reinforcement learning. In Interactive Collaborative, Information Systems (pp. 65–84).
61.
go back to reference Selfridge, O., Sutton, R. S., & Barto, A. G. (1985). Training and tracking in robotics. In Proceedings of Ninth International Joint Conference on Artificial Intelligence. Selfridge, O., Sutton, R. S., & Barto, A. G. (1985). Training and tracking in robotics. In Proceedings of Ninth International Joint Conference on Artificial Intelligence.
62.
go back to reference Sherstov, A. A., & Stone, P. (2005). Improving action selection in MDP’s via knowledge transfer. InProceedings of the Twentieth National Conference on Artificial Intelligence. Sherstov, A. A., & Stone, P. (2005). Improving action selection in MDP’s via knowledge transfer. InProceedings of the Twentieth National Conference on Artificial Intelligence.
63.
go back to reference Singh, S., Lewis, R., & Barto, A. (2009). Where do rewards come from? In Proceedings of 31st Annual Conference of the Cognitive Science Society (pp. 2601–2606). Singh, S., Lewis, R., & Barto, A. (2009). Where do rewards come from? In Proceedings of 31st Annual Conference of the Cognitive Science Society (pp. 2601–2606).
64.
go back to reference Singh, S., & Sutton, R. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1), 123–158.MATH Singh, S., & Sutton, R. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1), 123–158.MATH
65.
go back to reference Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8(3), 323–339.MATH Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8(3), 323–339.MATH
66.
go back to reference Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially observable markovian decision processes. In ICML (pp. 284–292). Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially observable markovian decision processes. In ICML (pp. 284–292).
67.
go back to reference Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century-Crofts. Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century-Crofts.
68.
go back to reference Snel, M., & Whiteson, S. (2010). Multi-task evolutionary shaping without pre-specified representations. In Genetic and Evolutionary Computation Conference (GECCO’10). Snel, M., & Whiteson, S. (2010). Multi-task evolutionary shaping without pre-specified representations. In Genetic and Evolutionary Computation Conference (GECCO’10).
69.
go back to reference Snel, M., & Whiteson, S. (2011). Multi-task reinforcement learning: Shaping and feature selection. In Proceedings of the European Workshop on Reinforcement Learning (EWRL). Snel, M., & Whiteson, S. (2011). Multi-task reinforcement learning: Shaping and feature selection. In Proceedings of the European Workshop on Reinforcement Learning (EWRL).
70.
go back to reference Sorg, J., & Singh, S. (2009). Transfer via soft homomorphisms. In Proceedings of 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009) (pp. 741–748). Sorg, J., & Singh, S. (2009). Transfer via soft homomorphisms. In Proceedings of 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009) (pp. 741–748).
71.
go back to reference Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state mdps. In AAAI (pp. 645–650). Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state mdps. In AAAI (pp. 645–650).
72.
go back to reference Sutton, R. (1983). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44. Sutton, R. (1983). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44.
73.
go back to reference Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: The MIT Press. Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: The MIT Press.
74.
go back to reference Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of mdps. In Proceedings of 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA 2003) (pp. 1108–113). Tanaka, F., & Yamamura, M. (2003). Multitask reinforcement learning on the distribution of mdps. In Proceedings of 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA 2003) (pp. 1108–113).
75.
go back to reference Taylor, J., Precup, D., & Panagaden, P. (2009). Bounding performance loss in approximate mdp homomorphisms. In Koller D., Schuurmans D., Bengio Y., & Bottou L. (Eds.), Advances in Neural Information Processing Systems (Vol. 21, pp. 1649–1656). Taylor, J., Precup, D., & Panagaden, P. (2009). Bounding performance loss in approximate mdp homomorphisms. In Koller D., Schuurmans D., Bengio Y., & Bottou L. (Eds.), Advances in Neural Information Processing Systems (Vol. 21, pp. 1649–1656).
76.
go back to reference Taylor, M., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(1), 1633–1685.MATHMathSciNet Taylor, M., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(1), 1633–1685.MATHMathSciNet
77.
go back to reference Taylor, M., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8(1), 2125–2167.MATHMathSciNet Taylor, M., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8(1), 2125–2167.MATHMathSciNet
78.
go back to reference Taylor, M. E., Whiteson, S., & Stone, P. (2007). Transfer via inter-task mappings in policy search reinforcement learning. In AAMAS (p. 37). Taylor, M. E., Whiteson, S., & Stone, P. (2007). Transfer via inter-task mappings in policy search reinforcement learning. In AAMAS (p. 37).
79.
go back to reference Thrun, S. (1995). Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing (pp. 640–646). Thrun, S. (1995). Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing (pp. 640–646).
80.
go back to reference Torrey, L., Shavlik, J. W., Walker, T., & Maclin, R. (2010). Transfer learning via advice taking. In Advances in Machine Learning I (pp. 147–170). New York: Springer. Torrey, L., Shavlik, J. W., Walker, T., & Maclin, R. (2010). Transfer learning via advice taking. In Advances in Machine Learning I (pp. 147–170). New York: Springer.
81.
go back to reference Torrey, L., Walker, T., Shavlik, J. W., & Maclin, R.: Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML 2005) (pp. 412–424). Torrey, L., Walker, T., Shavlik, J. W., & Maclin, R.: Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML 2005) (pp. 412–424).
82.
go back to reference Vlassis, N., Littman, M. L., & Barber, D. (2011). On the computational complexity of stochastic controller optimization in pomdps. CoRR abs/1107.3090. Vlassis, N., Littman, M. L., & Barber, D. (2011). On the computational complexity of stochastic controller optimization in pomdps. CoRR abs/1107.3090.
83.
go back to reference Walsh, T. J., Li, L., & Littman, M. L. (2006). Transferring state abstractions between mdps. In ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning. Walsh, T. J., Li, L., & Littman, M. L. (2006). Transferring state abstractions between mdps. In ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning.
84.
go back to reference Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.MATH Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.MATH
85.
go back to reference Whitehead, S. D. (1991). A complexity analysis of cooperative mechanisms in reinforcement learning. In Proceedings AAAI-91 (pp. 607–613). Whitehead, S. D. (1991). A complexity analysis of cooperative mechanisms in reinforcement learning. In Proceedings AAAI-91 (pp. 607–613).
86.
go back to reference Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfitting in empirical reinforcement learning. In ADPRL 2011: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement, Learning (pp. 120–127). Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfitting in empirical reinforcement learning. In ADPRL 2011: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement, Learning (pp. 120–127).
87.
go back to reference Wiewiora, E. (2003). Potential-based shaping and q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19, 205–208.MATHMathSciNet Wiewiora, E. (2003). Potential-based shaping and q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19, 205–208.MATHMathSciNet
88.
go back to reference Wiewiora, E., Cottrell, G., & Elkan, C.(2003). Principled methods for advising reinforcement learning agents. InProceedings of 20th International Conference on Machine Learning (pp. 792–799). Wiewiora, E., Cottrell, G., & Elkan, C.(2003). Principled methods for advising reinforcement learning agents. InProceedings of 20th International Conference on Machine Learning (pp. 792–799).
89.
go back to reference Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical Bayesian approach. In ICML (pp. 1015–1022). Wilson, A., Fern, A., Ray, S., & Tadepalli, P. (2007). Multi-task reinforcement learning: A hierarchical Bayesian approach. In ICML (pp. 1015–1022).
Metadata
Title
Learning potential functions and their representations for multi-task reinforcement learning
Authors
Matthijs Snel
Shimon Whiteson
Publication date
01-07-2014
Publisher
Springer US
Published in
Autonomous Agents and Multi-Agent Systems / Issue 4/2014
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-013-9235-z

Other articles of this Issue 4/2014

Autonomous Agents and Multi-Agent Systems 4/2014 Go to the issue

Premium Partner