Skip to main content
Top
Published in: Autonomous Agents and Multi-Agent Systems 1/2020

01-04-2020

Agents teaching agents: a survey on inter-agent transfer learning

Authors: Felipe Leno Da Silva, Garrett Warnell, Anna Helena Reali Costa, Peter Stone

Published in: Autonomous Agents and Multi-Agent Systems | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

While recent work in reinforcement learning (RL) has led to agents capable of solving increasingly complex tasks, the issue of high sample complexity is still a major concern. This issue has motivated the development of additional techniques that augment RL methods in an attempt to increase task learning speed. In particular, inter-agent teaching—endowing agents with the ability to respond to instructions from others—has been responsible for many of these developments. RL agents that can leverage instruction from a more competent teacher have been shown to be able to learn tasks significantly faster than agents that cannot take advantage of such instruction. That said, the inter-agent teaching paradigm presents many new challenges due to, among other factors, differences between the agents involved in the teaching interaction. As a result, many inter-agent teaching methods work only in restricted settings and have proven difficult to generalize to new domains or scenarios. In this article, we propose two frameworks that provide a comprehensive view of the challenges associated with inter-agent teaching. We highlight state-of-the-art solutions, open problems, prospective applications, and argue that new research in this area should be developed in the context of the proposed frameworks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Notice that the initial policy definition might be implicit (e.g., assuming the agent starts with a random policy), effectively enabling the inter-agent interaction to start immediately at the beginning of the learning process.
 
Literature
1.
go back to reference Amir, O., Kamar, E., Kolobov, A., & Grosz, B. (2016). Interactive teaching strategies for agent training. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 804–811). Amir, O., Kamar, E., Kolobov, A., & Grosz, B. (2016). Interactive teaching strategies for agent training. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 804–811).
2.
go back to reference Arakawa, R., Kobayashi, S., Unno, Y., Tsuboi, Y., & Maeda, S.I. (2018). DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748. Arakawa, R., Kobayashi, S., Unno, Y., Tsuboi, Y., & Maeda, S.I. (2018). DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:​1810.​11748.
4.
go back to reference Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the 29th AAAI conference on artificial intelligence (AAAI) (pp. 2010–2016). Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the 29th AAAI conference on artificial intelligence (AAAI) (pp. 2010–2016).
6.
go back to reference Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47, 253–279.CrossRef Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47, 253–279.CrossRef
8.
go back to reference Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning. Techical report, Computer Science Department, Carnegie Mellon University. Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning. Techical report, Computer Science Department, Carnegie Mellon University.
11.
go back to reference Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research (JAIR), 34(1), 1–25.MathSciNetMATH Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research (JAIR), 34(1), 1–25.MathSciNetMATH
12.
go back to reference Clouse, J. A. (1996). Learning from an automated training agent. In G. Weiß & S. Sen (Eds.), Adaptation and learning in multiagent systems. Berlin: Springer. Clouse, J. A. (1996). Learning from an automated training agent. In G. Weiß & S. Sen (Eds.), Adaptation and learning in multiagent systems. Berlin: Springer.
13.
go back to reference Cui, Y., Niekum, S. (2018). Active reward learning from critiques. In IEEE international conference on robotics and automation (ICRA) (pp. 6907–6914). Cui, Y., Niekum, S. (2018). Active reward learning from critiques. In IEEE international conference on robotics and automation (ICRA) (pp. 6907–6914).
14.
go back to reference Devlin, S. (2013). Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning. Ph.D. thesis, University of York. Devlin, S. (2013). Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning. Ph.D. thesis, University of York.
17.
18.
go back to reference Fernandez, R., John, N., Kirmani, S., Hart, J., Sinapov, J., & Stone, P. (2018). Passive demonstrations of light-based robot signals for improved human interpretability. In IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). Fernandez, R., John, N., Kirmani, S., Hart, J., Sinapov, J., & Stone, P. (2018). Passive demonstrations of light-based robot signals for improved human interpretability. In IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).
19.
go back to reference Foerster, J.N., Assael, Y.M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Conference on neural information processing systems (NIPS). Foerster, J.N., Assael, Y.M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Conference on neural information processing systems (NIPS).
20.
go back to reference Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., et al. (2019). Guidelines for Reinforcement Learning in Healthcare. Nature Medicine, 25, 16–18.CrossRef Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., et al. (2019). Guidelines for Reinforcement Learning in Healthcare. Nature Medicine, 25, 16–18.CrossRef
21.
go back to reference Gottesman, O., Johansson, F.D., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., Yao, J., Lage, I., Mosch, C., Lehman, L.H., Komorowski, M., Faisal, A., Celi, L.A., Sontag, D., & Doshi-Velez, F. (2018). Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:1805.12298. Gottesman, O., Johansson, F.D., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., Yao, J., Lage, I., Mosch, C., Lehman, L.H., Komorowski, M., Faisal, A., Celi, L.A., Sontag, D., & Doshi-Velez, F. (2018). Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:​1805.​12298.
22.
go back to reference Gupta, A., Devin, C., Liu, Y., Abbeel, P., & Levine, S. (2017). Learning invariant feature spaces to transfer skills with reinforcement learning. In Proceedings of the 5th international conference on learning representations (ICLR). Gupta, A., Devin, C., Liu, Y., Abbeel, P., & Levine, S. (2017). Learning invariant feature spaces to transfer skills with reinforcement learning. In Proceedings of the 5th international conference on learning representations (ICLR).
23.
go back to reference Hausknecht, M., & Stone, P. (2016). Grounded semantic networks for learning shared communication protocols. In NIPS workshop on deep reinforcement learning. Hausknecht, M., & Stone, P. (2016). Grounded semantic networks for learning shared communication protocols. In NIPS workshop on deep reinforcement learning.
24.
go back to reference Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24(6), 1463–1467.CrossRef Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24(6), 1463–1467.CrossRef
25.
go back to reference Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 598. Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 598.
26.
go back to reference Hu, Y., Gao, Y., & An, B. (2015). Multiagent Reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4), 647–662.CrossRef Hu, Y., Gao, Y., & An, B. (2015). Multiagent Reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4), 647–662.CrossRef
27.
go back to reference Jonsson, A. (2019). Deep reinforcement learning in medicine. Kidney Diseases, 5(1), 3–7.CrossRef Jonsson, A. (2019). Deep reinforcement learning in medicine. Kidney Diseases, 5(1), 3–7.CrossRef
28.
go back to reference Judah, K., Fern, A.P., Dietterich, T.G., Tadepalii, P.: Active imitation learning: Formal and practical reductions to I.I.D. Learning. Journal of Machine Learning Research (JMLR)15(1), 3925–3963 (2014) Judah, K., Fern, A.P., Dietterich, T.G., Tadepalii, P.: Active imitation learning: Formal and practical reductions to I.I.D. Learning. Journal of Machine Learning Research (JMLR)15(1), 3925–3963 (2014)
29.
go back to reference Knox, W.B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the 5th international conference on knowledge capture (pp. 9–16). Knox, W.B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the 5th international conference on knowledge capture (pp. 9–16).
32.
go back to reference Kuhlmann, G., Stone, P., Mooney, R., & Shavlik, J. (2004). Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In AAAI workshop on supervisory control of learning and adaptive systems. Kuhlmann, G., Stone, P., Mooney, R., & Shavlik, J. (2004). Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In AAAI workshop on supervisory control of learning and adaptive systems.
33.
go back to reference Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 535–542). Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 535–542).
34.
go back to reference Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey (pp. 143–173). Heidelberg: Springer.CrossRef Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey (pp. 143–173). Heidelberg: Springer.CrossRef
35.
go back to reference Li, G., Hung, H., Whiteson, S., & Knox, W.B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 909–916). Li, G., Hung, H., Whiteson, S., & Knox, W.B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 909–916).
37.
go back to reference MacGlashan, J., Ho, M.K., Loftin, R., Peng, B., Wang, G., Roberts, D.L., Taylor, M.E., & Littman, M.L. (2017). Interactive learning from policy-dependent human feedback. In Proceedings of the 34th international conference on machine learning (ICML) (pp. 2285–2294). MacGlashan, J., Ho, M.K., Loftin, R., Peng, B., Wang, G., Roberts, D.L., Taylor, M.E., & Littman, M.L. (2017). Interactive learning from policy-dependent human feedback. In Proceedings of the 34th international conference on machine learning (ICML) (pp. 2285–2294).
38.
go back to reference Maclin, R., Shavlik, J., Torrey, L., Walker, T., & Wild, E. (2005). Giving advice about preferred actions to reinforcement learners via knowledge-based Kernel regression. In Proceedings of the 20th AAAI conference on artificial intelligence. Maclin, R., Shavlik, J., Torrey, L., Walker, T., & Wild, E. (2005). Giving advice about preferred actions to reinforcement learners via knowledge-based Kernel regression. In Proceedings of the 20th AAAI conference on artificial intelligence.
39.
go back to reference Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (ICML) (pp. 1928–1937). Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (ICML) (pp. 1928–1937).
40.
go back to reference Omidshafiei, S., Kim, D., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., & How, J.P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the 33rd AAAI conference on artificial intelligence (AAAI). Omidshafiei, S., Kim, D., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., & How, J.P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the 33rd AAAI conference on artificial intelligence (AAAI).
41.
go back to reference Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., & Taylor, M.E. (2016). A need for speed: Adapting agent action speed to improve task learning from non-expert humans. In Proceedings of the 15th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 957–965). Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., & Taylor, M.E. (2016). A need for speed: Adapting agent action speed to improve task learning from non-expert humans. In Proceedings of the 15th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 957–965).
42.
go back to reference Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken (N. J.): Wiley. Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken (N. J.): Wiley.
43.
go back to reference Reardon, C., Lee, K., & Fink, J. (2018). Come see this!. Augmented reality to enable human-robot cooperative search: In IEEE international symposium on safety, security, and rescue robotics. Reardon, C., Lee, K., & Fink, J. (2018). Come see this!. Augmented reality to enable human-robot cooperative search: In IEEE international symposium on safety, security, and rescue robotics.
44.
go back to reference Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., & Hebert, M. (2013). Learning monocular reactive UAV control in cluttered natural environments. In IEEE international conference on robotics and automation (ICRA). Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., & Hebert, M. (2013). Learning monocular reactive UAV control in cluttered natural environments. In IEEE international conference on robotics and automation (ICRA).
45.
go back to reference Santara, A., Naik, A., Ravindran, B., Das, D., Mudigere, D., Avancha, S., & Kaul, B. (2018). RAIL: Risk-averse imitation learning. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 2062–2063). Santara, A., Naik, A., Ravindran, B., Das, D., Mudigere, D., Avancha, S., & Kaul, B. (2018). RAIL: Risk-averse imitation learning. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 2062–2063).
46.
go back to reference Schaal, S. (1997). Learning from demonstration. In Advances in neural information processing systems (NIPS) (pp. 1040–1046). Schaal, S. (1997). Learning from demonstration. In Advances in neural information processing systems (NIPS) (pp. 1040–1046).
47.
go back to reference Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S., & Brain, G. (2018). Time-contrastive networks: Self-supervised learning from video. In IEEE international conference on robotics and automation (ICRA). Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S., & Brain, G. (2018). Time-contrastive networks: Self-supervised learning from video. In IEEE international conference on robotics and automation (ICRA).
48.
go back to reference Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin-Madison. Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin-Madison.
49.
go back to reference Silva, F. L. D., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research (JAIR), 69, 645–703.MathSciNetMATHCrossRef Silva, F. L. D., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research (JAIR), 69, 645–703.MathSciNetMATHCrossRef
50.
go back to reference Silva, F.L.D., Glatt, R., & Costa, A.H.R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1100–1108). Silva, F.L.D., Glatt, R., & Costa, A.H.R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1100–1108).
51.
go back to reference Silva, F.L.D., Taylor, M.E., & Costa, A.H.R. (2018). Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 5487–5493). Silva, F.L.D., Taylor, M.E., & Costa, A.H.R. (2018). Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 5487–5493).
52.
go back to reference Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th AAAI conference on artificial intelligence (AAAI) (pp. 1504–1509). Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th AAAI conference on artificial intelligence (AAAI) (pp. 1504–1509).
54.
go back to reference Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Conference on neural information processing systems (NIPS). Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Conference on neural information processing systems (NIPS).
55.
go back to reference Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (1st ed.). Cambridge, MA, USA: MIT Press.MATH Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (1st ed.). Cambridge, MA, USA: MIT Press.MATH
56.
go back to reference Sutton, R.S., McAllester, D.A., Singh, S.P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (pp. 1057–1063). Sutton, R.S., McAllester, D.A., Singh, S.P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (pp. 1057–1063).
57.
go back to reference Tafesse, Y. D., Wigness, M., & Twigg, J. (2018). Analysis techniques for displaying robot intent with LED patterns. US Army Research Laboratory: Technical report. Tafesse, Y. D., Wigness, M., & Twigg, J. (2018). Analysis techniques for displaying robot intent with LED patterns. US Army Research Laboratory: Technical report.
58.
go back to reference Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th international conference on machine learning (ICML) (pp. 330–337). Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th international conference on machine learning (ICML) (pp. 330–337).
62.
go back to reference Taylor, M. E., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research (JMLR), 8(1), 2125–2167.MathSciNetMATH Taylor, M. E., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research (JMLR), 8(1), 2125–2167.MathSciNetMATH
63.
go back to reference Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems. Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems.
64.
go back to reference Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 4950–4957). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 4950–4957).
65.
go back to reference Torrey, L., & Taylor, M.E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1053–1060). Torrey, L., & Taylor, M.E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1053–1060).
66.
go back to reference Warnell, G., Waytowich, N., Lawhern, V., & Stone, P. (2018). Deep TAMER: Interactive agent shaping in high-dimensional state spaces. In AAAI conference on artificial intelligence. Warnell, G., Waytowich, N., Lawhern, V., & Stone, P. (2018). Deep TAMER: Interactive agent shaping in high-dimensional state spaces. In AAAI conference on artificial intelligence.
67.
go back to reference Watkins, C. J., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 279–292.MATH Watkins, C. J., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 279–292.MATH
69.
go back to reference Zhan, Y., Bou-Ammar, H., & Taylor, M.E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 2315–2321). Zhan, Y., Bou-Ammar, H., & Taylor, M.E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 2315–2321).
70.
go back to reference Zimmer, M., Viappiani, P., & Weng, P. (2014). Teacher–student framework: A reinforcement learning approach. In Workshop on autonomous robots and multirobot systems at AAMAS. Zimmer, M., Viappiani, P., & Weng, P. (2014). Teacher–student framework: A reinforcement learning approach. In Workshop on autonomous robots and multirobot systems at AAMAS.
Metadata
Title
Agents teaching agents: a survey on inter-agent transfer learning
Authors
Felipe Leno Da Silva
Garrett Warnell
Anna Helena Reali Costa
Peter Stone
Publication date
01-04-2020
Publisher
Springer US
Published in
Autonomous Agents and Multi-Agent Systems / Issue 1/2020
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-019-09430-0

Other articles of this Issue 1/2020

Autonomous Agents and Multi-Agent Systems 1/2020 Go to the issue

Premium Partner