Top

Autonomous Agents and Multi-Agent Systems

Published in:

01-04-2020

Agents teaching agents: a survey on inter-agent transfer learning

Authors: Felipe Leno Da Silva, Garrett Warnell, Anna Helena Reali Costa, Peter Stone

Published in: Autonomous Agents and Multi-Agent Systems | Issue 1/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

While recent work in reinforcement learning (RL) has led to agents capable of solving increasingly complex tasks, the issue of high sample complexity is still a major concern. This issue has motivated the development of additional techniques that augment RL methods in an attempt to increase task learning speed. In particular, inter-agent teaching—endowing agents with the ability to respond to instructions from others—has been responsible for many of these developments. RL agents that can leverage instruction from a more competent teacher have been shown to be able to learn tasks significantly faster than agents that cannot take advantage of such instruction. That said, the inter-agent teaching paradigm presents many new challenges due to, among other factors, differences between the agents involved in the teaching interaction. As a result, many inter-agent teaching methods work only in restricted settings and have proven difficult to generalize to new domains or scenarios. In this article, we propose two frameworks that provide a comprehensive view of the challenges associated with inter-agent teaching. We highlight state-of-the-art solutions, open problems, prospective applications, and argue that new research in this area should be developed in the context of the proposed frameworks.

previous article The complexity of bribery and control in group identification

next article Multi-objective multi-agent decision making: a utility-based analysis and survey

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Notice that the initial policy definition might be implicit (e.g., assuming the agent starts with a random policy), effectively enabling the inter-agent interaction to start immediately at the beginning of the learning process.

Amir, O., Kamar, E., Kolobov, A., & Grosz, B. (2016). Interactive teaching strategies for agent training. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 804–811).

Arakawa, R., Kobayashi, S., Unno, Y., Tsuboi, Y., & Maeda, S.I. (2018). DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748.

Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483. https://doi.org/10.1016/j.robot.2008.10.024.CrossRef

Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the 29th AAAI conference on artificial intelligence (AAAI) (pp. 2010–2016).

Bazzan, A. L. C. (2014). Beyond reinforcement learning and local view in multiagent systems. Künstliche Intelligenz, 28(3), 179–189. https://doi.org/10.1007/s13218-014-0312-5.CrossRef

Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47, 253–279.CrossRef

Bianchi, R. A. C., Martins, M. F., Ribeiro, C. H. C., & Costa, A. H. R. (2014). Heuristically-accelerated multiagent reinforcement learning. IEEE Transactions on Cybernetics, 44(2), 252–265. https://doi.org/10.1109/TCYB.2013.2253094.CrossRef

Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning. Techical report, Computer Science Department, Carnegie Mellon University.

Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172. https://doi.org/10.1109/TSMCC.2007.913919.CrossRef

10.

Calandriello, D., Lazaric, A., & Restelli, M. (2014). Sparse Multi-Task Reinforcement Learning. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in neural information processing systems (NIPS) (pp. 819–827). Curran Associates, Inc. http://papers.nips.cc/paper/5247-sparse-multi-task-reinforcement-learning.pdf.

11.

Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research (JAIR), 34(1), 1–25.MathSciNetMATH

12.

Clouse, J. A. (1996). Learning from an automated training agent. In G. Weiß & S. Sen (Eds.), Adaptation and learning in multiagent systems. Berlin: Springer.

13.

Cui, Y., Niekum, S. (2018). Active reward learning from critiques. In IEEE international conference on robotics and automation (ICRA) (pp. 6907–6914).

14.

Devlin, S. (2013). Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning. Ph.D. thesis, University of York.

15.

Dusparic, I., Harris, C., Marinescu, A., Cahill, V., & Clarke, S. (2013). Multi-agent residential demand response based on load forecasting. In 1st IEEE conference on technologies for sustainability (SusTech) (pp. 90–96). https://doi.org/10.1109/SusTech.2013.6617303

16.

Fachantidis, A., Taylor, M. E., & Vlahavas, I. (2018). Learning to teach reinforcement learning agents. Machine Learning and Knowledge Extraction, 1(1), 21–42. https://doi.org/10.3390/make1010002.CrossRef

17.

Fernández, F., & Veloso, M. (2006). Probabilistic Policy Reuse in a Reinforcement Learning Agent. In Proceedings of the 5th international joint conference on autonomous agents and multiagent systems (AAMAS) (pp. 720–727). https://doi.org/10.1145/1160633.1160762

18.

Fernandez, R., John, N., Kirmani, S., Hart, J., Sinapov, J., & Stone, P. (2018). Passive demonstrations of light-based robot signals for improved human interpretability. In IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

19.

Foerster, J.N., Assael, Y.M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Conference on neural information processing systems (NIPS).

20.

Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., et al. (2019). Guidelines for Reinforcement Learning in Healthcare. Nature Medicine, 25, 16–18.CrossRef

21.

Gottesman, O., Johansson, F.D., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., Yao, J., Lage, I., Mosch, C., Lehman, L.H., Komorowski, M., Faisal, A., Celi, L.A., Sontag, D., & Doshi-Velez, F. (2018). Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:1805.12298.

22.

Gupta, A., Devin, C., Liu, Y., Abbeel, P., & Levine, S. (2017). Learning invariant feature spaces to transfer skills with reinforcement learning. In Proceedings of the 5th international conference on learning representations (ICLR).

23.

Hausknecht, M., & Stone, P. (2016). Grounded semantic networks for learning shared communication protocols. In NIPS workshop on deep reinforcement learning.

24.

Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24(6), 1463–1467.CrossRef

25.

Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 598.

26.

Hu, Y., Gao, Y., & An, B. (2015). Multiagent Reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4), 647–662.CrossRef

27.

Jonsson, A. (2019). Deep reinforcement learning in medicine. Kidney Diseases, 5(1), 3–7.CrossRef

28.

Judah, K., Fern, A.P., Dietterich, T.G., Tadepalii, P.: Active imitation learning: Formal and practical reductions to I.I.D. Learning. Journal of Machine Learning Research (JMLR)15(1), 3925–3963 (2014)

29.

Knox, W.B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the 5th international conference on knowledge capture (pp. 9–16).

30.

Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274. https://doi.org/10.1177/0278364913495721.CrossRef

31.

Kono, H., Kamimura, A., Tomita, K., Murata, Y., & Suzuki, T. (2014). Transfer learning method using ontology for heterogeneous multi-agent reinforcement learning. International Journal of Advanced Computer Science and Applications (IJACSA), 5(10), 156–164. https://doi.org/10.14569/IJACSA.2014.051022.CrossRef

32.

Kuhlmann, G., Stone, P., Mooney, R., & Shavlik, J. (2004). Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In AAAI workshop on supervisory control of learning and adaptive systems.

33.

Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 535–542).

34.

Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey (pp. 143–173). Heidelberg: Springer.CrossRef

35.

Li, G., Hung, H., Whiteson, S., & Knox, W.B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 909–916).

36.

Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553), 445–451. https://doi.org/10.1038/nature14540.CrossRef

37.

MacGlashan, J., Ho, M.K., Loftin, R., Peng, B., Wang, G., Roberts, D.L., Taylor, M.E., & Littman, M.L. (2017). Interactive learning from policy-dependent human feedback. In Proceedings of the 34th international conference on machine learning (ICML) (pp. 2285–2294).

38.

Maclin, R., Shavlik, J., Torrey, L., Walker, T., & Wild, E. (2005). Giving advice about preferred actions to reinforcement learners via knowledge-based Kernel regression. In Proceedings of the 20th AAAI conference on artificial intelligence.

39.

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (ICML) (pp. 1928–1937).

40.

Omidshafiei, S., Kim, D., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., & How, J.P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the 33rd AAAI conference on artificial intelligence (AAAI).

41.

Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., & Taylor, M.E. (2016). A need for speed: Adapting agent action speed to improve task learning from non-expert humans. In Proceedings of the 15th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 957–965).

42.

Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken (N. J.): Wiley.

43.

Reardon, C., Lee, K., & Fink, J. (2018). Come see this!. Augmented reality to enable human-robot cooperative search: In IEEE international symposium on safety, security, and rescue robotics.

44.

Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., & Hebert, M. (2013). Learning monocular reactive UAV control in cluttered natural environments. In IEEE international conference on robotics and automation (ICRA).

45.

Santara, A., Naik, A., Ravindran, B., Das, D., Mudigere, D., Avancha, S., & Kaul, B. (2018). RAIL: Risk-averse imitation learning. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 2062–2063).

46.

Schaal, S. (1997). Learning from demonstration. In Advances in neural information processing systems (NIPS) (pp. 1040–1046).

47.

Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S., & Brain, G. (2018). Time-contrastive networks: Self-supervised learning from video. In IEEE international conference on robotics and automation (ICRA).

48.

Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin-Madison.

49.

Silva, F. L. D., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research (JAIR), 69, 645–703.MathSciNetMATHCrossRef

50.

Silva, F.L.D., Glatt, R., & Costa, A.H.R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1100–1108).

51.

Silva, F.L.D., Taylor, M.E., & Costa, A.H.R. (2018). Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 5487–5493).

52.

Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th AAAI conference on artificial intelligence (AAAI) (pp. 1504–1509).

53.

Stone, P., & Veloso, M. (1999). Task decomposition, dynamic role assignment, and low-bandwidth communication for real-time strategic teamwork. Artificial Intelligence, 110(2), 241–273. https://doi.org/10.1016/S0004-3702(99)00025-9.MATHCrossRef

54.

Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Conference on neural information processing systems (NIPS).

55.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (1st ed.). Cambridge, MA, USA: MIT Press.MATH

56.

Sutton, R.S., McAllester, D.A., Singh, S.P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (pp. 1057–1063).

57.

Tafesse, Y. D., Wigness, M., & Twigg, J. (2018). Analysis techniques for displaying robot intent with LED patterns. US Army Research Laboratory: Technical report.

58.

Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th international conference on machine learning (ICML) (pp. 330–337).

59.

Taylor, A., Dusparic, I., Galvan-Lopez, E., Clarke, S., & Cahill, V. (2014). Accelerating learning in multi-objective systems through transfer learning. In International joint conference on neural networks (IJCNN) (pp. 2298–2305). https://doi.org/10.1109/IJCNN.2014.6889438

60.

Taylor, M. E., Carboni, N., Fachantidis, A., Vlahavas, I. P., & Torrey, L. (2014). Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1), 45–63. https://doi.org/10.1080/09540091.2014.885279.CrossRef

61.

Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research (JMLR), 10, 1633–1685. https://doi.org/10.1145/1577069.1755839.MathSciNetMATHCrossRef

62.

Taylor, M. E., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research (JMLR), 8(1), 2125–2167.MathSciNetMATH

63.

Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems.

64.

Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 4950–4957).

65.

Torrey, L., & Taylor, M.E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1053–1060).

66.

Warnell, G., Waytowich, N., Lawhern, V., & Stone, P. (2018). Deep TAMER: Interactive agent shaping in high-dimensional state spaces. In AAAI conference on artificial intelligence.

67.

Watkins, C. J., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 279–292.MATH

68.

Wirth, C., Akrour, R., Neumann, G., & Fürnkranz, J. (2017). A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research18(136), 1–46. http://jmlr.org/papers/v18/16-634.html

69.

Zhan, Y., Bou-Ammar, H., & Taylor, M.E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 2315–2321).

70.

Zimmer, M., Viappiani, P., & Weng, P. (2014). Teacher–student framework: A reinforcement learning approach. In Workshop on autonomous robots and multirobot systems at AAMAS.

Title: Agents teaching agents: a survey on inter-agent transfer learning
Authors: Felipe Leno Da Silva
Garrett Warnell
Anna Helena Reali Costa
Peter Stone
Publication date: 01-04-2020
Publisher: Springer US
Published in: Autonomous Agents and Multi-Agent Systems / Issue 1/2020
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-019-09430-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2020

Bounds and dynamics for empirical game theoretic analysis

The complexity of bribery and control in group identification

Semantics and algorithms for trustworthy commitment achievement under model uncertainty

The impact of agent definitions and interactions on multiagent learning for coordination in traffic management domains

Applying Max-sum to asymmetric distributed constraint optimization problems

Partition decision trees: representation for efficient computation of the Shapley value extended to games with externalities

Premium Partner