Skip to main content
Erschienen in: Artificial Intelligence Review 5/2021

24.11.2020

A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications

verfasst von: Wei Du, Shifei Ding

Erschienen in: Artificial Intelligence Review | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep reinforcement learning has proved to be a fruitful method in various tasks in the field of artificial intelligence during the last several years. Recent works have focused on deep reinforcement learning beyond single-agent scenarios, with more consideration of multi-agent settings. The main goal of this paper is to provide a detailed and systematic overview of multi-agent deep reinforcement learning methods in views of challenges and applications. Specifically, the preliminary knowledge is introduced first for a better understanding of this field. Then, a taxonomy of challenges is proposed and the corresponding structures and representative methods are introduced. Finally, some applications and interesting future opportunities for multi-agent deep reinforcement learning are given.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abouheaf M, Gueaieb W (2017) Multi-agent reinforcement learning approach based on reduced value function approximations. In 2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS) pp 111–116. IEEE Abouheaf M, Gueaieb W (2017) Multi-agent reinforcement learning approach based on reduced value function approximations. In 2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS) pp 111–116. IEEE
Zurück zum Zitat Albrecht SV, Stone P (2018) Autonomous agents modeling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95CrossRef Albrecht SV, Stone P (2018) Autonomous agents modeling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95CrossRef
Zurück zum Zitat Bard N, Foerster JN, Chandar S, Burch N, Lanctot M, Song HF, Dunning I (2020) The hanabi challenge: a new frontier for ai research. Artif Intell 280:103216MathSciNetCrossRef Bard N, Foerster JN, Chandar S, Burch N, Lanctot M, Song HF, Dunning I (2020) The hanabi challenge: a new frontier for ai research. Artif Intell 280:103216MathSciNetCrossRef
Zurück zum Zitat Bowling M, McCracken P (2005) Coordination and adaptation in impromptu teams. In: 1995 AAAI conference on artificial intelligence, vol 5, pp 53–58 Bowling M, McCracken P (2005) Coordination and adaptation in impromptu teams. In: 1995 AAAI conference on artificial intelligence, vol 5, pp 53–58
Zurück zum Zitat Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. In: Srinivasan D, Jain LC (eds) Innovations in multi-agent systems and applications-1. Springer, Berlin, Heidelberg, pp 183–221CrossRef Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. In: Srinivasan D, Jain LC (eds) Innovations in multi-agent systems and applications-1. Springer, Berlin, Heidelberg, pp 183–221CrossRef
Zurück zum Zitat Calvo JA, Dusparic I (2018) Heterogeneous multi-agent deep reinforcement learning for traffic lights control. In AICS pp 2–13 Calvo JA, Dusparic I (2018) Heterogeneous multi-agent deep reinforcement learning for traffic lights control. In AICS pp 2–13
Zurück zum Zitat Camerer CF, Ho TH, Chong JK (2004) Behavioural game theory: thinking, learning and teaching. In Advances in understanding strategic behavior. Palgrave Macmillan, London, pp 120–180CrossRef Camerer CF, Ho TH, Chong JK (2004) Behavioural game theory: thinking, learning and teaching. In Advances in understanding strategic behavior. Palgrave Macmillan, London, pp 120–180CrossRef
Zurück zum Zitat Carmel D, Markovitch S (1996) Incorporating opponent models into adversary search. In AAAI/IAAI, Vol. 1, pp 120–125 Carmel D, Markovitch S (1996) Incorporating opponent models into adversary search. In AAAI/IAAI, Vol. 1, pp 120–125
Zurück zum Zitat Chen W, Zhou K, Chen C (2016) Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning. In 2016 IEEE 19th International conference on intelligent transportation systems (ITSC) pp 100–106. IEEE Chen W, Zhou K, Chen C (2016) Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning. In 2016 IEEE 19th International conference on intelligent transportation systems (ITSC) pp 100–106. IEEE
Zurück zum Zitat Christiano PF, Leike J, Brown T, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems pp 4299–4307 Christiano PF, Leike J, Brown T, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems pp 4299–4307
Zurück zum Zitat Da Silva FL, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Intell Res 64:645–703MathSciNetCrossRef Da Silva FL, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Intell Res 64:645–703MathSciNetCrossRef
Zurück zum Zitat Ding S, Du W, Zhao X et al (2019) A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 49(12):4211–4222CrossRef Ding S, Du W, Zhao X et al (2019) A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 49(12):4211–4222CrossRef
Zurück zum Zitat Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning pp 1329–1338 Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning pp 1329–1338
Zurück zum Zitat Egorov M (2016) Multi-agent deep reinforcement learning. CS231n: convolutional neural networks for visual recognition Egorov M (2016) Multi-agent deep reinforcement learning. CS231n: convolutional neural networks for visual recognition
Zurück zum Zitat Finn C, Levine S (2017) Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA) pp 2786–2793. IEEE Finn C, Levine S (2017) Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA) pp 2786–2793. IEEE
Zurück zum Zitat Foerster J, Assael IA, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems pp 2137–2145 Foerster J, Assael IA, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems pp 2137–2145
Zurück zum Zitat Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 pp 1146–1155. JMLR. org Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 pp 1146–1155. JMLR. org
Zurück zum Zitat Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence
Zurück zum Zitat Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Blundell C (2017) Noisy networks for exploration. arXiv preprint Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Blundell C (2017) Noisy networks for exploration. arXiv preprint
Zurück zum Zitat Francois-Lavet V, Fonteneau R, Ernst D (2015) How to discount deep reinforcement learning: towards new dynamic strategies. Proceedings of the Workshops at the Advances in Neural Information Processing Systems. Montreal, Canada: pp 107–116 Francois-Lavet V, Fonteneau R, Ernst D (2015) How to discount deep reinforcement learning: towards new dynamic strategies. Proceedings of the Workshops at the Advances in Neural Information Processing Systems. Montreal, Canada: pp 107–116
Zurück zum Zitat Fu H, Tang H, Hao J, Lei Z, Chen Y, Fan C (2019) Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. arXiv preprint Fu H, Tang H, Hao J, Lei Z, Chen Y, Fan C (2019) Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. arXiv preprint
Zurück zum Zitat Fujimoto S, Van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. arXiv preprint Fujimoto S, Van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. arXiv preprint
Zurück zum Zitat Gao C, Kartal B, Hernandez-Leal P, Taylor ME (2019) On hard exploration for reinforcement learning: a case study in pommerman. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Vol. 15, No. 1, pp 24–30 Gao C, Kartal B, Hernandez-Leal P, Taylor ME (2019) On hard exploration for reinforcement learning: a case study in pommerman. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Vol. 15, No. 1, pp 24–30
Zurück zum Zitat Gmytrasiewicz PJ, Doshi P (2005) A framework for sequential planning in multi-agent settings. J Artif Intell Res 24:49–79CrossRef Gmytrasiewicz PJ, Doshi P (2005) A framework for sequential planning in multi-agent settings. J Artif Intell Res 24:49–79CrossRef
Zurück zum Zitat Gmytrasiewicz PJ, Durfee EH (2000) Rational coordination in multi-agent environments, autonomous agents and multi-agent systems 3 (4) Gmytrasiewicz PJ, Durfee EH (2000) Rational coordination in multi-agent environments, autonomous agents and multi-agent systems 3 (4)
Zurück zum Zitat Greenwald A, Hall K, Serrano R (2003) Correlated q-learning. In: International conference on machine learning, vol 3, pp 242–249 Greenwald A, Hall K, Serrano R (2003) Correlated q-learning. In: International conference on machine learning, vol 3, pp 242–249
Zurück zum Zitat Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning pp 2829–2838 Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning pp 2829–2838
Zurück zum Zitat Gu S, Holly E, Lillicrap T et al. (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE International Conference on Robotics and Automation. Singapore: IEEE Press: 3389–3396 Gu S, Holly E, Lillicrap T et al. (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE International Conference on Robotics and Automation. Singapore: IEEE Press: 3389–3396
Zurück zum Zitat Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems pp 66–83 Springer, Cham Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems pp 66–83 Springer, Cham
Zurück zum Zitat Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint
Zurück zum Zitat Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. In Advances in neural information processing systems pp 3909–3917 Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. In Advances in neural information processing systems pp 3909–3917
Zurück zum Zitat Hadfield-Menell D, Milli S, Abbeel P, Russell SJ, Dragan A (2017) Inverse reward design. In Advances in neural information processing systems pp 6765–6774 Hadfield-Menell D, Milli S, Abbeel P, Russell SJ, Dragan A (2017) Inverse reward design. In Advances in neural information processing systems pp 6765–6774
Zurück zum Zitat Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series
Zurück zum Zitat He H, Boyd-Graber J, Kwok K, Daumé III H (2016) Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning pp 1804–1813 He H, Boyd-Graber J, Kwok K, Daumé III H (2016) Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning pp 1804–1813
Zurück zum Zitat Heess N, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Silver D (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint Heess N, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Silver D (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint
Zurück zum Zitat Hernandez-Leal P, Kaisers M (2017) Learning against sequential opponents in repeated stochastic games. In The 3rd Multi-disciplinary Conference on Reinforcement Learning and Decision Making, Ann Arbor Hernandez-Leal P, Kaisers M (2017) Learning against sequential opponents in repeated stochastic games. In The 3rd Multi-disciplinary Conference on Reinforcement Learning and Decision Making, Ann Arbor
Zurück zum Zitat Hernandez-Leal P, Taylor ME, Rosman B, Sucar LE, Munoz de Cote E (2016) Identifying and tracking switching, non-stationary opponents: a bayesian approach, In: Multiagent Interaction without Prior Coordination Workshop at AAAI, Phoenix, AZ, USA, 2016 Hernandez-Leal P, Taylor ME, Rosman B, Sucar LE, Munoz de Cote E (2016) Identifying and tracking switching, non-stationary opponents: a bayesian approach, In: Multiagent Interaction without Prior Coordination Workshop at AAAI, Phoenix, AZ, USA, 2016
Zurück zum Zitat Hernandez-Leal P, Kaisers M, Baarslag T, de Cote EM (2017) A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint Hernandez-Leal P, Kaisers M, Baarslag T, de Cote EM (2017) A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint
Zurück zum Zitat Hernandez-Leal P, Zhan Y, Taylor ME, Sucar LE, de Cote EM (2017) Efficiently detecting switches against non-stationary opponents. Auton Agent Multi-Agent Syst 31(4):767–789CrossRef Hernandez-Leal P, Zhan Y, Taylor ME, Sucar LE, de Cote EM (2017) Efficiently detecting switches against non-stationary opponents. Auton Agent Multi-Agent Syst 31(4):767–789CrossRef
Zurück zum Zitat Hernandez-Leal P, Kartal B, Taylor ME (2018) Is multiagent deep reinforcement learning the answer or the question? A brief survey. arXiv preprint Hernandez-Leal P, Kartal B, Taylor ME (2018) Is multiagent deep reinforcement learning the answer or the question? A brief survey. arXiv preprint
Zurück zum Zitat Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G (2017) Rainbow: combining improvements in deep reinforcement learning Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G (2017) Rainbow: combining improvements in deep reinforcement learning
Zurück zum Zitat Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence
Zurück zum Zitat Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint
Zurück zum Zitat Hong ZW, Su SY, Shann, TY, Chang YH, Lee CY (2018) A deep policy inference q-network for multi-agent systems. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems pp 1388–1396. International Foundation for Autonomous Agents and Multiagent Systems Hong ZW, Su SY, Shann, TY, Chang YH, Lee CY (2018) A deep policy inference q-network for multi-agent systems. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems pp 1388–1396. International Foundation for Autonomous Agents and Multiagent Systems
Zurück zum Zitat Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069MathSciNetMATH Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069MathSciNetMATH
Zurück zum Zitat Ivanov S, D'yakonov A (2019) Modern Deep Reinforcement Learning Algorithms. arXiv preprint Ivanov S, D'yakonov A (2019) Modern Deep Reinforcement Learning Algorithms. arXiv preprint
Zurück zum Zitat Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems pp 7254–7264 Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems pp 7254–7264
Zurück zum Zitat Jin J, Song C, Li H, Gai K, Wang J, Zhang W (2018) Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management pp 2193–2201. ACM Jin J, Song C, Li H, Gai K, Wang J, Zhang W (2018) Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management pp 2193–2201. ACM
Zurück zum Zitat Johnson M, Hofmann K, Hutton T (2016) The Malmo platform for artificial intelligence experimentation. In: IJCAI, pp 4246–4247 Johnson M, Hofmann K, Hutton T (2016) The Malmo platform for artificial intelligence experimentation. In: IJCAI, pp 4246–4247
Zurück zum Zitat Kofinas P, Dounis AI, Vouros GA (2018) Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids. Appl Energy 219:53–67CrossRef Kofinas P, Dounis AI, Vouros GA (2018) Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids. Appl Energy 219:53–67CrossRef
Zurück zum Zitat Kononen V (2004) Asymmetric multiagent reinforcement learning. Web Intell Agent Syst: An Int J 2(2):105–121 Kononen V (2004) Asymmetric multiagent reinforcement learning. Web Intell Agent Syst: An Int J 2(2):105–121
Zurück zum Zitat Kurek M, Jakowski W (2016) Heterogeneous team deep Q-learning in low-dimensional multi-agent environments. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on pp 1–8 Kurek M, Jakowski W (2016) Heterogeneous team deep Q-learning in low-dimensional multi-agent environments. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on pp 1–8
Zurück zum Zitat Lakshminarayanan AS, Sharma S, Ravindran B (2016) Dynamic frame skip deep q network. Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence Lakshminarayanan AS, Sharma S, Ravindran B (2016) Dynamic frame skip deep q network. Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence
Zurück zum Zitat Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systemsm pp 4190–4203 Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systemsm pp 4190–4203
Zurück zum Zitat Lanctot M, Zambaldi V, Gruslys A et al (2017) A unified game-theoretic approach to multi-agent reinforcement learning. Advances in neural information processing systems. Los Angeles: NIPS Press 2017:4190–4203 Lanctot M, Zambaldi V, Gruslys A et al (2017) A unified game-theoretic approach to multi-agent reinforcement learning. Advances in neural information processing systems. Los Angeles: NIPS Press 2017:4190–4203
Zurück zum Zitat Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the Seventeenth International Conference on Machine Learning Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the Seventeenth International Conference on Machine Learning
Zurück zum Zitat Leibo JZ, Zambaldi V, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems pp 464–473. International Foundation for Autonomous Agents and Multiagent Systems Leibo JZ, Zambaldi V, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems pp 464–473. International Foundation for Autonomous Agents and Multiagent Systems
Zurück zum Zitat Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373MathSciNetMATH Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373MathSciNetMATH
Zurück zum Zitat Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In AAAI Conference on Artificial Intelligence (AAAI) Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In AAAI Conference on Artificial Intelligence (AAAI)
Zurück zum Zitat Lillicrap TP, Hunt JJ, Pritzel A et al (2016) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187 Lillicrap TP, Hunt JJ, Pritzel A et al (2016) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187
Zurück zum Zitat Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. New brunswick: machine learning. Elsevier, USA, pp 157–163 Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. New brunswick: machine learning. Elsevier, USA, pp 157–163
Zurück zum Zitat Littman ML (2001) Value-function reinforcement learning in Markov games. Cognit Syst Res 2(1):55–66CrossRef Littman ML (2001) Value-function reinforcement learning in Markov games. Cognit Syst Res 2(1):55–66CrossRef
Zurück zum Zitat Liu S, Lever G, Merel J, Tunyasuvunakool S, Heess N, Graepel T (2019) Emergent coordination through competition. arXiv preprint Liu S, Lever G, Merel J, Tunyasuvunakool S, Heess N, Graepel T (2019) Emergent coordination through competition. arXiv preprint
Zurück zum Zitat Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30:6379–6390 Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30:6379–6390
Zurück zum Zitat Mao H, Gong Z, Ni, Y, Xiao Z (2017) ACCNet: Actor-Coordinator-Critic Net for" Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning. arXiv preprint Mao H, Gong Z, Ni, Y, Xiao Z (2017) ACCNet: Actor-Coordinator-Critic Net for" Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning. arXiv preprint
Zurück zum Zitat Mao H, Liu W, Hao J, Luo J, Li D, Zhang Z, Xiao Z (2019) Neighborhood cognition consistent multi-agent reinforcement learning. arXiv preprint Mao H, Liu W, Hao J, Luo J, Li D, Zhang Z, Xiao Z (2019) Neighborhood cognition consistent multi-agent reinforcement learning. arXiv preprint
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller, M (2013) Playing atari with deep reinforcement learning. arXiv preprint Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller, M (2013) Playing atari with deep reinforcement learning. arXiv preprint
Zurück zum Zitat Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp 1928–1937 Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp 1928–1937
Zurück zum Zitat Nguyen ND, Nahavandi S, Nguyen T (2018) A human mixed strategy approach to deep reinforcement learning. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) pp 4023–4028. IEEE Nguyen ND, Nahavandi S, Nguyen T (2018) A human mixed strategy approach to deep reinforcement learning. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) pp 4023–4028. IEEE
Zurück zum Zitat Nguyen TT, Nguyen ND, Nahavandi S (2018) Deep reinforcement learning for multi-agent systems: a review of challenges, solutions and applications. arXiv preprint Nguyen TT, Nguyen ND, Nahavandi S (2018) Deep reinforcement learning for multi-agent systems: a review of challenges, solutions and applications. arXiv preprint
Zurück zum Zitat Nguyen T, Nguyen ND, Nahavandi S (2018) Multi-agent deep reinforcement learning with human strategies. arXiv preprint Nguyen T, Nguyen ND, Nahavandi S (2018) Multi-agent deep reinforcement learning with human strategies. arXiv preprint
Zurück zum Zitat Noureddine D, Gharbi A Ahmed S (2017) Multi-agent deep reinforcement learning for task allocation in dynamic environment. In Proceedings of the 12th International Conference on Software Technologies (ICSOFT), pp 17–26 Noureddine D, Gharbi A Ahmed S (2017) Multi-agent deep reinforcement learning for task allocation in dynamic environment. In Proceedings of the 12th International Conference on Software Technologies (ICSOFT), pp 17–26
Zurück zum Zitat Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems pp 443–451. International Foundation for Autonomous Agents and Multiagent Systems Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems pp 443–451. International Foundation for Autonomous Agents and Multiagent Systems
Zurück zum Zitat Palmer G, Savani R, Tuyls K (2019) Negative update intervals in deep multi-agent reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems pp 43–51. International Foundation for Autonomous Agents and Multiagent Systems Palmer G, Savani R, Tuyls K (2019) Negative update intervals in deep multi-agent reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems pp 43–51. International Foundation for Autonomous Agents and Multiagent Systems
Zurück zum Zitat Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. Auton Agent Multi-Agent Syst 11(3):387–434CrossRef Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. Auton Agent Multi-Agent Syst 11(3):387–434CrossRef
Zurück zum Zitat Parisotto E, Ba JL, Salakhutdinov R (2015) Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint Parisotto E, Ba JL, Salakhutdinov R (2015) Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint
Zurück zum Zitat Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint , 2 Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint , 2
Zurück zum Zitat Perolat J, Leibo JZ, Zambaldi V, Beattie C, Tuyls K, Graepel T (2017) A multi-agent reinforcement learning model of common-pool resource appropriation. In Advances in Neural Information Processing Systems pp 3643–3652 Perolat J, Leibo JZ, Zambaldi V, Beattie C, Tuyls K, Graepel T (2017) A multi-agent reinforcement learning model of common-pool resource appropriation. In Advances in Neural Information Processing Systems pp 3643–3652
Zurück zum Zitat Piot B, Geist M, Pietquin O (2016) Bridging the gap between imitation learning and inverse reinforcement learning. IEEE transactions on neural networks and learning systems 28(8):1814–1826MathSciNetCrossRef Piot B, Geist M, Pietquin O (2016) Bridging the gap between imitation learning and inverse reinforcement learning. IEEE transactions on neural networks and learning systems 28(8):1814–1826MathSciNetCrossRef
Zurück zum Zitat Rabinowitz NC, Perbet F, Song HF, Zhang C, Eslami SM, Botvinick M (2018) Machine theory of mind. arXiv preprint Rabinowitz NC, Perbet F, Song HF, Zhang C, Eslami SM, Botvinick M (2018) Machine theory of mind. arXiv preprint
Zurück zum Zitat Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. arXiv preprint Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. arXiv preprint
Zurück zum Zitat Rashid T, Samvelyan M, De Witt CS, Farquhar G, Foerster J, Whiteson S (2018). QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint Rashid T, Samvelyan M, De Witt CS, Farquhar G, Foerster J, Whiteson S (2018). QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint
Zurück zum Zitat Resnick C, Eldridge W, Ha D, Britz D, Foerster J, Togelius J et al (2018) Pommerman: a multi-agent playground Resnick C, Eldridge W, Ha D, Britz D, Foerster J, Togelius J et al (2018) Pommerman: a multi-agent playground
Zurück zum Zitat Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Hadsell R (2015) Policy distillation. arXiv preprint Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Hadsell R (2015) Policy distillation. arXiv preprint
Zurück zum Zitat Samvelyan M, Rashid T, Schroeder de Witt C, Farquhar G, Nardelli N, Rudner TG, Whiteson . (2019). The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems pp 2186–2188. International Foundation for Autonomous Agents and Multiagent Systems Samvelyan M, Rashid T, Schroeder de Witt C, Farquhar G, Nardelli N, Rudner TG, Whiteson . (2019). The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems pp 2186–2188. International Foundation for Autonomous Agents and Multiagent Systems
Zurück zum Zitat Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint
Zurück zum Zitat Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint
Zurück zum Zitat Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint
Zurück zum Zitat Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. Proceedings of the International Conference on Machine Learning. Beijing, China: 387–395 Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. Proceedings of the International Conference on Machine Learning. Beijing, China: 387–395
Zurück zum Zitat Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv preprint Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv preprint
Zurück zum Zitat Song J, Ren H, Sadigh D, Ermon S (2018) Multi-agent generative adversarial imitation learning. In Advances in Neural Information Processing Systems pp 7461–7472 Song J, Ren H, Sadigh D, Ermon S (2018) Multi-agent generative adversarial imitation learning. In Advances in Neural Information Processing Systems pp 7461–7472
Zurück zum Zitat Song Y, Wang J, Lukasiewicz T, Xu Z, Xu M, Ding Z, Wu L (2019) Arena: a general evaluation platform and building toolkit for multi-agent intelligence. arXiv preprint Song Y, Wang J, Lukasiewicz T, Xu Z, Xu M, Ding Z, Wu L (2019) Arena: a general evaluation platform and building toolkit for multi-agent intelligence. arXiv preprint
Zurück zum Zitat Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robots 8(3):345–383CrossRef Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robots 8(3):345–383CrossRef
Zurück zum Zitat Suarez J, Du Y, Isola P, Mordatch I, MMO N (1903) A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint Suarez J, Du Y, Isola P, Mordatch I, MMO N (1903) A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint
Zurück zum Zitat Sukhbaatar S, Fergus R (2016) Learning multiagent communication with backpropagation. In Advances in neural information processing systems pp 2244–2252 Sukhbaatar S, Fergus R (2016) Learning multiagent communication with backpropagation. In Advances in neural information processing systems pp 2244–2252
Zurück zum Zitat Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Graepel T (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Graepel T (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint
Zurück zum Zitat Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):e0172395CrossRef Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):e0172395CrossRef
Zurück zum Zitat Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning pp 330–337 Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning pp 330–337
Zurück zum Zitat Tumer K, Agogino A (2007) Distributed agent-based air traffic flow management. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems pp 1–8 Tumer K, Agogino A (2007) Distributed agent-based air traffic flow management. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems pp 1–8
Zurück zum Zitat Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence
Zurück zum Zitat Vidhate DA, Kulkarni P (2017) Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control. In 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM) pp 325–331. IEEE Vidhate DA, Kulkarni P (2017) Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control. In 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM) pp 325–331. IEEE
Zurück zum Zitat Wai HT, Yang Z, Wang PZ, Hong M (2018) Multi-agent reinforcement learning via double averaging primal-dual optimization. In Advances in Neural Information Processing Systems pp 9649–9660 Wai HT, Yang Z, Wang PZ, Hong M (2018) Multi-agent reinforcement learning via double averaging primal-dual optimization. In Advances in Neural Information Processing Systems pp 9649–9660
Zurück zum Zitat Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv preprint Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv preprint
Zurück zum Zitat Wang W, Yang T, Liu Y, Hao J, Hao X, Hu Y, Gao Y (2019) From Few to More: Large-scale Dynamic Multiagent Curriculum Learning. arXiv preprint Wang W, Yang T, Liu Y, Hao J, Hao X, Hu Y, Gao Y (2019) From Few to More: Large-scale Dynamic Multiagent Curriculum Learning. arXiv preprint
Zurück zum Zitat Wang W, Liu TYY, Hao J, Hao X, Hu Y, Chen Y, Gao Y (2019) Action semantics network: Considering the Effects of Actions in Multiagent Systems. arXiv preprint Wang W, Liu TYY, Hao J, Hao X, Hu Y, Chen Y, Gao Y (2019) Action semantics network: Considering the Effects of Actions in Multiagent Systems. arXiv preprint
Zurück zum Zitat Wei E, Wicke D, Freelan D, Luke S (2018) Multiagent soft q-learning. In 2018 AAAI Spring Symposium Series Wei E, Wicke D, Freelan D, Luke S (2018) Multiagent soft q-learning. In 2018 AAAI Spring Symposium Series
Zurück zum Zitat Xi L, Yu T, Yang B, Zhang X (2015) A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids. Energy Convers Manage 103:82–93CrossRef Xi L, Yu T, Yang B, Zhang X (2015) A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids. Energy Convers Manage 103:82–93CrossRef
Zurück zum Zitat Xi L, Chen J, Huang Y, Xu Y, Liu L, Zhou Y, Li Y (2018) Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel. Energy 153:977–987CrossRef Xi L, Chen J, Huang Y, Xu Y, Liu L, Zhou Y, Li Y (2018) Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel. Energy 153:977–987CrossRef
Zurück zum Zitat Xi L, Yu L, Xu Y, Wang S, Chen X (2019) A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems. IEEE Transactions on Sustainable Energy Xi L, Yu L, Xu Y, Wang S, Chen X (2019) A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems. IEEE Transactions on Sustainable Energy
Zurück zum Zitat Xu D, Si J, Bian W (2016) Fingerprint orientation field extraction using gradient-based weighted averaging. International Journal of collaborative intelligence 1(4):287–297CrossRef Xu D, Si J, Bian W (2016) Fingerprint orientation field extraction using gradient-based weighted averaging. International Journal of collaborative intelligence 1(4):287–297CrossRef
Zurück zum Zitat Yang T, Hao J, Meng Z, Zhang C, Zheng YZZ, Zheng Z (2019) Towards efficient detection and optimal response against sophisticated opponents. In Proceedings of the 28th International Joint Conference on Artificial Intelligence pp 623–629. AAAI Press Yang T, Hao J, Meng Z, Zhang C, Zheng YZZ, Zheng Z (2019) Towards efficient detection and optimal response against sophisticated opponents. In Proceedings of the 28th International Joint Conference on Artificial Intelligence pp 623–629. AAAI Press
Zurück zum Zitat Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: a general framework for cooperative multiagent reinforcement learning. arXiv preprint . Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: a general framework for cooperative multiagent reinforcement learning. arXiv preprint .
Zurück zum Zitat Yang Y, Hao J, Chen G, Tang H, Chen Y, Hu Y, Wei Z (2020) Q-value path decomposition for deep multiagent reinforcement learning. In International Joint Conference on Artificial Intelligence (IJCAI) Yang Y, Hao J, Chen G, Tang H, Chen Y, Hu Y, Wei Z (2020) Q-value path decomposition for deep multiagent reinforcement learning. In International Joint Conference on Artificial Intelligence (IJCAI)
Zurück zum Zitat Yin H, Pan SJ (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence Yin H, Pan SJ (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence
Zurück zum Zitat Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y (2020) KoGuN: accelerating deep reinforcement learning via integrating human suboptimal knowledge. In Thirty-seventh International Conference on Machine Learning (ICML)s Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y (2020) KoGuN: accelerating deep reinforcement learning via integrating human suboptimal knowledge. In Thirty-seventh International Conference on Machine Learning (ICML)s
Zurück zum Zitat Zhao Z, Gao Y, Luo B et al (2004) Reinforcement learning technology in multi-agent system. Comput Sci 31(3):23–27 Zhao Z, Gao Y, Luo B et al (2004) Reinforcement learning technology in multi-agent system. Comput Sci 31(3):23–27
Zurück zum Zitat Zhao X, Ding S, An Y, Jia W (2018) Asynchronous reinforcement learning algorithms for solving discrete space path planning problems. Appl Intell 48(12):4889–4904CrossRef Zhao X, Ding S, An Y, Jia W (2018) Asynchronous reinforcement learning algorithms for solving discrete space path planning problems. Appl Intell 48(12):4889–4904CrossRef
Zurück zum Zitat Zhao X, Ding S, An Y, Jia W (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591CrossRef Zhao X, Ding S, An Y, Jia W (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591CrossRef
Zurück zum Zitat Zheng L, Yang J, Cai H, Zhang W, Wang J, Yu Y (2017)s Magent: a many-agent reinforcement learning platform for artificial collective intelligence Zheng L, Yang J, Cai H, Zhang W, Wang J, Yu Y (2017)s Magent: a many-agent reinforcement learning platform for artificial collective intelligence
Zurück zum Zitat Zheng Y, Meng Z, Hao J, Zhang Z, Yang T, Fan C (2018) A deep bayesian policy reuse approach against non-stationary agents. In Advances in Neural Information Processing Systems pp 954–964 Zheng Y, Meng Z, Hao J, Zhang Z, Yang T, Fan C (2018) A deep bayesian policy reuse approach against non-stationary agents. In Advances in Neural Information Processing Systems pp 954–964
Metadaten
Titel
A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications
verfasst von
Wei Du
Shifei Ding
Publikationsdatum
24.11.2020
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence Review / Ausgabe 5/2021
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-020-09938-y

Weitere Artikel der Ausgabe 5/2021

Artificial Intelligence Review 5/2021 Zur Ausgabe

Premium Partner