Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 5/2024

28-10-2023 | Original Article

Consistent epistemic planning for multiagent deep reinforcement learning

Authors: Peiliang Wu, Shicheng Luo, Liqiang Tian, Bingyi Mao, Wenbai Chen

Published in: International Journal of Machine Learning and Cybernetics | Issue 5/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multiagent cooperation in a partially observable environment without communication is difficult because of the uncertainty of agents. Traditional multiagent deep reinforcement learning (MADRL) algorithms fail to address this uncertainty. We proposed a MADRL-based policy network architecture called shared mental model-multiagent epistemic planning policy (SMM-MEPP) to resolve this issue. Firstly, this architecture combines multiagent epistemic planning and MADRL to create a “perception–planning–action” multiagent epistemic planning framework, helping multiple agents better handle uncertainty in the absence of coordination. Additionally, by introducing mental models and describing them as neural networks, the parameter-sharing mechanism is used to create shared mental models, maintain the consistency of multiagent planning under the condition of no communication, and improve the efficiency of cooperation. Finally, we applied the SMM-MEPP architecture to three advanced MADRL algorithms (i.e., MAAC, MADDPG, and MAPPO) and conducted comparative experiments in multiagent cooperation tasks. The results show that the proposed method can provide consistent planning for multiple agents and improve the convergence speed or training effect in a partially observable environment without communication.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Alshehri A, Miller T, Sonenberg L (2021) Modeling communication of collaborative multiagent system under epistemic planning. Int J Intell Syst 36(10):5959–5980CrossRef Alshehri A, Miller T, Sonenberg L (2021) Modeling communication of collaborative multiagent system under epistemic planning. Int J Intell Syst 36(10):5959–5980CrossRef
2.
go back to reference Areces C, Fervari R, Saravia AR et al (2021) Uncertainty-based semantics for multi-agent knowing how logics. arXiv preprint arXiv:2106.11492 Areces C, Fervari R, Saravia AR et al (2021) Uncertainty-based semantics for multi-agent knowing how logics. arXiv preprint arXiv:​2106.​11492
3.
go back to reference Baier C, Funke F, Majumdar R (2021) Responsibility attribution in parameterized Markovian models. In: Proceedings of the AAAI conference on artificial intelligence, pp 11734–11743 Baier C, Funke F, Majumdar R (2021) Responsibility attribution in parameterized Markovian models. In: Proceedings of the AAAI conference on artificial intelligence, pp 11734–11743
4.
go back to reference Bolander T, Andersen MB (2011) Epistemic planning for single-and multi-agent systems. J Appl Non-Class Logics 21(1):9–34MathSciNetCrossRef Bolander T, Andersen MB (2011) Epistemic planning for single-and multi-agent systems. J Appl Non-Class Logics 21(1):9–34MathSciNetCrossRef
5.
go back to reference Buckingham D, Kasenberg D, Scheutz M (2020) Simultaneous representation of knowledge and belief for epistemic planning with belief revision. In: Proceedings of the international conference on principles of knowledge representation and reasoning, vol 17, pp 172–181 Buckingham D, Kasenberg D, Scheutz M (2020) Simultaneous representation of knowledge and belief for epistemic planning with belief revision. In: Proceedings of the international conference on principles of knowledge representation and reasoning, vol 17, pp 172–181
7.
go back to reference Engesser T, Bolander T, Mattmüller R et al (2017) Cooperative epistemic multi-agent planning for implicit coordination. arXiv preprint arXiv:1703.02196 Engesser T, Bolander T, Mattmüller R et al (2017) Cooperative epistemic multi-agent planning for implicit coordination. arXiv preprint arXiv:​1703.​02196
8.
go back to reference Fabiano F, Burigana A, Dovier A et al (2021) Multi-agent epistemic planning with inconsistent beliefs, trust and lies. In: Pham DN, Theeramunkong T, Governatori G et al (eds) PRICAI 2021: trends in artificial intelligence. Springer International Publishing, Cham, pp 586–597 Fabiano F, Burigana A, Dovier A et al (2021) Multi-agent epistemic planning with inconsistent beliefs, trust and lies. In: Pham DN, Theeramunkong T, Governatori G et al (eds) PRICAI 2021: trends in artificial intelligence. Springer International Publishing, Cham, pp 586–597
9.
go back to reference Fabiano F, Srivastava B, Lenchner J, et al (2021b) E-PDDL: a standardized way of defining epistemic planning problems. arXiv preprint arXiv:2107.08739 Fabiano F, Srivastava B, Lenchner J, et al (2021b) E-PDDL: a standardized way of defining epistemic planning problems. arXiv preprint arXiv:​2107.​08739
10.
go back to reference Foerster J, Assael IA, De Freitas N et al (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th international conference on neural information processing systems, pp 2145–2153 Foerster J, Assael IA, De Freitas N et al (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th international conference on neural information processing systems, pp 2145–2153
11.
go back to reference Geffner H, Bonet B (2013) A concise introduction to models and methods for automated planning. In: Synthesis lectures on artificial intelligence and machine learning, vol 8, no 1, pp 1–141 Geffner H, Bonet B (2013) A concise introduction to models and methods for automated planning. In: Synthesis lectures on artificial intelligence and machine learning, vol 8, no 1, pp 1–141
12.
go back to reference Gurov D, Goranko V, Lundberg E (2022) Knowledge-based strategies for multi-agent teams playing against nature. Artif Intell 309(103):728MathSciNet Gurov D, Goranko V, Lundberg E (2022) Knowledge-based strategies for multi-agent teams playing against nature. Artif Intell 309(103):728MathSciNet
13.
go back to reference He K, Banerjee B, Doshi P (2021) Cooperative-competitive reinforcement learning with history-dependent rewards. In: Proceedings of the 20th international conference on autonomous agents and multiagent systems, pp 602–610 He K, Banerjee B, Doshi P (2021) Cooperative-competitive reinforcement learning with history-dependent rewards. In: Proceedings of the 20th international conference on autonomous agents and multiagent systems, pp 602–610
14.
go back to reference Ikeda T, Shibuya T (2022) Centralized training with decentralized execution reinforcement learning for cooperative multi-agent systems with communication delay. In: 2022 61st annual conference of the Society of Instrument and Control Engineers (SICE). IEEE, pp 135–140 Ikeda T, Shibuya T (2022) Centralized training with decentralized execution reinforcement learning for cooperative multi-agent systems with communication delay. In: 2022 61st annual conference of the Society of Instrument and Control Engineers (SICE). IEEE, pp 135–140
15.
go back to reference Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 2961–2970 Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 2961–2970
17.
go back to reference Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Proceedings of the 32nd international conference on neural information processing systems, pp 7265–7275 Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Proceedings of the 32nd international conference on neural information processing systems, pp 7265–7275
18.
go back to reference Kong X, Xin B, Liu F et al (2017) Revisiting the master–slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.07305 Kong X, Xin B, Liu F et al (2017) Revisiting the master–slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:​1712.​07305
19.
go back to reference Lowe R, Wu YI, Tamar A et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382–6393 Lowe R, Wu YI, Tamar A et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382–6393
20.
go back to reference Muise C (2014) Exploiting relevance to improve robustness and flexibility in plan generation and execution. University of Toronto (Canada), Toronto Muise C (2014) Exploiting relevance to improve robustness and flexibility in plan generation and execution. University of Toronto (Canada), Toronto
22.
go back to reference Parnika P, Diddigi RB, Danda SKR et al (2021) Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. In: International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) Parnika P, Diddigi RB, Danda SKR et al (2021) Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. In: International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
23.
go back to reference Rouse WB, Morris NM (1986) On looking into the black box: prospects and limits in the search for mental models. Psychol Bull 100(3):349CrossRef Rouse WB, Morris NM (1986) On looking into the black box: prospects and limits in the search for mental models. Psychol Bull 100(3):349CrossRef
24.
go back to reference Rupprecht T, Wang Y (2022) A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions. Neural Netw Off J Int Neural Netw Soc 153:13–36 Rupprecht T, Wang Y (2022) A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions. Neural Netw Off J Int Neural Netw Soc 153:13–36
25.
go back to reference Seo S, Kennedy-Metz LR, Zenati MA et al (2021) Towards an AI coach to infer team mental model alignment in healthcare. In: 2021 IEEE conference on cognitive and computational aspects of situation management (CogSIMA). IEEE, pp 39–44 Seo S, Kennedy-Metz LR, Zenati MA et al (2021) Towards an AI coach to infer team mental model alignment in healthcare. In: 2021 IEEE conference on cognitive and computational aspects of situation management (CogSIMA). IEEE, pp 39–44
26.
go back to reference Shibata K, Jimbo T, Matsubara T (2023) Deep reinforcement learning of event-triggered communication and consensus-based control for distributed cooperative transport. Robot Auton Syst 159(104):307 Shibata K, Jimbo T, Matsubara T (2023) Deep reinforcement learning of event-triggered communication and consensus-based control for distributed cooperative transport. Robot Auton Syst 159(104):307
27.
go back to reference Singh R, Sonenberg L, Miller T (2017) Communication and shared mental models for teams performing interdependent tasks. In: Coordination, organizations, institutions, and norms in agent systems XII: COIN 2016 international workshops, COIN@ AAMAS, Singapore, Singapore, May 9, 2016, COIN@ ECAI, The Hague, The Netherlands, August 30, 2016, Revised Selected Papers. Springer, pp 81–97 Singh R, Sonenberg L, Miller T (2017) Communication and shared mental models for teams performing interdependent tasks. In: Coordination, organizations, institutions, and norms in agent systems XII: COIN 2016 international workshops, COIN@ AAMAS, Singapore, Singapore, May 9, 2016, COIN@ ECAI, The Hague, The Netherlands, August 30, 2016, Revised Selected Papers. Springer, pp 81–97
28.
go back to reference Ulusoy A, Smith SL, Ding XC et al (2011) Optimal multi-robot path planning with temporal logic constraints. In: 2011 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3087–3092 Ulusoy A, Smith SL, Ding XC et al (2011) Optimal multi-robot path planning with temporal logic constraints. In: 2011 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3087–3092
30.
go back to reference Wu J, Sun X, Zeng A et al (2021) Spatial intention maps for multi-agent mobile manipulation. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 8749–8756 Wu J, Sun X, Zeng A et al (2021) Spatial intention maps for multi-agent mobile manipulation. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 8749–8756
32.
go back to reference Yang T, Tang H, Bai C et al (2021) Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668 Yang T, Tang H, Bai C et al (2021) Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:​2109.​06668
33.
go back to reference Yu C, Velu A, Vinitsky E et al (2021) The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 Yu C, Velu A, Vinitsky E et al (2021) The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:​2103.​01955
34.
go back to reference Zhou Y (2021) Ideology, censorship, and propaganda: unifying shared mental models. Available at SSRN 3821161 Zhou Y (2021) Ideology, censorship, and propaganda: unifying shared mental models. Available at SSRN 3821161
Metadata
Title
Consistent epistemic planning for multiagent deep reinforcement learning
Authors
Peiliang Wu
Shicheng Luo
Liqiang Tian
Bingyi Mao
Wenbai Chen
Publication date
28-10-2023
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 5/2024
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01989-1

Other articles of this Issue 5/2024

International Journal of Machine Learning and Cybernetics 5/2024 Go to the issue