Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2024

30.09.2023 | Original Article

Multi-agent cooperation policy gradient method based on enhanced exploration for cooperative tasks

verfasst von: Li-yang Zhao, Tian-qing Chang, Lei Zhang, Xin-lu Zhang, Jiang-feng Wang

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multi-agent cooperation and coordination are often essential for task fulfillment. Multi-agent deep reinforcement learning (MADRL) can effectively learn the solutions to problems, but its application is still primarily restricted by the exploration–exploitation trade-off. Therefore, the focus of MADRL research is placed on how to effectively explore the environment and collect good experience with rich information to strengthen cooperative behaviors and optimize policy learning. To address this problem, we propose a novel multi-agent cooperation policy gradient method called multi-agent proximal policy optimization based on self-imitation learning and random network distillation (MAPPOSR). MAPPOSR consists of two policy gradient-based additional components, namely (1) random network distillation (RND) exploration bonus component that produces intrinsic rewards and encourages agents to access new states and actions, thereby helping them explore better trajectories and avoiding the algorithm prematurely converging or getting stuck in local optima; and (2) self-imitation learning (SIL) policy update component that stores and reuses high-return trajectory samples generated by agents themselves, thereby strengthening their cooperation and boosting learning efficiency. The experimental results show that in addition to effectively solving the hard-exploration problem, the proposed method significantly outperforms other SOTA MADRL algorithms in learning efficiency as well as in escaping local optima. Moreover, the effect of different function inputs on algorithm performance is investigated in the centralized training and decentralized execution (CTDE) framework, based on which a joint-observation coding method based on individual is developed. By encouraging the agent to focus more on the local observation information of other agents related to it and abandon global state information provided by the environment, the developed coding method can remove the effects of excessive value function input dimensions and redundant feature information on algorithm performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
2.
Zurück zum Zitat Li Q, Peng H, Li J, Wu J, Ning Y, Wang L, Yu PS, Wang Z (2021) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Trans Audio Speech Lang Process 30:520–533ADSCrossRef Li Q, Peng H, Li J, Wu J, Ning Y, Wang L, Yu PS, Wang Z (2021) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Trans Audio Speech Lang Process 30:520–533ADSCrossRef
3.
Zurück zum Zitat Peng B, Rashid T, Schroeder de Witt C, Kamienny P-A, Torr P, Böhmer W, Whiteson S (2021) Facmac: factored multi-agent centralised policy gradients. Adv Neural Inf Process Syst 34:12208–12221 Peng B, Rashid T, Schroeder de Witt C, Kamienny P-A, Torr P, Böhmer W, Whiteson S (2021) Facmac: factored multi-agent centralised policy gradients. Adv Neural Inf Process Syst 34:12208–12221
4.
Zurück zum Zitat Gupta JK, Egorov M, Kochenderfer MJ (2017) Cooperative multi-agent control using deep reinforcement learning. AAMAS Workshops 30:66–83 Gupta JK, Egorov M, Kochenderfer MJ (2017) Cooperative multi-agent control using deep reinforcement learning. AAMAS Workshops 30:66–83
5.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533ADSCrossRefPubMed Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533ADSCrossRefPubMed
6.
Zurück zum Zitat Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489ADSCrossRefPubMed Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489ADSCrossRefPubMed
7.
Zurück zum Zitat Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359ADSCrossRefPubMed Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359ADSCrossRefPubMed
8.
Zurück zum Zitat Bolander T, Andersen MB (2011) Epistemic planning for single-and multi-agent systems. J Appl Non Class Logics 21(1):9–34MathSciNetCrossRef Bolander T, Andersen MB (2011) Epistemic planning for single-and multi-agent systems. J Appl Non Class Logics 21(1):9–34MathSciNetCrossRef
9.
Zurück zum Zitat Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238CrossRef Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238CrossRef
10.
Zurück zum Zitat Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agent Multi Agent Syst 33(6):750–797CrossRef Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agent Multi Agent Syst 33(6):750–797CrossRef
11.
Zurück zum Zitat Wang J, Hong Y, Wang J, Xu J, Tang Y, Han Q-L, Kurths J (2022) Cooperative and competitive multi-agent systems: from optimization to games. IEEE/CAA J Autom Sin 9(5):763–783CrossRef Wang J, Hong Y, Wang J, Xu J, Tang Y, Han Q-L, Kurths J (2022) Cooperative and competitive multi-agent systems: from optimization to games. IEEE/CAA J Autom Sin 9(5):763–783CrossRef
12.
Zurück zum Zitat Du Y, Han L, Fang M, Liu J, Dai T, Tao D (2019) Liir: learning individual intrinsic reward in multi-agent reinforcement learning. Adv Neural Inf Process Syst 32 Du Y, Han L, Fang M, Liu J, Dai T, Tao D (2019) Liir: learning individual intrinsic reward in multi-agent reinforcement learning. Adv Neural Inf Process Syst 32
13.
Zurück zum Zitat Wang T, Wang J, Wu Y, Zhang C (2020) Influence-based multi-agent exploration. International conference on learning representations Wang T, Wang J, Wu Y, Zhang C (2020) Influence-based multi-agent exploration. International conference on learning representations
14.
Zurück zum Zitat Mahajan A, Rashid T, Samvelyan M, Whiteson S (2019) MAVEN: multi-agent variational exploration. Neural Inf Process Syst 32:7611–7622 Mahajan A, Rashid T, Samvelyan M, Whiteson S (2019) MAVEN: multi-agent variational exploration. Neural Inf Process Syst 32:7611–7622
16.
Zurück zum Zitat Liu I-J, Jain U, Yeh RA, Schwing A (2021) Cooperative exploration for multi-agent deep reinforcement learning. In: International conference on machine learning. PMLR, pp 6826–6836 Liu I-J, Jain U, Yeh RA, Schwing A (2021) Cooperative exploration for multi-agent deep reinforcement learning. In: International conference on machine learning. PMLR, pp 6826–6836
17.
Zurück zum Zitat Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):1–35CrossRef Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):1–35CrossRef
18.
Zurück zum Zitat Ambhore S (2020) A comprehensive study on robot learning from demonstration. In: 2020 2nd international conference on innovative mechanisms for industry applications (ICIMIA). IEEE, pp 291–299 Ambhore S (2020) A comprehensive study on robot learning from demonstration. In: 2020 2nd international conference on innovative mechanisms for industry applications (ICIMIA). IEEE, pp 291–299
19.
Zurück zum Zitat Ravichandar H, Polydoros AS, Chernova S, Billard A (2020) Recent advances in robot learning from demonstration. Ann Rev Control Robot Auton Syst 3:297–330CrossRef Ravichandar H, Polydoros AS, Chernova S, Billard A (2020) Recent advances in robot learning from demonstration. Ann Rev Control Robot Auton Syst 3:297–330CrossRef
20.
Zurück zum Zitat Oh J, Guo Y, Singh S, Lee H (2018) Self-imitation learning. In: International conference on machine learning. PMLR, pp 3878–3887 Oh J, Guo Y, Singh S, Lee H (2018) Self-imitation learning. In: International conference on machine learning. PMLR, pp 3878–3887
22.
Zurück zum Zitat Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J (2018) An algorithmic perspective on imitation learning. Found Trends Robot 7(1–2):1–179 Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J (2018) An algorithmic perspective on imitation learning. Found Trends Robot 7(1–2):1–179
23.
Zurück zum Zitat Pomerleau DA (1991) Efficient training of artificial neural networks for autonomous navigation. Neural Comput 3(1):88–97CrossRefPubMed Pomerleau DA (1991) Efficient training of artificial neural networks for autonomous navigation. Neural Comput 3(1):88–97CrossRefPubMed
24.
Zurück zum Zitat Bain M, Sammut C (1995) A framework for behavioural cloning. Mach Intell 15:103–129 Bain M, Sammut C (1995) A framework for behavioural cloning. Mach Intell 15:103–129
25.
Zurück zum Zitat Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR, pp 627–635 Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR, pp 627–635
26.
Zurück zum Zitat Sun W, Venkatraman A, Gordon GJ, Boots B, Bagnell JA (2017) Deeply aggrevated: differentiable imitation learning for sequential prediction. In: International conference on machine learning. PMLR, pp 3309–3318 Sun W, Venkatraman A, Gordon GJ, Boots B, Bagnell JA (2017) Deeply aggrevated: differentiable imitation learning for sequential prediction. In: International conference on machine learning. PMLR, pp 3309–3318
27.
Zurück zum Zitat Russell S (1998) Learning agents for uncertain environments. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 101–103 Russell S (1998) Learning agents for uncertain environments. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 101–103
28.
Zurück zum Zitat Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning. PMLR, pp 1–8 Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning. PMLR, pp 1–8
29.
Zurück zum Zitat Syed U, Schapire RE (2007) A game-theoretic approach to apprenticeship learning. Adv Neural Inf Process Syst 20. pp 1–8 Syed U, Schapire RE (2007) A game-theoretic approach to apprenticeship learning. Adv Neural Inf Process Syst 20. pp 1–8
30.
Zurück zum Zitat Ho J, Ermon S (2016) Generative adversarial imitation learning. Adv Neural Inf Process Syst 29. pp 1–9 Ho J, Ermon S (2016) Generative adversarial imitation learning. Adv Neural Inf Process Syst 29. pp 1–9
31.
Zurück zum Zitat Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144MathSciNetCrossRef Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144MathSciNetCrossRef
32.
Zurück zum Zitat Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence. AAAI, pp 1433–1438 Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence. AAAI, pp 1433–1438
33.
Zurück zum Zitat Zhang Y, Cai Q, Yang Z, Wang Z (2020) Generative adversarial imitation learning with neural network parameterization: global optimality and convergence rate. In: International conference on machine learning. PMLR, pp 11044–11054 Zhang Y, Cai Q, Yang Z, Wang Z (2020) Generative adversarial imitation learning with neural network parameterization: global optimality and convergence rate. In: International conference on machine learning. PMLR, pp 11044–11054
34.
Zurück zum Zitat Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483CrossRef Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483CrossRef
35.
Zurück zum Zitat Gao Y, Xu H, Lin J, Yu F, Levine S, Darrell T (2018) Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313 Gao Y, Xu H, Lin J, Yu F, Levine S, Darrell T (2018) Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:​1802.​05313
36.
Zurück zum Zitat Jing M, Ma X, Huang W, Sun F, Yang C, Fang B et al (2020) Reinforcement learning from imperfect demonstrations under soft expert guidance. In: Proceedings of the AAAI conference on artificial intelligence. AAAI, pp 5109–5116 Jing M, Ma X, Huang W, Sun F, Yang C, Fang B et al (2020) Reinforcement learning from imperfect demonstrations under soft expert guidance. In: Proceedings of the AAAI conference on artificial intelligence. AAAI, pp 5109–5116
37.
Zurück zum Zitat Kang B, Jie Z, Feng J (2018) Policy optimization with demonstrations. In: International conference on machine learning. PMLR, pp 2469–2478 Kang B, Jie Z, Feng J (2018) Policy optimization with demonstrations. In: International conference on machine learning. PMLR, pp 2469–2478
38.
Zurück zum Zitat Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, Heess N, Rothörl T, Lampe T, Riedmiller M (2017) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817 Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, Heess N, Rothörl T, Lampe T, Riedmiller M (2017) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:​1707.​08817
39.
Zurück zum Zitat Pshikhachev G, Ivanov D, Egorov V, Shpilman A (2022) Self-imitation learning from demonstrations. arXiv preprint arXiv:2203.10905 Pshikhachev G, Ivanov D, Egorov V, Shpilman A (2022) Self-imitation learning from demonstrations. arXiv preprint arXiv:​2203.​10905
40.
Zurück zum Zitat Guo Y, Choi J, Moczulski M, Feng S, Bengio S, Norouzi M, Lee H (2020) Memory based trajectory-conditioned policies for learning from sparse rewards. Adv Neural Inf Process Syst 33:4333–4345 Guo Y, Choi J, Moczulski M, Feng S, Bengio S, Norouzi M, Lee H (2020) Memory based trajectory-conditioned policies for learning from sparse rewards. Adv Neural Inf Process Syst 33:4333–4345
42.
Zurück zum Zitat Tang Y (2020) Self-imitation learning via generalized lower bound q-learning. Adv Neural Inf Process Syst 33:13964–13975 Tang Y (2020) Self-imitation learning via generalized lower bound q-learning. Adv Neural Inf Process Syst 33:13964–13975
43.
Zurück zum Zitat Badia AP, Sprechmann P, Vitvitskyi A, Guo D, Piot B, Kapturowski S, Tieleman O, Arjovsky M, Pritzel A, Bolt A (2020) Never give up: learning directed exploration strategies. arXiv preprint arXiv:2002.06038 Badia AP, Sprechmann P, Vitvitskyi A, Guo D, Piot B, Kapturowski S, Tieleman O, Arjovsky M, Pritzel A, Bolt A (2020) Never give up: learning directed exploration strategies. arXiv preprint arXiv:​2002.​06038
44.
Zurück zum Zitat Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2019) Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2019) Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:​1901.​10995
45.
Zurück zum Zitat Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590(7847):580–586ADSCrossRefPubMed Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590(7847):580–586ADSCrossRefPubMed
47.
Zurück zum Zitat Savinov N, Raichuk A, Marinier R, Vincent D, Pollefeys M, Lillicrap T, Gelly S (2018) Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274 Savinov N, Raichuk A, Marinier R, Vincent D, Pollefeys M, Lillicrap T, Gelly S (2018) Episodic curiosity through reachability. arXiv preprint arXiv:​1810.​02274
48.
Zurück zum Zitat Oudeyer P-Y, Kaplan F (2008) How can we define intrinsic motivation? In: The 8th international conference on epigenetic robotics: modeling cognitive development in robotic systems. Lund:LUCS, pp 1–10 Oudeyer P-Y, Kaplan F (2008) How can we define intrinsic motivation? In: The 8th international conference on epigenetic robotics: modeling cognitive development in robotic systems. Lund:LUCS, pp 1–10
49.
Zurück zum Zitat Tang H, Houthooft R, Foote D, Stooke A, Xi Chen O, Duan Y, Schulman J, DeTurck F, Abbeel P (2017) # exploration: a study of count-based exploration for deep reinforcement learning. Adv Neural Inf Process Syst (30):1–10 Tang H, Houthooft R, Foote D, Stooke A, Xi Chen O, Duan Y, Schulman J, DeTurck F, Abbeel P (2017) # exploration: a study of count-based exploration for deep reinforcement learning. Adv Neural Inf Process Syst (30):1–10
50.
Zurück zum Zitat Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inf Process Syst (29):1–9 Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inf Process Syst (29):1–9
51.
Zurück zum Zitat Ostrovski G, Bellemare MG, Oord A, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning. PMLR, pp 2721–2730 Ostrovski G, Bellemare MG, Oord A, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning. PMLR, pp 2721–2730
52.
Zurück zum Zitat Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning. PMLR, pp 2778–2787 Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning. PMLR, pp 2778–2787
53.
Zurück zum Zitat Oudeyer P-Y, Kaplan F, Hafner VV (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265–286CrossRef Oudeyer P-Y, Kaplan F, Hafner VV (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265–286CrossRef
54.
55.
Zurück zum Zitat Stadie BC, Levine S, Abbeel P (2015) Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814 Stadie BC, Levine S, Abbeel P (2015) Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:​1507.​00814
56.
Zurück zum Zitat Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros AA (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355 Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros AA (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:​1808.​04355
57.
Zurück zum Zitat Choshen L, Fox L, Loewenstein Y (2018) Dora the explorer: directed outreaching reinforcement action-selection. arXiv preprint arXiv:1804.04012 Choshen L, Fox L, Loewenstein Y (2018) Dora the explorer: directed outreaching reinforcement action-selection. arXiv preprint arXiv:​1804.​04012
58.
Zurück zum Zitat Pathak D, Gandhi D, Gupta A (2019) Self-supervised exploration via disagreement. In: International conference on machine learning. PMLR, pp 5062–5071 Pathak D, Gandhi D, Gupta A (2019) Self-supervised exploration via disagreement. In: International conference on machine learning. PMLR, pp 5062–5071
59.
Zurück zum Zitat Lee GT, Kim CO (2019) Amplifying the imitation effect for reinforcement learning of UCAV’s mission execution. arXiv preprint arXiv:1901.05856 Lee GT, Kim CO (2019) Amplifying the imitation effect for reinforcement learning of UCAV’s mission execution. arXiv preprint arXiv:​1901.​05856
60.
61.
Zurück zum Zitat Kang C-Y, Chen M-S (2020) Balancing exploration and exploitation in self-imitation learning. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 274–285 Kang C-Y, Chen M-S (2020) Balancing exploration and exploitation in self-imitation learning. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 274–285
62.
Zurück zum Zitat Hao X, Wang W, Hao J, Yang Y (2019) Independent generative adversarial self-imitation learning in cooperative multiagent systems. arXiv preprint arXiv:1909.11468 Hao X, Wang W, Hao J, Yang Y (2019) Independent generative adversarial self-imitation learning in cooperative multiagent systems. arXiv preprint arXiv:​1909.​11468
63.
Zurück zum Zitat Jiang S, Amato C (2021) Multi-agent reinforcement learning with directed exploration and selective memory reuse. In: Proceedings of the 36th annual ACM symposium on applied computing. ACM, pp 777–784 Jiang S, Amato C (2021) Multi-agent reinforcement learning with directed exploration and selective memory reuse. In: Proceedings of the 36th annual ACM symposium on applied computing. ACM, pp 777–784
64.
Zurück zum Zitat Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs. Springer, BerlinCrossRef Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs. Springer, BerlinCrossRef
65.
Zurück zum Zitat Bernstein DS, Givan R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision processes. Math Oper Res 27:819–840MathSciNetCrossRef Bernstein DS, Givan R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision processes. Math Oper Res 27:819–840MathSciNetCrossRef
66.
Zurück zum Zitat Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T et al. (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1928–1937 Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T et al. (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1928–1937
67.
Zurück zum Zitat Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:​1707.​06347
68.
Zurück zum Zitat Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:​2103.​01955
70.
Zurück zum Zitat Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst (30):1–12 Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst (30):1–12
71.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst (30):1–11 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst (30):1–11
72.
Zurück zum Zitat Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:​1412.​3555
73.
Zurück zum Zitat Samvelyan M, Rashid T, De Witt CS, Farquhar G, Nardelli N, Rudner TG, Hung C-M, Torr PH, Foerster J, Whiteson S (2019) The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 Samvelyan M, Rashid T, De Witt CS, Farquhar G, Nardelli N, Rudner TG, Hung C-M, Torr PH, Foerster J, Whiteson S (2019) The starcraft multi-agent challenge. arXiv preprint arXiv:​1902.​04043
74.
Zurück zum Zitat Hu S, Hu J (2021) Noisy-MAPPO: noisy advantage values for cooperative multi-agent actor-critic methods. arXiv e-prints, arXiv:2106.14334 Hu S, Hu J (2021) Noisy-MAPPO: noisy advantage values for cooperative multi-agent actor-critic methods. arXiv e-prints, arXiv:2106.14334
75.
Zurück zum Zitat de Witt CS, Gupta T, Makoviichuk D, Makoviychuk V, Torr PH, Sun M, Whiteson S (2020) Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533 de Witt CS, Gupta T, Makoviichuk D, Makoviychuk V, Torr PH, Sun M, Whiteson S (2020) Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:​2011.​09533
Metadaten
Titel
Multi-agent cooperation policy gradient method based on enhanced exploration for cooperative tasks
verfasst von
Li-yang Zhao
Tian-qing Chang
Lei Zhang
Xin-lu Zhang
Jiang-feng Wang
Publikationsdatum
30.09.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2024
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01976-6

Weitere Artikel der Ausgabe 4/2024

International Journal of Machine Learning and Cybernetics 4/2024 Zur Ausgabe

Neuer Inhalt