Skip to main content
Top

26-04-2024 | Original Article

Air combat maneuver decision based on deep reinforcement learning with auxiliary reward

Authors: Tingyu Zhang, Yongshuai Wang, Mingwei Sun, Zengqiang Chen

Published in: Neural Computing and Applications

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

For air combat maneuvering decision, the sparse reward during the application of deep reinforcement learning limits the exploration efficiency of the agents. To address this challenge, we propose an auxiliary reward function considering the impact of angle, range, and altitude. Furthermore, we investigate the influences of the network nodes, layers, and the learning rate on decision system, and reasonable parameter ranges are provided, which can serve as a guideline. Finally, four typical air combat scenarios demonstrate good adaptability and effectiveness of the proposed scheme, and the auxiliary reward significantly improves the learning ability of deep Q network (DQN) by leading the agents to explore more intently. Compared with the original deep deterministic policy gradient and soft actor critic algorithm, the proposed method exhibits superior exploration capability with higher reward, indicating that the trained agent can adapt to different air combats with good performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664CrossRef Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664CrossRef
2.
go back to reference Liu H, Meng Q, Peng F, Lewis FL (2020) Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning. Neurocomputing 412:63–71CrossRef Liu H, Meng Q, Peng F, Lewis FL (2020) Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning. Neurocomputing 412:63–71CrossRef
3.
go back to reference Zhou K, Wei R, Xu Z (2020) An air combat decision learning system based on a brain-like cognitive mechanism. Cogn Comput 12:128–139CrossRef Zhou K, Wei R, Xu Z (2020) An air combat decision learning system based on a brain-like cognitive mechanism. Cogn Comput 12:128–139CrossRef
4.
go back to reference Trotta A, Felice MD, Montori F, Chowdhury KR, Bononi L (2018) Joint coverage, connectivity, and charging strategies for distributed UAV networks. IEEE Trans Robot 34:883–900CrossRef Trotta A, Felice MD, Montori F, Chowdhury KR, Bononi L (2018) Joint coverage, connectivity, and charging strategies for distributed UAV networks. IEEE Trans Robot 34:883–900CrossRef
5.
go back to reference Sun Z, Wu H, Shi Y, Yu X, Gao Y, Pei W, Yang Z, Piao H, Hou Y (2023) Multi-agent air combat with two-stage graph-attention communication. Neural Comput Appl 35:19765–19781CrossRef Sun Z, Wu H, Shi Y, Yu X, Gao Y, Pei W, Yang Z, Piao H, Hou Y (2023) Multi-agent air combat with two-stage graph-attention communication. Neural Comput Appl 35:19765–19781CrossRef
6.
go back to reference Shin H, Lee J, Kim H, Hyunchul Shim D (2018) An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp Sci Technol 72:305–315CrossRef Shin H, Lee J, Kim H, Hyunchul Shim D (2018) An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp Sci Technol 72:305–315CrossRef
7.
go back to reference Maravall Lope J, Fuentes JP (2015) Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks. Neurocomputing 151:101–107CrossRef Maravall Lope J, Fuentes JP (2015) Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks. Neurocomputing 151:101–107CrossRef
8.
go back to reference Dai X, Mao Y, Huang T (2020) Automatic obstacle avoidance of quadrotor UAV via CNN-based learning. Neurocomputing 402:346–358CrossRef Dai X, Mao Y, Huang T (2020) Automatic obstacle avoidance of quadrotor UAV via CNN-based learning. Neurocomputing 402:346–358CrossRef
9.
go back to reference Wang M, Wang L, Yue T, Liu H (2020) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534CrossRef Wang M, Wang L, Yue T, Liu H (2020) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534CrossRef
10.
go back to reference Zhou K, Wei R, Xu Z, Zhang Q (2018) (2018) A brain like air combat learning system inspired by human learning mechanism. In: Proceedings of IEEE CSAA guidance, navigation and control conference (CGNCC). IEEE, Xiamen, pp 1–6 Zhou K, Wei R, Xu Z, Zhang Q (2018) (2018) A brain like air combat learning system inspired by human learning mechanism. In: Proceedings of IEEE CSAA guidance, navigation and control conference (CGNCC). IEEE, Xiamen, pp 1–6
11.
go back to reference Wang X, Guo K, Chao T, Wang S (2022) Design of differential game guidance law for dual defense aircrafts. In: Proceedings of 2022 5th international symposium on autonomous systems (ISAS). IEEE, Hangzhou, pp 1–6 Wang X, Guo K, Chao T, Wang S (2022) Design of differential game guidance law for dual defense aircrafts. In: Proceedings of 2022 5th international symposium on autonomous systems (ISAS). IEEE, Hangzhou, pp 1–6
12.
go back to reference Weintraub IE, Pachter M, Garcia E (2020) (2020) An introduction to pursuit-evasion differential games. In: Proceedings of American control conference (ACC). IEEE, Denver, pp 1049–1066 Weintraub IE, Pachter M, Garcia E (2020) (2020) An introduction to pursuit-evasion differential games. In: Proceedings of American control conference (ACC). IEEE, Denver, pp 1049–1066
13.
go back to reference Ruan W, Sun Y, Deng Y, Duan H (2023) Hawk-pigeon game tactics for unmanned aerial vehicle swarm target defense. IEEE Trans Ind Inform 19:11619–11629CrossRef Ruan W, Sun Y, Deng Y, Duan H (2023) Hawk-pigeon game tactics for unmanned aerial vehicle swarm target defense. IEEE Trans Ind Inform 19:11619–11629CrossRef
14.
go back to reference Ma Y, Wang G, Hu X, Luo H, Lei X (2020) Cooperative occupancy decision making of multi-UAV in beyond-visual-range air combat: a game theory approach. IEEE Access 8:11624–11634CrossRef Ma Y, Wang G, Hu X, Luo H, Lei X (2020) Cooperative occupancy decision making of multi-UAV in beyond-visual-range air combat: a game theory approach. IEEE Access 8:11624–11634CrossRef
15.
go back to reference Kang Y, Pu Z, Liu Z (2020) (2020) Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of international conference on guidance, navigation and control (ICGNC). Springer, Tianjin, pp 3699–3709 Kang Y, Pu Z, Liu Z (2020) (2020) Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of international conference on guidance, navigation and control (ICGNC). Springer, Tianjin, pp 3699–3709
16.
go back to reference Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448CrossRef Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448CrossRef
17.
go back to reference Sharma R (2014) (2014) Fuzzy Q learning based UAV autopilot. In: Proceedings of innovative applications of computational intelligence on power, energy and controls with their impact on humanity (CIPECH). IEEE, Ghaziabad, pp 29–33 Sharma R (2014) (2014) Fuzzy Q learning based UAV autopilot. In: Proceedings of innovative applications of computational intelligence on power, energy and controls with their impact on humanity (CIPECH). IEEE, Ghaziabad, pp 29–33
18.
go back to reference Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping Cohen–Grossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415CrossRef Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping Cohen–Grossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415CrossRef
19.
go back to reference Du B, Liu Y, Atiatallah Abbas I (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Frankl Inst 353:448–461MathSciNetCrossRef Du B, Liu Y, Atiatallah Abbas I (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Frankl Inst 353:448–461MathSciNetCrossRef
20.
go back to reference Emuna R, Duffney R, Borowsky A, Biess A (2022) Example-guided learning of stochastic human driving policies using deep reinforcement learning. Neural Comput Appl 35:16791–16804CrossRef Emuna R, Duffney R, Borowsky A, Biess A (2022) Example-guided learning of stochastic human driving policies using deep reinforcement learning. Neural Comput Appl 35:16791–16804CrossRef
21.
go back to reference Kiani F, Saraç ÖF (2023) A novel intelligent traffic recovery model for emergency vehicles based on context-aware reinforcement learning. Inf Sci 619:288–309CrossRef Kiani F, Saraç ÖF (2023) A novel intelligent traffic recovery model for emergency vehicles based on context-aware reinforcement learning. Inf Sci 619:288–309CrossRef
22.
go back to reference Damadam S, Zourbakhsh M, Javidan R, Faroughi A (2022) An intelligent IoT based traffic light management system: deep reinforcement learning. Smart Cities 5:1293–1311CrossRef Damadam S, Zourbakhsh M, Javidan R, Faroughi A (2022) An intelligent IoT based traffic light management system: deep reinforcement learning. Smart Cities 5:1293–1311CrossRef
23.
go back to reference Zhu R, Li L, Wu S, Lv P, Li Y, Xu M (2023) Multi-agent broad reinforcement learning for intelligent traffic light control. Inf Sci 619:509–525CrossRef Zhu R, Li L, Wu S, Lv P, Li Y, Xu M (2023) Multi-agent broad reinforcement learning for intelligent traffic light control. Inf Sci 619:509–525CrossRef
24.
go back to reference Du G, Zou Y, Zhang X, Liu T, Wu J, He D (2020) Deep reinforcement learning based energy management for a hybrid electric vehicle. Energy 201:117591CrossRef Du G, Zou Y, Zhang X, Liu T, Wu J, He D (2020) Deep reinforcement learning based energy management for a hybrid electric vehicle. Energy 201:117591CrossRef
25.
go back to reference Yang D, Karimi HR, Pawelczyk M (2023) A new intelligent fault diagnosis framework for rotating machinery based on deep transfer reinforcement learning. Control Eng Pract 134:105475CrossRef Yang D, Karimi HR, Pawelczyk M (2023) A new intelligent fault diagnosis framework for rotating machinery based on deep transfer reinforcement learning. Control Eng Pract 134:105475CrossRef
26.
go back to reference Liu Q, Shi L, Sun L, Li J, Ding M, Shu FS (2020) Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Trans Veh Technol 69:5723–5728CrossRef Liu Q, Shi L, Sun L, Li J, Ding M, Shu FS (2020) Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Trans Veh Technol 69:5723–5728CrossRef
27.
go back to reference Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305CrossRef Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305CrossRef
28.
go back to reference Leong AS, Ramaswamy A, Quevedo DE, Karl H (2020) Deep reinforcement learning for wireless sensor scheduling in cyber-physical system. Automatic 113:108759MathSciNetCrossRef Leong AS, Ramaswamy A, Quevedo DE, Karl H (2020) Deep reinforcement learning for wireless sensor scheduling in cyber-physical system. Automatic 113:108759MathSciNetCrossRef
29.
go back to reference Liessner R, Schmitt J, Dietermann A, Bäker B (2019) Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In: Proceedings of 11th international conference on agents artificial intelligence SCITEPRESS—science and technology publications, Prague, pp 134–144 Liessner R, Schmitt J, Dietermann A, Bäker B (2019) Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In: Proceedings of 11th international conference on agents artificial intelligence SCITEPRESS—science and technology publications, Prague, pp 134–144
30.
go back to reference Chen Y, Zhang J, Yang Q, Zhou Y, Shi G, Wu Y (2020) Design and verification of UAV maneuver decision Simulation system based on deep Q-learning network. In: Proceedings of 2020 16th international conference on control, automation, robotics and vision (ICARCV). IEEE, Shenzhen, pp 817–823 Chen Y, Zhang J, Yang Q, Zhou Y, Shi G, Wu Y (2020) Design and verification of UAV maneuver decision Simulation system based on deep Q-learning network. In: Proceedings of 2020 16th international conference on control, automation, robotics and vision (ICARCV). IEEE, Shenzhen, pp 817–823
31.
go back to reference Cao Y, Kou Y-X, Li Z-W, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023:1–20CrossRef Cao Y, Kou Y-X, Li Z-W, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023:1–20CrossRef
32.
go back to reference Zhang J, Yu Y, Zheng L, Yang Q, Shi G, Wu Y (2023) Situational continuity-based air combat autonomous maneuvering decision-making. Def Technol 29:66–79CrossRef Zhang J, Yu Y, Zheng L, Yang Q, Shi G, Wu Y (2023) Situational continuity-based air combat autonomous maneuvering decision-making. Def Technol 29:66–79CrossRef
33.
go back to reference Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) UAV air combat autonomous maneuver decision based on DDPG algorithm. In: 2019 IEEE 15th international conference on control automation. ICCA. IEEE, Edinburgh, pp 37–42 Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) UAV air combat autonomous maneuver decision based on DDPG algorithm. In: 2019 IEEE 15th international conference on control automation. ICCA. IEEE, Edinburgh, pp 37–42
34.
go back to reference Zhang J, Yang Q, Shi G (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32:1421–1438CrossRef Zhang J, Yang Q, Shi G (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32:1421–1438CrossRef
35.
go back to reference Wang Z, Guo Y, Li N, Hu S, Wang M (2023) Autonomous collaborative combat strategy of unmanned system group in continuous dynamic environment based on PD-MADDPG. Comput Commun 200:182–204CrossRef Wang Z, Guo Y, Li N, Hu S, Wang M (2023) Autonomous collaborative combat strategy of unmanned system group in continuous dynamic environment based on PD-MADDPG. Comput Commun 200:182–204CrossRef
36.
go back to reference Li L, Zhang X, Qian C et al (2023) Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput Appl 2023:1–17 Li L, Zhang X, Qian C et al (2023) Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput Appl 2023:1–17
37.
go back to reference Wang Z, Li H, Wu Z, Wu H (2021) A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int J Adv Robot Syst 18:172988142198954CrossRef Wang Z, Li H, Wu Z, Wu H (2021) A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int J Adv Robot Syst 18:172988142198954CrossRef
38.
go back to reference Liu X, Yin Y, Su Y, Ming R (2022) A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563–582CrossRef Liu X, Yin Y, Su Y, Ming R (2022) A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563–582CrossRef
39.
go back to reference Xu J, Zhang J, Yang L, Liu C (2022) Autonomous decision-making for dogfights based on a tactical pursuit point approach. Aerosp Sci Technol 129:107857CrossRef Xu J, Zhang J, Yang L, Liu C (2022) Autonomous decision-making for dogfights based on a tactical pursuit point approach. Aerosp Sci Technol 129:107857CrossRef
40.
go back to reference Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol 8:1608–1619CrossRef Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol 8:1608–1619CrossRef
41.
go back to reference Li B, Huang J, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2023) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81CrossRef Li B, Huang J, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2023) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81CrossRef
42.
go back to reference Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97CrossRef Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97CrossRef
43.
go back to reference Johnson J (2023) Automating the OODA loop in the age of intelligent machines: reaffirming the role of humans in command-and-control decision-making in the digital age. Def Stud 23:43–67CrossRef Johnson J (2023) Automating the OODA loop in the age of intelligent machines: reaffirming the role of humans in command-and-control decision-making in the digital age. Def Stud 23:43–67CrossRef
44.
go back to reference Wang LX, Guo YG, Zhang Q, Yue T (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30:881–897CrossRef Wang LX, Guo YG, Zhang Q, Yue T (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30:881–897CrossRef
45.
go back to reference Li Y, Lyu Y, Shi J, Li W (2022) Autonomous maneuver decision of air combat based on simulated operation command and FRV-DDPG algorithm. Aerospace 9:658–676CrossRef Li Y, Lyu Y, Shi J, Li W (2022) Autonomous maneuver decision of air combat based on simulated operation command and FRV-DDPG algorithm. Aerospace 9:658–676CrossRef
46.
go back to reference Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. In: Guidance, navigation and control conference, pp 2393 Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. In: Guidance, navigation and control conference, pp 2393
Metadata
Title
Air combat maneuver decision based on deep reinforcement learning with auxiliary reward
Authors
Tingyu Zhang
Yongshuai Wang
Mingwei Sun
Zengqiang Chen
Publication date
26-04-2024
Publisher
Springer London
Published in
Neural Computing and Applications
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-024-09720-z

Premium Partner