Skip to main content
Erschienen in: Neural Processing Letters 1/2022

19.10.2021

Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm

verfasst von: Haoen Shi, Yanghe Feng, Chaoxu Mu, Yunkai Wu

Erschienen in: Neural Processing Letters | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper developes a novel model-free dual heuristic dynamic programming (DHP) algorithm combined with policy iteration and least square techniques to implement optimal consensus control of discrete-time multi-agent systems. The coupled Hamilton-Jacobi-Bellman (HJB) equations are required to be solved to achieve optimal consensus control, which is generally difficult especially under the case of unknown mathematical models. To overcome above difficulties, the DHP method is carried out by reinforcement learning utilizing online collected data rather than the accurate system dynamics. First, the performance index and corresponding Bellman equation are acquired. Each agent’s value function has quadratic form. Then, a model network is employed to approximate the accurate system dynamics. The Q-function Bellman equation is obtained next. By taking the derivative of Q-function, the DHP method is applied to construct the update formula. Convergence and stability analysis of proposed algorithm are presented. Two simulation examples are provided to illustrate the validity of the proposed algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dong XW, Zhou Y, Zhang R, Zhong YS (2016) Time-varying formation control for unmanned aerial vehicles with switching interaction topologies. Control Eng Pract 46:26–36CrossRef Dong XW, Zhou Y, Zhang R, Zhong YS (2016) Time-varying formation control for unmanned aerial vehicles with switching interaction topologies. Control Eng Pract 46:26–36CrossRef
2.
Zurück zum Zitat Ge XH, Han QL, Zhang XM (2018) Achieving cluster formation of multi-agent systems under aperiodic sampling and communication delays. IEEE Trans Ind Electron 65(4):3417–3426CrossRef Ge XH, Han QL, Zhang XM (2018) Achieving cluster formation of multi-agent systems under aperiodic sampling and communication delays. IEEE Trans Ind Electron 65(4):3417–3426CrossRef
3.
Zurück zum Zitat Su HS, Zhang NZ, Chen MZQ, Wang HW, Wang XF (2013) Adaptive flocking with a virtual leader of multiple agents governed by locally Lipschitz nonlinearity. Nonlinear Anal Real World Appl 14(1):310–325MathSciNetCrossRef Su HS, Zhang NZ, Chen MZQ, Wang HW, Wang XF (2013) Adaptive flocking with a virtual leader of multiple agents governed by locally Lipschitz nonlinearity. Nonlinear Anal Real World Appl 14(1):310–325MathSciNetCrossRef
4.
Zurück zum Zitat Ding L, Han QL, Ge XH, Zhang XM (2018) An overview of recent advances in event-triggered consensus of multiagent systems. IEEE Trans Cybern 48(4):1110–1123CrossRef Ding L, Han QL, Ge XH, Zhang XM (2018) An overview of recent advances in event-triggered consensus of multiagent systems. IEEE Trans Cybern 48(4):1110–1123CrossRef
5.
Zurück zum Zitat Lin J, Morse AS, Anderson BDO (2004) The multi-agent rendezvous problem—the asynchronous case. In: 43rd IEEE conference on decision and control, pp 1926–1931 Lin J, Morse AS, Anderson BDO (2004) The multi-agent rendezvous problem—the asynchronous case. In: 43rd IEEE conference on decision and control, pp 1926–1931
6.
Zurück zum Zitat Olfati-Saber R, Murray RM (2004) Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Control 49(9):1520–1533MathSciNetCrossRef Olfati-Saber R, Murray RM (2004) Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Control 49(9):1520–1533MathSciNetCrossRef
7.
Zurück zum Zitat Cao YC, Yu WW, Ren W, Chen GR (2013) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438CrossRef Cao YC, Yu WW, Ren W, Chen GR (2013) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438CrossRef
8.
Zurück zum Zitat Abouheaf MI, Lewis FL, Vamvoudakis KG, Haesaert S, Babuska R (2014) Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica 50(12):3038–3053MathSciNetCrossRef Abouheaf MI, Lewis FL, Vamvoudakis KG, Haesaert S, Babuska R (2014) Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica 50(12):3038–3053MathSciNetCrossRef
9.
Zurück zum Zitat Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50CrossRef Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50CrossRef
10.
Zurück zum Zitat Zhang HG, Luo YH, Liu DR (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503CrossRef Zhang HG, Luo YH, Liu DR (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503CrossRef
11.
Zurück zum Zitat Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792MathSciNetCrossRef Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792MathSciNetCrossRef
12.
Zurück zum Zitat Abu-Khalaf M, Lewis FL (2008) Neuro dynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252CrossRef Abu-Khalaf M, Lewis FL (2008) Neuro dynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252CrossRef
13.
Zurück zum Zitat Shi J, Yue D, Xie XP, Karimpour A, Naghibi-Sistani MB (2020) Adaptive optimal tracking control for nonlinear continuous-time systems with time delay using value iteration algorithm. Neurocomputing 396:172–178CrossRef Shi J, Yue D, Xie XP, Karimpour A, Naghibi-Sistani MB (2020) Adaptive optimal tracking control for nonlinear continuous-time systems with time delay using value iteration algorithm. Neurocomputing 396:172–178CrossRef
14.
Zurück zum Zitat Wei QL, Zhang HG, Liu DR (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Autom Sin 36(1):121–129MathSciNetCrossRef Wei QL, Zhang HG, Liu DR (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Autom Sin 36(1):121–129MathSciNetCrossRef
15.
Zurück zum Zitat Kiumarsi B, Lewis FL, Naghibi-Sistani MB, Karimpour A (2015) Optimal tracking control of unknown discrete-time linear systems using input–output measured data. IEEE Trans Cybern 45(12):2770–2779CrossRef Kiumarsi B, Lewis FL, Naghibi-Sistani MB, Karimpour A (2015) Optimal tracking control of unknown discrete-time linear systems using input–output measured data. IEEE Trans Cybern 45(12):2770–2779CrossRef
16.
Zurück zum Zitat Mu CX, Zhao Q, Sun CY, Gao ZK (2019) An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Appl Soft Comput 82:1–13CrossRef Mu CX, Zhao Q, Sun CY, Gao ZK (2019) An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Appl Soft Comput 82:1–13CrossRef
17.
Zurück zum Zitat Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175MathSciNetCrossRef Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175MathSciNetCrossRef
18.
Zurück zum Zitat Wei QL, Song RZ, Yan PF (2016) Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans Neural Netw 27(2):444–458MathSciNetCrossRef Wei QL, Song RZ, Yan PF (2016) Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans Neural Netw 27(2):444–458MathSciNetCrossRef
19.
Zurück zum Zitat Vamvoudakis K, Lewis FL (2012) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int J Robust Nonlinear Control 22(13):1460–1483MathSciNetCrossRef Vamvoudakis K, Lewis FL (2012) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int J Robust Nonlinear Control 22(13):1460–1483MathSciNetCrossRef
20.
Zurück zum Zitat Wen YL, Zhang HG, Su HG, Ren H (2020) Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning. Opt Control Appl Methods 41(4):1233–1250MathSciNetCrossRef Wen YL, Zhang HG, Su HG, Ren H (2020) Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning. Opt Control Appl Methods 41(4):1233–1250MathSciNetCrossRef
21.
Zurück zum Zitat Zhang HG, Cui LL, Luo YH (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single network ADP. IEEE Trans Cybern 43(1):206–216CrossRef Zhang HG, Cui LL, Luo YH (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single network ADP. IEEE Trans Cybern 43(1):206–216CrossRef
22.
Zurück zum Zitat Mu CX, Sun CY, Song AG, Yu HL (2016) Iterative GDHP-based approxiamte optimal tracking control for a class of discrete-time nonlinear systems. Neurocomputing 214:775–784CrossRef Mu CX, Sun CY, Song AG, Yu HL (2016) Iterative GDHP-based approxiamte optimal tracking control for a class of discrete-time nonlinear systems. Neurocomputing 214:775–784CrossRef
23.
Zurück zum Zitat Zhang HW, Lewis FL (2012) Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7):1432–1439MathSciNetCrossRef Zhang HW, Lewis FL (2012) Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7):1432–1439MathSciNetCrossRef
24.
Zurück zum Zitat Zhang K, Zhang HG, Gao ZY, Su HG (2018) Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures. J Frankl Inst 355(15):6947–6968MathSciNetCrossRef Zhang K, Zhang HG, Gao ZY, Su HG (2018) Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures. J Frankl Inst 355(15):6947–6968MathSciNetCrossRef
25.
Zurück zum Zitat Li MH, Gao X, Wen Y, Si J, Huang H (2019) Offline policy iteration based reinforcement learning controller for online robotic knee prosthesis parameter tuning. In: 2019 International conference on robotics and automation (ICRA), pp 2831–2837 Li MH, Gao X, Wen Y, Si J, Huang H (2019) Offline policy iteration based reinforcement learning controller for online robotic knee prosthesis parameter tuning. In: 2019 International conference on robotics and automation (ICRA), pp 2831–2837
26.
Zurück zum Zitat Vamvoudakis K, Lewis FL, Hudas G (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611MathSciNetCrossRef Vamvoudakis K, Lewis FL, Hudas G (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611MathSciNetCrossRef
27.
Zurück zum Zitat Abouheaf M, Lewis FL (2013) Multi-agent differential graphical games: Nash online adaptive learning solutions. In: 52nd IEEE annual conference on decision and control (CDC), pp 5803–5809 Abouheaf M, Lewis FL (2013) Multi-agent differential graphical games: Nash online adaptive learning solutions. In: 52nd IEEE annual conference on decision and control (CDC), pp 5803–5809
28.
Zurück zum Zitat Zhang HG, Zhang JL, Yang GH, Luo YH (2015) Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 23(1):152–163CrossRef Zhang HG, Zhang JL, Yang GH, Luo YH (2015) Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 23(1):152–163CrossRef
29.
Zurück zum Zitat Wei QL, Liu DR, Lewis FL (2015) Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci 317:96–113CrossRef Wei QL, Liu DR, Lewis FL (2015) Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci 317:96–113CrossRef
30.
Zurück zum Zitat Abouheaf M, Lewis FL, Haesaert S, Babuska R, Vamvoudakis K (2013) Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. In: 2013 American control conference (ACC), pp 4189–4195 Abouheaf M, Lewis FL, Haesaert S, Babuska R, Vamvoudakis K (2013) Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. In: 2013 American control conference (ACC), pp 4189–4195
31.
Zurück zum Zitat Wang CY, Zuo ZY, Sun JY, Yang J, Ding ZT (2017) Consensus disturbance rejection for Lipschitz nonlinear multi-agent systems with input delay: a DOBC approach. J Frankl Inst 354(1):298–315MathSciNetCrossRef Wang CY, Zuo ZY, Sun JY, Yang J, Ding ZT (2017) Consensus disturbance rejection for Lipschitz nonlinear multi-agent systems with input delay: a DOBC approach. J Frankl Inst 354(1):298–315MathSciNetCrossRef
32.
Zurück zum Zitat Zhang HG, Jiang H, Luo YH, Xiao GY (2017) Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron 64(5):4091–4100CrossRef Zhang HG, Jiang H, Luo YH, Xiao GY (2017) Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron 64(5):4091–4100CrossRef
33.
Zurück zum Zitat Zhang J, Wang Z, Zhang H (2019) Data-based optimal control of multiagent systems: a reinforcement learning design approach. IEEE Trans Cybern 49(12):4441–4449CrossRef Zhang J, Wang Z, Zhang H (2019) Data-based optimal control of multiagent systems: a reinforcement learning design approach. IEEE Trans Cybern 49(12):4441–4449CrossRef
34.
Zurück zum Zitat Mu CX, Zhao Q, Gao ZK, Sun CY (2019) Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Frankl Inst Eng Appl Math 356(13):6946–6967MathSciNetCrossRef Mu CX, Zhao Q, Gao ZK, Sun CY (2019) Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Frankl Inst Eng Appl Math 356(13):6946–6967MathSciNetCrossRef
35.
Zurück zum Zitat Abouheaf MI, Lewis FL, Mahmoud MS (2019) Action dependent dual heuristic programming solution for the dynamic graphical games. In: 2018 IEEE conference on decision and control (CDC), pp 2741–2746 Abouheaf MI, Lewis FL, Mahmoud MS (2019) Action dependent dual heuristic programming solution for the dynamic graphical games. In: 2018 IEEE conference on decision and control (CDC), pp 2741–2746
36.
Zurück zum Zitat Khoo S, Xie L, Man Z (2009) Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Trans Mechatron 14(2):219–228CrossRef Khoo S, Xie L, Man Z (2009) Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Trans Mechatron 14(2):219–228CrossRef
37.
Zurück zum Zitat Abouheaf MI, Lewis FL, Mahmoud MS, Mikulski DG (2015) Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol 13(1):55–69MathSciNetCrossRef Abouheaf MI, Lewis FL, Mahmoud MS, Mikulski DG (2015) Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol 13(1):55–69MathSciNetCrossRef
38.
Zurück zum Zitat Tijs S (2003) Introduction to game theory. Hindustan Book Agency, GurgaonCrossRef Tijs S (2003) Introduction to game theory. Hindustan Book Agency, GurgaonCrossRef
39.
Zurück zum Zitat Modares H, Lewis FL, Naghibi-Sistani M (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525CrossRef Modares H, Lewis FL, Naghibi-Sistani M (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525CrossRef
40.
Zurück zum Zitat Rehan M, Ahn CK, Chadli M (2020) Consensus of one-sided lipschitz multi-agents under input saturation. IEEE Trans Circuits Syst II Exp 67(4):745–749 Rehan M, Ahn CK, Chadli M (2020) Consensus of one-sided lipschitz multi-agents under input saturation. IEEE Trans Circuits Syst II Exp 67(4):745–749
41.
Zurück zum Zitat Razaq MA, Rehan M, Tufail M, Ahn CK (2020) Multiple Lyapunov functions approach for consensus of one-sided Lipschitz multi-agents over switching topologies and input saturation. IEEE Trans Circuits Syst II Exp 67(12):3267–3271CrossRef Razaq MA, Rehan M, Tufail M, Ahn CK (2020) Multiple Lyapunov functions approach for consensus of one-sided Lipschitz multi-agents over switching topologies and input saturation. IEEE Trans Circuits Syst II Exp 67(12):3267–3271CrossRef
Metadaten
Titel
Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm
verfasst von
Haoen Shi
Yanghe Feng
Chaoxu Mu
Yunkai Wu
Publikationsdatum
19.10.2021
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 1/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10641-4

Weitere Artikel der Ausgabe 1/2022

Neural Processing Letters 1/2022 Zur Ausgabe

Neuer Inhalt