Skip to main content
Top
Published in: Artificial Intelligence Review 1/2018

12-01-2018

Iterative ADP learning algorithms for discrete-time multi-player games

Authors: He Jiang, Huaguang Zhang

Published in: Artificial Intelligence Review | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton–Jacobi–Isaacs equations for zero-sum games and a set of coupled Hamilton–Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to \(H_{\infty }\) control. IEEE Trans Syst Man Cybern B Cybern 37(1):240–247CrossRefMATH Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to \(H_{\infty }\) control. IEEE Trans Syst Man Cybern B Cybern 37(1):240–247CrossRefMATH
go back to reference Al-Tamimi A, Lewis FL, Abu-Khalaf M (2007) Model-free Q-learning designs for linear discrete-time zero-sum games with application to \(H_{\infty }\) control. Automatica 43(3):473–481MathSciNetCrossRefMATH Al-Tamimi A, Lewis FL, Abu-Khalaf M (2007) Model-free Q-learning designs for linear discrete-time zero-sum games with application to \(H_{\infty }\) control. Automatica 43(3):473–481MathSciNetCrossRefMATH
go back to reference Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949CrossRef Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949CrossRef
go back to reference Jiang H, Zhang H, Luo Y, Cui X (2017) \(H_\infty \) control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 237:226–234CrossRef Jiang H, Zhang H, Luo Y, Cui X (2017) \(H_\infty \) control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 237:226–234CrossRef
go back to reference Johnson M, Kamalapurkar R, Bhasin S, Dixon WE (2015) Approximate \(N\)-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst 1(3):1645–1658MathSciNetCrossRef Johnson M, Kamalapurkar R, Bhasin S, Dixon WE (2015) Approximate \(N\)-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst 1(3):1645–1658MathSciNetCrossRef
go back to reference Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of \(N\)-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247CrossRef Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of \(N\)-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247CrossRef
go back to reference Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634CrossRef Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634CrossRef
go back to reference Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634CrossRef Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634CrossRef
go back to reference Liu F, Sun J, Si J, Guo W, Mei S (2012) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235CrossRefMATH Liu F, Sun J, Si J, Guo W, Mei S (2012) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235CrossRefMATH
go back to reference Liu D, Li H, Wang D (2013) Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing 110:92–100CrossRef Liu D, Li H, Wang D (2013) Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing 110:92–100CrossRef
go back to reference Liu D, Li H, Wang D (2014) Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics. IEEE Trans Syst Man Cybern Syst 44(8):1015–1027CrossRef Liu D, Li H, Wang D (2014) Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics. IEEE Trans Syst Man Cybern Syst 44(8):1015–1027CrossRef
go back to reference Liu D, Yang X, Wang D, Wei Q (2015) Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans Cybern 45(7):1372–1385CrossRef Liu D, Yang X, Wang D, Wei Q (2015) Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans Cybern 45(7):1372–1385CrossRef
go back to reference Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290MathSciNetCrossRefMATH Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290MathSciNetCrossRefMATH
go back to reference Luo B, Wu HN, Huang T, Liu D (2015) Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Netw 71:150–158CrossRef Luo B, Wu HN, Huang T, Liu D (2015) Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Netw 71:150–158CrossRef
go back to reference Luo B, Wu HN, Huang T (2015) Off-policy reinforcement learning for \(H_\infty \) control design. IEEE Trans Cybern 45(1):65–76CrossRef Luo B, Wu HN, Huang T (2015) Off-policy reinforcement learning for \(H_\infty \) control design. IEEE Trans Cybern 45(1):65–76CrossRef
go back to reference Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144MathSciNetCrossRef Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144MathSciNetCrossRef
go back to reference Mehraeen S, Dierks T, Jagannathan S (2013) Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. IEEE Trans Cybern 43(6):1641–1655CrossRef Mehraeen S, Dierks T, Jagannathan S (2013) Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. IEEE Trans Cybern 43(6):1641–1655CrossRef
go back to reference Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef
go back to reference Sokolov Y, Kozma R, Werbos L, Werbos P (2015) Complete stability analysis of a heuristic approximate dynamic programming control design. Automatica 59:9–18MathSciNetCrossRefMATH Sokolov Y, Kozma R, Werbos L, Werbos P (2015) Complete stability analysis of a heuristic approximate dynamic programming control design. Automatica 59:9–18MathSciNetCrossRefMATH
go back to reference Song R, Lewis FL, Wei Q, Zhang H, Jiang ZP, Levine D (2015) Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans Neural Netw Learn Syst 26(4):851–865MathSciNetCrossRef Song R, Lewis FL, Wei Q, Zhang H, Jiang ZP, Levine D (2015) Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans Neural Netw Learn Syst 26(4):851–865MathSciNetCrossRef
go back to reference Song R, Lewis FL, Wei Q (2017) Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst 28(3):704–713MathSciNetCrossRef Song R, Lewis FL, Wei Q (2017) Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst 28(3):704–713MathSciNetCrossRef
go back to reference Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH
go back to reference Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569MathSciNetCrossRefMATH Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569MathSciNetCrossRefMATH
go back to reference Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47CrossRef Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47CrossRef
go back to reference Wang D, Liu D, Li H, Ma H (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167–179MathSciNetCrossRefMATH Wang D, Liu D, Li H, Ma H (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167–179MathSciNetCrossRefMATH
go back to reference Wang D, Liu D, Li H, Luo B, Ma H (2016) An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybern Syst 46(5):713–717CrossRef Wang D, Liu D, Li H, Luo B, Ma H (2016) An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybern Syst 46(5):713–717CrossRef
go back to reference Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst Man Cybern Syst 46(11):1544–1555CrossRef Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst Man Cybern Syst 46(11):1544–1555CrossRef
go back to reference Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybern 47(10):3429–3451CrossRef Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybern 47(10):3429–3451CrossRef
go back to reference Wang D, Mu C, Liu D, Ma H (2017) On mixed data and event driven design for adaptive-critic-based nonlinear \(H_{\infty }\) control. IEEE Trans Neural Netw Learn Syst 99:1–13 Wang D, Mu C, Liu D, Ma H (2017) On mixed data and event driven design for adaptive-critic-based nonlinear \(H_{\infty }\) control. IEEE Trans Neural Netw Learn Syst 99:1–13
go back to reference Wang D, He H, Mu C, Liu D (2017) Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system. IEEE Trans Ind Electron 64(6):4935–4944CrossRef Wang D, He H, Mu C, Liu D (2017) Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system. IEEE Trans Ind Electron 64(6):4935–4944CrossRef
go back to reference Wei Q, Wang FY, Liu D, Yang X (2014) Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans Cybern 44(12):2820–2833CrossRef Wei Q, Wang FY, Liu D, Yang X (2014) Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans Cybern 44(12):2820–2833CrossRef
go back to reference Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879MathSciNetCrossRef Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879MathSciNetCrossRef
go back to reference Wei Q, Liu D, Lin H (2016) Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern 46(3):840–853CrossRef Wei Q, Liu D, Lin H (2016) Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern 46(3):840–853CrossRef
go back to reference Wei Q, Lewis FL, Liu D, Song R, Lin H (2016) Discrete-time local value iteration adaptive dynamic programming: convergence analysis. IEEE Trans Syst Man Cybern Syst 99:1–17CrossRef Wei Q, Lewis FL, Liu D, Song R, Lin H (2016) Discrete-time local value iteration adaptive dynamic programming: convergence analysis. IEEE Trans Syst Man Cybern Syst 99:1–17CrossRef
go back to reference Wei Q, Liu D, Qiao L, Song R (2017) Adaptive dynamic programming for discrete-time zero-sum games. IEEE Trans Neural Netw Learn Syst 99:1–13CrossRef Wei Q, Liu D, Qiao L, Song R (2017) Adaptive dynamic programming for discrete-time zero-sum games. IEEE Trans Neural Netw Learn Syst 99:1–13CrossRef
go back to reference Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearb 22(6):25–38 Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearb 22(6):25–38
go back to reference Yang X, Liu D, Wei Q, Wang D (2016) Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 198:80–90CrossRef Yang X, Liu D, Wei Q, Wang D (2016) Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 198:80–90CrossRef
go back to reference Yang X, Liu D, Ma H, Xu Y (2016) Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems. Inf Sci 328:435–454CrossRef Yang X, Liu D, Ma H, Xu Y (2016) Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems. Inf Sci 328:435–454CrossRef
go back to reference Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef
go back to reference Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_ {\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718CrossRef Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_ {\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718CrossRef
go back to reference Zhang H, Jiang H, Luo C, Xiao G (2016) Discrete-time nonzero-sum games for multiplayer using policy iteration-based adaptive dynamic programming algorithms. IEEE Trans Cybern 99:1–10 Zhang H, Jiang H, Luo C, Xiao G (2016) Discrete-time nonzero-sum games for multiplayer using policy iteration-based adaptive dynamic programming algorithms. IEEE Trans Cybern 99:1–10
go back to reference Zhang H, Cui X, Luo Y, Jiang H (2017) Finite-horizon \(H_\infty \) tracking control for unknown nonlinear systems with saturating actuators. IEEE Trans Neural Netw Learn Syst 99:1–13 Zhang H, Cui X, Luo Y, Jiang H (2017) Finite-horizon \(H_\infty \) tracking control for unknown nonlinear systems with saturating actuators. IEEE Trans Neural Netw Learn Syst 99:1–13
go back to reference Zhao D, Zhu Y (2015) MEC—a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst 26(2):346–356MathSciNetCrossRef Zhao D, Zhu Y (2015) MEC—a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst 26(2):346–356MathSciNetCrossRef
go back to reference Zhao D, Xia Z, Wang D (2015) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468CrossRef Zhao D, Xia Z, Wang D (2015) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468CrossRef
go back to reference Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865CrossRef Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865CrossRef
go back to reference Zhu Y, Zhao D, Li X (2016) Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl 10(12):1339–1347MathSciNetCrossRef Zhu Y, Zhao D, Li X (2016) Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl 10(12):1339–1347MathSciNetCrossRef
go back to reference Zhu Y, Zhao D, He H, Ji J (2017) Event-triggered optimal control for partially-unknown constrained-input systems via adaptive dynamic programming. IEEE Trans Ind Electron 64(5):4101–4109CrossRef Zhu Y, Zhao D, He H, Ji J (2017) Event-triggered optimal control for partially-unknown constrained-input systems via adaptive dynamic programming. IEEE Trans Ind Electron 64(5):4101–4109CrossRef
Metadata
Title
Iterative ADP learning algorithms for discrete-time multi-player games
Authors
He Jiang
Huaguang Zhang
Publication date
12-01-2018
Publisher
Springer Netherlands
Published in
Artificial Intelligence Review / Issue 1/2018
Print ISSN: 0269-2821
Electronic ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-017-9603-1

Other articles of this Issue 1/2018

Artificial Intelligence Review 1/2018 Go to the issue

Premium Partner