Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 6/2016

01.12.2016 | Original Article

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

verfasst von: Sholeh Yasini, Mohammad Bagher Naghibi Sitani, Ali Kirampor

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents an online adaptive optimal control method based on reinforcement learning to solve the multi-agent nonzero-sum (NZS) differential games of nonlinear constrained-input continuous-time systems. A non-quadratic cost functional associated with each agent is employed to encode the saturation nonlinearity into the NZS game. The algorithm is implemented as a separate actor-critic neural network (NN) structure for every participant in the game, where adaptation of both NNs is performed simultaneously and continuously. The technique of concurrent learning is utilized to obtain novel update laws for the critic NN weights. That is, recorded data and current data are used concurrently for adaptation of the critic NN weights. This results in an algorithm where an easier and verifiable condition is sufficient for parameter convergence rather than the restrictive persistence of excitation (PE) condition. The stability of the closed-loop systems is guaranteed and the convergence to the Nash equilibrium solution of the game is shown. Simulation results show the effectiveness of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Shah V (1998) Power control for wireless data services based on utility and pricing. Dissertation, Rutgers University Shah V (1998) Power control for wireless data services based on utility and pricing. Dissertation, Rutgers University
2.
Zurück zum Zitat Mukaidani H (2007) Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems. J Appl Math Comput 188(1):103–115MathSciNetCrossRefMATH Mukaidani H (2007) Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems. J Appl Math Comput 188(1):103–115MathSciNetCrossRefMATH
3.
4.
Zurück zum Zitat Starr A, Ho Y (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):148–206CrossRefMATH Starr A, Ho Y (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):148–206CrossRefMATH
5.
Zurück zum Zitat Basar T, Olsder GJ (1998) Dynamic Noncooperative Game Theory, 2nd edn. SIAM, PhiladelphiaCrossRefMATH Basar T, Olsder GJ (1998) Dynamic Noncooperative Game Theory, 2nd edn. SIAM, PhiladelphiaCrossRefMATH
6.
Zurück zum Zitat Li T, Gajic Z (1994) Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games. New Trends Dynam Appl. Birkhäuser, Boston, pp 489–494 Li T, Gajic Z (1994) Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games. New Trends Dynam Appl. Birkhäuser, Boston, pp 489–494
7.
Zurück zum Zitat Freiling G, Jank G, Abou-Kandil H (2002) On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Autom Control 41(2):264–269MathSciNetCrossRefMATH Freiling G, Jank G, Abou-Kandil H (2002) On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Autom Control 41(2):264–269MathSciNetCrossRefMATH
8.
Zurück zum Zitat Jungers M, De Pieri E, Abu-Kandil H (2007) Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach. Int J Tomography Stat 7:49–54MathSciNet Jungers M, De Pieri E, Abu-Kandil H (2007) Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach. Int J Tomography Stat 7:49–54MathSciNet
9.
Zurück zum Zitat Sutton R (1988) Learning to predictive by the method of temporal differences. Mach Learn 3(1):9–44 Sutton R (1988) Learning to predictive by the method of temporal differences. Mach Learn 3(1):9–44
10.
Zurück zum Zitat Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50MathSciNetCrossRef Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50MathSciNetCrossRef
11.
Zurück zum Zitat Lewis FL, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control. IEEE Control Syst 32(6):76–105MathSciNetCrossRef Lewis FL, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control. IEEE Control Syst 32(6):76–105MathSciNetCrossRef
12.
Zurück zum Zitat Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control. Multiscience Press, Brentwood Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control. Multiscience Press, Brentwood
13.
Zurück zum Zitat Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef
14.
Zurück zum Zitat Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic Programming. Athena Scientific, MAMATH Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic Programming. Athena Scientific, MAMATH
15.
Zurück zum Zitat Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246CrossRefMATH Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246CrossRefMATH
16.
Zurück zum Zitat Vamvoudakis K, Lewis FL (2010) Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH Vamvoudakis K, Lewis FL (2010) Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH
17.
Zurück zum Zitat Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis FL, Dixon WD (2012) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92MathSciNetCrossRefMATH Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis FL, Dixon WD (2012) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92MathSciNetCrossRefMATH
18.
Zurück zum Zitat Modares H, Lewis FL, Naghibi Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525CrossRef Modares H, Lewis FL, Naghibi Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525CrossRef
19.
Zurück zum Zitat Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360MathSciNetCrossRefMATH Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360MathSciNetCrossRefMATH
20.
Zurück zum Zitat Vamvoudakis K, Lewis FL (2010) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc. 49th IEEE CDC, pp 3040-3047 Vamvoudakis K, Lewis FL (2010) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc. 49th IEEE CDC, pp 3040-3047
21.
Zurück zum Zitat Modares H, Lewis FL, Naghibi Sistani MB (2014) Online solution of nonquadratic two-player zero-sum games arising in the H ∞ control of constrained input systems. Int J Adapt Cont Sig Proc 28(3–5):232–254MathSciNetCrossRefMATH Modares H, Lewis FL, Naghibi Sistani MB (2014) Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems. Int J Adapt Cont Sig Proc 28(3–5):232–254MathSciNetCrossRefMATH
22.
Zurück zum Zitat Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC, pp 142–147 Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC, pp 142–147
23.
Zurück zum Zitat Vrabie D, Lewis FL (2010) Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC, pp 3066–3071 Vrabie D, Lewis FL (2010) Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC, pp 3066–3071
24.
Zurück zum Zitat Vamvoudakis K, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569MathSciNetCrossRefMATH Vamvoudakis K, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569MathSciNetCrossRefMATH
25.
Zurück zum Zitat Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 45(1):206–216CrossRef Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 45(1):206–216CrossRef
26.
Zurück zum Zitat Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRefMATH Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRefMATH
27.
Zurück zum Zitat Abu-Khalaf M, Lewis FL, Huang J (2008) Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252CrossRef Abu-Khalaf M, Lewis FL, Huang J (2008) Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252CrossRef
28.
Zurück zum Zitat Chowdhary GV (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Dissertation, Georgia Institute of Technology Chowdhary GV (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Dissertation, Georgia Institute of Technology
29.
Zurück zum Zitat Modares H, Lewis FL, Naghibi Sistani MB, Chowdhary GV, Yucelen T (2013) Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay. AIAA Guidance Navigation and Control Conference, Boston, Massachusetts Modares H, Lewis FL, Naghibi Sistani MB, Chowdhary GV, Yucelen T (2013) Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay. AIAA Guidance Navigation and Control Conference, Boston, Massachusetts
30.
Zurück zum Zitat Modares H, Lewis FL, Naghibi Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRefMATH Modares H, Lewis FL, Naghibi Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRefMATH
31.
Zurück zum Zitat Yasini S, Karimpour A, Naghibi Sistani MB, Modares H (2014) Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems. Int J Adapt Cont Sig Proc. doi:10.1002/acs.2485 MathSciNetMATH Yasini S, Karimpour A, Naghibi Sistani MB, Modares H (2014) Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems. Int J Adapt Cont Sig Proc. doi:10.​1002/​acs.​2485 MathSciNetMATH
32.
33.
Zurück zum Zitat Lyshevski SE (1998) Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc. IEEE ACC. pp 205–209 Lyshevski SE (1998) Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc. IEEE ACC. pp 205–209
34.
Zurück zum Zitat Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560CrossRef Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560CrossRef
35.
Zurück zum Zitat Wang XZ, Li CG, Yeung DS, Song S, Feng H (2008) A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis. Neurocomputing 71(7–9):1515–1526CrossRef Wang XZ, Li CG, Yeung DS, Song S, Feng H (2008) A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis. Neurocomputing 71(7–9):1515–1526CrossRef
36.
37.
Zurück zum Zitat Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cyber 4(3):217–233. doi:10.1007/s13042-012-0089-5 CrossRef Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cyber 4(3):217–233. doi:10.​1007/​s13042-012-0089-5 CrossRef
38.
Zurück zum Zitat Nevisitc V, Primbs JA (1996) Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech. Rep Nevisitc V, Primbs JA (1996) Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech. Rep
39.
40.
Zurück zum Zitat Hardy G, Littlewood J, Polya G (1998) Inequalities, 2nd edn. Cambridge University Press, CambridgeMATH Hardy G, Littlewood J, Polya G (1998) Inequalities, 2nd edn. Cambridge University Press, CambridgeMATH
Metadaten
Titel
Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems
verfasst von
Sholeh Yasini
Mohammad Bagher Naghibi Sitani
Ali Kirampor
Publikationsdatum
01.12.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 6/2016
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-014-0300-y

Weitere Artikel der Ausgabe 6/2016

International Journal of Machine Learning and Cybernetics 6/2016 Zur Ausgabe

Neuer Inhalt