nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

01.12.2016 | Original Article

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

verfasst von: Sholeh Yasini, Mohammad Bagher Naghibi Sitani, Ali Kirampor

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents an online adaptive optimal control method based on reinforcement learning to solve the multi-agent nonzero-sum (NZS) differential games of nonlinear constrained-input continuous-time systems. A non-quadratic cost functional associated with each agent is employed to encode the saturation nonlinearity into the NZS game. The algorithm is implemented as a separate actor-critic neural network (NN) structure for every participant in the game, where adaptation of both NNs is performed simultaneously and continuously. The technique of concurrent learning is utilized to obtain novel update laws for the critic NN weights. That is, recorded data and current data are used concurrently for adaptation of the critic NN weights. This results in an algorithm where an easier and verifiable condition is sufficient for parameter convergence rather than the restrictive persistence of excitation (PE) condition. The stability of the closed-loop systems is guaranteed and the convergence to the Nash equilibrium solution of the game is shown. Simulation results show the effectiveness of the proposed method.

Vorheriger Artikel Linguistic rough sets

Nächster Artikel Erratum to: Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

Nur mit Berechtigung zugänglich

Shah V (1998) Power control for wireless data services based on utility and pricing. Dissertation, Rutgers University

Mukaidani H (2007) Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems. J Appl Math Comput 188(1):103–115MathSciNetCrossRefMATH

Isaacs R (1965) Differential Games. Wiley, New YorkMATH

Starr A, Ho Y (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):148–206CrossRefMATH

Basar T, Olsder GJ (1998) Dynamic Noncooperative Game Theory, 2nd edn. SIAM, PhiladelphiaCrossRefMATH

Li T, Gajic Z (1994) Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games. New Trends Dynam Appl. Birkhäuser, Boston, pp 489–494

Freiling G, Jank G, Abou-Kandil H (2002) On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Autom Control 41(2):264–269MathSciNetCrossRefMATH

Jungers M, De Pieri E, Abu-Kandil H (2007) Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach. Int J Tomography Stat 7:49–54MathSciNet

Sutton R (1988) Learning to predictive by the method of temporal differences. Mach Learn 3(1):9–44

10.

Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50MathSciNetCrossRef

11.

Lewis FL, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control. IEEE Control Syst 32(6):76–105MathSciNetCrossRef

12.

Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control. Multiscience Press, Brentwood

13.

Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef

14.

Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic Programming. Athena Scientific, MAMATH

15.

Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246CrossRefMATH

16.

Vamvoudakis K, Lewis FL (2010) Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH

17.

Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis FL, Dixon WD (2012) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92MathSciNetCrossRefMATH

18.

Modares H, Lewis FL, Naghibi Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525CrossRef

19.

Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360MathSciNetCrossRefMATH

20.

Vamvoudakis K, Lewis FL (2010) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc. 49th IEEE CDC, pp 3040-3047

21.

Modares H, Lewis FL, Naghibi Sistani MB (2014) Online solution of nonquadratic two-player zero-sum games arising in the H _∞ control of constrained input systems. Int J Adapt Cont Sig Proc 28(3–5):232–254MathSciNetCrossRefMATH

22.

Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC, pp 142–147

23.

Vrabie D, Lewis FL (2010) Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC, pp 3066–3071

24.

Vamvoudakis K, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569MathSciNetCrossRefMATH

25.

Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 45(1):206–216CrossRef

26.

Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRefMATH

27.

Abu-Khalaf M, Lewis FL, Huang J (2008) Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252CrossRef

28.

Chowdhary GV (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Dissertation, Georgia Institute of Technology

29.

Modares H, Lewis FL, Naghibi Sistani MB, Chowdhary GV, Yucelen T (2013) Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay. AIAA Guidance Navigation and Control Conference, Boston, Massachusetts

30.

Modares H, Lewis FL, Naghibi Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRefMATH

31.

Yasini S, Karimpour A, Naghibi Sistani MB, Modares H (2014) Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems. Int J Adapt Cont Sig Proc. doi:10.1002/acs.2485 MathSciNetMATH

32.

Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, New YorkCrossRefMATH

33.

Lyshevski SE (1998) Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc. IEEE ACC. pp 205–209

34.

Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560CrossRef

35.

Wang XZ, Li CG, Yeung DS, Song S, Feng H (2008) A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis. Neurocomputing 71(7–9):1515–1526CrossRef

36.

Ghazikhani A, Monsefi R, Sadoghi Yazdi H (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cyber 5(1):51–62. doi:10.1007/s13042-013-0180-6 CrossRef

37.

Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cyber 4(3):217–233. doi:10.1007/s13042-012-0089-5 CrossRef

38.

Nevisitc V, Primbs JA (1996) Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech. Rep

39.

Raja R, Karthik Raja U, Samidurai R, Leelamani A (2014) Dynamic analysis of discrete-time BAM neural networks with stochastic perturbations and impulses. Int J Mach Learn Cyber 5(1):39–50. doi:10.1007/s13042-013-0199-8 CrossRefMATH

40.

Hardy G, Littlewood J, Polya G (1998) Inequalities, 2nd edn. Cambridge University Press, CambridgeMATH

Titel: Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems
verfasst von: Sholeh Yasini
Mohammad Bagher Naghibi Sitani
Ali Kirampor
Publikationsdatum: 01.12.2016
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 6/2016
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-014-0300-y

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 6/2016

Structural-damage detection with big data using parallel computing based on MPSoC

On generalized fuzzy ideals of ordered -groupoids

Robust stability analysis of uncertain genetic regulatory networks with mixed time delays

SVD based fragile watermarking scheme for tamper localization and self-recovery

Fuzzy parameterized fuzzy soft sets and decision making

Integration of data fusion and reinforcement learning techniques for the rank-aggregation problem

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.