nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

3. Excitation-Based Online Approximate Optimal Control

verfasst von : Rushikesh Kamalapurkar, Patrick Walters, Joel Rosenfeld, Warren Dixon

Erschienen in: Reinforcement Learning for Optimal Feedback Control

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this chapter, online adaptive reinforcement learning-based solutions are developed for infinite-horizon optimal control problems for continuous-time uncertain nonlinear systems. An actor-critic-identifier structure is developed to approximate the solution to the Hamilton–Jacobi–Bellman equation using three neural network structures. The actor and the critic neural networks approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of the using the actor-critic-identifier architecture is that learning by the actor, critic, and identifier is continuous and concurrent, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded stability of the closed-loop system. The developed actor-critic method is extended to solve trajectory tracking problems under the assumption that the system dynamics are completely known. The actor-critic-identifier architecture is also extended to generate approximate feedback-Nash equilibrium solutions to N-player nonzero-sum differential games. Simulation results are provided to demonstrate the performance of the developed actor-critic-identifier method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Approximate Dynamic Programming

Nächstes Kapitel Model-Based Reinforcement Learning for Approximate Optimal Control

Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sorge DA (eds) Handbook of intelligent control: neural, fuzzy, and adaptive approaches, vol 15. Nostrand, New York, pp 493–525

Hopfield J (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proc Nat Acad Sci USA 81(10):3088CrossRef

Kirk D (2004) Optimal Control Theory: An Introduction. Dover, Mineola, NY

Lewis FL, Vrabie D, Syrmos VL (2012) Optimal Control, 3rd edn. Wiley, HobokenCrossRef

Case J (1969) Toward a theory of many player differential games. SIAM J Control 7:179–197MathSciNetCrossRef

Starr A, Ho CY (1969) Nonzero-sum differential games. J Optim Theory App 3(3):184–206MathSciNetCrossRef

Starr A, Ho, (1969) Further properties of nonzero-sum differential games. J Optim Theory App 4:207–219MathSciNetCrossRef

Friedman A (1971) Differential games. Wiley, HobokenMATH

Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92MathSciNetCrossRef

10.

Xian B, Dawson DM, de Queiroz MS, Chen J (2004) A continuous asymptotic tracking control strategy for uncertain nonlinear systems. IEEE Trans Autom Control 49(7):1206–1211MathSciNetCrossRef

11.

Patre PM, MacKunis W, Kaiser K, Dixon WE (2008) Asymptotic tracking for uncertain dynamic systems via a multilayer neural network feedforward and RISE feedback control structure. IEEE Trans Autom Control 53(9):2180–2185MathSciNetCrossRef

12.

Filippov AF (1988) Differential equations with discontinuous right-hand sides. Kluwer Academic Publishers, DordrechtCrossRef

13.

Kamalapurkar R, Rosenfeld JA, Klotz J, Downey RJ, Dixon WE (2014) Supporting lemmas for RISE-based control methods. arXiv:1306.3432

14.

Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRef

15.

Sastry S, Bodson M (1989) Adaptive control: stability, convergence, and robustness. Prentice-Hall, Upper Saddle RiverMATH

16.

Panteley E, Loria A, Teel A (2001) Relaxed persistency of excitation for uniform asymptotic stability. IEEE Trans Autom Control 46(12):1874–1886MathSciNetCrossRef

17.

Loría A, Panteley E (2002) Uniform exponential stability of linear time-varying systems: revisited. Syst Control Lett 47(1):13–24MathSciNetCrossRef

18.

Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle RiverMATH

19.

Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH

20.

Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, Boca RatonCrossRef

21.

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

22.

Kamalapurkar R, Dinh H, Bhasin S, Dixon WE (2015) Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 51:40–48MathSciNetCrossRef

23.

Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. IEEE Trans Syst Man Cybern Part B Cybern 38(4):937–942CrossRef

24.

Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560CrossRef

25.

Lewis FL, Selmic R, Campos J (2002) Neuro-fuzzy control of industrial systems with actuator nonlinearities. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRef

26.

Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle RiverMATH

27.

Misovec KM (1999) Friction compensation using adaptive non-linear control with persistent excitation. Int J Control 72(5):457–479MathSciNetCrossRef

28.

Narendra K, Annaswamy A (1986) Robust adaptive control in the presence of bounded disturbances. IEEE Trans Autom Control 31(4):306–315MathSciNetCrossRef

29.

Rao AV, Benson DA, Darby CL, Patterson MA, Francolin C, Huntington GT (2010) Algorithm 902: GPOPS, A MATLAB software for solving multiple-phase optimal control problems using the Gauss pseudospectral method. ACM Trans Math Softw 37(2):1–39CrossRef

30.

Johnson M, Kamalapurkar R, Bhasin S, Dixon WE (2015) Approximate n-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst 26(8):1645–1658MathSciNetCrossRef

31.

Basar T, Olsder GJ (1999) Dynamic noncooperative game theory. Classics in applied mathematics, 2nd edn. SIAM, Philadelphia

32.

Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled hamilton-jacobi equations. Automatica 47:1556–1569MathSciNetCrossRef

33.

Basar T, Bernhard P (2008) \(H^{\infty }\)-optimal control and related minimax design problems: a dynamic game approach, 2nd edn. Modern Birkhäuser Classics, Birkhäuser, BostonCrossRef

34.

Patre PM, Dixon WE, Makkar C, Mackunis W (2006) Asymptotic tracking for systems with structured and unstructured uncertainties. In: Proceedings of the IEEE conference on decision and control, San Diego, California, pp 441–446

35.

Dixon WE, Behal A, Dawson DM, Nagarkatti S (2003) Nonlinear control of engineering systems: a lyapunov-based approach. Birkhauser, BostonCrossRef

36.

Krstic M, Kanellakopoulos I, Kokotovic PV (1995) Nonlinear and adaptive control design. Wiley, New YorkMATH

37.

Nevistic V, Primbs JA (1996) Constrained nonlinear optimal control: a converse HJB approach. Technical report. CIT-CDS 96-021, California Institute of Technology, Pasadena, CA 91125

38.

Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374CrossRef

39.

Vamvoudakis KG, Lewis FL (2010) Online neural network solution of nonlinear two-player zero-sum games using synchronous policy iteration. In: Proceedings of the IEEE conference on decision and control

40.

Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef

41.

Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRef

42.

Chen Z, Jagannathan S (2008) Generalized Hamilton-Jacobi-Bellman formulation -based neural network control of affine nonlinear discrete-time systems. IEEE Trans Neural Netw 19(1):90–106CrossRef

43.

Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860CrossRef

44.

Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, LondonCrossRef

45.

Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634CrossRef

46.

Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRef

47.

Yang X, Liu D, Wang D (2014) Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 87(3):553–566MathSciNetCrossRef

48.

Dierks T, Jagannathan S (2009) Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the IEEE conference on decision and control, Shanghai, CN, pp 6750–6755

49.

Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236CrossRef

50.

Wei Q, Liu D (2013) Optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. In: Guo C, Hou ZG, Zeng Z (eds) Advances in neural networks - ISNN 2013, vol 7952. Lecture notes in computer science. Springer, Berlin, pp 1–10CrossRef

51.

Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175MathSciNetCrossRef

52.

Qin C, Zhang H, Luo Y (2014) Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming. Int J Control 87(5):1000–1009MathSciNetCrossRef

53.

Murray J, Cox C, Lendaris G, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef

54.

Beard R, Saridis G, Wen J (1997) Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 33:2159–2178MathSciNetCrossRef

55.

Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRef

56.

Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246CrossRef

57.

Wang K, Liu Y, Li L (2014) Visual servoing trajectory tracking of nonholonomic mobile robots without direct position measurement. IEEE Trans Robot 30(4):1026–1035CrossRef

58.

Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst Man Cybern Syst 46(11):1544–1555CrossRef

59.

Li H, Liu D, Wang D (2014) Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans Autom Sci Eng 11(3):706–714CrossRef

60.

Dierks T, Jagannathan S (2010) Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the American control conference, pp 1568–1573

61.

Park YM, Choi MS, Lee KY (1996) An optimal tracking neuro-controller for nonlinear dynamic systems. IEEE Trans Neural Netw 7(5):1099–1110CrossRef

62.

Luo Y, Liang M (2011) Approximate optimal tracking control for a class of discrete-time non-affine systems based on GDHP algorithm. In: IWACI International Workshop on Advanced Computational Intelligence, pp 143–149

63.

Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22CrossRef

64.

Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792MathSciNetCrossRef

65.

Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144MathSciNetCrossRef

66.

Yang X, Liu D, Wei Q, Wang D (2016) Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 198:80–90CrossRef

67.

Zhao B, Liu D, Yang X, Li Y (2017) Observer-critic structure-based adaptive dynamic programming for decentralised tracking control of unknown large-scale nonlinear systems. Int J Syst Sci 48(9):1978–1989MathSciNetCrossRef

68.

Wang D, Liu D, Zhang Y, Li H (2018) Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Netw 97:11–18CrossRef

69.

Vamvoudakis KG, Mojoodi A, Ferraz H (2017) Event-triggered optimal tracking control of nonlinear systems. Int J Robust Nonlinear Control 27(4):598–619MathSciNetCrossRef

70.

Wei Q, Zhang H (2008) A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. In: IEEE international conference on networking, sensing and control, pp 507–512

71.

Zhang H, Wei Q, Liu D (2010) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47:207–214MathSciNetCrossRef

72.

Zhang X, Zhang H, Luo Y, Dong M (2010) Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the IEEE conference on decision and control, pp 1359–1364

73.

Mellouk A (ed) (2011) Advances in reinforcement learning. InTech

74.

Littman M (2001) Value-function reinforcement learning in markov games. Cogn Syst Res 2(1):55–66CrossRef

75.

Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proceedings of the IEEE conference on decision and control, pp 142–147

Titel: Excitation-Based Online Approximate Optimal Control
verfasst von: Rushikesh Kamalapurkar
Patrick Walters
Joel Rosenfeld
Warren Dixon
Verlag: Springer International Publishing
Buch: Reinforcement Learning for Optimal Feedback Control
Print ISBN: 978-3-319-78383-3

Electronic ISBN: 978-3-319-78384-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-78384-0_3

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.