Skip to main content

2018 | OriginalPaper | Buchkapitel

3. Excitation-Based Online Approximate Optimal Control

verfasst von : Rushikesh Kamalapurkar, Patrick Walters, Joel Rosenfeld, Warren Dixon

Erschienen in: Reinforcement Learning for Optimal Feedback Control

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this chapter, online adaptive reinforcement learning-based solutions are developed for infinite-horizon optimal control problems for continuous-time uncertain nonlinear systems. An actor-critic-identifier structure is developed to approximate the solution to the Hamilton–Jacobi–Bellman equation using three neural network structures. The actor and the critic neural networks approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of the using the actor-critic-identifier architecture is that learning by the actor, critic, and identifier is continuous and concurrent, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded stability of the closed-loop system. The developed actor-critic method is extended to solve trajectory tracking problems under the assumption that the system dynamics are completely known. The actor-critic-identifier architecture is also extended to generate approximate feedback-Nash equilibrium solutions to N-player nonzero-sum differential games. Simulation results are provided to demonstrate the performance of the developed actor-critic-identifier method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Parts of the text in this section are reproduced, with permission, from [9], ©2013, Elsevier.
 
2
Parts of the text in this section are reproduced, with permission, from [22], ©2015, Elsevier.
 
3
Parts of the text in this section are reproduced, with permission, from [30], ©2015, IEEE.
 
Literatur
1.
Zurück zum Zitat Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sorge DA (eds) Handbook of intelligent control: neural, fuzzy, and adaptive approaches, vol 15. Nostrand, New York, pp 493–525 Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sorge DA (eds) Handbook of intelligent control: neural, fuzzy, and adaptive approaches, vol 15. Nostrand, New York, pp 493–525
2.
Zurück zum Zitat Hopfield J (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proc Nat Acad Sci USA 81(10):3088CrossRef Hopfield J (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proc Nat Acad Sci USA 81(10):3088CrossRef
3.
Zurück zum Zitat Kirk D (2004) Optimal Control Theory: An Introduction. Dover, Mineola, NY Kirk D (2004) Optimal Control Theory: An Introduction. Dover, Mineola, NY
4.
Zurück zum Zitat Lewis FL, Vrabie D, Syrmos VL (2012) Optimal Control, 3rd edn. Wiley, HobokenCrossRef Lewis FL, Vrabie D, Syrmos VL (2012) Optimal Control, 3rd edn. Wiley, HobokenCrossRef
7.
8.
9.
Zurück zum Zitat Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92MathSciNetCrossRef Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92MathSciNetCrossRef
10.
Zurück zum Zitat Xian B, Dawson DM, de Queiroz MS, Chen J (2004) A continuous asymptotic tracking control strategy for uncertain nonlinear systems. IEEE Trans Autom Control 49(7):1206–1211MathSciNetCrossRef Xian B, Dawson DM, de Queiroz MS, Chen J (2004) A continuous asymptotic tracking control strategy for uncertain nonlinear systems. IEEE Trans Autom Control 49(7):1206–1211MathSciNetCrossRef
11.
Zurück zum Zitat Patre PM, MacKunis W, Kaiser K, Dixon WE (2008) Asymptotic tracking for uncertain dynamic systems via a multilayer neural network feedforward and RISE feedback control structure. IEEE Trans Autom Control 53(9):2180–2185MathSciNetCrossRef Patre PM, MacKunis W, Kaiser K, Dixon WE (2008) Asymptotic tracking for uncertain dynamic systems via a multilayer neural network feedforward and RISE feedback control structure. IEEE Trans Autom Control 53(9):2180–2185MathSciNetCrossRef
12.
Zurück zum Zitat Filippov AF (1988) Differential equations with discontinuous right-hand sides. Kluwer Academic Publishers, DordrechtCrossRef Filippov AF (1988) Differential equations with discontinuous right-hand sides. Kluwer Academic Publishers, DordrechtCrossRef
13.
Zurück zum Zitat Kamalapurkar R, Rosenfeld JA, Klotz J, Downey RJ, Dixon WE (2014) Supporting lemmas for RISE-based control methods. arXiv:1306.3432 Kamalapurkar R, Rosenfeld JA, Klotz J, Downey RJ, Dixon WE (2014) Supporting lemmas for RISE-based control methods. arXiv:​1306.​3432
14.
Zurück zum Zitat Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRef Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRef
15.
Zurück zum Zitat Sastry S, Bodson M (1989) Adaptive control: stability, convergence, and robustness. Prentice-Hall, Upper Saddle RiverMATH Sastry S, Bodson M (1989) Adaptive control: stability, convergence, and robustness. Prentice-Hall, Upper Saddle RiverMATH
16.
Zurück zum Zitat Panteley E, Loria A, Teel A (2001) Relaxed persistency of excitation for uniform asymptotic stability. IEEE Trans Autom Control 46(12):1874–1886MathSciNetCrossRef Panteley E, Loria A, Teel A (2001) Relaxed persistency of excitation for uniform asymptotic stability. IEEE Trans Autom Control 46(12):1874–1886MathSciNetCrossRef
17.
Zurück zum Zitat Loría A, Panteley E (2002) Uniform exponential stability of linear time-varying systems: revisited. Syst Control Lett 47(1):13–24MathSciNetCrossRef Loría A, Panteley E (2002) Uniform exponential stability of linear time-varying systems: revisited. Syst Control Lett 47(1):13–24MathSciNetCrossRef
18.
Zurück zum Zitat Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle RiverMATH Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle RiverMATH
19.
Zurück zum Zitat Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH
20.
Zurück zum Zitat Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, Boca RatonCrossRef Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, Boca RatonCrossRef
21.
Zurück zum Zitat Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
22.
Zurück zum Zitat Kamalapurkar R, Dinh H, Bhasin S, Dixon WE (2015) Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 51:40–48MathSciNetCrossRef Kamalapurkar R, Dinh H, Bhasin S, Dixon WE (2015) Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 51:40–48MathSciNetCrossRef
23.
Zurück zum Zitat Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. IEEE Trans Syst Man Cybern Part B Cybern 38(4):937–942CrossRef Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. IEEE Trans Syst Man Cybern Part B Cybern 38(4):937–942CrossRef
24.
Zurück zum Zitat Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560CrossRef Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560CrossRef
25.
Zurück zum Zitat Lewis FL, Selmic R, Campos J (2002) Neuro-fuzzy control of industrial systems with actuator nonlinearities. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRef Lewis FL, Selmic R, Campos J (2002) Neuro-fuzzy control of industrial systems with actuator nonlinearities. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRef
26.
Zurück zum Zitat Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle RiverMATH Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle RiverMATH
27.
Zurück zum Zitat Misovec KM (1999) Friction compensation using adaptive non-linear control with persistent excitation. Int J Control 72(5):457–479MathSciNetCrossRef Misovec KM (1999) Friction compensation using adaptive non-linear control with persistent excitation. Int J Control 72(5):457–479MathSciNetCrossRef
28.
Zurück zum Zitat Narendra K, Annaswamy A (1986) Robust adaptive control in the presence of bounded disturbances. IEEE Trans Autom Control 31(4):306–315MathSciNetCrossRef Narendra K, Annaswamy A (1986) Robust adaptive control in the presence of bounded disturbances. IEEE Trans Autom Control 31(4):306–315MathSciNetCrossRef
29.
Zurück zum Zitat Rao AV, Benson DA, Darby CL, Patterson MA, Francolin C, Huntington GT (2010) Algorithm 902: GPOPS, A MATLAB software for solving multiple-phase optimal control problems using the Gauss pseudospectral method. ACM Trans Math Softw 37(2):1–39CrossRef Rao AV, Benson DA, Darby CL, Patterson MA, Francolin C, Huntington GT (2010) Algorithm 902: GPOPS, A MATLAB software for solving multiple-phase optimal control problems using the Gauss pseudospectral method. ACM Trans Math Softw 37(2):1–39CrossRef
30.
Zurück zum Zitat Johnson M, Kamalapurkar R, Bhasin S, Dixon WE (2015) Approximate n-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst 26(8):1645–1658MathSciNetCrossRef Johnson M, Kamalapurkar R, Bhasin S, Dixon WE (2015) Approximate n-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst 26(8):1645–1658MathSciNetCrossRef
31.
Zurück zum Zitat Basar T, Olsder GJ (1999) Dynamic noncooperative game theory. Classics in applied mathematics, 2nd edn. SIAM, Philadelphia Basar T, Olsder GJ (1999) Dynamic noncooperative game theory. Classics in applied mathematics, 2nd edn. SIAM, Philadelphia
32.
Zurück zum Zitat Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled hamilton-jacobi equations. Automatica 47:1556–1569MathSciNetCrossRef Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled hamilton-jacobi equations. Automatica 47:1556–1569MathSciNetCrossRef
33.
Zurück zum Zitat Basar T, Bernhard P (2008) \(H^{\infty }\)-optimal control and related minimax design problems: a dynamic game approach, 2nd edn. Modern Birkhäuser Classics, Birkhäuser, BostonCrossRef Basar T, Bernhard P (2008) \(H^{\infty }\)-optimal control and related minimax design problems: a dynamic game approach, 2nd edn. Modern Birkhäuser Classics, Birkhäuser, BostonCrossRef
34.
Zurück zum Zitat Patre PM, Dixon WE, Makkar C, Mackunis W (2006) Asymptotic tracking for systems with structured and unstructured uncertainties. In: Proceedings of the IEEE conference on decision and control, San Diego, California, pp 441–446 Patre PM, Dixon WE, Makkar C, Mackunis W (2006) Asymptotic tracking for systems with structured and unstructured uncertainties. In: Proceedings of the IEEE conference on decision and control, San Diego, California, pp 441–446
35.
Zurück zum Zitat Dixon WE, Behal A, Dawson DM, Nagarkatti S (2003) Nonlinear control of engineering systems: a lyapunov-based approach. Birkhauser, BostonCrossRef Dixon WE, Behal A, Dawson DM, Nagarkatti S (2003) Nonlinear control of engineering systems: a lyapunov-based approach. Birkhauser, BostonCrossRef
36.
Zurück zum Zitat Krstic M, Kanellakopoulos I, Kokotovic PV (1995) Nonlinear and adaptive control design. Wiley, New YorkMATH Krstic M, Kanellakopoulos I, Kokotovic PV (1995) Nonlinear and adaptive control design. Wiley, New YorkMATH
37.
Zurück zum Zitat Nevistic V, Primbs JA (1996) Constrained nonlinear optimal control: a converse HJB approach. Technical report. CIT-CDS 96-021, California Institute of Technology, Pasadena, CA 91125 Nevistic V, Primbs JA (1996) Constrained nonlinear optimal control: a converse HJB approach. Technical report. CIT-CDS 96-021, California Institute of Technology, Pasadena, CA 91125
38.
Zurück zum Zitat Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374CrossRef Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374CrossRef
39.
Zurück zum Zitat Vamvoudakis KG, Lewis FL (2010) Online neural network solution of nonlinear two-player zero-sum games using synchronous policy iteration. In: Proceedings of the IEEE conference on decision and control Vamvoudakis KG, Lewis FL (2010) Online neural network solution of nonlinear two-player zero-sum games using synchronous policy iteration. In: Proceedings of the IEEE conference on decision and control
40.
Zurück zum Zitat Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef
41.
Zurück zum Zitat Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRef Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRef
42.
Zurück zum Zitat Chen Z, Jagannathan S (2008) Generalized Hamilton-Jacobi-Bellman formulation -based neural network control of affine nonlinear discrete-time systems. IEEE Trans Neural Netw 19(1):90–106CrossRef Chen Z, Jagannathan S (2008) Generalized Hamilton-Jacobi-Bellman formulation -based neural network control of affine nonlinear discrete-time systems. IEEE Trans Neural Netw 19(1):90–106CrossRef
43.
Zurück zum Zitat Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860CrossRef Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860CrossRef
44.
Zurück zum Zitat Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, LondonCrossRef Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, LondonCrossRef
45.
Zurück zum Zitat Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634CrossRef Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634CrossRef
46.
Zurück zum Zitat Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRef Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRef
47.
Zurück zum Zitat Yang X, Liu D, Wang D (2014) Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 87(3):553–566MathSciNetCrossRef Yang X, Liu D, Wang D (2014) Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 87(3):553–566MathSciNetCrossRef
48.
Zurück zum Zitat Dierks T, Jagannathan S (2009) Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the IEEE conference on decision and control, Shanghai, CN, pp 6750–6755 Dierks T, Jagannathan S (2009) Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the IEEE conference on decision and control, Shanghai, CN, pp 6750–6755
49.
Zurück zum Zitat Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236CrossRef Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236CrossRef
50.
Zurück zum Zitat Wei Q, Liu D (2013) Optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. In: Guo C, Hou ZG, Zeng Z (eds) Advances in neural networks - ISNN 2013, vol 7952. Lecture notes in computer science. Springer, Berlin, pp 1–10CrossRef Wei Q, Liu D (2013) Optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. In: Guo C, Hou ZG, Zeng Z (eds) Advances in neural networks - ISNN 2013, vol 7952. Lecture notes in computer science. Springer, Berlin, pp 1–10CrossRef
51.
Zurück zum Zitat Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175MathSciNetCrossRef Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175MathSciNetCrossRef
52.
Zurück zum Zitat Qin C, Zhang H, Luo Y (2014) Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming. Int J Control 87(5):1000–1009MathSciNetCrossRef Qin C, Zhang H, Luo Y (2014) Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming. Int J Control 87(5):1000–1009MathSciNetCrossRef
53.
Zurück zum Zitat Murray J, Cox C, Lendaris G, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef Murray J, Cox C, Lendaris G, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef
54.
Zurück zum Zitat Beard R, Saridis G, Wen J (1997) Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 33:2159–2178MathSciNetCrossRef Beard R, Saridis G, Wen J (1997) Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 33:2159–2178MathSciNetCrossRef
55.
Zurück zum Zitat Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRef Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRef
56.
Zurück zum Zitat Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246CrossRef Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246CrossRef
57.
Zurück zum Zitat Wang K, Liu Y, Li L (2014) Visual servoing trajectory tracking of nonholonomic mobile robots without direct position measurement. IEEE Trans Robot 30(4):1026–1035CrossRef Wang K, Liu Y, Li L (2014) Visual servoing trajectory tracking of nonholonomic mobile robots without direct position measurement. IEEE Trans Robot 30(4):1026–1035CrossRef
58.
Zurück zum Zitat Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst Man Cybern Syst 46(11):1544–1555CrossRef Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst Man Cybern Syst 46(11):1544–1555CrossRef
59.
Zurück zum Zitat Li H, Liu D, Wang D (2014) Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans Autom Sci Eng 11(3):706–714CrossRef Li H, Liu D, Wang D (2014) Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans Autom Sci Eng 11(3):706–714CrossRef
60.
Zurück zum Zitat Dierks T, Jagannathan S (2010) Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the American control conference, pp 1568–1573 Dierks T, Jagannathan S (2010) Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the American control conference, pp 1568–1573
61.
Zurück zum Zitat Park YM, Choi MS, Lee KY (1996) An optimal tracking neuro-controller for nonlinear dynamic systems. IEEE Trans Neural Netw 7(5):1099–1110CrossRef Park YM, Choi MS, Lee KY (1996) An optimal tracking neuro-controller for nonlinear dynamic systems. IEEE Trans Neural Netw 7(5):1099–1110CrossRef
62.
Zurück zum Zitat Luo Y, Liang M (2011) Approximate optimal tracking control for a class of discrete-time non-affine systems based on GDHP algorithm. In: IWACI International Workshop on Advanced Computational Intelligence, pp 143–149 Luo Y, Liang M (2011) Approximate optimal tracking control for a class of discrete-time non-affine systems based on GDHP algorithm. In: IWACI International Workshop on Advanced Computational Intelligence, pp 143–149
63.
Zurück zum Zitat Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22CrossRef Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22CrossRef
64.
Zurück zum Zitat Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792MathSciNetCrossRef Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792MathSciNetCrossRef
65.
Zurück zum Zitat Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144MathSciNetCrossRef Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144MathSciNetCrossRef
66.
Zurück zum Zitat Yang X, Liu D, Wei Q, Wang D (2016) Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 198:80–90CrossRef Yang X, Liu D, Wei Q, Wang D (2016) Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 198:80–90CrossRef
67.
Zurück zum Zitat Zhao B, Liu D, Yang X, Li Y (2017) Observer-critic structure-based adaptive dynamic programming for decentralised tracking control of unknown large-scale nonlinear systems. Int J Syst Sci 48(9):1978–1989MathSciNetCrossRef Zhao B, Liu D, Yang X, Li Y (2017) Observer-critic structure-based adaptive dynamic programming for decentralised tracking control of unknown large-scale nonlinear systems. Int J Syst Sci 48(9):1978–1989MathSciNetCrossRef
68.
Zurück zum Zitat Wang D, Liu D, Zhang Y, Li H (2018) Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Netw 97:11–18CrossRef Wang D, Liu D, Zhang Y, Li H (2018) Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Netw 97:11–18CrossRef
69.
Zurück zum Zitat Vamvoudakis KG, Mojoodi A, Ferraz H (2017) Event-triggered optimal tracking control of nonlinear systems. Int J Robust Nonlinear Control 27(4):598–619MathSciNetCrossRef Vamvoudakis KG, Mojoodi A, Ferraz H (2017) Event-triggered optimal tracking control of nonlinear systems. Int J Robust Nonlinear Control 27(4):598–619MathSciNetCrossRef
70.
Zurück zum Zitat Wei Q, Zhang H (2008) A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. In: IEEE international conference on networking, sensing and control, pp 507–512 Wei Q, Zhang H (2008) A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. In: IEEE international conference on networking, sensing and control, pp 507–512
71.
Zurück zum Zitat Zhang H, Wei Q, Liu D (2010) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47:207–214MathSciNetCrossRef Zhang H, Wei Q, Liu D (2010) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47:207–214MathSciNetCrossRef
72.
Zurück zum Zitat Zhang X, Zhang H, Luo Y, Dong M (2010) Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the IEEE conference on decision and control, pp 1359–1364 Zhang X, Zhang H, Luo Y, Dong M (2010) Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the IEEE conference on decision and control, pp 1359–1364
73.
Zurück zum Zitat Mellouk A (ed) (2011) Advances in reinforcement learning. InTech Mellouk A (ed) (2011) Advances in reinforcement learning. InTech
74.
Zurück zum Zitat Littman M (2001) Value-function reinforcement learning in markov games. Cogn Syst Res 2(1):55–66CrossRef Littman M (2001) Value-function reinforcement learning in markov games. Cogn Syst Res 2(1):55–66CrossRef
75.
Zurück zum Zitat Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proceedings of the IEEE conference on decision and control, pp 142–147 Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proceedings of the IEEE conference on decision and control, pp 142–147
Metadaten
Titel
Excitation-Based Online Approximate Optimal Control
verfasst von
Rushikesh Kamalapurkar
Patrick Walters
Joel Rosenfeld
Warren Dixon
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-78384-0_3

Neuer Inhalt