Skip to main content
Top

2018 | OriginalPaper | Chapter

7. Computational Considerations

Authors : Rushikesh Kamalapurkar, Patrick Walters, Joel Rosenfeld, Warren Dixon

Published in: Reinforcement Learning for Optimal Feedback Control

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Motivated by issues arising in adaptive dynamic programming for optimal control, a function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where good accuracy can be achieved provided the weight update law is iterated at a high enough frequency, as detailed in Theorem 7.5. Simulation results are presented that demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For \(z \in \mathbb {C}\) the quantity Re(z) is the real part of z, and \({\overline{z}}\) represents the complex conjugate of z.
 
2
Parts of the text in this section are reproduced, with permission, from [21], ©2016, Elsevier.
 
Literature
1.
go back to reference Kirk D (2004) Optimal control theory: an introduction. Dover, Mineola Kirk D (2004) Optimal control theory: an introduction. Dover, Mineola
2.
go back to reference Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, PrincetonMATH Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, PrincetonMATH
3.
go back to reference Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing, pp 406–414 Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing, pp 406–414
5.
go back to reference Park J, Sanberg I (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257CrossRef Park J, Sanberg I (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257CrossRef
6.
go back to reference Folland GB (1999) Real analysis: modern techniques and their applications, 2nd edn. Pure and applied mathematics, Wiley, New YorkMATH Folland GB (1999) Real analysis: modern techniques and their applications, 2nd edn. Pure and applied mathematics, Wiley, New YorkMATH
7.
go back to reference Steinwart I, Christmann A (2008) Support vector machines. Information science and statistics, Springer, New YorkMATH Steinwart I, Christmann A (2008) Support vector machines. Information science and statistics, Springer, New YorkMATH
8.
go back to reference Gaggero M, Gnecco G, Sanguineti M (2013) Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J Optim Theory Appl 156 Gaggero M, Gnecco G, Sanguineti M (2013) Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J Optim Theory Appl 156
9.
go back to reference Gaggero M, Gnecco G, Sanguineti M (2014) Approximate dynamic programming for stochastic n-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58(1):31–85MathSciNetCrossRef Gaggero M, Gnecco G, Sanguineti M (2014) Approximate dynamic programming for stochastic n-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58(1):31–85MathSciNetCrossRef
10.
go back to reference Zoppoli R, Sanguineti M, Parisini T (2002) Approximating networks and extended Ritz method for the solution of functional optimization problems. J Optim Theory Appl 112(2):403–440MathSciNetCrossRef Zoppoli R, Sanguineti M, Parisini T (2002) Approximating networks and extended Ritz method for the solution of functional optimization problems. J Optim Theory Appl 112(2):403–440MathSciNetCrossRef
11.
go back to reference Kamalapurkar R, Walters P, Dixon WE (2013) Concurrent learning-based approximate optimal regulation. In: Proceedings of the IEEE conference on decision and control, Florence, IT, pp 6256–6261 Kamalapurkar R, Walters P, Dixon WE (2013) Concurrent learning-based approximate optimal regulation. In: Proceedings of the IEEE conference on decision and control, Florence, IT, pp 6256–6261
12.
go back to reference Kamalapurkar R, Andrews L, Walters P, Dixon WE (2014) Model-based reinforcement learning for infinite-horizon approximate optimal tracking. In: Proceedings of the IEEE conference on decision and control, Los Angeles, CA, pp 5083–5088 Kamalapurkar R, Andrews L, Walters P, Dixon WE (2014) Model-based reinforcement learning for infinite-horizon approximate optimal tracking. In: Proceedings of the IEEE conference on decision and control, Los Angeles, CA, pp 5083–5088
13.
go back to reference Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247CrossRef Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247CrossRef
15.
go back to reference Zhu K (2012) Analysis on fock spaces, vol 263. Graduate texts in mathematics, Springer, New YorkMATH Zhu K (2012) Analysis on fock spaces, vol 263. Graduate texts in mathematics, Springer, New YorkMATH
16.
17.
go back to reference Rosenfeld JA, Kamalapurkar R, Dixon WE (2015) State following (StaF) kernel functions for function approximation part I: theory and motivation. In: Proceedings of the American control conference, pp 1217–1222 Rosenfeld JA, Kamalapurkar R, Dixon WE (2015) State following (StaF) kernel functions for function approximation part I: theory and motivation. In: Proceedings of the American control conference, pp 1217–1222
18.
go back to reference Beylkin G, Monzon L (2005) On approximation of functions by exponential sums. Appl Comput Harmon Anal 19(1):17–48MathSciNetCrossRef Beylkin G, Monzon L (2005) On approximation of functions by exponential sums. Appl Comput Harmon Anal 19(1):17–48MathSciNetCrossRef
19.
go back to reference Bertsekas DP (1999) Nonlinear programming. Athena Scientific, BelmontMATH Bertsekas DP (1999) Nonlinear programming. Athena Scientific, BelmontMATH
20.
go back to reference Pedersen GK (1989) Analysis now, vol 118. Graduate texts in mathematics, Springer, New YorkMATH Pedersen GK (1989) Analysis now, vol 118. Graduate texts in mathematics, Springer, New YorkMATH
21.
go back to reference Kamalapurkar R, Rosenfeld J, Dixon WE (2016) Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74:247–258MathSciNetCrossRef Kamalapurkar R, Rosenfeld J, Dixon WE (2016) Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74:247–258MathSciNetCrossRef
22.
go back to reference Lorentz GG (1986) Bernstein polynomials, 2nd edn. Chelsea Publishing Co., New YorkMATH Lorentz GG (1986) Bernstein polynomials, 2nd edn. Chelsea Publishing Co., New YorkMATH
23.
go back to reference Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle RiverMATH Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle RiverMATH
24.
go back to reference Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle RiverMATH Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle RiverMATH
25.
go back to reference Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRef Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRef
26.
go back to reference Padhi R, Unnikrishnan N, Wang X, Balakrishnan S (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660CrossRef Padhi R, Unnikrishnan N, Wang X, Balakrishnan S (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660CrossRef
27.
go back to reference Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38:943–949CrossRef Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38:943–949CrossRef
28.
go back to reference Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50CrossRef Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50CrossRef
29.
go back to reference Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860CrossRef Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860CrossRef
30.
go back to reference Mehta P, Meyn S (2009) Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE conference on decision and control, pp 3598–3605 Mehta P, Meyn S (2009) Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE conference on decision and control, pp 3598–3605
31.
go back to reference Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRef Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRef
32.
go back to reference Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236CrossRef Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236CrossRef
33.
go back to reference Sadegh N (1993) A perceptron network for functional identification and control of nonlinear systems. IEEE Trans Neural Netw 4(6):982–988CrossRef Sadegh N (1993) A perceptron network for functional identification and control of nonlinear systems. IEEE Trans Neural Netw 4(6):982–988CrossRef
34.
go back to reference Chowdhary G, Yucelen T, Mühlegg M, Johnson EN (2013) Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process 27(4):280–301MathSciNetCrossRef Chowdhary G, Yucelen T, Mühlegg M, Johnson EN (2013) Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process 27(4):280–301MathSciNetCrossRef
35.
go back to reference Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639CrossRef Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639CrossRef
36.
go back to reference Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH
37.
go back to reference Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
39.
go back to reference Bertsekas D (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Belmont Bertsekas D (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Belmont
40.
go back to reference Szepesvári C (2010) Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelCrossRef Szepesvári C (2010) Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelCrossRef
41.
go back to reference Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374CrossRef Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374CrossRef
42.
go back to reference Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92MathSciNetCrossRef Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92MathSciNetCrossRef
43.
go back to reference Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Ph.D. thesis, Georgia Institute of Technology Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Ph.D. thesis, Georgia Institute of Technology
44.
go back to reference Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American control conference, pp 3547–3552 Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American control conference, pp 3547–3552
45.
go back to reference Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRef Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRef
46.
go back to reference Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef
47.
go back to reference Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, LondonCrossRef Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, LondonCrossRef
48.
go back to reference Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. AutomaticaMathSciNetCrossRef Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. AutomaticaMathSciNetCrossRef
49.
go back to reference Yang X, Liu D, Wei Q (2014) Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl 8(16):1676–1688MathSciNetCrossRef Yang X, Liu D, Wei Q (2014) Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl 8(16):1676–1688MathSciNetCrossRef
50.
go back to reference Ge SS, Zhang J (2003) Neural-network control of nonaffine nonlinear system with zero dynamics by state and output feedback. IEEE Trans Neural Netw 14(4):900–918CrossRef Ge SS, Zhang J (2003) Neural-network control of nonaffine nonlinear system with zero dynamics by state and output feedback. IEEE Trans Neural Netw 14(4):900–918CrossRef
51.
go back to reference Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832MathSciNetCrossRef Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832MathSciNetCrossRef
52.
go back to reference Zhang X, Zhang H, Sun Q, Luo Y (2012) Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91:48–55CrossRef Zhang X, Zhang H, Sun Q, Luo Y (2012) Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91:48–55CrossRef
53.
go back to reference Liu D, Huang Y, Wang D, Wei Q (2013) Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. Int J Control 86(9):1554–1566MathSciNetCrossRef Liu D, Huang Y, Wang D, Wei Q (2013) Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. Int J Control 86(9):1554–1566MathSciNetCrossRef
54.
go back to reference Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632MathSciNetCrossRef Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632MathSciNetCrossRef
55.
go back to reference Yang X, Liu D, Wei Q, Wang D (2015) Direct adaptive control for a class of discrete-time unknown nonaffine nonlinear systems using neural networks. Int J Robust Nonlinear Control 25(12):1844–1861MathSciNetCrossRef Yang X, Liu D, Wei Q, Wang D (2015) Direct adaptive control for a class of discrete-time unknown nonaffine nonlinear systems using neural networks. Int J Robust Nonlinear Control 25(12):1844–1861MathSciNetCrossRef
56.
go back to reference Kiumarsi B, Kang W, Lewis FL (2016) H-\(\infty \) control of nonaffine aerial systems using off-policy reinforcement learning. Unmanned Syst 4(1):1–10 Kiumarsi B, Kang W, Lewis FL (2016) H-\(\infty \) control of nonaffine aerial systems using off-policy reinforcement learning. Unmanned Syst 4(1):1–10
57.
go back to reference Song R, Wei Q, Xiao W (2016) Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 46(1):85–95 Song R, Wei Q, Xiao W (2016) Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 46(1):85–95
Metadata
Title
Computational Considerations
Authors
Rushikesh Kamalapurkar
Patrick Walters
Joel Rosenfeld
Warren Dixon
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-78384-0_7