Top

Published in:

2018 | OriginalPaper | Chapter

7. Computational Considerations

Authors : Rushikesh Kamalapurkar, Patrick Walters, Joel Rosenfeld, Warren Dixon

Published in: Reinforcement Learning for Optimal Feedback Control

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Motivated by issues arising in adaptive dynamic programming for optimal control, a function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where good accuracy can be achieved provided the weight update law is iterated at a high enough frequency, as detailed in Theorem 7.5. Simulation results are presented that demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Applications

For \(z \in \mathbb {C}\) the quantity Re(z) is the real part of z, and \({\overline{z}}\) represents the complex conjugate of z.

Kirk D (2004) Optimal control theory: an introduction. Dover, Mineola

Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, PrincetonMATH

Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing, pp 406–414

Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J Mach Learn Res 7:2651–2667MathSciNetMATH

Park J, Sanberg I (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257CrossRef

Folland GB (1999) Real analysis: modern techniques and their applications, 2nd edn. Pure and applied mathematics, Wiley, New YorkMATH

Steinwart I, Christmann A (2008) Support vector machines. Information science and statistics, Springer, New YorkMATH

Gaggero M, Gnecco G, Sanguineti M (2013) Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J Optim Theory Appl 156

Gaggero M, Gnecco G, Sanguineti M (2014) Approximate dynamic programming for stochastic n-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58(1):31–85MathSciNetCrossRef

10.

Zoppoli R, Sanguineti M, Parisini T (2002) Approximating networks and extended Ritz method for the solution of functional optimization problems. J Optim Theory Appl 112(2):403–440MathSciNetCrossRef

11.

Kamalapurkar R, Walters P, Dixon WE (2013) Concurrent learning-based approximate optimal regulation. In: Proceedings of the IEEE conference on decision and control, Florence, IT, pp 6256–6261

12.

Kamalapurkar R, Andrews L, Walters P, Dixon WE (2014) Model-based reinforcement learning for infinite-horizon approximate optimal tracking. In: Proceedings of the IEEE conference on decision and control, Los Angeles, CA, pp 5083–5088

13.

Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247CrossRef

14.

Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetCrossRef

15.

Zhu K (2012) Analysis on fock spaces, vol 263. Graduate texts in mathematics, Springer, New YorkMATH

16.

Pinkus A (2004) Strictly positive definite functions on a real inner product space. Adv Comput Math 20:263–271MathSciNetCrossRef

17.

Rosenfeld JA, Kamalapurkar R, Dixon WE (2015) State following (StaF) kernel functions for function approximation part I: theory and motivation. In: Proceedings of the American control conference, pp 1217–1222

18.

Beylkin G, Monzon L (2005) On approximation of functions by exponential sums. Appl Comput Harmon Anal 19(1):17–48MathSciNetCrossRef

19.

Bertsekas DP (1999) Nonlinear programming. Athena Scientific, BelmontMATH

20.

Pedersen GK (1989) Analysis now, vol 118. Graduate texts in mathematics, Springer, New YorkMATH

21.

Kamalapurkar R, Rosenfeld J, Dixon WE (2016) Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74:247–258MathSciNetCrossRef

22.

Lorentz GG (1986) Bernstein polynomials, 2nd edn. Chelsea Publishing Co., New YorkMATH

23.

Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle RiverMATH

24.

Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle RiverMATH

25.

Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245CrossRef

26.

Padhi R, Unnikrishnan N, Wang X, Balakrishnan S (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660CrossRef

27.

Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38:943–949CrossRef

28.

Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50CrossRef

29.

Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860CrossRef

30.

Mehta P, Meyn S (2009) Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE conference on decision and control, pp 3598–3605

31.

Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRef

32.

Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236CrossRef

33.

Sadegh N (1993) A perceptron network for functional identification and control of nonlinear systems. IEEE Trans Neural Netw 4(6):982–988CrossRef

34.

Chowdhary G, Yucelen T, Mühlegg M, Johnson EN (2013) Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process 27(4):280–301MathSciNetCrossRef

35.

Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639CrossRef

36.

Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH

37.

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

38.

Konda V, Tsitsiklis J (2004) On actor-critic algorithms. SIAM J Control Optim 42(4):1143–1166MathSciNetCrossRef

39.

Bertsekas D (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Belmont

40.

Szepesvári C (2010) Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelCrossRef

41.

Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374CrossRef

42.

Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92MathSciNetCrossRef

43.

Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Ph.D. thesis, Georgia Institute of Technology

44.

Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American control conference, pp 3547–3552

45.

Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRef

46.

Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216CrossRef

47.

Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, LondonCrossRef

48.

Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. AutomaticaMathSciNetCrossRef

49.

Yang X, Liu D, Wei Q (2014) Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl 8(16):1676–1688MathSciNetCrossRef

50.

Ge SS, Zhang J (2003) Neural-network control of nonaffine nonlinear system with zero dynamics by state and output feedback. IEEE Trans Neural Netw 14(4):900–918CrossRef

51.

Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832MathSciNetCrossRef

52.

Zhang X, Zhang H, Sun Q, Luo Y (2012) Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91:48–55CrossRef

53.

Liu D, Huang Y, Wang D, Wei Q (2013) Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. Int J Control 86(9):1554–1566MathSciNetCrossRef

54.

Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632MathSciNetCrossRef

55.

Yang X, Liu D, Wei Q, Wang D (2015) Direct adaptive control for a class of discrete-time unknown nonaffine nonlinear systems using neural networks. Int J Robust Nonlinear Control 25(12):1844–1861MathSciNetCrossRef

56.

Kiumarsi B, Kang W, Lewis FL (2016) H-\(\infty \) control of nonaffine aerial systems using off-policy reinforcement learning. Unmanned Syst 4(1):1–10

57.

Song R, Wei Q, Xiao W (2016) Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 46(1):85–95

Title: Computational Considerations
Authors: Rushikesh Kamalapurkar
Patrick Walters
Joel Rosenfeld
Warren Dixon
Publisher: Springer International Publishing
Book: Reinforcement Learning for Optimal Feedback Control
Print ISBN: 978-3-319-78383-3

Electronic ISBN: 978-3-319-78384-0

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-78384-0_7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"