Skip to main content
Erschienen in: Neural Computing and Applications 7-8/2013

01.12.2013 | ISNN2012

Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays

verfasst von: Qinglai Wei, Ding Wang, Dehua Zhang

Erschienen in: Neural Computing and Applications | Ausgabe 7-8/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, a new dual iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for a class of nonlinear systems with time-delays in state and control variables. The idea is to use the dynamic programming theory to solve the expressions of the optimal performance index function and control. Then, the dual iterative ADP algorithm is introduced to obtain the optimal solutions iteratively, where in each iteration, the performance index function and the system states are both updated. Convergence analysis is presented to prove the performance index function to reach the optimum by the proposed method. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the dual iterative ADP algorithm. Simulation examples are given to demonstrate the validity of the proposed optimal control scheme.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRefMATH Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791MathSciNetCrossRefMATH
2.
Zurück zum Zitat Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to \(H_{\infty}\) control. IEEE Trans Syst Cybern Part B Cybern 37(1):240–247CrossRef Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to \(H_{\infty}\) control. IEEE Trans Syst Cybern Part B Cybern 37(1):240–247CrossRef
3.
Zurück zum Zitat Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949CrossRef Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949CrossRef
4.
Zurück zum Zitat Basin M, Rodriguez-Gonzalez J (2006) Optimal control for linear systems with multiple time delays in control input. IEEE Trans Autom Control 51(1):91–97MathSciNetCrossRef Basin M, Rodriguez-Gonzalez J (2006) Optimal control for linear systems with multiple time delays in control input. IEEE Trans Autom Control 51(1):91–97MathSciNetCrossRef
5.
Zurück zum Zitat Basin M, Rodriguez-Gonzaleza J, Fridman L (2007) Optimal and robust control for linear state-delay systems. J Franklin Inst 344(7):830–845MathSciNetCrossRefMATH Basin M, Rodriguez-Gonzaleza J, Fridman L (2007) Optimal and robust control for linear state-delay systems. J Franklin Inst 344(7):830–845MathSciNetCrossRefMATH
6.
Zurück zum Zitat Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJMATH Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJMATH
7.
Zurück zum Zitat Busoniu L, Ernst D, Schutter BD, Babuska R (2010) Approximate dynamic programming with a fuzzy parameterization. Automatica 46(5):804–814MathSciNetCrossRefMATH Busoniu L, Ernst D, Schutter BD, Babuska R (2010) Approximate dynamic programming with a fuzzy parameterization. Automatica 46(5):804–814MathSciNetCrossRefMATH
8.
Zurück zum Zitat Gao H, Sun W, Shi P (2010) Robust sampled-data \(H_{\infty}\) control for vehicle active suspension systems. IEEE Trans Control Syst Technol 18(1):238–245CrossRef Gao H, Sun W, Shi P (2010) Robust sampled-data \(H_{\infty}\) control for vehicle active suspension systems. IEEE Trans Control Syst Technol 18(1):238–245CrossRef
9.
Zurück zum Zitat Chen Z, Jagannathan S (2008) Generalized Hamilton-Jacobi-Bellman formulation-based neural network control of affine nonlinear discretetime systems. IEEE Trans Neural Netw 19(1):90–106CrossRef Chen Z, Jagannathan S (2008) Generalized Hamilton-Jacobi-Bellman formulation-based neural network control of affine nonlinear discretetime systems. IEEE Trans Neural Netw 19(1):90–106CrossRef
11.
Zurück zum Zitat Halpin SM, Harley KA, Jones RA, Taylor LY (2008) Slope-permissive under-voltage load shed relay for delayed voltage recovery mitigation. IEEE Trans Power Syst 23(3):1211–1216CrossRef Halpin SM, Harley KA, Jones RA, Taylor LY (2008) Slope-permissive under-voltage load shed relay for delayed voltage recovery mitigation. IEEE Trans Power Syst 23(3):1211–1216CrossRef
12.
Zurück zum Zitat Han M, Han B, Xi J, Hirasawa K (2006) Universal learning network and its application for nonlinear system with long time delay. Comput Chem Eng 31(1):13–20CrossRef Han M, Han B, Xi J, Hirasawa K (2006) Universal learning network and its application for nonlinear system with long time delay. Comput Chem Eng 31(1):13–20CrossRef
13.
Zurück zum Zitat Hanselmann T, Noakes L, Zaknich A (2007) Continuous-time adaptive critics. IEEE Trans Neural Netw 18(3):631–647CrossRef Hanselmann T, Noakes L, Zaknich A (2007) Continuous-time adaptive critics. IEEE Trans Neural Netw 18(3):631–647CrossRef
14.
Zurück zum Zitat Ho DWC, Li J, Niu Y (2005) Adaptive neural control for a class of nonlinearly parametric time-delay systems. IEEE Trans Neural Netw 16(3):625–635CrossRef Ho DWC, Li J, Niu Y (2005) Adaptive neural control for a class of nonlinearly parametric time-delay systems. IEEE Trans Neural Netw 16(3):625–635CrossRef
15.
Zurück zum Zitat Huang X, Ma M (2008) Optimal scheduling for minimum delay in passive star coupled WDM optical networks. IEEE Trans Commun 56(8):1324–1330CrossRef Huang X, Ma M (2008) Optimal scheduling for minimum delay in passive star coupled WDM optical networks. IEEE Trans Commun 56(8):1324–1330CrossRef
16.
Zurück zum Zitat Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50MathSciNetCrossRef Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50MathSciNetCrossRef
17.
Zurück zum Zitat Li T, Tong SC, Feng G (2010) A novel robust adaptive-fuzzy-tracking control for a class of nonlinear multi-input/multi-output systems. IEEE Trans Fuzzy Syst 18(1):150–160CrossRef Li T, Tong SC, Feng G (2010) A novel robust adaptive-fuzzy-tracking control for a class of nonlinear multi-input/multi-output systems. IEEE Trans Fuzzy Syst 18(1):150–160CrossRef
18.
Zurück zum Zitat Li T, Wang D, Feng G, Tong SC (2010) A DSC approach to robust adaptive NN tracking control for strict-feedback nonlinear systems. IEEE Trans Syst Man Cybern Part B Cybern 40(3):915–927CrossRef Li T, Wang D, Feng G, Tong SC (2010) A DSC approach to robust adaptive NN tracking control for strict-feedback nonlinear systems. IEEE Trans Syst Man Cybern Part B Cybern 40(3):915–927CrossRef
19.
Zurück zum Zitat Li T, Feng , Wang D, Tong S (2010) Neural-network-based simple adaptive control of uncertain multi-input multi-output non-linear systems. IET Control Theory Appl 4(9):1543–1557MathSciNetCrossRef Li T, Feng , Wang D, Tong S (2010) Neural-network-based simple adaptive control of uncertain multi-input multi-output non-linear systems. IET Control Theory Appl 4(9):1543–1557MathSciNetCrossRef
20.
Zurück zum Zitat Liu D, Zhang Y, Zhang H (2005) A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans Neural Netw 16(5):1219–1228CrossRef Liu D, Zhang Y, Zhang H (2005) A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans Neural Netw 16(5):1219–1228CrossRef
21.
Zurück zum Zitat Malek-Zavarei M, Jashmidi M (1987) Time-delay systems: analysis, optimization and applications. North-Holland, AmsterdamMATH Malek-Zavarei M, Jashmidi M (1987) Time-delay systems: analysis, optimization and applications. North-Holland, AmsterdamMATH
22.
Zurück zum Zitat Pindyck RS (1992) The distrete-time tracking problem with a time delay in the control. IEEE Trans Autom Control 17(6):397–398MathSciNet Pindyck RS (1992) The distrete-time tracking problem with a time delay in the control. IEEE Trans Autom Control 17(6):397–398MathSciNet
23.
Zurück zum Zitat Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153CrossRef
24.
Zurück zum Zitat Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007CrossRef Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007CrossRef
25.
26.
Zurück zum Zitat Schenato L (2008) Optimal estimation in networked control systems subject to random delay and packet drop. IEEE Trans Autom Control 53(5):1311–1317MathSciNetCrossRef Schenato L (2008) Optimal estimation in networked control systems subject to random delay and packet drop. IEEE Trans Autom Control 53(5):1311–1317MathSciNetCrossRef
27.
Zurück zum Zitat Si J, Wang YT (2001) On-line learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276MathSciNetCrossRef Si J, Wang YT (2001) On-line learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276MathSciNetCrossRef
28.
Zurück zum Zitat Silva GJ (2005) PID Controllers for time-delay systems. Birkhuser, Boston, MAMATH Silva GJ (2005) PID Controllers for time-delay systems. Birkhuser, Boston, MAMATH
29.
Zurück zum Zitat Song R, Zhang H, Luo Y, Wei Q (2010) Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing 73(16–18):3020–3027CrossRef Song R, Zhang H, Luo Y, Wei Q (2010) Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing 73(16–18):3020–3027CrossRef
30.
Zurück zum Zitat Sun Q, Li Z, Yang J, Luo Y (2010) Load distribution model and voltage static profile of Smart Grid. J Central S Univ Technol 17(4):824–829CrossRef Sun Q, Li Z, Yang J, Luo Y (2010) Load distribution model and voltage static profile of Smart Grid. J Central S Univ Technol 17(4):824–829CrossRef
31.
Zurück zum Zitat Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888MathSciNetCrossRefMATH
32.
Zurück zum Zitat Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22CrossRef Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22CrossRef
33.
Zurück zum Zitat Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\epsilon\)-error bound. IEEE Trans Neural Netw 22(1):24–36CrossRef Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\epsilon\)-error bound. IEEE Trans Neural Netw 22(1):24–36CrossRef
34.
Zurück zum Zitat Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47CrossRef Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47CrossRef
35.
Zurück zum Zitat Watkins C (1989) Learning from delayed rewards. Ph.D. Thesis. Cambridge University, Cambridge Watkins C (1989) Learning from delayed rewards. Ph.D. Thesis. Cambridge University, Cambridge
36.
Zurück zum Zitat Wei Q, Zhang H, Dai J (2009) Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9):1839–1848CrossRef Wei Q, Zhang H, Dai J (2009) Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9):1839–1848CrossRef
37.
Zurück zum Zitat Werbos PJ (1991) A menu of designs for reinforcement learning over time. In: Miller WT, Sutton RS, Werbos PJ (eds) Neural networks for control. MIT Press, Cambridge, pp 67–95 Werbos PJ (1991) A menu of designs for reinforcement learning over time. In: Miller WT, Sutton RS, Werbos PJ (eds) Neural networks for control. MIT Press, Cambridge, pp 67–95
38.
Zurück zum Zitat Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control: neural, fuzzy, and adaptive approaches ch. 13.. Van Nostrand Reinhold, New York Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control: neural, fuzzy, and adaptive approaches ch. 13.. Van Nostrand Reinhold, New York
39.
Zurück zum Zitat Widrow B, Gupta N, Maitra S (1973) Punish/reward: learning with a critic in adaptive threshold systems. IEEE Trans Syst Man Cybern 3:455–465MathSciNetCrossRefMATH Widrow B, Gupta N, Maitra S (1973) Punish/reward: learning with a critic in adaptive threshold systems. IEEE Trans Syst Man Cybern 3:455–465MathSciNetCrossRefMATH
40.
Zurück zum Zitat Yadav V, Padhi R, Balakrishnan SN (2007) Robust/optimal temperature profile control of a high-speed aerospace vehicle using neural networks. IEEE Trans Neural Netw 18(4):1115–1128CrossRef Yadav V, Padhi R, Balakrishnan SN (2007) Robust/optimal temperature profile control of a high-speed aerospace vehicle using neural networks. IEEE Trans Neural Netw 18(4):1115–1128CrossRef
41.
Zurück zum Zitat Yang Y, Feng G, Ren J (2004) A combined backstepping and small-gain approach to robust adaptive fuzzy control for strict-feedback nonlinear systems. IEEE Trans Syst Man Cybern Part A Syst Humans 34(3):406–420CrossRef Yang Y, Feng G, Ren J (2004) A combined backstepping and small-gain approach to robust adaptive fuzzy control for strict-feedback nonlinear systems. IEEE Trans Syst Man Cybern Part A Syst Humans 34(3):406–420CrossRef
42.
Zurück zum Zitat Zhang H, Basin MV, Skliar M (2007) It\(\hat{o}\)-Volterra optimal state estimation with continuous, multirate, randomly sampled, and delayed measurements. IEEE Trans Autom Control 52(3):401–416MathSciNetCrossRef Zhang H, Basin MV, Skliar M (2007) It\(\hat{o}\)-Volterra optimal state estimation with continuous, multirate, randomly sampled, and delayed measurements. IEEE Trans Autom Control 52(3):401–416MathSciNetCrossRef
43.
Zurück zum Zitat Zhang H, Quan Y (2001) Modeling, identification and control of a class of nonlinear system. IEEE Trans Fuzzy Syst 9(2):349–354CrossRef Zhang H, Quan Y (2001) Modeling, identification and control of a class of nonlinear system. IEEE Trans Fuzzy Syst 9(2):349–354CrossRef
44.
Zurück zum Zitat Zhang H, Wang Y, Liu D (2008) Delay-dependent guaranteed cost control for uncertain stochastic fuzzy systems with multiple time delays. IEEE Trans Syst Man Cybern Part B Cybern 38(1):125–140CrossRefMATH Zhang H, Wang Y, Liu D (2008) Delay-dependent guaranteed cost control for uncertain stochastic fuzzy systems with multiple time delays. IEEE Trans Syst Man Cybern Part B Cybern 38(1):125–140CrossRefMATH
45.
Zurück zum Zitat Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans Syst Man Cybern Part B Cybern 38(4):937–942CrossRef Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans Syst Man Cybern Part B Cybern 38(4):937–942CrossRef
46.
Zurück zum Zitat Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214MathSciNetCrossRefMATH Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214MathSciNetCrossRefMATH
47.
Zurück zum Zitat Zhang H, Song R, Wei Q, Zhang T (2011) Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–1862CrossRef Zhang H, Song R, Wei Q, Zhang T (2011) Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–1862CrossRef
48.
Zurück zum Zitat Zhang H, Yang D, Chai T (2007) Guaranteed cost networked control for T-S fuzzy systems with time delay. IEEE Trans Syst Man Cybern Part C Appl Rev 37(2):160–172CrossRef Zhang H, Yang D, Chai T (2007) Guaranteed cost networked control for T-S fuzzy systems with time delay. IEEE Trans Syst Man Cybern Part C Appl Rev 37(2):160–172CrossRef
Metadaten
Titel
Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays
verfasst von
Qinglai Wei
Ding Wang
Dehua Zhang
Publikationsdatum
01.12.2013
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 7-8/2013
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-012-1188-7

Weitere Artikel der Ausgabe 7-8/2013

Neural Computing and Applications 7-8/2013 Zur Ausgabe