2017 | Book

# Adaptive Dynamic Programming with Applications in Optimal Control

Authors: Derong Liu, Qinglai Wei, Ding Wang, Xiong Yang, Hongliang Li

Publisher: Springer International Publishing

Book Series : Advances in Industrial Control

2017 | Book

Authors: Derong Liu, Qinglai Wei, Ding Wang, Xiong Yang, Hongliang Li

Publisher: Springer International Publishing

Book Series : Advances in Industrial Control

This book covers the most recent developments in adaptive dynamic programming (ADP). The text begins with a thorough background review of ADP making sure that readers are sufficiently familiar with the fundamentals. In the core of the book, the authors address first discrete- and then continuous-time systems. Coverage of discrete-time systems starts with a more general form of value iteration to demonstrate its convergence, optimality, and stability with complete and thorough theoretical analysis. A more realistic form of value iteration is studied where value function approximations are assumed to have finite errors. Adaptive Dynamic Programming also details another avenue of the ADP approach: policy iteration. Both basic and generalized forms of policy-iteration-based ADP are studied with complete and thorough theoretical analysis in terms of convergence, optimality, stability, and error bounds. Among continuous-time systems, the control of affine and nonaffine nonlinear systems is studied using the ADP approach which is then extended to other branches of control theory including decentralized control, robust and guaranteed cost control, and game theory. In the last part of the book the real-world significance of ADP theory is presented, focusing on three application examples developed from the authors’ work:

• renewable energy scheduling for smart power grids;• coal gasification processes; and• water–gas shift reactions.

Researchers studying intelligent control methods and practitioners looking to apply them in the chemical-process and power-supply industries will find much to interest them in this thorough treatment of an advanced approach to control.

Advertisement

Abstract

This chapter reviews the development of adaptive dynamic programming (ADP). It starts with a background overview of reinforcement learning and dynamic programming. It then moves on to the basic forms of ADP and then to the iterative forms. ADP is an emerging advanced control technology developed for nonlinear dynamical systems. It is based on the idea of approximating dynamic programming solutions. Dynamic programming was introduced by Bellman in the 1950’s for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the cost function. The optimal control signal can then be determined by minimizing the cost function (or maximizing a reward function). Due to their universal approximation capability, artificial neural networks are often used to represent the cost function in dynamic programming. The implementation of ADP usually requires the use of three modules—critic, model, and action. These three modules perform the function of evaluation, prediction, and decision, respectively.

Abstract

In this chapter, optimal control problems of discrete-time nonlinear systems, including optimal regulation, optimal tracking control, and constrained optimal control, are studied by using a series of value iteration (VI) adaptive dynamic programming (ADP) approaches. First, an ADP scheme based on general value iteration (GVI) is developed to obtain near optimal control for discrete-time affine nonlinear systems. Then, the GVI-based ADP algorithm is employed to solve the infinite-horizon optimal tracking control problem for a class of discrete-time nonlinear systems. Moreover, using the globalized dual heuristic programming technique, the VI-based optimal control strategy of unknown discrete-time nonlinear systems with input constraints is established as a special case. Finally, an iterative \(\theta \)-ADP algorithm is developed to solve the optimal control problem of infinite-horizon discrete-time nonlinear systems, which shows that each of the iterative controls can stabilize the nonlinear system and the condition of initial admissible control is avoided effectively. Simulation examples are included to verify the effectiveness of the present control strategies.

Abstract

In this chapter, iterative adaptive dynamic programming (ADP) algorithms are developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. idea is to use iterative ADP algorithms to obtain the iterative control laws that guarantee the iterative value functions to reach the optimums. Then, the numerical optimal control problems are solved by an adaptive learning control scheme based on ADP algorithm. Stability properties of the system under the numerical iterative controls are proved which allow the present iterative ADP algorithm to be implemented both online and offline. Moreover, a general value iteration (GVI) algorithm with finite approximation errors is developed to guarantee the iterative value function to converge to the solution of the Bellman equation. The GVI algorithm permits an arbitrary positive semidefinite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. Simulation examples are also included to demonstrate the effectiveness of the present control strategies.

Abstract

This chapter is concerned with discrete-time policy iteration adaptive dynamic programming (ADP) methods for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use a policy iteration ADP technique to obtain the iterative control laws which minimize the iterative value functions. The main contribution of this chapter is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems. It shows that the iterative value function is nonincreasingly convergent to the optimal solution of the Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear system. Neural networks are used to approximate the iterative value functions and compute the iterative control laws, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, numerical results and analysis are presented to illustrate the performance of the present method.

Abstract

In this chapter, generalized policy iteration (GPI) algorithms are developed to solve infinite-horizon optimal control problems for discrete-time nonlinear systems. GPI algorithms use the idea of interacting policy iteration and value iteration algorithms of adaptive dynamic programming (ADP). They permit an arbitrary positive semidefinite function to initialize the algorithm, where two revolving iterations are used for policy evaluation and policy improvement, respectively. Then, the monotonicity, convergence, admissibility, and optimality properties of the present GPI algorithms for discrete-time nonlinear systems are analyzed. For implementation of the GPI algorithms, neural networks are employed for approximating the iterative value functions and computing the iterative control laws, respectively, to obtain the approximate optimal control law. Simulation examples are included to verify the effectiveness of the present algorithm.

Abstract

In this chapter, we first establish error bounds of adaptive dynamic programming (ADP) algorithms for solving undiscounted infinite-horizon optimal control problems of discrete-time deterministic nonlinear dynamical systems. We establish the error bounds for approximate value iteration, approximate policy iteration, and approximate optimistic policy iteration algorithms based on a new error condition. It is shown that the iterative approximate value function converges to a finite neighborhood of the optimal value function under some mild conditions. In addition, we also establish the error bound for Q-function of approximate policy iteration for optimal control of unknown discounted discrete-time nonlinear dynamical systems. We develop an iterative ADP algorithm by using Q-function which depends on the state and action to solve the nonlinear optimal control problems. Function approximation structures such as neural networks are used to approximate the Q-function and the control policy. These results provide theoretical guarantees for using neural network approximation to solve optimal control problems for nonlinear dynamical systems.

Abstract

In this chapter, optimal control problems of continuous-time affine nonlinear systems are studied using adaptive dynamic programming (ADP) approach. First, an identifier–critic architecture based on ADP methods is presented to derive the approximate optimal control for partially unknown continuous-time nonlinear systems. Based on the ADP approach developed in this chapter, the identifier neural network (NN) and the critic NN are tuned simultaneously. Meanwhile, using recorded and instantaneous data simultaneously for the adaptation of the critic NN, the restrictive persistence of excitation condition is relaxed. Second, an ADP algorithm is developed to obtain the optimal control for continuous-time nonlinear systems with control constraints. By using the present algorithm, a single critic NN is utilized to derive the optimal control. Moreover, unlike in the case of policy iteration, where an initial stabilizing control is indispensable, there is no special requirement imposed on the initial control law.

Abstract

In this chapter, we consider optimal control problems of continuous-time nonaffine nonlinear systems with completely unknown dynamics via adaptive dynamic programming (ADP) methods. First, we develop an ADP-based identifier–actor–critic architecture to obtain the approximate optimal control for continuous-time unknown nonaffine nonlinear systems. The identifier is constructed by a dynamic neural network, which transforms nonaffine nonlinear systems into a kind of affine nonlinear systems. After that, the actor–critic dual networks are employed to derive the optimal control for the newly formulated affine nonlinear systems. Second, we present an ADP-based observer–critic architecture to obtain the approximate optimal output regulation for unknown nonaffine nonlinear systems. The present observer is composed of a three-layer feedforward neural network, which aims to obtain the knowledge of system states. Meanwhile, a single critic neural network is employed for estimating the performance of the systems as well as for constructing the optimal control signal.

Abstract

In this chapter, the robust control and optimal guaranteed cost control of continuous-time uncertain nonlinear systems are studied using adaptive dynamic programming (ADP) methods. First, a novel strategy is established to design the robust controller for a class of nonlinear systems with uncertainties based on online policy iteration algorithm. By properly choosing a cost function that reflects the uncertainties, states, and controls, the robust control problem is transformed into an optimal control problem, which is solved under the framework of ADP. Then, the infinite horizon optimal guaranteed cost control of uncertain nonlinear systems is investigated. A critic neural network is constructed to facilitate the solution of the modified Hamilton–Jacobi–Bellman equation corresponding to the nominal system. An additional stabilizing term is introduced to ensure stability, which reinforces the updating process of the weight vector and reduces the requirement of an initial stabilizing control. The uniform ultimate boundedness of the closed-loop system is analyzed by using the Lyapunov’s direct approach. Simulation examples are provided to verify the effectiveness of the present control approaches.

Abstract

In this chapter, by using neural network (NN)-based online learning optimal control approach, a decentralized control strategy is developed to stabilize a class of continuous-time large-scale interconnected nonlinear systems. It is proven that the decentralized control strategy of the overall system can be established by adding appropriate feedback gains to the optimal control laws of the isolated subsystems. Then, an online policy iteration (PI) algorithm is developed to solve the Hamilton–Jacobi–Bellman equations related to the optimal control problem. By constructing a set of critic NNs, the cost functions can be obtained by NN approximation, followed by the control laws. Furthermore, as a generalization, an NN-based decentralized control law is developed to stabilize the large-scale interconnected nonlinear systems using an online model-free integral PI algorithm. The model-free PI approach can solve the decentralized control problem for large-scale interconnected nonlinear systems with unknown dynamics. Finally, two simulation examples are provided to illustrate the effectiveness of the present decentralized control scheme.

Abstract

In this chapter, differential games are studied for continuous-time linear and nonlinear systems, including two-player zero-sum games, multi-player zero-sum games, and multi-player nonzero-sum games, via a series of adaptive dynamic programming (ADP) approaches. First, an integral policy iteration algorithm is developed to learn online the Nash equilibrium of two-player zero-sum differential games with completely unknown continuous-time linear dynamics using the state and control data. Second, multi-player zero-sum differential games for a class of continuous-time uncertain nonlinear systems are solved by using a novel iterative ADP algorithm. Via neural network modeling for the system dynamics, the ADP technique is employed to obtain the optimal control pair iteratively so that the iterative value function reaches the optimal solution of the zero-sum differential games. Finally, an online synchronous approximate optimal learning algorithm based on policy iteration is developed to solve multi-player nonzero-sum games of continuous-time nonlinear systems without the requirement of exact knowledge of system dynamics.

Abstract

In the present chapter, intelligent dynamic optimization methods based on adaptive dynamic programming (ADP) are applied to deal with the challenges of intelligent price-responsive management of residential energy, with an emphasis on home battery connected to the power grid. First, an action-dependent heuristic dynamic programming method is developed to obtain the optimal residential energy control law. Second, a dual iterative Q-learning algorithm is developed to solve the optimal battery management and control problem in smart residential environments where two iterations, internal and external iterations, are employed. Based on the dual iterative Q-learning algorithm, the convergence property of iterative Q-learning method for the optimal battery management and control problem is proven. Finally, a distributed iterative ADP method is developed to solve the multi-battery optimal coordination control problems for home energy management systems.

Abstract

In this chapter, a coal gasification optimal tracking control problem is solved through a data-based optimal learning control scheme using iterative adaptive dynamic programming (ADP) approach. According to the system data, neural networks (NNs) are used to construct the dynamics of coal gasification process, the coal quality function, and the reference control, respectively, where the mathematical model of the system is unnecessary. The approximation errors from NN construction of the disturbance and the controls are both considered. Via system transformation, the optimal tracking control problem with approximation errors and disturbances is effectively transformed into a two-person zero-sum game. An iterative ADP algorithm is then developed to obtain the optimal control laws for the transformed system. Convergence property is developed to guarantee that the cost function converges to a finite neighborhood of the optimal cost function, and the convergence criterion is also obtained. Finally, numerical results are given to illustrate the performance of the present method.

Abstract

In this chapter, a data-driven stable iterative adaptive dynamic programming (ADP) algorithm is developed to solve the optimal temperature control problem for water gas shift (WGS) reaction system. According to the system data, neural networks (NNs) are used to construct the dynamics of WGS system and solve the reference control, respectively, where the mathematical model of the WGS system is unnecessary. Considering the reconstruction errors of NNs and the disturbances of the system and control input, a stable iterative ADP algorithm is developed to obtain the optimal control law. Convergence property is developed to guarantee that the iterative value function converges to a finite neighborhood of the optimal cost function. Stability property is developed so that each of the iterative control laws can guarantee the tracking error to be uniformly ultimately bounded. NNs are employed to implement the stable iterative ADP algorithm. Finally, numerical results are given to illustrate the effectiveness of the present method.