Abstract
This paper will present an approximate/adaptive dynamic programming (ADP) algorithm, that uses the idea of integral reinforcement learning (IRL), to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost. The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation (CT-GARE), which underlies the game problem. We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The feasibility of the ADP scheme is demonstrated in simulation for a power system control application. The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.
Similar content being viewed by others
References
T. Basar, P. Bernhard. H∞ Optimal Control and Related Minimax Design Problems. Boston: Birkhuser, 1995.
T. Basar, G. J. Olsder. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics 23). 2nd ed. Philadelphia: SIAM, 1999.
J. Doyle, K. Glover, P. Khargonekar, et al. State-space solutions to standard H2 and H∞ control problems. IEEE Transactions on Automatic Control, 1989, 34(8): 831–847
A. A. Stoorvogel. The H∞ Control Problem: A State Space Approach. New York: Prentice Hall, 1992.
K. Zhou, P. P. Khargonekar. An algebraic Riccati equation approach to H∞ optimization. Systems & Control Letters, 1988, 11(2): 85–91.
L. Cherfi, H. Abou-Kandil, H. Bourles. Iterative method for general algebraic Riccati equation. Proceedings of International Conference on Automatic Control and System Engineering, Cairo, Egypt, 2005: 85–88.
T. Damm. Rational Matrix Equations in Stochastic Control. Berlin: Springer-Verlag, 2004.
A. Lanzon, Y. Feng, B. D. O. Anderson, et al. Computing the positive stabilizing solution to algebraic Riccati equations with an indefinite quadratic term via a recursive method. IEEE Transactions on Automation Control, 2008, 53(10): 2280–2291.
M. Abu-Khalaf, F. L. Lewis, J. Huang. Policy iterations and the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989–1995.
Y. Feng, B. D. O. Anderson, M. Rotkowitz. A game theoretic algorithm to compute local stabilizing solutions to HJBI equations in nonlinear H∞ control. Automatica, 2009, 45(4): 881–888.
A. J. van der Schaft. L 2-gain analysis of nonlinear systems and nonlinear state feedback H∞ control. IEEE Transactions on Automatic Control, 1992, 37(6): 770–784.
R. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 1988, 3(1): 9–44.
P. J. Werbos. Approximate dynamic programming for real-time control and neural modeling. D. White, D. Sofge, eds. Handbook of Intelligent Control, Neural, Fuzzy, and, Adaptive Approaches, New York: Van Nostrand, 1992: 493–525.
C. Watkins. Learning from Delayed Rewards. Ph.D. thesis. Cambridge, U.K.: Cambridge University, 1989.
Q. Wei, H. Zhang. A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. Proceedings of IEEE International Conference on Networking, Sensing and Control, New York: IEEE, 2008: 507–512.
D. Vrabie, M. Abu-Khalaf, F. L. Lewis, et al. Continuoustime ADP for linear systems with partially unknown dynamics. Proceedings of Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), New York: IEEE, 2007: 247–253.
D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484.
F. L. Lewis, V. L. Syrmos. Optimal Control. New York: John Wiley & Sons, 1995.
J. Speyer, D. Jacobson. Primer on Optimal Control Theory. Philadelphia: SIAM, 2010.
J. W. Brewer. Kronecker products and matrix calculus in system theory. IEEE Transactions on Circuit and System, 1978, 25(9): 772–781.
Y. Wang, R. Zhou, C. Wen. Robust load-frequency controller design for power systems. IEE Proceedings — C: Generation, Transmission, and Distribution, 1993, 140(1): 11–16.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Science Foundation (No.ECCS-0801330), and the Army Research Office (No.W91NF-05-1-0314).
Draguna VRABIE is a senior research scientist at the United Technologies Research Center, East Hartford, Connecticut. She received her B.S. in 2003 and M.S. degrees in 2004 from the Automatic Control and Computer Engineering Department, ‘Gh. Asachi’ Technical University of Iasi, and her Ph.D. in Electrical Engineering in 2009 from the University of Texas at Arlington. She is coauthor of the book ‘Automatic Systems with PID Controllers’, 3 book chapters, and 25 technical publications. She received the Best Paper award at the International Joint Conference on Neural Networks (IJCNN’10), Barcelona, Spain, 2010, and the Best Student award from the Automation & Robotics Research Institute, University of Texas at Arlington, in 2009. She serves as Associate Editor for the IEEE Transactions on Neural Networks, and the Transaction of the Institute of Measurement and Control. She serves on the Technical Program Committee for several international conferences.
Frank LEWIS was born in Würzburg, Germany, subsequently studying in Chile and Gordonstoun School in Scotland. He obtained his Bachelor’s degree in Physics/Electrical Engineering and Master’s of Electrical Engineering degree at Rice University in 1971. He spent six years in the U.S. Navy, serving as Navigator aboard the frigate USS Trippe (FF-1075), and Executive Officer and Acting Commanding Officer aboard USS Salinan (ATF-161). In 1977, he received his Master’s of Science degree in Aeronautical Engineering from the University of West Florida. In 1981, he obtained his Ph.D. degree at the Georgia Institute of Technology in Atlanta, where he was employed as a professor from 1981 to 1990. He is a professor of Electrical Engineering at the University of Texas at Arlington, where he was awarded the Moncrief-O’Donnell Endowed Chair in 1990 at the Automation & Robotics Research Institute. He is Fellow of the IEEE, Fellow of IFAC, Fellow of the U.K. Institute of Measurement & Control, and Member of the New York Academy of Sciences. Registered Professional Engineer in the State of Texas and Chartered Engineer, U.K. Engineering Council. Charter Member (2004) of the UTA Academy of Distinguished Scholars and Senior Research Fellow of the Automation & Robotics Research Institute. Founding Member of the Board of Governors of the Mediterranean Control Association. Has served as Visiting Professor at Democritus University in Greece, Hong Kong University of Science and Technology, Chinese University of Hong Kong, City University of Hong Kong, National University of Singapore, and Nanyang Technological University Singapore. Elected Guest Consulting Professor at Shanghai Jiao Tong University and South China University of Technology.
Current interests include intelligent control, distributed control on graphs, neural and fuzzy systems, wireless sensor networks, nonlinear systems, robotics, condition-based maintenance, microelectromechanical systems (MEMS) control, and manufacturing process control. Author of 6 U.S. patents, 222 journal papers, 47 chapters and encyclopedia articles, 333 refereed conference papers, and 14 books including ‘Optimal Control, Optimal Estimation, Applied Optimal Control and Estimation, Aircraft Control and Simulation, Control of Robot Manipulators’, ‘Neural Network Control, High-Level Feedback Control with Neural Networks’ and the IEEE reprint volume ‘Robot Control’. Editor of Taylor & Francis Book Series on Automation & Control Engineering. Served/serves on many Editorial Boards including International Journal of Control, Neural Computing and Applications, Optimal Control & Methods, and International Journal of Intelligent Control Systems. Served as Editor for the flagship journal Automatica. Recipient of NSF Research Initiation Grant and continuously funded by NSF since 1982. Since 1991, he has received $7 million in funding from NSF, ARO, AFOSR and other government agencies, including significant DoD SBIR and industry funding. His SBIR program was instrumental in ARRI’s receipt of the US SBA Tibbets Award in 1996. Received Fulbright Research Award 1988, American Society of Engineering Education F.E. Terman Award 1989, International Neural Network Society Gabor Award 2009, U.K. Inst Measurement & Control Honeywell Field Engineering Medal 2009, three Sigma Xi Research Awards, UTA Halliburton Engineering Research Award, UTA Distinguished Research Award, ARRI Patent Awards, various Best Paper Awards, IEEE Control Systems Society Best Chapter Award (as Founding Chairman of DFW Chapter), and National Sigma Xi Award for Outstanding Chapter (as President of UTA Chapter). Received Outstanding Service Award from the Dallas IEEE Section and selected as Engineer of the year by Ft. Worth IEEE Section. Listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Appointed to NAE Committee on Space Station in 1995 and IEEE Control Systems Society Board of Governors in 1996. Selected in 1998 as an IEEE Control Systems Society Distinguished Lecturer. Received the 2010 IEEE Region 5 Outstanding Engineering Educator Award and the 2010 UTA Graduate Dean’s Excellence in Doctoral Mentoring Award.
Rights and permissions
About this article
Cite this article
Vrabie, D., Lewis, F. Adaptive dynamic programming for online solution of a zero-sum differential game. J. Control Theory Appl. 9, 353–360 (2011). https://doi.org/10.1007/s11768-011-0166-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768-011-0166-4