Adaptive dynamic programming for online solution of a zero-sum differential game

Vrabie, Draguna; Lewis, Frank

doi:10.1007/s11768-011-0166-4

Adaptive dynamic programming for online solution of a zero-sum differential game

Published: 19 July 2011

Volume 9, pages 353–360, (2011)
Cite this article

Journal of Control Theory and Applications Aims and scope Submit manuscript

Draguna Vrabie¹ &
Frank Lewis²

831 Accesses
108 Citations
Explore all metrics

Abstract

This paper will present an approximate/adaptive dynamic programming (ADP) algorithm, that uses the idea of integral reinforcement learning (IRL), to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost. The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation (CT-GARE), which underlies the game problem. We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The feasibility of the ADP scheme is demonstrated in simulation for a power system control application. The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Newton’s Method, Bellman Recursion and Differential Dynamic Programming for Unconstrained Nonlinear Dynamic Games

Article 11 November 2021

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Article 12 December 2023

Online Iterative Adaptive Dynamic Programming Approach for Solving the Zero-Sum Game for Nonlinear Continuous-Time Systems with Partially Unknown Dynamics

References

T. Basar, P. Bernhard. H_∞ Optimal Control and Related Minimax Design Problems. Boston: Birkhuser, 1995.
MATH Google Scholar
T. Basar, G. J. Olsder. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics 23). 2nd ed. Philadelphia: SIAM, 1999.
Google Scholar
J. Doyle, K. Glover, P. Khargonekar, et al. State-space solutions to standard H₂ and H_∞ control problems. IEEE Transactions on Automatic Control, 1989, 34(8): 831–847
Article MathSciNet MATH Google Scholar
A. A. Stoorvogel. The H_∞ Control Problem: A State Space Approach. New York: Prentice Hall, 1992.
MATH Google Scholar
K. Zhou, P. P. Khargonekar. An algebraic Riccati equation approach to H_∞ optimization. Systems & Control Letters, 1988, 11(2): 85–91.
Article MathSciNet MATH Google Scholar
L. Cherfi, H. Abou-Kandil, H. Bourles. Iterative method for general algebraic Riccati equation. Proceedings of International Conference on Automatic Control and System Engineering, Cairo, Egypt, 2005: 85–88.
T. Damm. Rational Matrix Equations in Stochastic Control. Berlin: Springer-Verlag, 2004.
MATH Google Scholar
A. Lanzon, Y. Feng, B. D. O. Anderson, et al. Computing the positive stabilizing solution to algebraic Riccati equations with an indefinite quadratic term via a recursive method. IEEE Transactions on Automation Control, 2008, 53(10): 2280–2291.
Article MathSciNet Google Scholar
M. Abu-Khalaf, F. L. Lewis, J. Huang. Policy iterations and the Hamilton-Jacobi-Isaacs equation for H^∞ state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989–1995.
Article MathSciNet Google Scholar
Y. Feng, B. D. O. Anderson, M. Rotkowitz. A game theoretic algorithm to compute local stabilizing solutions to HJBI equations in nonlinear H^∞ control. Automatica, 2009, 45(4): 881–888.
Article MathSciNet MATH Google Scholar
A. J. van der Schaft. L ₂-gain analysis of nonlinear systems and nonlinear state feedback H_∞ control. IEEE Transactions on Automatic Control, 1992, 37(6): 770–784.
Article MATH Google Scholar
R. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 1988, 3(1): 9–44.
Google Scholar
P. J. Werbos. Approximate dynamic programming for real-time control and neural modeling. D. White, D. Sofge, eds. Handbook of Intelligent Control, Neural, Fuzzy, and, Adaptive Approaches, New York: Van Nostrand, 1992: 493–525.
Google Scholar
C. Watkins. Learning from Delayed Rewards. Ph.D. thesis. Cambridge, U.K.: Cambridge University, 1989.
Google Scholar
Q. Wei, H. Zhang. A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. Proceedings of IEEE International Conference on Networking, Sensing and Control, New York: IEEE, 2008: 507–512.
Chapter Google Scholar
D. Vrabie, M. Abu-Khalaf, F. L. Lewis, et al. Continuoustime ADP for linear systems with partially unknown dynamics. Proceedings of Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), New York: IEEE, 2007: 247–253.
Chapter Google Scholar
D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484.
Article MathSciNet MATH Google Scholar
F. L. Lewis, V. L. Syrmos. Optimal Control. New York: John Wiley & Sons, 1995.
Google Scholar
J. Speyer, D. Jacobson. Primer on Optimal Control Theory. Philadelphia: SIAM, 2010.
Book MATH Google Scholar
J. W. Brewer. Kronecker products and matrix calculus in system theory. IEEE Transactions on Circuit and System, 1978, 25(9): 772–781.
Article MathSciNet MATH Google Scholar
Y. Wang, R. Zhou, C. Wen. Robust load-frequency controller design for power systems. IEE Proceedings — C: Generation, Transmission, and Distribution, 1993, 140(1): 11–16.
Article Google Scholar

Download references

Author information

Authors and Affiliations

United Technologies Research Center, East Hartford, CT, 06108, USA
Draguna Vrabie
Automation and Robotics Research Institute, University of Texas at Arlington, Fort Worth, TX, 76118, USA
Frank Lewis

Authors

Draguna Vrabie
View author publications
You can also search for this author in PubMed Google Scholar
Frank Lewis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Draguna Vrabie.

Additional information

This work was supported by the National Science Foundation (No.ECCS-0801330), and the Army Research Office (No.W91NF-05-1-0314).

Draguna VRABIE is a senior research scientist at the United Technologies Research Center, East Hartford, Connecticut. She received her B.S. in 2003 and M.S. degrees in 2004 from the Automatic Control and Computer Engineering Department, ‘Gh. Asachi’ Technical University of Iasi, and her Ph.D. in Electrical Engineering in 2009 from the University of Texas at Arlington. She is coauthor of the book ‘Automatic Systems with PID Controllers’, 3 book chapters, and 25 technical publications. She received the Best Paper award at the International Joint Conference on Neural Networks (IJCNN’10), Barcelona, Spain, 2010, and the Best Student award from the Automation & Robotics Research Institute, University of Texas at Arlington, in 2009. She serves as Associate Editor for the IEEE Transactions on Neural Networks, and the Transaction of the Institute of Measurement and Control. She serves on the Technical Program Committee for several international conferences.

Frank LEWIS was born in Würzburg, Germany, subsequently studying in Chile and Gordonstoun School in Scotland. He obtained his Bachelor’s degree in Physics/Electrical Engineering and Master’s of Electrical Engineering degree at Rice University in 1971. He spent six years in the U.S. Navy, serving as Navigator aboard the frigate USS Trippe (FF-1075), and Executive Officer and Acting Commanding Officer aboard USS Salinan (ATF-161). In 1977, he received his Master’s of Science degree in Aeronautical Engineering from the University of West Florida. In 1981, he obtained his Ph.D. degree at the Georgia Institute of Technology in Atlanta, where he was employed as a professor from 1981 to 1990. He is a professor of Electrical Engineering at the University of Texas at Arlington, where he was awarded the Moncrief-O’Donnell Endowed Chair in 1990 at the Automation & Robotics Research Institute. He is Fellow of the IEEE, Fellow of IFAC, Fellow of the U.K. Institute of Measurement & Control, and Member of the New York Academy of Sciences. Registered Professional Engineer in the State of Texas and Chartered Engineer, U.K. Engineering Council. Charter Member (2004) of the UTA Academy of Distinguished Scholars and Senior Research Fellow of the Automation & Robotics Research Institute. Founding Member of the Board of Governors of the Mediterranean Control Association. Has served as Visiting Professor at Democritus University in Greece, Hong Kong University of Science and Technology, Chinese University of Hong Kong, City University of Hong Kong, National University of Singapore, and Nanyang Technological University Singapore. Elected Guest Consulting Professor at Shanghai Jiao Tong University and South China University of Technology.

Current interests include intelligent control, distributed control on graphs, neural and fuzzy systems, wireless sensor networks, nonlinear systems, robotics, condition-based maintenance, microelectromechanical systems (MEMS) control, and manufacturing process control. Author of 6 U.S. patents, 222 journal papers, 47 chapters and encyclopedia articles, 333 refereed conference papers, and 14 books including ‘Optimal Control, Optimal Estimation, Applied Optimal Control and Estimation, Aircraft Control and Simulation, Control of Robot Manipulators’, ‘Neural Network Control, High-Level Feedback Control with Neural Networks’ and the IEEE reprint volume ‘Robot Control’. Editor of Taylor & Francis Book Series on Automation & Control Engineering. Served/serves on many Editorial Boards including International Journal of Control, Neural Computing and Applications, Optimal Control & Methods, and International Journal of Intelligent Control Systems. Served as Editor for the flagship journal Automatica. Recipient of NSF Research Initiation Grant and continuously funded by NSF since 1982. Since 1991, he has received $7 million in funding from NSF, ARO, AFOSR and other government agencies, including significant DoD SBIR and industry funding. His SBIR program was instrumental in ARRI’s receipt of the US SBA Tibbets Award in 1996. Received Fulbright Research Award 1988, American Society of Engineering Education F.E. Terman Award 1989, International Neural Network Society Gabor Award 2009, U.K. Inst Measurement & Control Honeywell Field Engineering Medal 2009, three Sigma Xi Research Awards, UTA Halliburton Engineering Research Award, UTA Distinguished Research Award, ARRI Patent Awards, various Best Paper Awards, IEEE Control Systems Society Best Chapter Award (as Founding Chairman of DFW Chapter), and National Sigma Xi Award for Outstanding Chapter (as President of UTA Chapter). Received Outstanding Service Award from the Dallas IEEE Section and selected as Engineer of the year by Ft. Worth IEEE Section. Listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Appointed to NAE Committee on Space Station in 1995 and IEEE Control Systems Society Board of Governors in 1996. Selected in 1998 as an IEEE Control Systems Society Distinguished Lecturer. Received the 2010 IEEE Region 5 Outstanding Engineering Educator Award and the 2010 UTA Graduate Dean’s Excellence in Doctoral Mentoring Award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vrabie, D., Lewis, F. Adaptive dynamic programming for online solution of a zero-sum differential game. J. Control Theory Appl. 9, 353–360 (2011). https://doi.org/10.1007/s11768-011-0166-4

Download citation

Received: 14 July 2010
Revised: 23 May 2011
Published: 19 July 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s11768-011-0166-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive dynamic programming for online solution of a zero-sum differential game

Abstract

Access this article

Similar content being viewed by others

Newton’s Method, Bellman Recursion and Differential Dynamic Programming for Unconstrained Nonlinear Dynamic Games

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Online Iterative Adaptive Dynamic Programming Approach for Solving the Zero-Sum Game for Nonlinear Continuous-Time Systems with Partially Unknown Dynamics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive dynamic programming for online solution of a zero-sum differential game

Abstract

Access this article

Similar content being viewed by others

Newton’s Method, Bellman Recursion and Differential Dynamic Programming for Unconstrained Nonlinear Dynamic Games

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Online Iterative Adaptive Dynamic Programming Approach for Solving the Zero-Sum Game for Nonlinear Continuous-Time Systems with Partially Unknown Dynamics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation