Skip to main content
Log in

Adaptive dynamic programming for online solution of a zero-sum differential game

  • Published:
Journal of Control Theory and Applications Aims and scope Submit manuscript

Abstract

This paper will present an approximate/adaptive dynamic programming (ADP) algorithm, that uses the idea of integral reinforcement learning (IRL), to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost. The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation (CT-GARE), which underlies the game problem. We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The feasibility of the ADP scheme is demonstrated in simulation for a power system control application. The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. T. Basar, P. Bernhard. H Optimal Control and Related Minimax Design Problems. Boston: Birkhuser, 1995.

    MATH  Google Scholar 

  2. T. Basar, G. J. Olsder. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics 23). 2nd ed. Philadelphia: SIAM, 1999.

    Google Scholar 

  3. J. Doyle, K. Glover, P. Khargonekar, et al. State-space solutions to standard H2 and H control problems. IEEE Transactions on Automatic Control, 1989, 34(8): 831–847

    Article  MathSciNet  MATH  Google Scholar 

  4. A. A. Stoorvogel. The H Control Problem: A State Space Approach. New York: Prentice Hall, 1992.

    MATH  Google Scholar 

  5. K. Zhou, P. P. Khargonekar. An algebraic Riccati equation approach to H optimization. Systems & Control Letters, 1988, 11(2): 85–91.

    Article  MathSciNet  MATH  Google Scholar 

  6. L. Cherfi, H. Abou-Kandil, H. Bourles. Iterative method for general algebraic Riccati equation. Proceedings of International Conference on Automatic Control and System Engineering, Cairo, Egypt, 2005: 85–88.

  7. T. Damm. Rational Matrix Equations in Stochastic Control. Berlin: Springer-Verlag, 2004.

    MATH  Google Scholar 

  8. A. Lanzon, Y. Feng, B. D. O. Anderson, et al. Computing the positive stabilizing solution to algebraic Riccati equations with an indefinite quadratic term via a recursive method. IEEE Transactions on Automation Control, 2008, 53(10): 2280–2291.

    Article  MathSciNet  Google Scholar 

  9. M. Abu-Khalaf, F. L. Lewis, J. Huang. Policy iterations and the Hamilton-Jacobi-Isaacs equation for H state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989–1995.

    Article  MathSciNet  Google Scholar 

  10. Y. Feng, B. D. O. Anderson, M. Rotkowitz. A game theoretic algorithm to compute local stabilizing solutions to HJBI equations in nonlinear H control. Automatica, 2009, 45(4): 881–888.

    Article  MathSciNet  MATH  Google Scholar 

  11. A. J. van der Schaft. L 2-gain analysis of nonlinear systems and nonlinear state feedback H control. IEEE Transactions on Automatic Control, 1992, 37(6): 770–784.

    Article  MATH  Google Scholar 

  12. R. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 1988, 3(1): 9–44.

    Google Scholar 

  13. P. J. Werbos. Approximate dynamic programming for real-time control and neural modeling. D. White, D. Sofge, eds. Handbook of Intelligent Control, Neural, Fuzzy, and, Adaptive Approaches, New York: Van Nostrand, 1992: 493–525.

    Google Scholar 

  14. C. Watkins. Learning from Delayed Rewards. Ph.D. thesis. Cambridge, U.K.: Cambridge University, 1989.

    Google Scholar 

  15. Q. Wei, H. Zhang. A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. Proceedings of IEEE International Conference on Networking, Sensing and Control, New York: IEEE, 2008: 507–512.

    Chapter  Google Scholar 

  16. D. Vrabie, M. Abu-Khalaf, F. L. Lewis, et al. Continuoustime ADP for linear systems with partially unknown dynamics. Proceedings of Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), New York: IEEE, 2007: 247–253.

    Chapter  Google Scholar 

  17. D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484.

    Article  MathSciNet  MATH  Google Scholar 

  18. F. L. Lewis, V. L. Syrmos. Optimal Control. New York: John Wiley & Sons, 1995.

    Google Scholar 

  19. J. Speyer, D. Jacobson. Primer on Optimal Control Theory. Philadelphia: SIAM, 2010.

    Book  MATH  Google Scholar 

  20. J. W. Brewer. Kronecker products and matrix calculus in system theory. IEEE Transactions on Circuit and System, 1978, 25(9): 772–781.

    Article  MathSciNet  MATH  Google Scholar 

  21. Y. Wang, R. Zhou, C. Wen. Robust load-frequency controller design for power systems. IEE Proceedings — C: Generation, Transmission, and Distribution, 1993, 140(1): 11–16.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Draguna Vrabie.

Additional information

This work was supported by the National Science Foundation (No.ECCS-0801330), and the Army Research Office (No.W91NF-05-1-0314).

Draguna VRABIE is a senior research scientist at the United Technologies Research Center, East Hartford, Connecticut. She received her B.S. in 2003 and M.S. degrees in 2004 from the Automatic Control and Computer Engineering Department, ‘Gh. Asachi’ Technical University of Iasi, and her Ph.D. in Electrical Engineering in 2009 from the University of Texas at Arlington. She is coauthor of the book ‘Automatic Systems with PID Controllers’, 3 book chapters, and 25 technical publications. She received the Best Paper award at the International Joint Conference on Neural Networks (IJCNN’10), Barcelona, Spain, 2010, and the Best Student award from the Automation & Robotics Research Institute, University of Texas at Arlington, in 2009. She serves as Associate Editor for the IEEE Transactions on Neural Networks, and the Transaction of the Institute of Measurement and Control. She serves on the Technical Program Committee for several international conferences.

Frank LEWIS was born in Würzburg, Germany, subsequently studying in Chile and Gordonstoun School in Scotland. He obtained his Bachelor’s degree in Physics/Electrical Engineering and Master’s of Electrical Engineering degree at Rice University in 1971. He spent six years in the U.S. Navy, serving as Navigator aboard the frigate USS Trippe (FF-1075), and Executive Officer and Acting Commanding Officer aboard USS Salinan (ATF-161). In 1977, he received his Master’s of Science degree in Aeronautical Engineering from the University of West Florida. In 1981, he obtained his Ph.D. degree at the Georgia Institute of Technology in Atlanta, where he was employed as a professor from 1981 to 1990. He is a professor of Electrical Engineering at the University of Texas at Arlington, where he was awarded the Moncrief-O’Donnell Endowed Chair in 1990 at the Automation & Robotics Research Institute. He is Fellow of the IEEE, Fellow of IFAC, Fellow of the U.K. Institute of Measurement & Control, and Member of the New York Academy of Sciences. Registered Professional Engineer in the State of Texas and Chartered Engineer, U.K. Engineering Council. Charter Member (2004) of the UTA Academy of Distinguished Scholars and Senior Research Fellow of the Automation & Robotics Research Institute. Founding Member of the Board of Governors of the Mediterranean Control Association. Has served as Visiting Professor at Democritus University in Greece, Hong Kong University of Science and Technology, Chinese University of Hong Kong, City University of Hong Kong, National University of Singapore, and Nanyang Technological University Singapore. Elected Guest Consulting Professor at Shanghai Jiao Tong University and South China University of Technology.

Current interests include intelligent control, distributed control on graphs, neural and fuzzy systems, wireless sensor networks, nonlinear systems, robotics, condition-based maintenance, microelectromechanical systems (MEMS) control, and manufacturing process control. Author of 6 U.S. patents, 222 journal papers, 47 chapters and encyclopedia articles, 333 refereed conference papers, and 14 books including ‘Optimal Control, Optimal Estimation, Applied Optimal Control and Estimation, Aircraft Control and Simulation, Control of Robot Manipulators’, ‘Neural Network Control, High-Level Feedback Control with Neural Networks’ and the IEEE reprint volume ‘Robot Control’. Editor of Taylor & Francis Book Series on Automation & Control Engineering. Served/serves on many Editorial Boards including International Journal of Control, Neural Computing and Applications, Optimal Control & Methods, and International Journal of Intelligent Control Systems. Served as Editor for the flagship journal Automatica. Recipient of NSF Research Initiation Grant and continuously funded by NSF since 1982. Since 1991, he has received $7 million in funding from NSF, ARO, AFOSR and other government agencies, including significant DoD SBIR and industry funding. His SBIR program was instrumental in ARRI’s receipt of the US SBA Tibbets Award in 1996. Received Fulbright Research Award 1988, American Society of Engineering Education F.E. Terman Award 1989, International Neural Network Society Gabor Award 2009, U.K. Inst Measurement & Control Honeywell Field Engineering Medal 2009, three Sigma Xi Research Awards, UTA Halliburton Engineering Research Award, UTA Distinguished Research Award, ARRI Patent Awards, various Best Paper Awards, IEEE Control Systems Society Best Chapter Award (as Founding Chairman of DFW Chapter), and National Sigma Xi Award for Outstanding Chapter (as President of UTA Chapter). Received Outstanding Service Award from the Dallas IEEE Section and selected as Engineer of the year by Ft. Worth IEEE Section. Listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Appointed to NAE Committee on Space Station in 1995 and IEEE Control Systems Society Board of Governors in 1996. Selected in 1998 as an IEEE Control Systems Society Distinguished Lecturer. Received the 2010 IEEE Region 5 Outstanding Engineering Educator Award and the 2010 UTA Graduate Dean’s Excellence in Doctoral Mentoring Award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vrabie, D., Lewis, F. Adaptive dynamic programming for online solution of a zero-sum differential game. J. Control Theory Appl. 9, 353–360 (2011). https://doi.org/10.1007/s11768-011-0166-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-011-0166-4

Keywords

Navigation