Skip to main content

2020 | OriginalPaper | Buchkapitel

An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control

verfasst von : Jia Li, Zhaolin Yuan, Xiaojuan Ban

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For complicated processing industrial area, model-free adaptive control in data-driven schema is a classic problem. This paper proposes an improved reinforcement learning (RL) based heuristic dynamic programming algorithm for optimal tracking control in industrial system. The proposed method designs a double neural networks framework and employs a gradient-based optimization schema to present the optimal control law. Inspired by the experience replay buffer in deep RL learning, historical system trajectories in short-term are also considered in the training phase which achieves the stabilization of network learning. An experimental study based on an simulated industrial device shows that the proposed method is superior to other algorithms in terms of time consumption and control accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Shen, Y., Hao, L., Ding, S.X.: Real-time implementation of fault tolerant control systems with performance optimization. IEEE Trans. Ind. Electron. 61(5), 2402–2411 (2014)CrossRef Shen, Y., Hao, L., Ding, S.X.: Real-time implementation of fault tolerant control systems with performance optimization. IEEE Trans. Ind. Electron. 61(5), 2402–2411 (2014)CrossRef
2.
Zurück zum Zitat Kouro, S., Cortes, P., Vargas, R., Ammann, U., Rodriguez, J.: Model predictive control - a simple and powerful method to control power converters. IEEE Trans. Ind. Electron. 56(6), 1826–1838 (2009)CrossRef Kouro, S., Cortes, P., Vargas, R., Ammann, U., Rodriguez, J.: Model predictive control - a simple and powerful method to control power converters. IEEE Trans. Ind. Electron. 56(6), 1826–1838 (2009)CrossRef
3.
Zurück zum Zitat Dai, W., Chai, T., Yang, S.X.: Data-driven optimization control for safety operation of hematite grinding process. IEEE Trans. Ind. Electron. 62(5), 2930–2941 (2015)CrossRef Dai, W., Chai, T., Yang, S.X.: Data-driven optimization control for safety operation of hematite grinding process. IEEE Trans. Ind. Electron. 62(5), 2930–2941 (2015)CrossRef
4.
Zurück zum Zitat Wang, D., Liu, D., Zhang, Q., Zhao, D.: Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst. Man Cybern. Syst. 46(11), 1544–1555 (2016)CrossRef Wang, D., Liu, D., Zhang, Q., Zhao, D.: Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst. Man Cybern. Syst. 46(11), 1544–1555 (2016)CrossRef
5.
Zurück zum Zitat Wei, Q.-L., Liu, D.-R.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)MathSciNetCrossRef Wei, Q.-L., Liu, D.-R.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)MathSciNetCrossRef
6.
Zurück zum Zitat Jiang, Y., Fan, J.-L., Chai, T.-Y., Li, J.-N., Lewis, L.F.: Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Trans. Ind. Inform. 14(5), 1974–1989 (2017)CrossRef Jiang, Y., Fan, J.-L., Chai, T.-Y., Li, J.-N., Lewis, L.F.: Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Trans. Ind. Inform. 14(5), 1974–1989 (2017)CrossRef
7.
Zurück zum Zitat Jiang, Y., Fan, J.-L., Chai, T.-Y., Lewis, L.F.: Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Trans. Ind. Electron. 66(6), 4587–4599 (2019)CrossRef Jiang, Y., Fan, J.-L., Chai, T.-Y., Lewis, L.F.: Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Trans. Ind. Electron. 66(6), 4587–4599 (2019)CrossRef
8.
Zurück zum Zitat Modares, H., Lewis, F.L.: Automatica integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input. Automatica 50(1), 193–202 (2014)MathSciNetCrossRef Modares, H., Lewis, F.L.: Automatica integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input. Automatica 50(1), 193–202 (2014)MathSciNetCrossRef
9.
Zurück zum Zitat Mnih, V., Silver, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013, Lake Tahoe, USA NIPS, pp. 1–9 (2013) Mnih, V., Silver, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013, Lake Tahoe, USA NIPS, pp. 1–9 (2013)
10.
Zurück zum Zitat Wang, D., Liu, D.-R., Wei, Q.-L., Zhao, D.-B., Jin, N.: Automatica optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8), 1825–1832 (2012)MathSciNetCrossRef Wang, D., Liu, D.-R., Wei, Q.-L., Zhao, D.-B., Jin, N.: Automatica optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8), 1825–1832 (2012)MathSciNetCrossRef
11.
Zurück zum Zitat Chai, T.-Y., Jia, Y., Li, H.-B., Wang, H.: An intelligent switching control for a mixed separation thickener process. Control Eng. Pract. 57, 61–71 (2016)CrossRef Chai, T.-Y., Jia, Y., Li, H.-B., Wang, H.: An intelligent switching control for a mixed separation thickener process. Control Eng. Pract. 57, 61–71 (2016)CrossRef
12.
Zurück zum Zitat Kim, B.H., Klima, M.S.: Development and application of a dynamic model for hindered-settling column separations. Miner. Eng. 17(3), 403–410 (2004)CrossRef Kim, B.H., Klima, M.S.: Development and application of a dynamic model for hindered-settling column separations. Miner. Eng. 17(3), 403–410 (2004)CrossRef
13.
Zurück zum Zitat Wang, L.-Y., Jia, Y., Chai, T.-Y., Xie, W.-F.: Dual rate adaptive control for mixed separation thickening process using compensation signal based approach. IEEE Trans. Ind. Electron. 1 (2017) Wang, L.-Y., Jia, Y., Chai, T.-Y., Xie, W.-F.: Dual rate adaptive control for mixed separation thickening process using compensation signal based approach. IEEE Trans. Ind. Electron. 1 (2017)
14.
Zurück zum Zitat Wang, M.: Design and development of model software of processes of slurry neutralization, sedimentation and separation. Northeastern University (2011) Wang, M.: Design and development of model software of processes of slurry neutralization, sedimentation and separation. Northeastern University (2011)
15.
Zurück zum Zitat Tang, M.-T.: Hydrometallurgical equipment. Central South University (2009) Tang, M.-T.: Hydrometallurgical equipment. Central South University (2009)
16.
Zurück zum Zitat Lin-Yan, W., Jian, L., Yao, J., Tian-You, C.: Dual-rate intelligent switching control for mixed separation thickening process. Acta Automatica Sinica 44(2), 330–343 (2018) Lin-Yan, W., Jian, L., Yao, J., Tian-You, C.: Dual-rate intelligent switching control for mixed separation thickening process. Acta Automatica Sinica 44(2), 330–343 (2018)
17.
Zurück zum Zitat Luo, B., Liu, D.-R., Huang, T.-W., Wang, D.: Model-free optimal tracking control via critic-only Q-learning. IEEE Trans. Neural Netw. Learn. Syst. 27(10), 2134–2144 (2016)MathSciNetCrossRef Luo, B., Liu, D.-R., Huang, T.-W., Wang, D.: Model-free optimal tracking control via critic-only Q-learning. IEEE Trans. Neural Netw. Learn. Syst. 27(10), 2134–2144 (2016)MathSciNetCrossRef
18.
Zurück zum Zitat Padhi, R., Unnikrishnan, N., Wang, X.-H., Balakrishnan, S.N.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)CrossRef Padhi, R., Unnikrishnan, N., Wang, X.-H., Balakrishnan, S.N.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)CrossRef
Metadaten
Titel
An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control
verfasst von
Jia Li
Zhaolin Yuan
Xiaojuan Ban
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-61616-8_23

Premium Partner