Skip to main content
Top
Published in: Neural Computing and Applications 12/2022

26-02-2022 | Original Article

Improving actor-critic structure by relatively optimal historical information for discrete system

Authors: Xinyu Zhang, Weidong Li, Xiaoke Zhu, Xiao-Yuan Jing

Published in: Neural Computing and Applications | Issue 12/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, actor-critic structure based neural networks are widely used in many reinforcement learning tasks. It consists of two main parts: (i) an actor module which outputs the probability distribution of action, and (ii) a critic module which outputs the predicted value based on the current environment. Actor-critic structure based networks usually need expert demonstration to provide an appropriate pre-training for the actor module, but the demonstration data is often hard or even impossible to obtain. And most of them, such as those used in the maze and robot control tasks, suffer from a lack of proper pre-training and unstable error propagation from the critic module to the actor module, which would result in poor and unstable performance. Therefore, a specially designed module which is called relatively optimal historical information learning (ROHI) is proposed. The proposed ROHI module can record the historical explored information and obtain the relatively optimal information through a customized merging algorithm. Then, the relatively optimal historical information is used to assist in training the actor module during the main learning process. We introduce two complex experimental environments, including the complex maze problem and flipping game, to evaluate the effectiveness of the proposed module. The experimental results demonstrate that the extended models with ROHI can significantly improve the success rate of the original actor-critic structure based models and slightly decrease the number of iteration required to reach the stable phase of value-based networks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef
2.
go back to reference Hausknecht MJ, Lehman J, Miikkulainen R, Stone P (2014) A neuroevolution approach to general atari game playing. IEEE Trans Comput Intell AI Games 6(4):355–366CrossRef Hausknecht MJ, Lehman J, Miikkulainen R, Stone P (2014) A neuroevolution approach to general atari game playing. IEEE Trans Comput Intell AI Games 6(4):355–366CrossRef
3.
go back to reference Omerdic E, Trslic P, Kaknjo A, Weir A, Rao M, Dooly G, Toal D (2020) Geometric insight into the control allocation problem for open-frame rovs and visualisation of solution. Robotics 9(1):7CrossRef Omerdic E, Trslic P, Kaknjo A, Weir A, Rao M, Dooly G, Toal D (2020) Geometric insight into the control allocation problem for open-frame rovs and visualisation of solution. Robotics 9(1):7CrossRef
4.
go back to reference Kuwada S, Aota T, Uehara K, Nara S (2018) Application of chaos in a recurrent neural network to control in ill-posed problems: a novel autonomous robot arm. Biol Cybern 112(5):495–508MathSciNetCrossRef Kuwada S, Aota T, Uehara K, Nara S (2018) Application of chaos in a recurrent neural network to control in ill-posed problems: a novel autonomous robot arm. Biol Cybern 112(5):495–508MathSciNetCrossRef
5.
go back to reference Xu X, Du Z, Chen X, Cai C (2019) Confidence consensus-based model for large-scale group decision making: a novel approach to managing non-cooperative behaviors. Inf Sci 477:410–427CrossRef Xu X, Du Z, Chen X, Cai C (2019) Confidence consensus-based model for large-scale group decision making: a novel approach to managing non-cooperative behaviors. Inf Sci 477:410–427CrossRef
6.
go back to reference Meng F, Tang J, Wang P, Chen X (2018) A programming-based algorithm for interval-valued intuitionistic fuzzy group decision making. Knowl Based Syst 144:122–143CrossRef Meng F, Tang J, Wang P, Chen X (2018) A programming-based algorithm for interval-valued intuitionistic fuzzy group decision making. Knowl Based Syst 144:122–143CrossRef
7.
go back to reference Meng F, An Q, Tan C, Chen X (2017) An approach for group decision making with interval fuzzy preference relations based on additive consistency and consensus analysis. IEEE Trans Syst Man Cybern Syst 47(8):2069–2082CrossRef Meng F, An Q, Tan C, Chen X (2017) An approach for group decision making with interval fuzzy preference relations based on additive consistency and consensus analysis. IEEE Trans Syst Man Cybern Syst 47(8):2069–2082CrossRef
8.
go back to reference Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller MA (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June, vol 32 of JMLR workshop and conference proceedings, 2014, pp 387–395 Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller MA (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June, vol 32 of JMLR workshop and conference proceedings, 2014, pp 387–395
9.
go back to reference Shi W, Song S, Wu C, Chen CLP (2019) Multi pseudo q-learning-based deterministic policy gradient for tracking control of autonomous underwater vehicles. IEEE Trans Neural Netw Learn Syst 30(12):3534–3546MathSciNetCrossRef Shi W, Song S, Wu C, Chen CLP (2019) Multi pseudo q-learning-based deterministic policy gradient for tracking control of autonomous underwater vehicles. IEEE Trans Neural Netw Learn Syst 30(12):3534–3546MathSciNetCrossRef
10.
go back to reference Otto J, Vogel-Heuser B, Niggemann O (2018) Automatic parameter estimation for reusable software components of modular and reconfigurable cyber-physical production systems in the domain of discrete manufacturing. IEEE Trans Ind Inform 14(1):275–282CrossRef Otto J, Vogel-Heuser B, Niggemann O (2018) Automatic parameter estimation for reusable software components of modular and reconfigurable cyber-physical production systems in the domain of discrete manufacturing. IEEE Trans Ind Inform 14(1):275–282CrossRef
11.
go back to reference Simões DA, Lau N, Reis LP (2020) Multi-agent actor centralized-critic with communication. Neurocomputing 390:40–56CrossRef Simões DA, Lau N, Reis LP (2020) Multi-agent actor centralized-critic with communication. Neurocomputing 390:40–56CrossRef
12.
go back to reference Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
13.
go back to reference Wang Y, Li Y, Lan T, Aggarwal V (2019) Deepchunk: deep q-learning for chunk-based caching in wireless data processing networks. IEEE Trans Cogn Commun Netw 5(4):1034–1045CrossRef Wang Y, Li Y, Lan T, Aggarwal V (2019) Deepchunk: deep q-learning for chunk-based caching in wireless data processing networks. IEEE Trans Cogn Commun Netw 5(4):1034–1045CrossRef
14.
go back to reference Bu X (2019) Actor-critic reinforcement learning control of non-strict feedback nonaffine dynamic systems. IEEE Access 7:65569–65578CrossRef Bu X (2019) Actor-critic reinforcement learning control of non-strict feedback nonaffine dynamic systems. IEEE Access 7:65569–65578CrossRef
15.
go back to reference Yang H, Xie X (2020) An actor-critic deep reinforcement learning approach for transmission scheduling in cognitive internet of things systems. IEEE Syst J 14(1):51–60CrossRef Yang H, Xie X (2020) An actor-critic deep reinforcement learning approach for transmission scheduling in cognitive internet of things systems. IEEE Syst J 14(1):51–60CrossRef
16.
go back to reference Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings
17.
go back to reference Degris T, White M, Sutton RS(2012) Linear off-policy actor-critic. In: Proceedings of the 29th international conference on machine learning, ICML, Edinburgh, Scotland, UK, June 26–July 1, 2012 Degris T, White M, Sutton RS(2012) Linear off-policy actor-critic. In: Proceedings of the 29th international conference on machine learning, ICML, Edinburgh, Scotland, UK, June 26–July 1, 2012
18.
go back to reference Mnih V, Badia A.P, Mirza M, Graves A, Lillicrap T.P, Harley T, Silver D, Kavukcuoglu K (2016) Degris2012degris2012. In: Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016 Mnih V, Badia A.P, Mirza M, Graves A, Lillicrap T.P, Harley T, Silver D, Kavukcuoglu K (2016) Degris2012degris2012. In: Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016
19.
go back to reference Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, vol 80 of proceedings of machine learning research, pp 1856–1865 Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, vol 80 of proceedings of machine learning research, pp 1856–1865
20.
go back to reference Passalis N, Tefas A (2020) Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput Appl 32(9):4227–4238CrossRef Passalis N, Tefas A (2020) Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput Appl 32(9):4227–4238CrossRef
21.
go back to reference Aboussalah AM, Lee C (2020) Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization. Expert Syst Appl 140 Aboussalah AM, Lee C (2020) Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization. Expert Syst Appl 140
22.
go back to reference Yang Z, Merrick KE, Jin L, Abbass HA (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Trans Neural Netw Learn Syst 29(11):5174–5184MathSciNetCrossRef Yang Z, Merrick KE, Jin L, Abbass HA (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Trans Neural Netw Learn Syst 29(11):5174–5184MathSciNetCrossRef
23.
go back to reference Xu W, Miao Z, Yu J, Ji Q (2020) Deep reinforcement learning for weak human activity localization. IEEE Trans Image Process 29:1522–1535MathSciNetCrossRef Xu W, Miao Z, Yu J, Ji Q (2020) Deep reinforcement learning for weak human activity localization. IEEE Trans Image Process 29:1522–1535MathSciNetCrossRef
24.
go back to reference Zhang X, Ma H (2018) Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. CoRR Zhang X, Ma H (2018) Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. CoRR
25.
go back to reference Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697CrossRef Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697CrossRef
26.
go back to reference Huang Z, Zhang Y, Liu Y, Zhang G (2019) Four actor-critic structures and algorithms for nonlinear multi-input multi-output system. Neurocomputing 330:172–187CrossRef Huang Z, Zhang Y, Liu Y, Zhang G (2019) Four actor-critic structures and algorithms for nonlinear multi-input multi-output system. Neurocomputing 330:172–187CrossRef
27.
go back to reference Iwaki R, Asada M (2019) Implicit incremental natural actor critic algorithm. Neural Netw 109:103–112CrossRef Iwaki R, Asada M (2019) Implicit incremental natural actor critic algorithm. Neural Netw 109:103–112CrossRef
28.
go back to reference Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S (2017) Q-prop: Sample-efficient policy gradient with an off-policy critic. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S (2017) Q-prop: Sample-efficient policy gradient with an off-policy critic. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings
29.
go back to reference O’Donoghue B, Munos R, Kavukcuoglu K, Mnih V (2017) Combining policy gradient and q-learning. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings O’Donoghue B, Munos R, Kavukcuoglu K, Mnih V (2017) Combining policy gradient and q-learning. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings
30.
go back to reference Song R, Lewis FL, Wei Q, Zhang H (2016) Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans Cybern 46(5):1041–1050CrossRef Song R, Lewis FL, Wei Q, Zhang H (2016) Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans Cybern 46(5):1041–1050CrossRef
31.
go back to reference Suttle W, Yang Z, Zhang K, Wang Z, Basar T, Liu J. A multi-agent off-policy actor-critic algorithm for distributed reinforcement learning. CoRR arXiv:1903.06372 Suttle W, Yang Z, Zhang K, Wang Z, Basar T, Liu J. A multi-agent off-policy actor-critic algorithm for distributed reinforcement learning. CoRR arXiv:​1903.​06372
32.
go back to reference Vrabel R (2019) Stabilisation and state trajectory tracking problem for nonlinear control systems in the presence of disturbances. Int J Control 92(3):540–548MathSciNetCrossRef Vrabel R (2019) Stabilisation and state trajectory tracking problem for nonlinear control systems in the presence of disturbances. Int J Control 92(3):540–548MathSciNetCrossRef
33.
go back to reference Hafez MB, Weber C, Kerzel M, Wermter S (2019) Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning. Paladyn 10(1):14–29 Hafez MB, Weber C, Kerzel M, Wermter S (2019) Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning. Paladyn 10(1):14–29
34.
go back to reference Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Brodley CE (ed.) Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004, vol 69 of ACM international conference proceeding series Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Brodley CE (ed.) Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004, vol 69 of ACM international conference proceeding series
35.
go back to reference Abbeel P, Ng AY (2017) Inverse reinforcement learning. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Berlin, pp 678–682CrossRef Abbeel P, Ng AY (2017) Inverse reinforcement learning. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Berlin, pp 678–682CrossRef
36.
go back to reference Zuo G, Chen K, Lu J, Huang X (2020) Deterministic generative adversarial imitation learning. Neurocomputing 388:60–69CrossRef Zuo G, Chen K, Lu J, Huang X (2020) Deterministic generative adversarial imitation learning. Neurocomputing 388:60–69CrossRef
37.
go back to reference Ho J, Gupta J.K, Ermon S (2016) Model-free imitation learning with policy optimization. In: Balcan M, Weinberger KQ (eds) Proceedings of the 33nd international conference on machine learning, ICML, New York City, NY, USA, June 19–24, vol 48 of JMLR workshop and conference proceedings, 2016, pp 2760–2769 Ho J, Gupta J.K, Ermon S (2016) Model-free imitation learning with policy optimization. In: Balcan M, Weinberger KQ (eds) Proceedings of the 33nd international conference on machine learning, ICML, New York City, NY, USA, June 19–24, vol 48 of JMLR workshop and conference proceedings, 2016, pp 2760–2769
38.
go back to reference Bhattacharya B, Winer E (2019) Augmented reality via expert demonstration authoring (AREDA). Comput Ind 105:61–79CrossRef Bhattacharya B, Winer E (2019) Augmented reality via expert demonstration authoring (AREDA). Comput Ind 105:61–79CrossRef
39.
go back to reference Ezzeddine A, Mourad N, Araabi BN, Ahmadabadi MN (2018) Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and bayesian policy improvement. Expert Syst Appl 112:331–341CrossRef Ezzeddine A, Mourad N, Araabi BN, Ahmadabadi MN (2018) Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and bayesian policy improvement. Expert Syst Appl 112:331–341CrossRef
40.
go back to reference Yan T, Zhang W, Yang SX, Yu L (2019) Soft actor-critic reinforcement learning for robotic manipulator with hindsight experience replay. Int J Robotics Autom 34(5) Yan T, Zhang W, Yang SX, Yu L (2019) Soft actor-critic reinforcement learning for robotic manipulator with hindsight experience replay. Int J Robotics Autom 34(5)
41.
go back to reference Ming Y, Zhang Y (2020) Efficient scalable spatiotemporal visual tracking based on recurrent neural networks. Multimed Tools Appl 79(3–4):2239–2261CrossRef Ming Y, Zhang Y (2020) Efficient scalable spatiotemporal visual tracking based on recurrent neural networks. Multimed Tools Appl 79(3–4):2239–2261CrossRef
42.
go back to reference Tian L, Li X, Ye Y, Xie P, Li Y (2020) A generative adversarial gated recurrent unit model for precipitation nowcasting. IEEE Geosci Remote Sens Lett 17(4):601–605CrossRef Tian L, Li X, Ye Y, Xie P, Li Y (2020) A generative adversarial gated recurrent unit model for precipitation nowcasting. IEEE Geosci Remote Sens Lett 17(4):601–605CrossRef
43.
go back to reference Pflueger M, Agha-Mohammadi A, Sukhatme GS (2019) Rover-irl: inverse reinforcement learning with soft value iteration networks for planetary rover path planning. IEEE Robotics Autom Lett 4(2):1387–1394CrossRef Pflueger M, Agha-Mohammadi A, Sukhatme GS (2019) Rover-irl: inverse reinforcement learning with soft value iteration networks for planetary rover path planning. IEEE Robotics Autom Lett 4(2):1387–1394CrossRef
44.
go back to reference Hausknecht MJ, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: AAAI Fall symposia, Arlington, Virginia, USA, November 12–14, 2015, pp 29–37 Hausknecht MJ, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: AAAI Fall symposia, Arlington, Virginia, USA, November 12–14, 2015, pp 29–37
Metadata
Title
Improving actor-critic structure by relatively optimal historical information for discrete system
Authors
Xinyu Zhang
Weidong Li
Xiaoke Zhu
Xiao-Yuan Jing
Publication date
26-02-2022
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 12/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-022-06988-x

Other articles of this Issue 12/2022

Neural Computing and Applications 12/2022 Go to the issue

Premium Partner