nach oben

Neural Computing and Applications

Erschienen in:

03.03.2021 | Original Article

Self-guided deep deterministic policy gradient with multi-actor

verfasst von: Hongming Chen, Quan Liu, Shan Zhong

Erschienen in: Neural Computing and Applications | Ausgabe 15/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Reinforcement learning algorithms have made huge progress in recent years by leveraging the power of deep neural networks. Despite the success, deep reinforcement learning algorithms’ performance is largely dependent on the approach of exploration. Some of them engage in exploratory behavior by injecting external noise into the action space or adopting a gaussian policy. This paper presents a deep reinforcement learning algorithm without external noise called self-guided deep deterministic policy gradient with multi-actor (SDDPGM), which is the combination of deep deterministic policy gradient and generative adversarial networks (GANs). It employs the generator of GANs which trained from excellent experiences to guide the learning of the agent and makes discriminator constitute a subjective reward. Moreover, to make the learning more stable, SDDPGM applies a multi-actor mechanism that stands as a serially distinct actor based on the temporal phase of an episode. Finally, experiments show that SDDPGM is a promising deep reinforcement learning method.

Vorheriger Artikel Signals classification based on IA-optimal CNN

Nächster Artikel Insight into an unsupervised two-step sparse transfer learning algorithm for speech diagnosis of Parkinson’s disease

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37:421–436CrossRef

Li T, Liu YJ, Tong SC (2014) Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 25:135–141CrossRef

Silver D et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–489CrossRef

Jiang ZY, Xu DX, Liang JJ (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059

Jiang MX, Hai T, Pan ZG, Wang HY, Jia YJ, Deng C (2019) Multi-agent deep reinforcement learning for multi-object tracker. IEEE Access 7:32400–32407CrossRef

Han JW, Yang L, Zhang DW, Chang XJ, Liang XD (2018) Reinforcement cutting-agent learning for video object segmentation. In: 2018 IEEE conference on computer vision and pattern recognition, pp 9080–9089.https://doi.org/10.1109/CVPR.2018.00946

Ganin Y, Kulkarni T, Babuschkin I, Eslami SMA, Vinyals O (2018) Synthesizing programs for images using reinforced adversarial learning. In: Proceedings of the 35th international conference on machine learning, pp 1652–1661

Li JW, Monroe W, Ritter A, Jurafsky D, Galley M, Gao JF (2016) Deep reinforcement learning for dialogue generation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1192–1202. https://doi.org/10.18653/v1/d16-1127

Yin QY, Zhang Y, Zhang WN, Liu T, Wang WY (2018) Deep reinforcement learning for Chinese zero pronoun resolution. In: Proceedings of the 56th annual meeting of the association for computational linguistics, pp 569–578. https://doi.org/10.18653/v1/P18-1053

10.

Feng J, Huang ML, Zhao L, Yang Y, Zhu XY (2018) Reinforcement learning for relation classification from noisy data. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 5779–5786

11.

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeMATH

12.

Watkins C, Christopher J, Dayan P (1992) Q-learning. Mach Learn 8:279–292MATH

13.

Mnih V et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236CrossRef

14.

Wang ZY, Schaul S, Hessel M, Hasselt HV, Lanctot M, Freitas ND (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd international conference on machine learning, pp 1995–2003

15.

Hasselt HV, Gueza A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp 2094–2100

16.

Hausknecht MJ, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. In: Proceedings of the 2015 AAAI conference on artificial intelligence, pp 29–37

17.

Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Proceedings of the 12nd neural information processing system, pp 1008–1014

18.

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Comput Sci 8:A187

19.

Mnih V, Badia AP, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of 33rd international conference of machine learning, pp 1928–1937

20.

Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller MA (2014) Deterministic policy gradient algorithms. In: Proceedings of 31st international conference of machine learning, pp 387–395

21.

Tangkaratt V, Abdolmaleki A, Sugiyama M (2018) Guide actor-critic for continuous control. In: 6th international conference on learning representations, pp 427–438

22.

Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12nd neural information processing system, pp 1057–1063

23.

Wawrzynski P (2015) Control policy with autocorrelated noise in reinforcement learning for robotics. Mach Learn 5:91–95

24.

Goodfellow IJ, Abadie JP, Mirza M, Xu B, Farley DW, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the 27th neural information processing system, pp 2672–2680

25.

Uhlenbeck GE, Ornstein LS (1930) On the theory of Brownian motion. Phys Rev 36:823–841CrossRef

26.

Pfau D, Vinyals O (2016) Connecting generative adversarial networks and actor-critic methods. arXiv preprint arXiv:1610.01945

27.

Lowe R, Yi W, Aviv T, Jean H, Pieter A, Igor M (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 30th neural information processing system, pp 6382–6393

28.

Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Proceedings of the 29th neural information processing system, pp 4565–4573

29.

Wu L, Li Z, Tao Q, Lai J, Liu T-Y (2017) Sequence prediction with unlabeled data by reward function learning. In: IJCAI, pp 3098–3104

30.

Schmidhuber J (2010) Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans Auton Ment Dev 2(3):230–247CrossRef

31.

Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. In: 7th international conference on learning representations, New Orleans, LA, USA, May 6–9

32.

Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv preprint arXiv:1606.01540

33.

Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: Proceedings of 2012 IEEE international conference intelligent robots systems, pp 5026–5033

34.

Schulman J, Levine S, Abbeel P (2015) Trust region policy optimization. In: Proceedings of 32nd international conference of machine learning, pp 1889–1897

35.

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

36.

Dhariwa P et al (2017) OpenAI baselines. Github. https://github.com/openai/baselines

Titel: Self-guided deep deterministic policy gradient with multi-actor
verfasst von: Hongming Chen
Quan Liu
Shan Zhong
Publikationsdatum: 03.03.2021
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 15/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-05738-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 15/2021

DCA for online prediction with expert advice

On-chip trainable hardware-based deep Q-networks approximating a backpropagation algorithm

A deep embedding model for knowledge graph completion based on attention mechanism

An energy efficient intelligent torque vectoring approach based on fuzzy logic controller and neural network tire forces estimator

A picture fuzzy similarity measure based on direct operations and novel multi-attribute decision-making method

Neural adaptive appointed-time control for flexible air-breathing hypersonic vehicles: an event-triggered case