nach oben

World Wide Web

Erschienen in:

11.02.2020

Accelerated deep reinforcement learning with efficient demonstration utilization techniques

verfasst von: Sangho Yeo, Sangyoon Oh, Minsu Lee

Erschienen in: World Wide Web | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The use of demonstrations for deep reinforcement learning (RL) agents usually accelerates their training process as well as guides the agents to learn complicated policies. Most of the current deep RL approaches with demonstrations assume that there is a sufficient amount of high-quality demonstrations. However, for most real-world learning cases, the available demonstrations are often limited in terms of amount and quality. In this paper, we present an accelerated deep RL approach with dual replay buffer management and dynamic frame skipping on demonstrations. The dual replay buffer manager manages a human replay buffer and an actor replay buffer with independent sampling policies. We also propose dynamic frame skipping on demonstrations called DFS-ER (Dynamic Frame Skipping-Experience Replay) that learns the action repetition factor of the demonstrations. By implementing DFS-ER, we can accelerate deep RL by improving the efficiency of demonstration utilization, thereby yielding a faster exploration of the environment. We verified the training acceleration in three dense reward environments and one sparse reward environment compared to the conventional approach. In our evaluation using the Atari game environments, the proposed approach showed 21.7%-39.1% reduction in training iterations in a sparse reward environment.

Vorheriger Artikel Editor’s Note

Nächster Artikel OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)CrossRef

Brockman, G., et al.: Openai gym. arXiv:1606.01540 (2016)

Dhariwal, P., et al.: OpenAI Baselines: high-quality implementations of reinforcement learning algorithms. https://github.com/openai/baselines (2019). Accessed 25 Apr 2019

Espeholt, L., et al.: Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561 (2018)

Gao, Y., et al.: Reinforcement learning from imperfect demonstrations. arXiv:1802.05313 (2018)

Garmulewicz, M., Michalewski, H., Miłoś, P.: Expert-augmented actor-critic for vizdoom and montezumas revenge. arXiv:1809.03447 (2018)

Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)

Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

Horgan, D., et al.: Distributed prioritized experience replay. arXiv:1803.00933 (2018)

10.

Kurin, V., et al.: The atari grand challenge dataset. arXiv:1705.10998 (2017)

11.

Lakshminarayanan, A.S., Sharma, S., Ravindran, B.: Dynamic frame skip deep q network. arXiv:1605.05365 (2016)

12.

Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv :1509.02971 (2015)

13.

Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)

14.

Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

15.

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature. 518(7540), 529 (2015)CrossRef

16.

Ng, A.Y., et al.: Feature selection, L1 vs. L2 regularization, and rotational variance. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78 (2004)

17.

OpenAI Authors: OpenAI Five. https://openai.com/blog/openai-five/ (2018). Accessed 25 Apr 2019

18.

Peng, J., et al.: Incremental Multi step Q-learning. Machine Learning Proceedings 1994, pp 226–232 (1994)

19.

Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)

20.

Pohlen, T., et al.: Observe and look further: achieving consistent performance on atari. arXiv:1805.11593 (2018)

21.

Salimans, T., Chen, R.: Learning Montezuma’s revenge from a single demonstration. arXiv:1812.03381 (2018)

22.

Sallab, A.E.L., et al.: Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017(19), 70–76 (2017)CrossRef

23.

Schulman, J., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

24.

Sharma, S., Lakshminarayanan, A.S., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv:1702.06054 (2017)

25.

Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587), 484 (2016)CrossRef

26.

Stadie, B.C., Abbeel, P., Sutskever, I..: Third-person imitation learning. arXiv:1703.01703 (2017)

27.

Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning. arXiv:1803.02811 (2018)

28.

TensorFlow Authors: tensorflow/tensorflow. An Open Source Machine Learning Framework for Everyone. https://github.com/tensorflow/tensorflow (2019). Accessed 25 Apr 2019

29.

Yeo, S., Oh, S., Lee, M.: Accelerating deep reinforcement learning using human demonstration data based on replay buffer management and online frame skipping. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8 (2019)

30.

Zhang, R., et al.: Atari-HEAD: Atari human eye-tracking and demonstration dataset. arXiv:1903.06754 (2019)

Titel: Accelerated deep reinforcement learning with efficient demonstration utilization techniques
verfasst von: Sangho Yeo
Sangyoon Oh
Minsu Lee
Publikationsdatum: 11.02.2020
Verlag: Springer US
Erschienen in: World Wide Web / Ausgabe 4/2021
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI: https://doi.org/10.1007/s11280-019-00763-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2021

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Editor’s Note

Utilizing extended geocodes for handling massive three-dimensional point cloud data

Aspect-level sentiment capsule network for micro-video click-through rate prediction

Energy-aware key management and access control for the Internet of things

Fractal scaling laws for the dynamic evolution of sentiments in Never Let Me Go and their implications for writing, adaptation and reading of novels