Skip to main content
Erschienen in: World Wide Web 4/2021

11.02.2020

Accelerated deep reinforcement learning with efficient demonstration utilization techniques

verfasst von: Sangho Yeo, Sangyoon Oh, Minsu Lee

Erschienen in: World Wide Web | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The use of demonstrations for deep reinforcement learning (RL) agents usually accelerates their training process as well as guides the agents to learn complicated policies. Most of the current deep RL approaches with demonstrations assume that there is a sufficient amount of high-quality demonstrations. However, for most real-world learning cases, the available demonstrations are often limited in terms of amount and quality. In this paper, we present an accelerated deep RL approach with dual replay buffer management and dynamic frame skipping on demonstrations. The dual replay buffer manager manages a human replay buffer and an actor replay buffer with independent sampling policies. We also propose dynamic frame skipping on demonstrations called DFS-ER (Dynamic Frame Skipping-Experience Replay) that learns the action repetition factor of the demonstrations. By implementing DFS-ER, we can accelerate deep RL by improving the efficiency of demonstration utilization, thereby yielding a faster exploration of the environment. We verified the training acceleration in three dense reward environments and one sparse reward environment compared to the conventional approach. In our evaluation using the Atari game environments, the proposed approach showed 21.7%-39.1% reduction in training iterations in a sparse reward environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)CrossRef Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)CrossRef
2.
Zurück zum Zitat Brockman, G., et al.: Openai gym. arXiv:1606.01540 (2016) Brockman, G., et al.: Openai gym. arXiv:1606.01540 (2016)
4.
Zurück zum Zitat Espeholt, L., et al.: Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561 (2018) Espeholt, L., et al.: Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561 (2018)
5.
Zurück zum Zitat Gao, Y., et al.: Reinforcement learning from imperfect demonstrations. arXiv:1802.05313 (2018) Gao, Y., et al.: Reinforcement learning from imperfect demonstrations. arXiv:1802.05313 (2018)
6.
Zurück zum Zitat Garmulewicz, M., Michalewski, H., Miłoś, P.: Expert-augmented actor-critic for vizdoom and montezumas revenge. arXiv:1809.03447 (2018) Garmulewicz, M., Michalewski, H., Miłoś, P.: Expert-augmented actor-critic for vizdoom and montezumas revenge. arXiv:1809.03447 (2018)
7.
Zurück zum Zitat Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017) Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)
8.
Zurück zum Zitat Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018) Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
9.
Zurück zum Zitat Horgan, D., et al.: Distributed prioritized experience replay. arXiv:1803.00933 (2018) Horgan, D., et al.: Distributed prioritized experience replay. arXiv:1803.00933 (2018)
10.
Zurück zum Zitat Kurin, V., et al.: The atari grand challenge dataset. arXiv:1705.10998 (2017) Kurin, V., et al.: The atari grand challenge dataset. arXiv:1705.10998 (2017)
11.
Zurück zum Zitat Lakshminarayanan, A.S., Sharma, S., Ravindran, B.: Dynamic frame skip deep q network. arXiv:1605.05365 (2016) Lakshminarayanan, A.S., Sharma, S., Ravindran, B.: Dynamic frame skip deep q network. arXiv:1605.05365 (2016)
12.
Zurück zum Zitat Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv :1509.02971 (2015) Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv :1509.02971 (2015)
13.
Zurück zum Zitat Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013) Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
14.
Zurück zum Zitat Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016) Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
15.
Zurück zum Zitat Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature. 518(7540), 529 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature. 518(7540), 529 (2015)CrossRef
16.
Zurück zum Zitat Ng, A.Y., et al.: Feature selection, L1 vs. L2 regularization, and rotational variance. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78 (2004) Ng, A.Y., et al.: Feature selection, L1 vs. L2 regularization, and rotational variance. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78 (2004)
18.
Zurück zum Zitat Peng, J., et al.: Incremental Multi step Q-learning. Machine Learning Proceedings 1994, pp 226–232 (1994) Peng, J., et al.: Incremental Multi step Q-learning. Machine Learning Proceedings 1994, pp 226–232 (1994)
19.
Zurück zum Zitat Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017) Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)
20.
Zurück zum Zitat Pohlen, T., et al.: Observe and look further: achieving consistent performance on atari. arXiv:1805.11593 (2018) Pohlen, T., et al.: Observe and look further: achieving consistent performance on atari. arXiv:1805.11593 (2018)
21.
Zurück zum Zitat Salimans, T., Chen, R.: Learning Montezuma’s revenge from a single demonstration. arXiv:1812.03381 (2018) Salimans, T., Chen, R.: Learning Montezuma’s revenge from a single demonstration. arXiv:1812.03381 (2018)
22.
Zurück zum Zitat Sallab, A.E.L., et al.: Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017(19), 70–76 (2017)CrossRef Sallab, A.E.L., et al.: Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017(19), 70–76 (2017)CrossRef
23.
Zurück zum Zitat Schulman, J., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015) Schulman, J., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
24.
Zurück zum Zitat Sharma, S., Lakshminarayanan, A.S., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv:1702.06054 (2017) Sharma, S., Lakshminarayanan, A.S., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv:1702.06054 (2017)
25.
Zurück zum Zitat Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587), 484 (2016)CrossRef Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587), 484 (2016)CrossRef
26.
Zurück zum Zitat Stadie, B.C., Abbeel, P., Sutskever, I..: Third-person imitation learning. arXiv:1703.01703 (2017) Stadie, B.C., Abbeel, P., Sutskever, I..: Third-person imitation learning. arXiv:1703.01703 (2017)
27.
Zurück zum Zitat Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning. arXiv:1803.02811 (2018) Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning. arXiv:1803.02811 (2018)
29.
Zurück zum Zitat Yeo, S., Oh, S., Lee, M.: Accelerating deep reinforcement learning using human demonstration data based on replay buffer management and online frame skipping. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8 (2019) Yeo, S., Oh, S., Lee, M.: Accelerating deep reinforcement learning using human demonstration data based on replay buffer management and online frame skipping. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8 (2019)
30.
Zurück zum Zitat Zhang, R., et al.: Atari-HEAD: Atari human eye-tracking and demonstration dataset. arXiv:1903.06754 (2019) Zhang, R., et al.: Atari-HEAD: Atari human eye-tracking and demonstration dataset. arXiv:1903.06754 (2019)
Metadaten
Titel
Accelerated deep reinforcement learning with efficient demonstration utilization techniques
verfasst von
Sangho Yeo
Sangyoon Oh
Minsu Lee
Publikationsdatum
11.02.2020
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 4/2021
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-019-00763-0

Weitere Artikel der Ausgabe 4/2021

World Wide Web 4/2021 Zur Ausgabe