Skip to main content

2018 | OriginalPaper | Buchkapitel

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

verfasst von : Per-Arne Andersen, Morten Goodwin, Ole-Christoffer Granmo

Erschienen in: Artificial Intelligence XXXV

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The Deep Maze is open-source and publicly available at https://​github.​com/​CAIR/​deep-maze.
 
Literatur
2.
Zurück zum Zitat Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)CrossRef Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)CrossRef
3.
Zurück zum Zitat Bangaru, S.P., Suhas, J., Ravindran, B.: Exploration for multi-task reinforcement learning with deep generative models. arxiv preprint arXiv:1611.09894, November 2016 Bangaru, S.P., Suhas, J., Ravindran, B.: Exploration for multi-task reinforcement learning with deep generative models. arxiv preprint arXiv:​1611.​09894, November 2016
5.
Zurück zum Zitat Buesing, L., et al.: Learning and querying fast generative models for reinforcement learning. arxiv preprint arXiv:1802.03006, February 2018 Buesing, L., et al.: Learning and querying fast generative models for reinforcement learning. arxiv preprint arXiv:​1802.​03006, February 2018
6.
Zurück zum Zitat Chen, K.: Deep Reinforcement Learning for Flappy Bird. cs229.stanford.edu, p. 6 (2015) Chen, K.: Deep Reinforcement Learning for Flappy Bird. cs229.stanford.edu, p. 6 (2015)
8.
Zurück zum Zitat Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016 Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016
9.
Zurück zum Zitat Higgins, I., et al.: DARLA: improving zero-shot transfer in reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1480–1490. PMLR, International Convention Centre, Sydney, Australia (2017) Higgins, I., et al.: DARLA: improving zero-shot transfer in reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1480–1490. PMLR, International Convention Centre, Sydney, Australia (2017)
10.
Zurück zum Zitat Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996) Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996)
11.
Zurück zum Zitat Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings, International Conference on Learning Representations 2015 (2015) Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings, International Conference on Learning Representations 2015 (2015)
13.
Zurück zum Zitat Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York (2016) Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York (2016)
14.
Zurück zum Zitat Mnih, V., et al.: Playing atari with deep reinforcement learning. Neural Inf. Process. Syst. December 2013 Mnih, V., et al.: Playing atari with deep reinforcement learning. Neural Inf. Process. Syst. December 2013
15.
Zurück zum Zitat Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
17.
Zurück zum Zitat Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I.R.G. (eds.) Advances in Neural Information Processing Systems, pp. 2352–2360. Curran Associates, Inc. (2016) Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I.R.G. (eds.) Advances in Neural Information Processing Systems, pp. 2352–2360. Curran Associates, Inc. (2016)
18.
Zurück zum Zitat Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1889–1897. PMLR, Lille (2015) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1889–1897. PMLR, Lille (2015)
19.
Zurück zum Zitat Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (jul 2017) Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:​1707.​06347 (jul 2017)
20.
Zurück zum Zitat Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 9. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 9. MIT Press, Cambridge (1998)
21.
Zurück zum Zitat Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5392–5402. Curran Associates, Inc. (2017) Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5392–5402. Curran Associates, Inc. (2017)
22.
Zurück zum Zitat Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. University of California, Berkeley, Technical report (2016) Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. University of California, Berkeley, Technical report (2016)
Metadaten
Titel
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
verfasst von
Per-Arne Andersen
Morten Goodwin
Ole-Christoffer Granmo
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-04191-5_11