nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

verfasst von : Per-Arne Andersen, Morten Goodwin, Ole-Christoffer Granmo

Erschienen in: Artificial Intelligence XXXV

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Dynamic Process Workflow Routing Using Deep Learning

Nächstes Kapitel Abnormality Detection in the Cloud Using Correlated Performance Metrics

The Deep Maze is open-source and publicly available at https://github.com/CAIR/deep-maze.

Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards a deep reinforcement learning approach for tower line wars. In: Bramer, M., Petridis, M. (eds.) SGAI 2017. LNCS (LNAI), vol. 10630, pp. 101–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71078-5_8CrossRef

Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)CrossRef

Bangaru, S.P., Suhas, J., Ravindran, B.: Exploration for multi-task reinforcement learning with deep generative models. arxiv preprint arXiv:1611.09894, November 2016

Blundell, C., et al.: Model-free episodic control. arxiv preprint arXiv:1606.04460, June 2016

Buesing, L., et al.: Learning and querying fast generative models for reinforcement learning. arxiv preprint arXiv:1802.03006, February 2018

Chen, K.: Deep Reinforcement Learning for Flappy Bird. cs229.stanford.edu, p. 6 (2015)

Ha, D., Schmidhuber, J.: World Models. arxiv preprint arXiv:1803.10122, March 2018

Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016

Higgins, I., et al.: DARLA: improving zero-shot transfer in reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1480–1490. PMLR, International Convention Centre, Sydney, Australia (2017)

10.

Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996)

11.

Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings, International Conference on Learning Representations 2015 (2015)

12.

Li, Y.: Deep reinforcement learning: an overview. arxiv preprint arXiv:1701.07274, January 2017

13.

Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York (2016)

14.

Mnih, V., et al.: Playing atari with deep reinforcement learning. Neural Inf. Process. Syst. December 2013

15.

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef

16.

Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32CrossRef

17.

Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I.R.G. (eds.) Advances in Neural Information Processing Systems, pp. 2352–2360. Curran Associates, Inc. (2016)

18.

Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1889–1897. PMLR, Lille (2015)

19.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347 (jul 2017)

20.

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 9. MIT Press, Cambridge (1998)

21.

Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5392–5402. Curran Associates, Inc. (2017)

22.

Xiao, T., Kesineni, G.: Generative adversarial networks for model based reinforcement learning with tree search. University of California, Berkeley, Technical report (2016)

Titel: The Dreaming Variational Autoencoder for Reinforcement Learning Environments
verfasst von: Per-Arne Andersen
Morten Goodwin
Ole-Christoffer Granmo
Verlag: Springer International Publishing
Buch: Artificial Intelligence XXXV
Print ISBN: 978-3-030-04190-8

Electronic ISBN: 978-3-030-04191-5

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-04191-5_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"