Skip to main content

2018 | OriginalPaper | Buchkapitel

Deep Reinforcement Learning: An Overview

verfasst von : Seyed Sajad Mousavi, Michael Schukat, Enda Howley

Erschienen in: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This article reviews the recent advances in deep reinforcement learning with focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRef Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRef
2.
Zurück zum Zitat Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)CrossRef Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)CrossRef
3.
Zurück zum Zitat Vengerov, D.: A reinforcement learning approach to dynamic resource allocation. Sun Microsystems, Inc. (2005) Vengerov, D.: A reinforcement learning approach to dynamic resource allocation. Sun Microsystems, Inc. (2005)
4.
Zurück zum Zitat Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13, 341–379 (2003)MathSciNetCrossRefMATH Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13, 341–379 (2003)MathSciNetCrossRefMATH
5.
Zurück zum Zitat Mousavi, S.S., Ghazanfari, B., Mozayani, N., Jahed-Motlagh, M.R.: Automatic abstraction controller in reinforcement learning agent via automata. Appl. Soft Comput. 25, 118–128 (2014)CrossRef Mousavi, S.S., Ghazanfari, B., Mozayani, N., Jahed-Motlagh, M.R.: Automatic abstraction controller in reinforcement learning agent via automata. Appl. Soft Comput. 25, 118–128 (2014)CrossRef
6.
Zurück zum Zitat Sutton, R.S., David, A.M., Satinder, P.S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation, pp. 1057–1063 (2000) Sutton, R.S., David, A.M., Satinder, P.S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation, pp. 1057–1063 (2000)
7.
Zurück zum Zitat Mattner, J., Lange, S., Riedmiller, M.: Learn to swing up and balance a real pole based on raw visual input data. In: Huang, T., Zeng, Z., Li, C., Leung, C.S., (eds.) Neural Information Processing: 19th International Conference, ICONIP 2012, Doha, Qatar, 12–15 November 2012, Proceedings, Part V, pp. 126–133. Springer, Heidelberg (2012) Mattner, J., Lange, S., Riedmiller, M.: Learn to swing up and balance a real pole based on raw visual input data. In: Huang, T., Zeng, Z., Li, C., Leung, C.S., (eds.) Neural Information Processing: 19th International Conference, ICONIP 2012, Doha, Qatar, 12–15 November 2012, Proceedings, Part V, pp. 126–133. Springer, Heidelberg (2012)
8.
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013) Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
9.
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRef
10.
Zurück zum Zitat Böhmer, W., Springenberg, J.T., Boedecker, J., Riedmiller, M., Obermayer, K.: Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz 29, 353–362 (2015)CrossRef Böhmer, W., Springenberg, J.T., Boedecker, J., Riedmiller, M., Obermayer, K.: Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz 29, 353–362 (2015)CrossRef
12.
Zurück zum Zitat Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (JAIR) 4, 237–285 (1996) Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (JAIR) 4, 237–285 (1996)
13.
Zurück zum Zitat Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
14.
Zurück zum Zitat Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L., (eds.) Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, 3–7 October 2005, Proceedings, pp. 317–328. Springer, Heidelberg (2005) Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L., (eds.) Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, 3–7 October 2005, Proceedings, pp. 317–328. Springer, Heidelberg (2005)
15.
Zurück zum Zitat Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in Atari games, pp. 2845–2853 (2015) Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in Atari games, pp. 2845–2853 (2015)
16.
Zurück zum Zitat Bengio, Y.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)CrossRef Bengio, Y.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)CrossRef
17.
Zurück zum Zitat Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)CrossRefMATH Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)CrossRefMATH
18.
Zurück zum Zitat Bengio, Y., Pascal, L., Dan, P., Larochelle, H.: Greedy Layer-Wise Training of Deep Networks, pp. 153–160 (2007) Bengio, Y., Pascal, L., Dan, P., Larochelle, H.: Greedy Layer-Wise Training of Deep Networks, pp. 153–160 (2007)
19.
Zurück zum Zitat Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. Presented at the Proceedings of the 25th International Conference on Machine learning, Helsinki, Finland (2008) Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. Presented at the Proceedings of the 25th International Conference on Machine learning, Helsinki, Finland (2008)
20.
Zurück zum Zitat Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH
21.
Zurück zum Zitat LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)CrossRef LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)CrossRef
22.
Zurück zum Zitat LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: Michael, A.A., (ed.) The Handbook of Brain Theory and Neural Networks, pp. 255–258. MIT Press (1998) LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: Michael, A.A., (ed.) The Handbook of Brain Theory and Neural Networks, pp. 255–258. MIT Press (1998)
23.
Zurück zum Zitat Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S., (eds.) Artificial Neural Networks – ICANN 2010: 20th International Conference, Thessaloniki, Greece, 15–18 September 2010, Proceedings, Part III, pp. 92–101. Springer, Heidelberg (2010) Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S., (eds.) Artificial Neural Networks – ICANN 2010: 20th International Conference, Thessaloniki, Greece, 15–18 September 2010, Proceedings, Part III, pp. 92–101. Springer, Heidelberg (2010)
24.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.J., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009, pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009, pp. 248–255 (2009)
25.
Zurück zum Zitat Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8599–8603 (2013) Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8599–8603 (2013)
26.
Zurück zum Zitat Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5, 157–166 (1994)CrossRef Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5, 157–166 (1994)CrossRef
27.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef
28.
Zurück zum Zitat Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)CrossRef Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)CrossRef
29.
Zurück zum Zitat Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, 1993, vol. 1, pp. 586–591 (1993) Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, 1993, vol. 1, pp. 586–591 (1993)
30.
Zurück zum Zitat Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Int. Res. 47, 253–279 (2013) Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Int. Res. 47, 253–279 (2013)
31.
Zurück zum Zitat Lin, L.-J.: Reinforcement learning for robots using neural networks. Carnegie Mellon University (1993) Lin, L.-J.: Reinforcement learning for robots using neural networks. Carnegie Mellon University (1993)
32.
Zurück zum Zitat Guo, X., Singh, S., Lee, H., Lewis, R.L., Wang, X.: Deep learning for real-time atari game play using offline monte-carlo tree search planning, pp. 3338–3346 (2014) Guo, X., Singh, S., Lee, H., Lewis, R.L., Wang, X.: Deep learning for real-time atari game play using offline monte-carlo tree search planning, pp. 3338–3346 (2014)
33.
Zurück zum Zitat Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. Presented at the Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany (2006) Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. Presented at the Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany (2006)
34.
Zurück zum Zitat Grüttner, M., Sehnke, F., Schaul, T., Schmidhuber, J.: Multi-dimensional deep memory Atari-Go players for parameter exploring policy gradients. In: Diamantaras, K., Duch, W., Iliadis, L.S., (eds.) Artificial Neural Networks – ICANN 2010: 20th International Conference, Thessaloniki, Greece, 15–18 September 2010, Proceedings, Part II, pp. 114–123. Springer, Heidelberg (2010) Grüttner, M., Sehnke, F., Schaul, T., Schmidhuber, J.: Multi-dimensional deep memory Atari-Go players for parameter exploring policy gradients. In: Diamantaras, K., Duch, W., Iliadis, L.S., (eds.) Artificial Neural Networks – ICANN 2010: 20th International Conference, Thessaloniki, Greece, 15–18 September 2010, Proceedings, Part II, pp. 114–123. Springer, Heidelberg (2010)
35.
Zurück zum Zitat Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F., (eds.) A Field Guide to Dynamical Recurrent Neural Networks (2001) Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F., (eds.) A Field Guide to Dynamical Recurrent Neural Networks (2001)
36.
Zurück zum Zitat Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Parameter-exploring policy gradients. Neural Netw. 23(4), 551–559 (2010)CrossRef Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Parameter-exploring policy gradients. Neural Netw. 23(4), 551–559 (2010)CrossRef
37.
38.
39.
Zurück zum Zitat Koutní, J., Cuccu, G., Schmidhuber, J., Gomez, F.: Evolving large-scale neural networks for vision-based reinforcement learning. In: Proceedings of the Genetic and Evolutionary Computation Conference, Amsterdam, pp. 1061–1068 (2013) Koutní, J., Cuccu, G., Schmidhuber, J., Gomez, F.: Evolving large-scale neural networks for vision-based reinforcement learning. In: Proceedings of the Genetic and Evolutionary Computation Conference, Amsterdam, pp. 1061–1068 (2013)
40.
Zurück zum Zitat Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2010) Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2010)
41.
Zurück zum Zitat Lange, S., Riedmiller, M., Voigtlaender, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, pp. 1–8 (2012) Lange, S., Riedmiller, M., Voigtlaender, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, pp. 1–8 (2012)
42.
Zurück zum Zitat Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49, 161–178 (2002)CrossRefMATH Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Mach. Learn. 49, 161–178 (2002)CrossRefMATH
43.
Zurück zum Zitat Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef
44.
Zurück zum Zitat Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: A robot that reinforcement-learns to identify and memorize important previous observations. In: 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, (IROS 2003), Proceedings, vol. 1, pp. 430–435 (2003) Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: A robot that reinforcement-learns to identify and memorize important previous observations. In: 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, (IROS 2003), Proceedings, vol. 1, pp. 430–435 (2003)
Metadaten
Titel
Deep Reinforcement Learning: An Overview
verfasst von
Seyed Sajad Mousavi
Michael Schukat
Enda Howley
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-56991-8_32