Skip to main content
Top

2020 | OriginalPaper | Chapter

9. Introduction to Reinforcement Learning

Authors : Matthew F. Dixon, Igor Halperin, Paul Bilokon

Published in: Machine Learning in Finance

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter introduces Markov Decision Processes and the classical methods of dynamic programming, before building familiarity with the ideas of reinforcement learning and other approximate methods for solving MDPs. After describing Bellman optimality and iterative value and policy updates before moving to Q-learning, the chapter quickly advances towards a more engineering style exposition of the topic, covering key computational concepts such as greediness, batch learning, and Q-learning. Through a number of mini-case studies, the chapter provides insight into how RL is applied to optimization problems in asset management and trading.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
These are portfolio strategies that are not adaptive to changing macro-economical conditions and thus do not need to be rebalanced frequently.
 
2
An alternative formulation for infinite-horizon MDPs is to consider maximization of an average reward rather than total reward. Such approach allows one to proceed without introducing a discount factor. We will not pursue reinforcement learning with average rewards in this book.
 
3
While high dimensionality is a curse for DP approaches as it makes them infeasible for high-dimensional problems, with some other approaches this may rather bring simplifications, in which case the “curse of dimensionality” is replaced by the “blessing of dimensionality.”
 
4
Computing times that are polynomial in the number of states and actions are obtained for worst-case scenarios in DP. In practical applications of DP, convergence is sometimes faster than constraints given by worst-case scenarios.
 
5
We will discuss function approximations below, after we present TD algorithms in a tabulated setting that is appropriate for finite MDPs with a sufficiently low number of possible states and actions.
 
6
These may be Monte Carlo trajectories or trajectories obtained from real-world data.
 
7
See Sect. 3 for further details of MDPs.
 
Literature
go back to reference Asadi, K., & Littman, M. L. (2016). An alternative softmax operator for reinforcement learning. Proceedings of ICML. Asadi, K., & Littman, M. L. (2016). An alternative softmax operator for reinforcement learning. Proceedings of ICML.
go back to reference Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.MATH Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.MATH
go back to reference Bertsekas, D. (2012). Dynamic programming and optimal control (vol. I and II), 4th edn. Athena Scientific. Bertsekas, D. (2012). Dynamic programming and optimal control (vol. I and II), 4th edn. Athena Scientific.
go back to reference Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.MathSciNetMATH Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.MathSciNetMATH
go back to reference Littman, M. L., & Szepasvari, S. (1996). A generalized reinforcement-learning model: convergence and applications. In Machine Learning, Proceedings of the Thirteenth International Conference (ICML ’96), Bari, Italy. Littman, M. L., & Szepasvari, S. (1996). A generalized reinforcement-learning model: convergence and applications. In Machine Learning, Proceedings of the Thirteenth International Conference (ICML ’96), Bari, Italy.
go back to reference Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature,518(7540), 529–533.CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature,518(7540), 529–533.CrossRef
go back to reference Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction, 2nd edn. MIT. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction, 2nd edn. MIT.
go back to reference Szepesvari, S. (2010). Algorithms for reinforcement learning. Morgan & Claypool. Szepesvari, S. (2010). Algorithms for reinforcement learning. Morgan & Claypool.
go back to reference Thompson, W. R. (1935). On a criterion for the rejection of observations and the distribution of the ratio of deviation to sample standard deviation. Ann. Math. Statist., 6(4), 214–219.CrossRef Thompson, W. R. (1935). On a criterion for the rejection of observations and the distribution of the ratio of deviation to sample standard deviation. Ann. Math. Statist., 6(4), 214–219.CrossRef
go back to reference Thompson, W. R. (1993). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3), 285–94.MathSciNet Thompson, W. R. (1993). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3), 285–94.MathSciNet
Metadata
Title
Introduction to Reinforcement Learning
Authors
Matthew F. Dixon
Igor Halperin
Paul Bilokon
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-41068-1_9