Top

Published in:

2020 | OriginalPaper | Chapter

9. Introduction to Reinforcement Learning

Authors : Matthew F. Dixon, Igor Halperin, Paul Bilokon

Published in: Machine Learning in Finance

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This chapter introduces Markov Decision Processes and the classical methods of dynamic programming, before building familiarity with the ideas of reinforcement learning and other approximate methods for solving MDPs. After describing Bellman optimality and iterative value and policy updates before moving to Q-learning, the chapter quickly advances towards a more engineering style exposition of the topic, covering key computational concepts such as greediness, batch learning, and Q-learning. Through a number of mini-case studies, the chapter provides insight into how RL is applied to optimization problems in asset management and trading.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Advanced Neural Networks

next chapter Applications of Reinforcement Learning

Available only for authorised users

These are portfolio strategies that are not adaptive to changing macro-economical conditions and thus do not need to be rebalanced frequently.

An alternative formulation for infinite-horizon MDPs is to consider maximization of an average reward rather than total reward. Such approach allows one to proceed without introducing a discount factor. We will not pursue reinforcement learning with average rewards in this book.

While high dimensionality is a curse for DP approaches as it makes them infeasible for high-dimensional problems, with some other approaches this may rather bring simplifications, in which case the “curse of dimensionality” is replaced by the “blessing of dimensionality.”

Computing times that are polynomial in the number of states and actions are obtained for worst-case scenarios in DP. In practical applications of DP, convergence is sometimes faster than constraints given by worst-case scenarios.

We will discuss function approximations below, after we present TD algorithms in a tabulated setting that is appropriate for finite MDPs with a sufficiently low number of possible states and actions.

These may be Monte Carlo trajectories or trajectories obtained from real-world data.

See Sect. 3 for further details of MDPs.

Asadi, K., & Littman, M. L. (2016). An alternative softmax operator for reinforcement learning. Proceedings of ICML.

Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.MATH

Bertsekas, D. (2012). Dynamic programming and optimal control (vol. I and II), 4th edn. Athena Scientific.

Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.MathSciNetMATH

Littman, M. L., & Szepasvari, S. (1996). A generalized reinforcement-learning model: convergence and applications. In Machine Learning, Proceedings of the Thirteenth International Conference (ICML ’96), Bari, Italy.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature,518(7540), 529–533.CrossRef

Robbins, H., & Monro, S. (1951). A stochastic approximation method. Ann. Math. Statistics, 22, 400–407.MathSciNetCrossRef

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction, 2nd edn. MIT.

Szepesvari, S. (2010). Algorithms for reinforcement learning. Morgan & Claypool.

Thompson, W. R. (1935). On a criterion for the rejection of observations and the distribution of the ratio of deviation to sample standard deviation. Ann. Math. Statist., 6(4), 214–219.CrossRef

Thompson, W. R. (1993). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3), 285–94.MathSciNet

van Hasselt, H. (2010). Double Q-learning. Advances in Neural Information Processing Systems. http://papers.nips.cc/paper/3964-double-q-learning.pdf.

Title: Introduction to Reinforcement Learning
Authors: Matthew F. Dixon
Igor Halperin
Paul Bilokon
Publisher: Springer International Publishing
Book: Machine Learning in Finance
Print ISBN: 978-3-030-41067-4

Electronic ISBN: 978-3-030-41068-1

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-41068-1_9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"