nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

7. Approximate Dynamic Programming and Reinforcement Learning for Continuous States

verfasst von : Paolo Brandimarte

Erschienen in: From Shortest Paths to Reinforcement Learning

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The numerical methods for stochastic dynamic programming that we have discussed in Chap. 6 are certainly useful tools for tackling some dynamic optimization problems under uncertainty. However, they are not a radical antidote against the curses of DP.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Numerical Dynamic Programming for Continuous States

Other underlying financial variables may be interest rates, volatilities, or futures prices.

See, e.g., [4, Chapters 13 and 14].

See Sect. 3.2.1 and Eq. (3.12) in particular.

We should introduce such a state variable in the case of an option with multiple exercise opportunities. Such options are traded, for instance, on energy markets.

If we want to generate several sample paths, but we only have a single history of actual data, we may consider bootstrapping; see, e.g. [6].

Details of sample path generation are irrelevant for our purposes. See, e.g., [2] or [3] for more details.

In this section, we adapt material borrowed from [11].

The difference between the two sides of Eq. (7.9) is called the Bellman error. Alternative strategies have been proposed for its minimization; see, e.g., [5].

Recursive least-squares are often used in ADP. To give the reader just a flavour of it, we may mention that incremental approaches to matrix inversion are based on the Sherman–Morrison formula: (A + uv ^T)⁻¹ = A ⁻¹ −(A ⁻¹ uv ^T A ⁻¹)∕(1 + v ^T A ⁻¹ u). This allows to update the inverse of matrix A efficiently, when additional data are gathered.

Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 4th edn. Athena Scientific, Belmont (2012)

Brandimarte, P.: Numerical Methods in Finance and Economics: A MATLAB-Based Introduction, 2nd edn. Wiley, Hoboken (2006)CrossRef

Brandimarte, P.: Handbook in Monte Carlo Simulation: Applications in Financial Engineering, Risk Management, and Economics. Wiley, Hoboken (2014)CrossRef

Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018)

Buşoniu, L., Lazaric, A., Ghavamzadeh, M., Munos, R., Babuška, R., De Schutter, B.: Least-squares methods for policy iteration. In: Wiering M., van Otterlo M. (eds.) Reinforcement Learning: State of the Art, pp. 75–109. Springer, Heidelberg (2012)CrossRef

Demirel, O.F., Willemain, T.R.: Generation of simulation input scenarios using bootstrap methods. J. Oper. Res. Soc. 53, 69–78 (2002)CrossRef

Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)

Lagoudakis, M.G., Parr, R.: Model-free least squares policy iteration. In: 14th Neural Information Processing Systems, NIPS-14, Vancouver (2001)

Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)

10.

Longstaff, F., Schwartz, E.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14, 113–147 (2001)CrossRef

11.

Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef

12.

Tsitsiklis, J.N., Van Roy, B.: Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Autom. Control 44, 1840–1851 (1999)CrossRef

13.

Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Trans. Neural Netw. 12, 694–703 (2001)CrossRef

14.

Zoppoli, R., Sanguineti, M., Gnecco, G., Parisini, T.: Neural Approximation for Optimal Control and Decision. Springer, Cham (2020)CrossRef

Titel: Approximate Dynamic Programming and Reinforcement Learning for Continuous States
verfasst von: Paolo Brandimarte
Verlag: Springer International Publishing
Buch: From Shortest Paths to Reinforcement Learning
Print ISBN: 978-3-030-61866-7

Electronic ISBN: 978-3-030-61867-4

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-61867-4_7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"