Skip to main content

2021 | OriginalPaper | Buchkapitel

7. Approximate Dynamic Programming and Reinforcement Learning for Continuous States

verfasst von : Paolo Brandimarte

Erschienen in: From Shortest Paths to Reinforcement Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The numerical methods for stochastic dynamic programming that we have discussed in Chap. 6 are certainly useful tools for tackling some dynamic optimization problems under uncertainty. However, they are not a radical antidote against the curses of DP.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Other underlying financial variables may be interest rates, volatilities, or futures prices.
 
2
See, e.g., [4, Chapters 13 and 14].
 
3
See Sect. 3.​2.​1 and Eq. (3.​12) in particular.
 
4
We should introduce such a state variable in the case of an option with multiple exercise opportunities. Such options are traded, for instance, on energy markets.
 
5
If we want to generate several sample paths, but we only have a single history of actual data, we may consider bootstrapping; see, e.g. [6].
 
6
Details of sample path generation are irrelevant for our purposes. See, e.g., [2] or [3] for more details.
 
7
In this section, we adapt material borrowed from [11].
 
8
The difference between the two sides of Eq. (7.9) is called the Bellman error. Alternative strategies have been proposed for its minimization; see, e.g., [5].
 
9
Recursive least-squares are often used in ADP. To give the reader just a flavour of it, we may mention that incremental approaches to matrix inversion are based on the Sherman–Morrison formula: (A + uv T)−1 = A −1 −(A −1 uv T A −1)∕(1 + v T A −1 u). This allows to update the inverse of matrix A efficiently, when additional data are gathered.
 
Literatur
1.
Zurück zum Zitat Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 4th edn. Athena Scientific, Belmont (2012) Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 4th edn. Athena Scientific, Belmont (2012)
2.
Zurück zum Zitat Brandimarte, P.: Numerical Methods in Finance and Economics: A MATLAB-Based Introduction, 2nd edn. Wiley, Hoboken (2006)CrossRef Brandimarte, P.: Numerical Methods in Finance and Economics: A MATLAB-Based Introduction, 2nd edn. Wiley, Hoboken (2006)CrossRef
3.
Zurück zum Zitat Brandimarte, P.: Handbook in Monte Carlo Simulation: Applications in Financial Engineering, Risk Management, and Economics. Wiley, Hoboken (2014)CrossRef Brandimarte, P.: Handbook in Monte Carlo Simulation: Applications in Financial Engineering, Risk Management, and Economics. Wiley, Hoboken (2014)CrossRef
4.
Zurück zum Zitat Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018) Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018)
5.
Zurück zum Zitat Buşoniu, L., Lazaric, A., Ghavamzadeh, M., Munos, R., Babuška, R., De Schutter, B.: Least-squares methods for policy iteration. In: Wiering M., van Otterlo M. (eds.) Reinforcement Learning: State of the Art, pp. 75–109. Springer, Heidelberg (2012)CrossRef Buşoniu, L., Lazaric, A., Ghavamzadeh, M., Munos, R., Babuška, R., De Schutter, B.: Least-squares methods for policy iteration. In: Wiering M., van Otterlo M. (eds.) Reinforcement Learning: State of the Art, pp. 75–109. Springer, Heidelberg (2012)CrossRef
6.
Zurück zum Zitat Demirel, O.F., Willemain, T.R.: Generation of simulation input scenarios using bootstrap methods. J. Oper. Res. Soc. 53, 69–78 (2002)CrossRef Demirel, O.F., Willemain, T.R.: Generation of simulation input scenarios using bootstrap methods. J. Oper. Res. Soc. 53, 69–78 (2002)CrossRef
7.
Zurück zum Zitat Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004) Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)
8.
Zurück zum Zitat Lagoudakis, M.G., Parr, R.: Model-free least squares policy iteration. In: 14th Neural Information Processing Systems, NIPS-14, Vancouver (2001) Lagoudakis, M.G., Parr, R.: Model-free least squares policy iteration. In: 14th Neural Information Processing Systems, NIPS-14, Vancouver (2001)
9.
Zurück zum Zitat Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003) Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
10.
Zurück zum Zitat Longstaff, F., Schwartz, E.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14, 113–147 (2001)CrossRef Longstaff, F., Schwartz, E.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14, 113–147 (2001)CrossRef
11.
Zurück zum Zitat Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef
12.
Zurück zum Zitat Tsitsiklis, J.N., Van Roy, B.: Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Autom. Control 44, 1840–1851 (1999)CrossRef Tsitsiklis, J.N., Van Roy, B.: Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Autom. Control 44, 1840–1851 (1999)CrossRef
13.
Zurück zum Zitat Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Trans. Neural Netw. 12, 694–703 (2001)CrossRef Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Trans. Neural Netw. 12, 694–703 (2001)CrossRef
14.
Zurück zum Zitat Zoppoli, R., Sanguineti, M., Gnecco, G., Parisini, T.: Neural Approximation for Optimal Control and Decision. Springer, Cham (2020)CrossRef Zoppoli, R., Sanguineti, M., Gnecco, G., Parisini, T.: Neural Approximation for Optimal Control and Decision. Springer, Cham (2020)CrossRef
Metadaten
Titel
Approximate Dynamic Programming and Reinforcement Learning for Continuous States
verfasst von
Paolo Brandimarte
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-61867-4_7