Skip to main content

2021 | OriginalPaper | Buchkapitel

3. Modeling for Dynamic Programming

verfasst von : Paolo Brandimarte

Erschienen in: From Shortest Paths to Reinforcement Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The DP principle, in its essence, is a rather simple and quite flexible concept for the decomposition of a multistage decision problem into a sequence of single-stage problems. The trick is to introduce a value function to balance the immediate contribution and the expected contributions from future decisions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We assume that the feasible set of decisions does not depend on time. If it does, we may just introduce time-varying feasible sets \(\mathcal {X}_t ({\mathbf s}_t)\).
 
2
We provide some background material on Markov chains in Sect. 4.​1.
 
3
We use subscript t + 1 to remark the fact that the transition takes place during the time interval t + 1, from time instant t to time instant t + 1. We choose to emphasize the time interval on which risk factors are realized, rather than the time instant at which we make a decision.
 
4
We could denote the optimal factors by Q (i, a), but since we do not use the asterisk for the optimal value function, we avoid it here as well. We may interpret Q(i, a) as a shorthand for \(Q_{\mu ^*} (i,a)\), where μ is an optimal policy.
 
5
In the case of patient customers, demand that is not immediately met from inventory is backlogged for satisfaction after the next delivery from suppliers. When customers are impatient, there is no backlog, as unmet demand results in lost sales.
 
6
See, e.g., [8] for a DP application to blood platelet production and inventory management.
 
7
The treatment borrows from [15], which in turn is based on [14].
 
8
To be precise, we should account for the cost of capacity expansions when the number of subscribers grows considerably.
 
9
We neglect additional fuel consumption. Apparently, some airlines are considering increased fares for overweight passengers.
 
10
See, e.g., [15] for essential concepts like nesting of policies.
 
11
We assume that no customer is willing to switch, but a large number of low-budget requests might be a signal of an unexpectedly overcrowded flight, including the high-revenue class. In such a case, we might wish to increase the protection level for high fare classes.
 
12
Readers familiar with the Poisson process may recall that the probability of a single arrival, or event, during a time interval of length δt is λ δt, where λ is the rate of the process. The probability of two or more arrivals is o(δt). Note that the Poisson process is a Markov process, which is compatible with DP. In this specific example, we are also ruling out pairs or groups of passengers.
 
13
See, e.g., [14, pp. 64–66] for a numerical example.
 
14
See, e.g., [2, Chapter 3] for more information about discounting and the time value of money.
 
15
The change of probability measure is related to a particular kind of stochastic process, known as a martingale. Existence and uniqueness of pricing probability measures critically depend on features of the market model we assume, i.e., lack of arbitrage opportunities and market completeness. See, e.g., [2, Chapter 2].
 
16
This is inconsequential from our viewpoint, and the only effect is on how we will generate scenarios in Sect. 7.​1.
 
17
It is a simple matter to show that it is never optimal to exercise a call option early, unless the asset pays dividends before option expiration. See, e.g., [2, p. 513].
 
18
A stopping time is a measurable random variable with respect to the filtration generated by the stochastic process of stock prices. In plain English, each stopping time is associated with a non-anticipative exercise policy, which relies on the sample path observed so far, but cannot peek into the future. See, e.g., [4, p. 247].
 
19
Believe it or not, since such options are halfway between European- and American-style ones, they are called Bermudan-style. For instance, we might hold a Bermudan-style option expiring in ten months, which may be exercised at the end of each month.
 
20
See the discussion in Sect. 5.​1.
 
21
See, e.g., [7, Chapter 8].
 
22
This is a streamlined version of the model described in [3, Chapter 7].
 
Literatur
1.
Zurück zum Zitat Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 4th edn. Athena Scientific, Belmont (2017) Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 4th edn. Athena Scientific, Belmont (2017)
2.
Zurück zum Zitat Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018) Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018)
3.
Zurück zum Zitat Campbell, J.Y., Viceira, L.M.: Strategic Asset Allocation. Oxford University Press, Oxford (2002)CrossRef Campbell, J.Y., Viceira, L.M.: Strategic Asset Allocation. Oxford University Press, Oxford (2002)CrossRef
4.
Zurück zum Zitat Campolieti, G., Makarov, R.N.: Financial Mathematics: A Comprehensive Treatment. CRC Press, Boca Raton (2014) Campolieti, G., Makarov, R.N.: Financial Mathematics: A Comprehensive Treatment. CRC Press, Boca Raton (2014)
5.
Zurück zum Zitat Chang, F.R.: Stochastic Optimization in Continuous Time. Cambridge University Press, Cambridge (2004)CrossRef Chang, F.R.: Stochastic Optimization in Continuous Time. Cambridge University Press, Cambridge (2004)CrossRef
6.
Zurück zum Zitat Gershwin, S.B.: Manufacturing Systems Engineering. Prentice Hall, Englewood Cliffs (1994) Gershwin, S.B.: Manufacturing Systems Engineering. Prentice Hall, Englewood Cliffs (1994)
7.
Zurück zum Zitat Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004) Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)
8.
Zurück zum Zitat Haijema, R., van Dijk, N., van der Wal, J., Sibinga, C.S.: Blood platelet production with breaks: Optimization by SDP and simulation. Int. J. Production Economics 121, 464–473 (2009)CrossRef Haijema, R., van Dijk, N., van der Wal, J., Sibinga, C.S.: Blood platelet production with breaks: Optimization by SDP and simulation. Int. J. Production Economics 121, 464–473 (2009)CrossRef
9.
Zurück zum Zitat Khrishnamurthy, V.: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. Cambridge University Press, Cambridge (2016)CrossRef Khrishnamurthy, V.: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. Cambridge University Press, Cambridge (2016)CrossRef
10.
Zurück zum Zitat Lewis, F.L., Liu, D. (eds.): Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press, Piscataway (2013) Lewis, F.L., Liu, D. (eds.): Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press, Piscataway (2013)
11.
Zurück zum Zitat Pinedo, M.L.: Scheduling: Theory, Algorithms, and Systems, 5th edn. Springer, New York (2016)CrossRef Pinedo, M.L.: Scheduling: Theory, Algorithms, and Systems, 5th edn. Springer, New York (2016)CrossRef
12.
Zurück zum Zitat Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef
13.
14.
Zurück zum Zitat Talluri, K.T., van Ryzin, G.J.: The Theory and Practice of Revenue Management. Springer, New York (2005) Talluri, K.T., van Ryzin, G.J.: The Theory and Practice of Revenue Management. Springer, New York (2005)
Metadaten
Titel
Modeling for Dynamic Programming
verfasst von
Paolo Brandimarte
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-61867-4_3