nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

3. Modeling for Dynamic Programming

verfasst von : Paolo Brandimarte

Erschienen in: From Shortest Paths to Reinforcement Learning

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The DP principle, in its essence, is a rather simple and quite flexible concept for the decomposition of a multistage decision problem into a sequence of single-stage problems. The trick is to introduce a value function to balance the immediate contribution and the expected contributions from future decisions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Implementing Dynamic Programming

Nächstes Kapitel Numerical Dynamic Programming for Discrete States

We assume that the feasible set of decisions does not depend on time. If it does, we may just introduce time-varying feasible sets \(\mathcal {X}_t ({\mathbf s}_t)\).

We provide some background material on Markov chains in Sect. 4.1.

We use subscript t + 1 to remark the fact that the transition takes place during the time interval t + 1, from time instant t to time instant t + 1. We choose to emphasize the time interval on which risk factors are realized, rather than the time instant at which we make a decision.

We could denote the optimal factors by Q ^∗(i, a), but since we do not use the asterisk for the optimal value function, we avoid it here as well. We may interpret Q(i, a) as a shorthand for \(Q_{\mu ^*} (i,a)\), where μ ^∗ is an optimal policy.

In the case of patient customers, demand that is not immediately met from inventory is backlogged for satisfaction after the next delivery from suppliers. When customers are impatient, there is no backlog, as unmet demand results in lost sales.

See, e.g., [8] for a DP application to blood platelet production and inventory management.

The treatment borrows from [15], which in turn is based on [14].

To be precise, we should account for the cost of capacity expansions when the number of subscribers grows considerably.

We neglect additional fuel consumption. Apparently, some airlines are considering increased fares for overweight passengers.

See, e.g., [15] for essential concepts like nesting of policies.

We assume that no customer is willing to switch, but a large number of low-budget requests might be a signal of an unexpectedly overcrowded flight, including the high-revenue class. In such a case, we might wish to increase the protection level for high fare classes.

Readers familiar with the Poisson process may recall that the probability of a single arrival, or event, during a time interval of length δt is λ δt, where λ is the rate of the process. The probability of two or more arrivals is o(δt). Note that the Poisson process is a Markov process, which is compatible with DP. In this specific example, we are also ruling out pairs or groups of passengers.

See, e.g., [14, pp. 64–66] for a numerical example.

See, e.g., [2, Chapter 3] for more information about discounting and the time value of money.

The change of probability measure is related to a particular kind of stochastic process, known as a martingale. Existence and uniqueness of pricing probability measures critically depend on features of the market model we assume, i.e., lack of arbitrage opportunities and market completeness. See, e.g., [2, Chapter 2].

This is inconsequential from our viewpoint, and the only effect is on how we will generate scenarios in Sect. 7.1.

It is a simple matter to show that it is never optimal to exercise a call option early, unless the asset pays dividends before option expiration. See, e.g., [2, p. 513].

A stopping time is a measurable random variable with respect to the filtration generated by the stochastic process of stock prices. In plain English, each stopping time is associated with a non-anticipative exercise policy, which relies on the sample path observed so far, but cannot peek into the future. See, e.g., [4, p. 247].

Believe it or not, since such options are halfway between European- and American-style ones, they are called Bermudan-style. For instance, we might hold a Bermudan-style option expiring in ten months, which may be exercised at the end of each month.

See the discussion in Sect. 5.1.

See, e.g., [7, Chapter 8].

This is a streamlined version of the model described in [3, Chapter 7].

Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 4th edn. Athena Scientific, Belmont (2017)

Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018)

Campbell, J.Y., Viceira, L.M.: Strategic Asset Allocation. Oxford University Press, Oxford (2002)CrossRef

Campolieti, G., Makarov, R.N.: Financial Mathematics: A Comprehensive Treatment. CRC Press, Boca Raton (2014)

Chang, F.R.: Stochastic Optimization in Continuous Time. Cambridge University Press, Cambridge (2004)CrossRef

Gershwin, S.B.: Manufacturing Systems Engineering. Prentice Hall, Englewood Cliffs (1994)

Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)

Haijema, R., van Dijk, N., van der Wal, J., Sibinga, C.S.: Blood platelet production with breaks: Optimization by SDP and simulation. Int. J. Production Economics 121, 464–473 (2009)CrossRef

Khrishnamurthy, V.: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. Cambridge University Press, Cambridge (2016)CrossRef

10.

Lewis, F.L., Liu, D. (eds.): Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press, Piscataway (2013)

11.

Pinedo, M.L.: Scheduling: Theory, Algorithms, and Systems, 5th edn. Springer, New York (2016)CrossRef

12.

Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef

13.

Powell, W.B., Ryzhov, I.O.: Optimal Learning. Wiley, Hoboken (2012)CrossRef

14.

Talluri, K.T., van Ryzin, G.J.: The Theory and Practice of Revenue Management. Springer, New York (2005)

15.

van Ryzin, G.J., Talluri, K.T.: An introduction to revenue management. In: Emerging Theory, Methods, and Applications, pp. 142–194. INFORMS TutORials in Operations Research (2005). https://pubsonline.informs.org/doi/10.1287/educ.1053.0019

Titel: Modeling for Dynamic Programming
verfasst von: Paolo Brandimarte
Verlag: Springer International Publishing
Buch: From Shortest Paths to Reinforcement Learning
Print ISBN: 978-3-030-61866-7

Electronic ISBN: 978-3-030-61867-4

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-61867-4_3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"