Skip to main content
Erschienen in:
Buchtitelbild

2021 | OriginalPaper | Buchkapitel

1. The Dynamic Programming Principle

verfasst von : Paolo Brandimarte

Erschienen in: From Shortest Paths to Reinforcement Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dynamic programming (DP) is a powerful principle for solving quite challenging optimization problems. However, it is not a tool like, e.g., linear programming. If we are able to cast a decision problem within the framework of linear programming models, in most cases we are done. We may just use a state-of-the-art commercial solver, since linear programming is a rather robust and mature technology.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Actually, there are software tools, like the MATLAB Reinforcement Learning Toolbox, implementing some form of DP, but they offer a restricted set of options.
 
2
Risk factors are typically modeled as random variables. Given a set Ω, the sample space of a random experiment, random variables are actually functions ξ(⋅) mapping random outcomes ω ∈ Ω, also called scenarios, to numerical values ξ(ω). Such numerical values, for a specific scenario, are the realizations of the random variable. In other words, the random variable is a function, and its realization is a specific value attained by this function.
 
3
More often than not, we do not deal with single random variables, but with sequences ξ t(⋅) of random variables indexed by time t, i.e., with stochastic processes. For a fixed scenario ω, we have a sequence ξ t(ω), t = 1,  2,  3, …, of realizations of the random variables over time; such a sequence is a sample path of the stochastic process. We should note that each scenario ω corresponds to a numerical value in the case of a random variable, but to a function of time in the case of a stochastic process.
 
4
See, e.g., [16].
 
5
In this booklet, we only consider discrete-time models; some references about continuous-time models are provided at the end of this chapter.
 
6
Specifying the exact sequence of events in a discrete-time model is a key ingredient in modeling. We discuss modeling in more depth in Chap. 3. For an extensive treatment of modeling issues in DP, see [30, Chapter 5].
 
7
The term exogenous factor may be not quite adequate when a factor is partially endogenous. This happens, for instance, when the probability distribution of risk factors is influenced by our decisions. A standard example is the dependence of random demand on price.
 
8
We use autonomous in the mathematical sense, clearly, referring to a time invariant system. In the theory of stochastic processes, we also use terms like homogeneous and inhomogeneous Markov chains.
 
9
This may be not true in the midst of a liquidity crisis, most notably for a big fish trading in the muddy financial pond.
 
10
See, e.g., [31] for an introduction to Bayesian learning and its role in optimal learning problems.
 
11
We use https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-61867-4_1/483821_1_En_1_IEq2_HTML.gif when we are defining something, ≈ when we are approximating something, and ≡ when we emphasize that two things are actually the same.
 
12
In measure-theoretic probability, this requirement is captured by measurability requirements. The data process generates a filtration, i.e., a sequence of sigma-algebras modeling the increase of available information over time, to which the decision process must adapt. We will do without these advanced concepts. See, e.g., [11].
 
13
In this case we are not only referring to the usual uncertainty about the realization of random variables, but to the incomplete knowledge about their probability distribution.
 
14
Furthermore, as we shall see, we will find simple stationary policies.
 
15
We are cutting several corners here, as we assume that an optimal policy in feedback form can be found. Actually, this depends on technical conditions that we take for granted, referring the reader to the end-of-chapter references for a rigorous treatment.
 
16
See Sect. 2.​3.
 
17
See, e.g., [17] for a discussion concerning the lot-sizing problem, [12] for an example in a financial management setting, and [21] for a more general analysis.
 
18
See, e.g., [27].
 
19
We should note that the approach we describe here has an illustrative purpose, since more efficient algorithms are available for the shortest path problem; see, e.g., [1, 4]. However, here we want to investigate a general principle, rather than efficient ad hoc methods.
 
20
You may have a peek at Fig. 1.4a, where the network structure is related with time.
 
21
In a large network, topological ordering may be time-consuming, and this is precisely why the approach that we are describing is not necessarily the best one. There are two classes of algorithms for the shortest path problem, called label setting and label correcting methods, respectively. We are just outlining one of the alternative label setting methods. Methods differ in efficiency and generality, i.e., the ability to deal with a mix of positive and negative costs and with networks including cycles. See [1, Chapters 4 and 5].
 
22
One such case occurs with certain deterministic scheduling problems. See, e.g., [29]. See also [14] for an illustration of DP within a high-quality heuristic approach for scheduling.
 
23
It is worth noting that, with a network featuring this specific structure, we can neglect issues related with topological ordering of nodes: we simply have to label nodes by stepping backwards in time.
 
24
As we shall see in Sect. 2.​4.​1, we may take advantage of structural properties of the optimal solution, leading to a drastically reduced state space. Unfortunately, we are not always so lucky.
 
25
Readers with some background in finance may notice a similarity with risk-free interest rates. If we invest cash at a risk-free rate r f(t, t + 1), over the time interval [t, t + 1], we do know the return of our investment over the next time interval. However, if we roll the investment over and over, we face reinvestment risk, since the risk-free rate may change, and we shall discover the rate r f(t + 1, t + 2) only at time instant t + 1. On the contrary, the return from risky investments in stock shares is always uncertain, also over the next time period.
 
26
In this book, we consider standard additive decomposition, but there are other forms of DP decomposition. For an abstract view of DP, see [7].
 
27
As we have remarked, in some engineering applications the average contribution per stage is more interesting, rather than the discounted DP. Since discounted DP is the rule in business and finance applications, we will mostly stick to it. However, we will discuss the average case in Sect. 4.​7.
 
28
Iterative methods are discussed in Chap. 4.
 
Literatur
1.
Zurück zum Zitat Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Englewood Cliffs (1993) Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Englewood Cliffs (1993)
2.
Zurück zum Zitat Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
3.
Zurück zum Zitat Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)CrossRef Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)CrossRef
4.
Zurück zum Zitat Bertsekas, D.P.: Network Optimization: Continuous and Discrete Models. Athena Scientific, Belmont (1998) Bertsekas, D.P.: Network Optimization: Continuous and Discrete Models. Athena Scientific, Belmont (1998)
5.
Zurück zum Zitat Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 4th edn. Athena Scientific, Belmont (2012) Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 4th edn. Athena Scientific, Belmont (2012)
6.
Zurück zum Zitat Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 4th edn. Athena Scientific, Belmont (2017) Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 4th edn. Athena Scientific, Belmont (2017)
7.
Zurück zum Zitat Bertsekas, D.P.: Abstract Dynamic Programming, 2nd edn. Athena Scientific, Belmont (2018) Bertsekas, D.P.: Abstract Dynamic Programming, 2nd edn. Athena Scientific, Belmont (2018)
8.
Zurück zum Zitat Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific, Belmont (2007) Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific, Belmont (2007)
9.
Zurück zum Zitat Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming, 2nd edn. Springer, New York (2011)CrossRef Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming, 2nd edn. Springer, New York (2011)CrossRef
10.
Zurück zum Zitat Buşoniu, L., Babuška, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximations. CRC Press, Boca raton (2010) Buşoniu, L., Babuška, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximations. CRC Press, Boca raton (2010)
11.
Zurück zum Zitat Campolieti, G., Makarov, R.N.: Financial Mathematics: A Comprehensive Treatment. CRC Press, Boca Raton (2014) Campolieti, G., Makarov, R.N.: Financial Mathematics: A Comprehensive Treatment. CRC Press, Boca Raton (2014)
12.
Zurück zum Zitat Cariño, D.C., Myers, D.H., Ziemba, W.T.: Concepts, technical issues, and uses of the Russell-Yasuda-Kasai financial planning model. Oper. Res. 46, 450–462 (1998)CrossRef Cariño, D.C., Myers, D.H., Ziemba, W.T.: Concepts, technical issues, and uses of the Russell-Yasuda-Kasai financial planning model. Oper. Res. 46, 450–462 (1998)CrossRef
13.
Zurück zum Zitat Chang, F.R.: Stochastic Optimization in Continuous Time. Cambridge University Press, Cambridge (2004)CrossRef Chang, F.R.: Stochastic Optimization in Continuous Time. Cambridge University Press, Cambridge (2004)CrossRef
14.
Zurück zum Zitat Congram, R.K., Potts, C.N., van de Velde, S.L.: An iterated dynasearch algorithm for the single-machine total weighted tardiness scheduling problem. INFORMS J. Comput. 14, 52–67 (2002)CrossRef Congram, R.K., Potts, C.N., van de Velde, S.L.: An iterated dynasearch algorithm for the single-machine total weighted tardiness scheduling problem. INFORMS J. Comput. 14, 52–67 (2002)CrossRef
15.
Zurück zum Zitat Denardo, E.V.: Dynamic Programming: Models and Applications. Dover Publications, New York (2003) Denardo, E.V.: Dynamic Programming: Models and Applications. Dover Publications, New York (2003)
16.
Zurück zum Zitat Desaulniers, G., Desrosiers, J., Solomon, M.M. (eds.): Column Generation. Springer, New York (2005) Desaulniers, G., Desrosiers, J., Solomon, M.M. (eds.): Column Generation. Springer, New York (2005)
17.
Zurück zum Zitat Fisher, M., Ramdas, K., Zheng, Y.S.: Ending inventory valuation in multiperiod production scheduling. Manag. Sci. 45, 679–692 (2001)CrossRef Fisher, M., Ramdas, K., Zheng, Y.S.: Ending inventory valuation in multiperiod production scheduling. Manag. Sci. 45, 679–692 (2001)CrossRef
19.
Zurück zum Zitat Fu, M.C. (ed.): Handbook of Simulation Optimization. Springer, New York (2015) Fu, M.C. (ed.): Handbook of Simulation Optimization. Springer, New York (2015)
21.
Zurück zum Zitat Grinold, R.C.: Model building techniques for the correction of end effects in multistage convex programs. Oper. Res. 31, 407–431 (1983)CrossRef Grinold, R.C.: Model building techniques for the correction of end effects in multistage convex programs. Oper. Res. 31, 407–431 (1983)CrossRef
22.
Zurück zum Zitat Howard, R.: Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA (1960) Howard, R.: Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA (1960)
23.
Zurück zum Zitat Kamien, M.I., Schwartz, N.L.: Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management, 2nd edn. Elsevier, Amsterdam (2000) Kamien, M.I., Schwartz, N.L.: Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management, 2nd edn. Elsevier, Amsterdam (2000)
24.
Zurück zum Zitat King, A., Wallace, S.: Modeling with Stochastic Programming. Springer, Berlin (2012)CrossRef King, A., Wallace, S.: Modeling with Stochastic Programming. Springer, Berlin (2012)CrossRef
25.
Zurück zum Zitat Lewis, F.L., Liu, D. (eds.): Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press, Piscataway (2013) Lewis, F.L., Liu, D. (eds.): Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE Press, Piscataway (2013)
26.
Zurück zum Zitat Ljungqvist, L., Sargent, T.J.: Recursive Macroeconomic Theory. MIT Press, Cambridge, MA (2000) Ljungqvist, L., Sargent, T.J.: Recursive Macroeconomic Theory. MIT Press, Cambridge, MA (2000)
27.
Zurück zum Zitat Mersereau, A.J.: Demand estimation from censored observations with inventory record inaccuracy. Manuf. Serv. Oper. Manag. 17, 335–349 (2015)CrossRef Mersereau, A.J.: Demand estimation from censored observations with inventory record inaccuracy. Manuf. Serv. Oper. Manag. 17, 335–349 (2015)CrossRef
28.
Zurück zum Zitat Merton, R.C.: Continuous-Time Finance. Blackwell, Malden (1992) Merton, R.C.: Continuous-Time Finance. Blackwell, Malden (1992)
29.
Zurück zum Zitat Pinedo, M.L.: Scheduling: Theory, Algorithms, and Systems, 5th edn. Springer, New York (2016)CrossRef Pinedo, M.L.: Scheduling: Theory, Algorithms, and Systems, 5th edn. Springer, New York (2016)CrossRef
30.
Zurück zum Zitat Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley, Hoboken (2011)CrossRef
31.
32.
Zurück zum Zitat Ross, S.: Introduction to Stochastic Dynamic Programming. Academic, New York (1983) Ross, S.: Introduction to Stochastic Dynamic Programming. Academic, New York (1983)
33.
Zurück zum Zitat Si, J., Barto, A.G., Powell, W.B., Wunsch II, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, Piscataway (2004) Si, J., Barto, A.G., Powell, W.B., Wunsch II, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, Piscataway (2004)
34.
Zurück zum Zitat Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken (2003)CrossRef Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken (2003)CrossRef
35.
Zurück zum Zitat Stokey, N., Lucas Jr, R.E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA (1989)CrossRef Stokey, N., Lucas Jr, R.E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA (1989)CrossRef
Metadaten
Titel
The Dynamic Programming Principle
verfasst von
Paolo Brandimarte
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-61867-4_1