Abstract
We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models.
Similar content being viewed by others
References
Artzner P., Delbaen F., Eber J.M., Heath D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)
Artzner P., Delbaen F., Eber J.-M., Heath D., Ku H.: Coherent multiperiod risk adjusted values and Bellmans principle. Ann. Oper. Res. 152, 5–22 (2007)
Aubin J.-P., Frankowska H.: Set-Valued Analysis. Birkhäuser, Boston (1990)
Bellman R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci 38, 716 (1952)
Bellman R.: Applied Dynamic Programming. Princeton University Press, Princeton (1957)
Bertsekas D., Shreve S.E.: Stochastic Optimal Control. The Discrete Time Case. Academic Press, New York (1978)
Boda K., Filar J.A.: Time consistent dynamic risk measures. Math. Methods Oper. Res. 63, 169–186 (2006)
Cheridito P., Delbaen F., Kupper M.: Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab. 11, 57–106 (2006)
Chung K.-J., Sobel M.J.: Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25, 49–62 (1987)
Delbaen F.: Coherent risk measures on general probability spaces, In essays in honour of Dieter Sondermann. Springer, Berlin (2002)
Eichhorn A., Römisch W.: Polyhedral risk measures in stochastic programming. SIAM J. Optim. 16, 69–95 (2005)
Fleming W.H., Sheu S.J.: Optimal long term growth rate of expected utility of wealth. Ann. Appl. Probab. 9, 871–903 (1999)
Fleming W.H., Sheu S.J.: Risk-sensitive control and an optimal investment model. Math. Finance 10, 197–213 (2000)
Föllmer H., Penner I.: Convex risk measures and the dynamics of their penalty functions. Stat. Decis. 24, 61–96 (2006)
Föllmer H., Schied A.: Convex measures of risk and trading constraints. Finance Stoch. 6, 429–447 (2002)
Föllmer H., Schied A.: Stochastic Finance. An Introduction in Discrete Time. de Gruyter, Berlin (2004)
Fritelli M., Rosazza Gianin E.: Putting order in risk measures. J. Bank. Finance 26, 1473–1486 (2002)
Frittelli M., Rosazza Gianin E.: Dynamic convex risk measures. In: Szegö, G. (eds) Risk Measures for the 21st Century, pp. 227–248. Wiley, Chichester (2005)
Fritelli M., Scandolo G.: Risk measures and capital requirements for processes. Math. Finance 16, 589–612 (2006)
González-Trejo J.I., Hernández-Lerma O., Hoyos-Reyes L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41, 1626–1659 (2003)
Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov Control Processes. Basic Optimality Criteria. Springer, New York (1996)
Howard R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960)
Jaquette S.C.: Markov decision processes with a new optimality criterion: Discrete time. Ann. Stat. 1, 496–505 (1973)
Jaquette S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23, 43–49 (1976)
Jobert L., Rogers L.C.G.: Valuations and dynamic convex risk measures. Math. Finance 18, 1–22 (2008)
Klatte D., Kummer B.: Nonsmooth Equations in Optimization. Kluwer, Dordrecht (2002)
Klein Haneveld, W.: Duality in stochastic linear and dynamic programming. Lecture notes economics and mathematical systems 274. Springer, Berlin (1986)
Klöppel S., Schweizer M.: Dynamic indifference valuation via convex risk measures. Math. Finance 17, 599–627 (2007)
Koopmans T.C.: Stationary ordinal utility and impatience. Econometrica 28, 287–309 (1960)
Kreps M.K., Porteus E.L.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)
Kummer B. et al.: Newton’s method for non-differentiable functions. In: Guddat, J. (eds) Advances in Mathematical Optimization, pp. 114–125. Academie Verlag, Berlin (1988)
Kushner H.J.: Introduction to Stochastic Control. Holt, Rhinehart, and Winston, New York (1971)
Küenle H.-U.: Stochastiche Spiele und Entscheidungsmodelle. B. G. Teubner, Leipzig (1986)
Leitner J.: A short note on second-order stochastic dominance preserving coherent risk measures. Math. Finance 15, 649–651 (2005)
Ogryczak W., Ruszczyński A.: From stochastic dominance to mean-risk models: Semideviations as risk measures. Eur. J. Oper. Res. 116, 33–50 (1999)
Ogryczak W., Ruszczyński A.: On consistency of stochastic dominance and mean-semideviation models. Math. Program. 89, 217–232 (2001)
Ogryczak W., Ruszczyński A.: Dual stochastic dominance and related mean-risk models. SIAM J. Optim. 13(1), 60–78 (2002)
Pflug G.Ch., Römisch W.: Modeling, Measuring and Managing Risk. World Scientific, Singapore (2007)
Puterman M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Riedel F.: Dynamic coherent risk measures. Stoch. Process. Appl. 112, 185–200 (2004)
Rockafellar R.T., Uryasev S.P.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)
Rockafellar R.T., Wets R.J.-B.: Variational Analysis. Springer, Berlin (1998)
Rockafellar R.T., Uryasev S., Zabarankin M.: Deviation measures in risk analysis and optimization. Finance Stoch. 10, 51–74 (2006)
Robinson S.M.: Newton’s method for a class of nonsmooth functions. Set-Valued Anal. 2, 291–305 (1994)
Ruszczyński A., Shapiro A.: Optimization of risk measures. In: Calafiore, G., Dabbene, F. (eds) Probabilistic and Randomized Methods for Design Under Uncertainty, Springer, London (2005)
Ruszczyński A., Shapiro A.: Optimization of convex risk functions. Math. Oper. Res. 31, 433–452 (2006)
Ruszczyński A., Shapiro A.: Conditional risk mappings. Math. Oper. Res. 31, 544–561 (2006)
Scandolo, G.: Risk measures in a dynamic setting. PhD Thesis, Università degli Studi di Milano, Milan (2003)
Shapiro A.: On a time consistency concept in risk averse multistage stochastic programming. Oper. Res. Lett. 37, 143–147 (2009)
White D.J.: Markov Decision Processes. Wiley, New York (1993)
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at http://dx.doi.org/10.1007/s10107-014-0783-z.
Rights and permissions
About this article
Cite this article
Ruszczyński, A. Risk-averse dynamic programming for Markov decision processes. Math. Program. 125, 235–261 (2010). https://doi.org/10.1007/s10107-010-0393-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-010-0393-3
Keywords
- Dynamic risk measures
- Markov risk measures
- Value iteration
- Policy iteration
- Nonsmooth Newton’s method
- Min-max Markov models