Skip to main content
Log in

Risk-averse dynamic programming for Markov decision processes

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

An Erratum to this article was published on 10 May 2014

Abstract

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Artzner P., Delbaen F., Eber J.M., Heath D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  2. Artzner P., Delbaen F., Eber J.-M., Heath D., Ku H.: Coherent multiperiod risk adjusted values and Bellmans principle. Ann. Oper. Res. 152, 5–22 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  3. Aubin J.-P., Frankowska H.: Set-Valued Analysis. Birkhäuser, Boston (1990)

    MATH  Google Scholar 

  4. Bellman R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci 38, 716 (1952)

    Article  MATH  Google Scholar 

  5. Bellman R.: Applied Dynamic Programming. Princeton University Press, Princeton (1957)

    Google Scholar 

  6. Bertsekas D., Shreve S.E.: Stochastic Optimal Control. The Discrete Time Case. Academic Press, New York (1978)

    MATH  Google Scholar 

  7. Boda K., Filar J.A.: Time consistent dynamic risk measures. Math. Methods Oper. Res. 63, 169–186 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  8. Cheridito P., Delbaen F., Kupper M.: Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab. 11, 57–106 (2006)

    MathSciNet  Google Scholar 

  9. Chung K.-J., Sobel M.J.: Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25, 49–62 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  10. Delbaen F.: Coherent risk measures on general probability spaces, In essays in honour of Dieter Sondermann. Springer, Berlin (2002)

    Google Scholar 

  11. Eichhorn A., Römisch W.: Polyhedral risk measures in stochastic programming. SIAM J. Optim. 16, 69–95 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  12. Fleming W.H., Sheu S.J.: Optimal long term growth rate of expected utility of wealth. Ann. Appl. Probab. 9, 871–903 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  13. Fleming W.H., Sheu S.J.: Risk-sensitive control and an optimal investment model. Math. Finance 10, 197–213 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Föllmer H., Penner I.: Convex risk measures and the dynamics of their penalty functions. Stat. Decis. 24, 61–96 (2006)

    Article  MATH  Google Scholar 

  15. Föllmer H., Schied A.: Convex measures of risk and trading constraints. Finance Stoch. 6, 429–447 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  16. Föllmer H., Schied A.: Stochastic Finance. An Introduction in Discrete Time. de Gruyter, Berlin (2004)

    Book  MATH  Google Scholar 

  17. Fritelli M., Rosazza Gianin E.: Putting order in risk measures. J. Bank. Finance 26, 1473–1486 (2002)

    Article  Google Scholar 

  18. Frittelli M., Rosazza Gianin E.: Dynamic convex risk measures. In: Szegö, G. (eds) Risk Measures for the 21st Century, pp. 227–248. Wiley, Chichester (2005)

    Google Scholar 

  19. Fritelli M., Scandolo G.: Risk measures and capital requirements for processes. Math. Finance 16, 589–612 (2006)

    Article  MathSciNet  Google Scholar 

  20. González-Trejo J.I., Hernández-Lerma O., Hoyos-Reyes L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41, 1626–1659 (2003)

    Article  MATH  Google Scholar 

  21. Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov Control Processes. Basic Optimality Criteria. Springer, New York (1996)

    Google Scholar 

  22. Howard R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960)

    MATH  Google Scholar 

  23. Jaquette S.C.: Markov decision processes with a new optimality criterion: Discrete time. Ann. Stat. 1, 496–505 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  24. Jaquette S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23, 43–49 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  25. Jobert L., Rogers L.C.G.: Valuations and dynamic convex risk measures. Math. Finance 18, 1–22 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  26. Klatte D., Kummer B.: Nonsmooth Equations in Optimization. Kluwer, Dordrecht (2002)

    MATH  Google Scholar 

  27. Klein Haneveld, W.: Duality in stochastic linear and dynamic programming. Lecture notes economics and mathematical systems 274. Springer, Berlin (1986)

  28. Klöppel S., Schweizer M.: Dynamic indifference valuation via convex risk measures. Math. Finance 17, 599–627 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  29. Koopmans T.C.: Stationary ordinal utility and impatience. Econometrica 28, 287–309 (1960)

    Article  MATH  MathSciNet  Google Scholar 

  30. Kreps M.K., Porteus E.L.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  31. Kummer B. et al.: Newton’s method for non-differentiable functions. In: Guddat, J. (eds) Advances in Mathematical Optimization, pp. 114–125. Academie Verlag, Berlin (1988)

    Google Scholar 

  32. Kushner H.J.: Introduction to Stochastic Control. Holt, Rhinehart, and Winston, New York (1971)

    MATH  Google Scholar 

  33. Küenle H.-U.: Stochastiche Spiele und Entscheidungsmodelle. B. G. Teubner, Leipzig (1986)

    Google Scholar 

  34. Leitner J.: A short note on second-order stochastic dominance preserving coherent risk measures. Math. Finance 15, 649–651 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  35. Ogryczak W., Ruszczyński A.: From stochastic dominance to mean-risk models: Semideviations as risk measures. Eur. J. Oper. Res. 116, 33–50 (1999)

    Article  MATH  Google Scholar 

  36. Ogryczak W., Ruszczyński A.: On consistency of stochastic dominance and mean-semideviation models. Math. Program. 89, 217–232 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  37. Ogryczak W., Ruszczyński A.: Dual stochastic dominance and related mean-risk models. SIAM J. Optim. 13(1), 60–78 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  38. Pflug G.Ch., Römisch W.: Modeling, Measuring and Managing Risk. World Scientific, Singapore (2007)

    Book  MATH  Google Scholar 

  39. Puterman M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    MATH  Google Scholar 

  40. Riedel F.: Dynamic coherent risk measures. Stoch. Process. Appl. 112, 185–200 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  41. Rockafellar R.T., Uryasev S.P.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)

    Article  Google Scholar 

  42. Rockafellar R.T., Wets R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  43. Rockafellar R.T., Uryasev S., Zabarankin M.: Deviation measures in risk analysis and optimization. Finance Stoch. 10, 51–74 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  44. Robinson S.M.: Newton’s method for a class of nonsmooth functions. Set-Valued Anal. 2, 291–305 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  45. Ruszczyński A., Shapiro A.: Optimization of risk measures. In: Calafiore, G., Dabbene, F. (eds) Probabilistic and Randomized Methods for Design Under Uncertainty, Springer, London (2005)

    Google Scholar 

  46. Ruszczyński A., Shapiro A.: Optimization of convex risk functions. Math. Oper. Res. 31, 433–452 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  47. Ruszczyński A., Shapiro A.: Conditional risk mappings. Math. Oper. Res. 31, 544–561 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  48. Scandolo, G.: Risk measures in a dynamic setting. PhD Thesis, Università degli Studi di Milano, Milan (2003)

  49. Shapiro A.: On a time consistency concept in risk averse multistage stochastic programming. Oper. Res. Lett. 37, 143–147 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  50. White D.J.: Markov Decision Processes. Wiley, New York (1993)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrzej Ruszczyński.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s10107-014-0783-z.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruszczyński, A. Risk-averse dynamic programming for Markov decision processes. Math. Program. 125, 235–261 (2010). https://doi.org/10.1007/s10107-010-0393-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-010-0393-3

Keywords

Mathematics Subject Classification (2000)

Navigation