Abstract
We investigate the problem of minimizing the Average-Value-at-Risk (AVaR τ ) of the discounted cost over a finite and an infinite horizon which is generated by a Markov Decision Process (MDP). We show that this problem can be reduced to an ordinary MDP with extended state space and give conditions under which an optimal policy exists. We also give a time-consistent interpretation of the AVaR τ . At the end we consider a numerical example which is a simple repeated casino game. It is used to discuss the influence of the risk aversion parameter τ of the AVaR τ -criterion.
Similar content being viewed by others
References
Acerbi C, Tasche D (2002) On the coherence of expected shortfall. J Banking Finance 26: 1487–1503
Artzner P, Delbaen F, Eber J, Heath D, Ku H (2007) Coherent multiperiod risk adjusted values and Bellman’s principle. Ann Oper Res 152: 5–22
Bäuerle N, Mundt A (2009) Dynamic mean-risk optimization in a binomial model. Math Methods Oper Res 70: 219–239
Bäuerle N, Rieder U (2011) Markov Decision processes with applications to finance. Springer, Berlin
Bertsekas DP, Shreve SE (1978) Stochastic optimal control. Academic Press, New York
Bion-Nadal J (2008) Dynamic risk measures: time consistency and risk measures from BMO martingales. Finance Stoch 12: 219–244
Björk T, Murgoci A (2010) A general theory of Markovian time inconsistent stochastic control problems. 1–39. Available at SSRN: http://ssrn.com/abstract=1694759
Boda K, Filar J (2006) Time consistent dynamic risk measures. Math Methods Oper Res 63: 169–186
Boda K, Filar JA, Lin Y, Spanjers L (2004) Stochastic target hitting time and the problem of early retirement. IEEE Trans Automat Control 49: 409–419
Collins E, McNamara J (1998) Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state. Adv Appl Probab 30: 122–136
Howard R, Matheson J (1972) Risk-sensitive Markov Decision processes. Manag Sci 18: 356–369
Jaquette S (1973) Markov Decision processes with a new optimality criterion: discrete time. Ann Stat 1: 496–505
Li D, Ng W-L (2000) Optimal dynamic portfolio selection: multiperiod mean-variance formulation. Math Finance 10: 387–406
Ott J (2010) A Markov decision model for a surveillance application and risk-sensistive Markov decision processes. PhD http://digbib.ubka.uni-karlsruhe.de/volltexte/1000020835
Rockafellar RT, Uryasev S (2002) Conditional Value-at-Risk for general loss distributions. J Banking Finance 26: 1443–1471
Shapiro A (2009) On a time consistency concept in risk averse multistage stochastic programming. Oper Res Lett 37: 143–147
White DJ (1988) Mean, variance, and probabilistic criteria in finite Markov Decision processes: a review. J Optim Theory Appl 56: 1–29
Wu C, Lin Y (1999) Minimizing risk models in Markov Decision processes with policies depending on target values. J Math Anal Appl 231: 47–67
Author information
Authors and Affiliations
Corresponding author
Additional information
The underlying projects have been funded by the Bundesministerium für Bildung und Forschung of Germany under promotional reference 03BAPAC1. The authors are responsible for the content of this article.
Rights and permissions
About this article
Cite this article
Bäuerle, N., Ott, J. Markov Decision Processes with Average-Value-at-Risk criteria. Math Meth Oper Res 74, 361–379 (2011). https://doi.org/10.1007/s00186-011-0367-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-011-0367-0