Abstract
The paper deals with a class of discounted discrete-time Markov control models with non-constant discount factors of the form \(\tilde{\alpha } (x_{n},a_{n},\xi _{n+1})\), where \(x_{n},a_{n},\) and \(\xi _{n+1}\) are the state, the action, and a random disturbance at time \(n,\) respectively, taking values in Borel spaces. Assuming that the one-stage cost is possibly unbounded and that the distributions of \(\xi _{n}\) are unknown, we study the corresponding optimal control problem under two settings. Firstly we assume that the random disturbance process \(\left\{ \xi _{n}\right\} \) is formed by observable independent and identically distributed random variables, and then we introduce an estimation and control procedure to construct strategies. Instead, in the second one, \(\left\{ \xi _{n}\right\} \) is assumed to be non-observable whose distributions may change from stage to stage, and in this case the problem is studied as a minimax control problem in which the controller has an opponent selecting the distribution of the corresponding random disturbance at each stage.
Similar content being viewed by others
References
Altman E (1999) Constrained Markov decision processes. Chapman and Hall, London
Ash RB (1972) Real analysis and probability. Academic Press, New York
Bertsekas DP, Shreve SE (1978) Stochastic optimal control: the discrete-time case. Academic Press, New York
Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs
Borkar VS (1998) A convex analytic approach to Markov decision processes. Probab Theory Relat Fields 78:583–602
Brigo D, Mercurio F (2007) Interest rate models: theory and practice. Springer, New York
Carmon Y, Shwartz A (2009) Markov decision processes with exponentially representable discounting. Oper Res Lett 37:51–55
Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer, New York
Feinberg EA, Shwartz A (1994) Markov decision models with weighted discounted criteria. Math Oper Res 19:152–168
Feinberg EA, Shwartz A (1995) Constrained Markov decision models with weighted discounted rewards. Math Oper Res 20:302–320
Feinberg EA, Shwartz A (1999) Constrained dynamic programming with two discount factors: applications and an algorithm. IEEE Trans Autom Control 44:628–631
González-Hernández J, López-Martínez RR, Pérez-Hernández R (2007) Markov control processes with randomized discounted cost in Borel space. Math Methods Oper Res 65:27–44
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2008) Adaptive policies for stochastic systems under a randomized discounted criterion. Bol Soc Mat Mex 14:149–163
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45:737–754
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA, Gabriel-Arguelles JR (2013) Constrained Markov control processes with randomized discounted cost criteria: occupation measures and extremal points. Risk Decis Anal 4:163–176
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA, Gabriel-Arguelles JR (2014) Constrained Markov control processes with randomized discounted rate: infinite linear programming approach. Optim Control Appl Methods 35:575–591
González-Trejo TJ, Hernández-Lerma O, Hoyos-Reyes LF (2003) Minimax control of discrete-time stochastic systems. SIAM J Control Optim 41:1626–1659
Gordienko EI, Minjárez-Sosa JA (1998) Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34:217–234
Heath D, Jarrow R, Morton A (1992) Bond pricing and the term structure of interest rates: a new methodology. Econometrica 60:77–105
Hernández-Lerma O (1989) Adaptive Markov control processes. Springer, New York
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285
Hilgert N, Minjárez-Sosa JA (2001) Adaptive policies for time-varying stochastic systems under discounted criterion. Math Methods Oper Res 54:491–505
Hilgert N, Minjárez-Sosa JA (2006) Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math Methods Oper Res 63:443–460
Hinderer K (1979) Foundations of non-stationary dynamic programming with discrete time parameter. In: Lecture Notes Oper. Res., vol 33. Springer, New York
Hordjik A, Yushkevich AA (1999) Blackwell optimality in the class of all policies in Markov decision chains with Borel state space an unbounded rewards. Math Methods Oper Res 50:421–448
Iyengar GN (2005) Robust dynamic programming. Math Oper Res 30:257–280
Jaskiewicz A, Nowak AS (2011) Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn Games Appl 1:253–279
López-Martínez RR, Hernández-Lerma O (2003) The Lagrange approach to constrained Markov processes: a survey and extension of results. Morfismos 7:1–26
Mandl P (1974) Estimation and control in Markov chains. Adv Appl Probab 6:40–60
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Dordrecht
Puterman ML (1994) Markov decision processes. In: Discrete stochastic dynamic programming. Wiley, New York
Ranga Rao R (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680
Rieder U (1978) Measurable selection theorems for optimization problems. Manuscripta Math 24:115–131
Schäl M (1975) Conditions for optimality and for the limit on \(n\)-stage optimal policies to be optimal. Z Wahrs Verw Gebiete 32:179–196
Vasicek O (1977) An equilibrium characterisation of the term structure. J Financ Econ 5:177–180
Schäl M (1987) Estimation and control in discounted stochastic dynamic programming. Stochastics 20:51–71
Wei Q, Guo X (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:274–368
Acknowledgments
Work supported by Consejo Nacional de Ciencia y Tecnología (CONACYT, MEXICO) under Grant CB2010/154612.
Author information
Authors and Affiliations
Corresponding author
Additional information
Work supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant CB2010/154612.
Rights and permissions
About this article
Cite this article
Minjárez-Sosa, J.A. Markov control models with unknown random state–action-dependent discount factors. TOP 23, 743–772 (2015). https://doi.org/10.1007/s11750-015-0360-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-015-0360-5
Keywords
- Discounted optimality
- Non-constant discount factors
- Estimation and control procedures
- Minimax control systems