Skip to main content
Log in

Markov control models with unknown random state–action-dependent discount factors

  • Original Paper
  • Published:
TOP Aims and scope Submit manuscript

Abstract

The paper deals with a class of discounted discrete-time Markov control models with non-constant discount factors of the form \(\tilde{\alpha } (x_{n},a_{n},\xi _{n+1})\), where \(x_{n},a_{n},\) and \(\xi _{n+1}\) are the state, the action, and a random disturbance at time \(n,\) respectively, taking values in Borel spaces. Assuming that the one-stage cost is possibly unbounded and that the distributions of \(\xi _{n}\) are unknown, we study the corresponding optimal control problem under two settings. Firstly we assume that the random disturbance process \(\left\{ \xi _{n}\right\} \) is formed by observable independent and identically distributed random variables, and then we introduce an estimation and control procedure to construct strategies. Instead, in the second one, \(\left\{ \xi _{n}\right\} \) is assumed to be non-observable whose distributions may change from stage to stage, and in this case the problem is studied as a minimax control problem in which the controller has an opponent selecting the distribution of the corresponding random disturbance at each stage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altman E (1999) Constrained Markov decision processes. Chapman and Hall, London

    Google Scholar 

  • Ash RB (1972) Real analysis and probability. Academic Press, New York

    Google Scholar 

  • Bertsekas DP, Shreve SE (1978) Stochastic optimal control: the discrete-time case. Academic Press, New York

    Google Scholar 

  • Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  • Borkar VS (1998) A convex analytic approach to Markov decision processes. Probab Theory Relat Fields 78:583–602

    Article  Google Scholar 

  • Brigo D, Mercurio F (2007) Interest rate models: theory and practice. Springer, New York

    Google Scholar 

  • Carmon Y, Shwartz A (2009) Markov decision processes with exponentially representable discounting. Oper Res Lett 37:51–55

    Article  Google Scholar 

  • Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer, New York

    Book  Google Scholar 

  • Feinberg EA, Shwartz A (1994) Markov decision models with weighted discounted criteria. Math Oper Res 19:152–168

    Article  Google Scholar 

  • Feinberg EA, Shwartz A (1995) Constrained Markov decision models with weighted discounted rewards. Math Oper Res 20:302–320

    Article  Google Scholar 

  • Feinberg EA, Shwartz A (1999) Constrained dynamic programming with two discount factors: applications and an algorithm. IEEE Trans Autom Control 44:628–631

    Article  Google Scholar 

  • González-Hernández J, López-Martínez RR, Pérez-Hernández R (2007) Markov control processes with randomized discounted cost in Borel space. Math Methods Oper Res 65:27–44

    Article  Google Scholar 

  • González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2008) Adaptive policies for stochastic systems under a randomized discounted criterion. Bol Soc Mat Mex 14:149–163

    Google Scholar 

  • González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45:737–754

    Google Scholar 

  • González-Hernández J, López-Martínez RR, Minjárez-Sosa JA, Gabriel-Arguelles JR (2013) Constrained Markov control processes with randomized discounted cost criteria: occupation measures and extremal points. Risk Decis Anal 4:163–176

    Google Scholar 

  • González-Hernández J, López-Martínez RR, Minjárez-Sosa JA, Gabriel-Arguelles JR (2014) Constrained Markov control processes with randomized discounted rate: infinite linear programming approach. Optim Control Appl Methods 35:575–591

    Article  Google Scholar 

  • González-Trejo TJ, Hernández-Lerma O, Hoyos-Reyes LF (2003) Minimax control of discrete-time stochastic systems. SIAM J Control Optim 41:1626–1659

    Article  Google Scholar 

  • Gordienko EI, Minjárez-Sosa JA (1998) Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34:217–234

    Google Scholar 

  • Heath D, Jarrow R, Morton A (1992) Bond pricing and the term structure of interest rates: a new methodology. Econometrica 60:77–105

    Article  Google Scholar 

  • Hernández-Lerma O (1989) Adaptive Markov control processes. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285

    Article  Google Scholar 

  • Hilgert N, Minjárez-Sosa JA (2001) Adaptive policies for time-varying stochastic systems under discounted criterion. Math Methods Oper Res 54:491–505

    Article  Google Scholar 

  • Hilgert N, Minjárez-Sosa JA (2006) Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math Methods Oper Res 63:443–460

    Article  Google Scholar 

  • Hinderer K (1979) Foundations of non-stationary dynamic programming with discrete time parameter. In: Lecture Notes Oper. Res., vol 33. Springer, New York

  • Hordjik A, Yushkevich AA (1999) Blackwell optimality in the class of all policies in Markov decision chains with Borel state space an unbounded rewards. Math Methods Oper Res 50:421–448

    Article  Google Scholar 

  • Iyengar GN (2005) Robust dynamic programming. Math Oper Res 30:257–280

    Article  Google Scholar 

  • Jaskiewicz A, Nowak AS (2011) Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn Games Appl 1:253–279

    Article  Google Scholar 

  • López-Martínez RR, Hernández-Lerma O (2003) The Lagrange approach to constrained Markov processes: a survey and extension of results. Morfismos 7:1–26

    Google Scholar 

  • Mandl P (1974) Estimation and control in Markov chains. Adv Appl Probab 6:40–60

    Article  Google Scholar 

  • Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Dordrecht

    Book  Google Scholar 

  • Puterman ML (1994) Markov decision processes. In: Discrete stochastic dynamic programming. Wiley, New York

  • Ranga Rao R (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680

    Article  Google Scholar 

  • Rieder U (1978) Measurable selection theorems for optimization problems. Manuscripta Math 24:115–131

    Article  Google Scholar 

  • Schäl M (1975) Conditions for optimality and for the limit on \(n\)-stage optimal policies to be optimal. Z Wahrs Verw Gebiete 32:179–196

    Article  Google Scholar 

  • Vasicek O (1977) An equilibrium characterisation of the term structure. J Financ Econ 5:177–180

    Article  Google Scholar 

  • Schäl M (1987) Estimation and control in discounted stochastic dynamic programming. Stochastics 20:51–71

    Article  Google Scholar 

  • Wei Q, Guo X (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:274–368

    Google Scholar 

Download references

Acknowledgments

Work supported by Consejo Nacional de Ciencia y Tecnología (CONACYT, MEXICO) under Grant CB2010/154612.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Adolfo Minjárez-Sosa.

Additional information

Work supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant CB2010/154612.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Minjárez-Sosa, J.A. Markov control models with unknown random state–action-dependent discount factors. TOP 23, 743–772 (2015). https://doi.org/10.1007/s11750-015-0360-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11750-015-0360-5

Keywords

Mathematics Subject Classification

Navigation