Markov control models with unknown random state–action-dependent discount factors

Minjárez-Sosa, J. Adolfo

doi:10.1007/s11750-015-0360-5

Markov control models with unknown random state–action-dependent discount factors

Original Paper
Published: 13 February 2015

Volume 23, pages 743–772, (2015)
Cite this article

TOP Aims and scope Submit manuscript

J. Adolfo Minjárez-Sosa¹

279 Accesses
12 Citations
Explore all metrics

Abstract

The paper deals with a class of discounted discrete-time Markov control models with non-constant discount factors of the form \(\tilde{\alpha } (x_{n},a_{n},\xi _{n+1})\), where \(x_{n},a_{n},\) and \(\xi _{n+1}\) are the state, the action, and a random disturbance at time \(n,\) respectively, taking values in Borel spaces. Assuming that the one-stage cost is possibly unbounded and that the distributions of \(\xi _{n}\) are unknown, we study the corresponding optimal control problem under two settings. Firstly we assume that the random disturbance process \(\left\{ \xi _{n}\right\} \) is formed by observable independent and identically distributed random variables, and then we introduce an estimation and control procedure to construct strategies. Instead, in the second one, \(\left\{ \xi _{n}\right\} \) is assumed to be non-observable whose distributions may change from stage to stage, and in this case the problem is studied as a minimax control problem in which the controller has an opponent selecting the distribution of the corresponding random disturbance at each stage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review on model predictive control: an engineering perspective

Article Open access 11 August 2021

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

References

Altman E (1999) Constrained Markov decision processes. Chapman and Hall, London
Google Scholar
Ash RB (1972) Real analysis and probability. Academic Press, New York
Google Scholar
Bertsekas DP, Shreve SE (1978) Stochastic optimal control: the discrete-time case. Academic Press, New York
Google Scholar
Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs
Google Scholar
Borkar VS (1998) A convex analytic approach to Markov decision processes. Probab Theory Relat Fields 78:583–602
Article Google Scholar
Brigo D, Mercurio F (2007) Interest rate models: theory and practice. Springer, New York
Google Scholar
Carmon Y, Shwartz A (2009) Markov decision processes with exponentially representable discounting. Oper Res Lett 37:51–55
Article Google Scholar
Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer, New York
Book Google Scholar
Feinberg EA, Shwartz A (1994) Markov decision models with weighted discounted criteria. Math Oper Res 19:152–168
Article Google Scholar
Feinberg EA, Shwartz A (1995) Constrained Markov decision models with weighted discounted rewards. Math Oper Res 20:302–320
Article Google Scholar
Feinberg EA, Shwartz A (1999) Constrained dynamic programming with two discount factors: applications and an algorithm. IEEE Trans Autom Control 44:628–631
Article Google Scholar
González-Hernández J, López-Martínez RR, Pérez-Hernández R (2007) Markov control processes with randomized discounted cost in Borel space. Math Methods Oper Res 65:27–44
Article Google Scholar
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2008) Adaptive policies for stochastic systems under a randomized discounted criterion. Bol Soc Mat Mex 14:149–163
Google Scholar
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45:737–754
Google Scholar
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA, Gabriel-Arguelles JR (2013) Constrained Markov control processes with randomized discounted cost criteria: occupation measures and extremal points. Risk Decis Anal 4:163–176
Google Scholar
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA, Gabriel-Arguelles JR (2014) Constrained Markov control processes with randomized discounted rate: infinite linear programming approach. Optim Control Appl Methods 35:575–591
Article Google Scholar
González-Trejo TJ, Hernández-Lerma O, Hoyos-Reyes LF (2003) Minimax control of discrete-time stochastic systems. SIAM J Control Optim 41:1626–1659
Article Google Scholar
Gordienko EI, Minjárez-Sosa JA (1998) Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34:217–234
Google Scholar
Heath D, Jarrow R, Morton A (1992) Bond pricing and the term structure of interest rates: a new methodology. Econometrica 60:77–105
Article Google Scholar
Hernández-Lerma O (1989) Adaptive Markov control processes. Springer, New York
Book Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Book Google Scholar
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Book Google Scholar
Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285
Article Google Scholar
Hilgert N, Minjárez-Sosa JA (2001) Adaptive policies for time-varying stochastic systems under discounted criterion. Math Methods Oper Res 54:491–505
Article Google Scholar
Hilgert N, Minjárez-Sosa JA (2006) Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math Methods Oper Res 63:443–460
Article Google Scholar
Hinderer K (1979) Foundations of non-stationary dynamic programming with discrete time parameter. In: Lecture Notes Oper. Res., vol 33. Springer, New York
Hordjik A, Yushkevich AA (1999) Blackwell optimality in the class of all policies in Markov decision chains with Borel state space an unbounded rewards. Math Methods Oper Res 50:421–448
Article Google Scholar
Iyengar GN (2005) Robust dynamic programming. Math Oper Res 30:257–280
Article Google Scholar
Jaskiewicz A, Nowak AS (2011) Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn Games Appl 1:253–279
Article Google Scholar
López-Martínez RR, Hernández-Lerma O (2003) The Lagrange approach to constrained Markov processes: a survey and extension of results. Morfismos 7:1–26
Google Scholar
Mandl P (1974) Estimation and control in Markov chains. Adv Appl Probab 6:40–60
Article Google Scholar
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Dordrecht
Book Google Scholar
Puterman ML (1994) Markov decision processes. In: Discrete stochastic dynamic programming. Wiley, New York
Ranga Rao R (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680
Article Google Scholar
Rieder U (1978) Measurable selection theorems for optimization problems. Manuscripta Math 24:115–131
Article Google Scholar
Schäl M (1975) Conditions for optimality and for the limit on \(n\)-stage optimal policies to be optimal. Z Wahrs Verw Gebiete 32:179–196
Article Google Scholar
Vasicek O (1977) An equilibrium characterisation of the term structure. J Financ Econ 5:177–180
Article Google Scholar
Schäl M (1987) Estimation and control in discounted stochastic dynamic programming. Stochastics 20:51–71
Article Google Scholar
Wei Q, Guo X (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:274–368
Google Scholar

Download references

Acknowledgments

Work supported by Consejo Nacional de Ciencia y Tecnología (CONACYT, MEXICO) under Grant CB2010/154612.

Author information

Authors and Affiliations

Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Col. Centro, 83000, Hermosillo, Sonora, Mexico
J. Adolfo Minjárez-Sosa

Authors

J. Adolfo Minjárez-Sosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Adolfo Minjárez-Sosa.

Additional information

Work supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant CB2010/154612.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minjárez-Sosa, J.A. Markov control models with unknown random state–action-dependent discount factors. TOP 23, 743–772 (2015). https://doi.org/10.1007/s11750-015-0360-5

Download citation

Received: 01 April 2014
Accepted: 04 February 2015
Published: 13 February 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s11750-015-0360-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov control models with unknown random state–action-dependent discount factors

Abstract

Access this article

Similar content being viewed by others

Review on model predictive control: an engineering perspective

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Markov control models with unknown random state–action-dependent discount factors

Abstract

Access this article

Similar content being viewed by others

Review on model predictive control: an engineering perspective

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation