Abstract.
In this paper, the optimization problem for a stopped Markov decision process with finite states and actions is considered over stopping times τ constrained so that ?τ≦α for some fixed α>0. The problem is solved through randomization of stopping times and mathematical programming formulation by occupation measures. Another representation, called F-representation, of randomized stopping times is given, by which the concept of Markov or stationary randomized stopping times is introduced. We treat two types of occupation measures, running and stopped, but stopped occupation measure is shown to be expressed by running one. We study the properties of the set of running occupation measures achieved by different classes of pairs of policies and randomized stopping times. Analyzing the equivalent mathematical programming problem formulated by running occupation measures corresponding with stationary policies and stationary randomized stopping times, we prove the existence of an optimal constrained pair of stationary policy and stopping time requiring randomization in at most one state.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Manuscript received: September 2000/Final version received: November 2000
Rights and permissions
About this article
Cite this article
Horiguchi, M. Markov decision processes with a stopping time constraint. Mathematical Methods of OR 53, 279–295 (2001). https://doi.org/10.1007/PL00003996
Issue Date:
DOI: https://doi.org/10.1007/PL00003996