Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies.
There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications.
Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit
In this paper, we introduce the class of
payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies.
This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.