2007 | OriginalPaper | Buchkapitel
Pure Stationary Optimal Strategies in Markov Decision Processes
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies.
There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications.
Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit
pure stationary
optimal strategies.
In this paper, we introduce the class of
prefix-independent
and
submixing
payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies.
This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.