ABSTRACT
Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.
- Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of Operations Research, 23, 769--805. Google ScholarDigital Library
- Brafman, R., & Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. of Machine Learning Research., 3, 213--231. Google ScholarDigital Library
- Calafiore, G., & El Ghaoui, L. (2006). On distributionally robust chance-constrained linear programs. Optimization Theory and Applications, 130, 1--22.Google ScholarDigital Library
- Charnes, A., & Cooper, W. (1959). Chance constrained programming. Management Science, 6, 73--79.Google ScholarDigital Library
- Dearden, R., Friedman, N., & Andre, D. (1999). Model-based Bayesian exploration. Proc. of Uncertainty in AI (pp. 150--159). Google ScholarDigital Library
- Filar, J., Krass, D., & Ross, K. (1995). Percentile performance criteria for limiting average Markov control problems. IEEE Trans. on Automatic Control, 40, 2--10.Google ScholarCross Ref
- Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis, second edition. Chapman & Hall/CRC.Google Scholar
- Givan, R., Leach, S., & Dean, T. (2000). Bounded-parameter Markov decision processes. Artificial Intelligence, 122, 71--109. Google ScholarDigital Library
- Howard, R. (1966). Information value theory. IEEE Trans. on Systems Science and Cybernetics, SSC-2, 22--26.Google ScholarCross Ref
- Iyengar, G. (2005). Robust dynamic programming. Mathematics of Operations Research, 30, 257--280. Google ScholarDigital Library
- Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proc. ICML (pp. 260--268). Google ScholarDigital Library
- Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second order cone programming. Linear Algebra and its App., 284, 193--228.Google Scholar
- Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. (2007). Bias and variance in value function estimation. Management Science, 53, 308--322. Google ScholarDigital Library
- Nemirovski, A., & Shapiro, A. (2006). Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17, 969--996. Google ScholarDigital Library
- Nilim, A., & El Ghaoui, L. Robust Markov decision processes with uncertain transition matrices. Operations Research, 53, 780--798. Google ScholarDigital Library
- Prékopa, A. (1995). Stochastic programming. Kluwer Academic Publishers.Google Scholar
- Putterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley. Google ScholarDigital Library
- Silver, E. (1963). Markovian decision processes with uncertain transition probabilities or rewards (Technical Report 1). Operations Research Center, MIT.Google Scholar
- Strehl, A., & Littman, M. (2005). A theoretical analysis of model-based interval estimation. Proc. ICML (pp. 857--864). Google ScholarDigital Library
- Percentile optimization in uncertain Markov decision processes with application to efficient exploration
Recommendations
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
Markov decision processes are an effective tool in modeling decision making in uncertain dynamic environments. Because the parameters of these models typically are estimated from data or learned from experience, it is not surprising that the actual ...
Reachability analysis of uncertain systems using bounded-parameter Markov decision processes
Verification of reachability properties for probabilistic systems is usually based on variants of Markov processes. Current methods assume an exact model of the dynamic behavior and are not suitable for realistic systems that operate in the presence of ...
Variability Sensitive Markov Decision Processes
Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in ...
Comments