ABSTRACT
Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.
- Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of Operations Research, 23, 769--805. Google ScholarDigital Library
- Brafman, R., & Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. of Machine Learning Research., 3, 213--231. Google ScholarDigital Library
- Calafiore, G., & El Ghaoui, L. (2006). On distributionally robust chance-constrained linear programs. Optimization Theory and Applications, 130, 1--22.Google ScholarDigital Library
- Charnes, A., & Cooper, W. (1959). Chance constrained programming. Management Science, 6, 73--79.Google ScholarDigital Library
- Dearden, R., Friedman, N., & Andre, D. (1999). Model-based Bayesian exploration. Proc. of Uncertainty in AI (pp. 150--159). Google ScholarDigital Library
- Filar, J., Krass, D., & Ross, K. (1995). Percentile performance criteria for limiting average Markov control problems. IEEE Trans. on Automatic Control, 40, 2--10.Google ScholarCross Ref
- Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis, second edition. Chapman & Hall/CRC.Google Scholar
- Givan, R., Leach, S., & Dean, T. (2000). Bounded-parameter Markov decision processes. Artificial Intelligence, 122, 71--109. Google ScholarDigital Library
- Howard, R. (1966). Information value theory. IEEE Trans. on Systems Science and Cybernetics, SSC-2, 22--26.Google ScholarCross Ref
- Iyengar, G. (2005). Robust dynamic programming. Mathematics of Operations Research, 30, 257--280. Google ScholarDigital Library
- Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proc. ICML (pp. 260--268). Google ScholarDigital Library
- Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second order cone programming. Linear Algebra and its App., 284, 193--228.Google Scholar
- Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. (2007). Bias and variance in value function estimation. Management Science, 53, 308--322. Google ScholarDigital Library
- Nemirovski, A., & Shapiro, A. (2006). Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17, 969--996. Google ScholarDigital Library
- Nilim, A., & El Ghaoui, L. Robust Markov decision processes with uncertain transition matrices. Operations Research, 53, 780--798. Google ScholarDigital Library
- Prékopa, A. (1995). Stochastic programming. Kluwer Academic Publishers.Google Scholar
- Putterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley. Google ScholarDigital Library
- Silver, E. (1963). Markovian decision processes with uncertain transition probabilities or rewards (Technical Report 1). Operations Research Center, MIT.Google Scholar
- Strehl, A., & Littman, M. (2005). A theoretical analysis of model-based interval estimation. Proc. ICML (pp. 857--864). Google ScholarDigital Library
- Percentile optimization in uncertain Markov decision processes with application to efficient exploration
Recommendations
Bias and variance in value function estimation
We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and ...
Comments