skip to main content
10.1145/1273496.1273525acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Published:20 June 2007Publication History

ABSTRACT

Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.

References

  1. Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of Operations Research, 23, 769--805. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brafman, R., & Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. of Machine Learning Research., 3, 213--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Calafiore, G., & El Ghaoui, L. (2006). On distributionally robust chance-constrained linear programs. Optimization Theory and Applications, 130, 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Charnes, A., & Cooper, W. (1959). Chance constrained programming. Management Science, 6, 73--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dearden, R., Friedman, N., & Andre, D. (1999). Model-based Bayesian exploration. Proc. of Uncertainty in AI (pp. 150--159). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Filar, J., Krass, D., & Ross, K. (1995). Percentile performance criteria for limiting average Markov control problems. IEEE Trans. on Automatic Control, 40, 2--10.Google ScholarGoogle ScholarCross RefCross Ref
  7. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis, second edition. Chapman & Hall/CRC.Google ScholarGoogle Scholar
  8. Givan, R., Leach, S., & Dean, T. (2000). Bounded-parameter Markov decision processes. Artificial Intelligence, 122, 71--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Howard, R. (1966). Information value theory. IEEE Trans. on Systems Science and Cybernetics, SSC-2, 22--26.Google ScholarGoogle ScholarCross RefCross Ref
  10. Iyengar, G. (2005). Robust dynamic programming. Mathematics of Operations Research, 30, 257--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proc. ICML (pp. 260--268). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second order cone programming. Linear Algebra and its App., 284, 193--228.Google ScholarGoogle Scholar
  13. Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. (2007). Bias and variance in value function estimation. Management Science, 53, 308--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nemirovski, A., & Shapiro, A. (2006). Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17, 969--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nilim, A., & El Ghaoui, L. Robust Markov decision processes with uncertain transition matrices. Operations Research, 53, 780--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Prékopa, A. (1995). Stochastic programming. Kluwer Academic Publishers.Google ScholarGoogle Scholar
  17. Putterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Silver, E. (1963). Markovian decision processes with uncertain transition probabilities or rewards (Technical Report 1). Operations Research Center, MIT.Google ScholarGoogle Scholar
  19. Strehl, A., & Littman, M. (2005). A theoretical analysis of model-based interval estimation. Proc. ICML (pp. 857--864). Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Percentile optimization in uncertain Markov decision processes with application to efficient exploration

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICML '07: Proceedings of the 24th international conference on Machine learning
            June 2007
            1233 pages
            ISBN:9781595937933
            DOI:10.1145/1273496

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 June 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate140of548submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader