Article

Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Authors:
Erick Delage

Stanford University, Stanford, California

Stanford University, Stanford, California
View Profile

,
Shie Mannor

McGill University, Montreal, Quebec, Canada

McGill University, Montreal, Quebec, Canada
View Profile

ICML '07: Proceedings of the 24th international conference on Machine learningJune 2007Pages 225–232https://doi.org/10.1145/1273496.1273525

Published:20 June 2007Publication History

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 225–232

ABSTRACT

Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.

References

Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of Operations Research, 23, 769--805. Google ScholarDigital Library
Brafman, R., & Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. of Machine Learning Research., 3, 213--231. Google ScholarDigital Library
Calafiore, G., & El Ghaoui, L. (2006). On distributionally robust chance-constrained linear programs. Optimization Theory and Applications, 130, 1--22.Google ScholarDigital Library
Charnes, A., & Cooper, W. (1959). Chance constrained programming. Management Science, 6, 73--79.Google ScholarDigital Library
Dearden, R., Friedman, N., & Andre, D. (1999). Model-based Bayesian exploration. Proc. of Uncertainty in AI (pp. 150--159). Google ScholarDigital Library
Filar, J., Krass, D., & Ross, K. (1995). Percentile performance criteria for limiting average Markov control problems. IEEE Trans. on Automatic Control, 40, 2--10.Google ScholarCross Ref
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis, second edition. Chapman & Hall/CRC.Google Scholar
Givan, R., Leach, S., & Dean, T. (2000). Bounded-parameter Markov decision processes. Artificial Intelligence, 122, 71--109. Google ScholarDigital Library
Howard, R. (1966). Information value theory. IEEE Trans. on Systems Science and Cybernetics, SSC-2, 22--26.Google ScholarCross Ref
Iyengar, G. (2005). Robust dynamic programming. Mathematics of Operations Research, 30, 257--280. Google ScholarDigital Library
Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proc. ICML (pp. 260--268). Google ScholarDigital Library
Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second order cone programming. Linear Algebra and its App., 284, 193--228.Google Scholar
Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. (2007). Bias and variance in value function estimation. Management Science, 53, 308--322. Google ScholarDigital Library
Nemirovski, A., & Shapiro, A. (2006). Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17, 969--996. Google ScholarDigital Library
Nilim, A., & El Ghaoui, L. Robust Markov decision processes with uncertain transition matrices. Operations Research, 53, 780--798. Google ScholarDigital Library
Prékopa, A. (1995). Stochastic programming. Kluwer Academic Publishers.Google Scholar
Putterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley. Google ScholarDigital Library
Silver, E. (1963). Markovian decision processes with uncertain transition probabilities or rewards (Technical Report 1). Operations Research Center, MIT.Google Scholar
Strehl, A., & Littman, M. (2005). A theoretical analysis of model-based interval estimation. Proc. ICML (pp. 857--864). Google ScholarDigital Library

Percentile optimization in uncertain Markov decision processes with application to efficient exploration
1. Computing methodologies
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Bias and variance in value function estimation

ICML '04: Proceedings of the twenty-first international conference on Machine learning

We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and ...

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 345
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Percentile optimization in uncertain Markov decision processes with application to efficient exploration

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Bias and variance in value function estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Percentile optimization in uncertain Markov decision processes with application to efficient exploration

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Bias and variance in value function estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media