A survey of solution techniques for the partially observed Markov decision process

White, Chelsea C.

doi:10.1007/BF02204836

A survey of solution techniques for the partially observed Markov decision process

Published: December 1991

Volume 32, pages 215–230, (1991)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Chelsea C. White III¹

963 Accesses
102 Citations
3 Altmetric
Explore all metrics

Abstract

We survey several computational procedures for the partially observed Markov decision process (POMDP) that have been developed since the Monahan survey was published in 1982. The POMDP generalizes the standard, completely observed Markov decision process by permitting the possibility that state observations may be noise-corrupted and/or costly. Several computational procedures presented are convergence accelerating variants of, or approximations to, the Smallwood-Sondik algorithm. Finite-memory suboptimal design results are reported, and new research directions involving heuristic search are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

K.J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10(1965)174–205.
Google Scholar
K.J. Astrom, Optimal control of Markov processes with incomplete state information II. The convexity of the loss function, J. Math. Anal. Appl. 26(1969)403–406.
Google Scholar
D.P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Englewood Cliffs, NJ, 1987).
Google Scholar
H.T. Cheng, Algorithms for partially observed Markov decision processes, Ph.D. Dissertation, Commerce and Business Administration, University of British Columbia (1988).
J.E. Eckles, Optimum replacement of stochastically failing systems, Ph.D. Dissertation, Dept. Eng.-Econ. Syst., Stanford University, Stanford, CA (1966).
Google Scholar
O. Hernández-Lerma and S.I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, Technical Report, Department of Electrical and Computer Engineering, University of Texas, Austin, TX (1987).
Google Scholar
J.S. Kakalik, Optimum policies for partially observable Markov systems, Technical Report 18, Operations Research Center, MIT, Cambridge, MA (1965).
Google Scholar
J.W. Lark, private communication (1989).
J.W. Lark and C.C. White, A heuristic search approach for solving finite-horizon, completely unobserved Markov decision processes, in preparation (1989).
W.S. Lovejoy, Computationally feasible bounds for partially observed Markov decision processes, Oper. Res. 39(1991)162–175.
Google Scholar
G. Monahan, A survey of partially observable Markov decision processes: Theory, models, and algorithms, Manag. Sci. 28(1982)1–16.
Google Scholar
A. Paz,Introduction to Probabilistic Automata (Academic Press, New York, 1971).
Google Scholar
J. Pearl,Heuristics (Addison-Wesley, Reading, MA, 1984).
Google Scholar
L.K. Platzman, Finite memory estimation and control of finite probabilistic systems, Ph.D. Dissertation (ESL-R1723), Dept. of Elec. Eng. and Comput. Sci., MIT, Cambridge, MA (1977).
Google Scholar
L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM J. Control Optim. 18(1980)362–380.
Google Scholar
M.L. Puterman and M.C. Shin, Action elimination procedures for modified policy iteration algorithms, Oper. Res. 30(1982)301–318.
Google Scholar
M.L. Puterman and M.C. Shin, Modified policy iteration algorithms for discounted Markov decision processes, Manag. Sci. 24(1978)1127–1138.
Google Scholar
R.D. Smallwood and E.J. Sondik, The optimal control of partially observable Markov processes over a finite horizon, Oper. Res. 21(1973)1071–1088.
Google Scholar
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26(1978)282–304.
Google Scholar
E.J. Sondik, The optimal control of partially-observable Markov processes, Ph.D. Dissertation, Eng.-Econ. Syst., Stanford University, Stanford, CA (1971).
Google Scholar
E.J. Sondik and R. Mendelssohn, Information seeking in Markov decision processes, Southwest Fisheries Center Admin. Report H-79-13, National Marine Fisheries Service, Honolulu, HI (1979).
Google Scholar
B.S. Stewart, Heuristic search with general order relation, Ph.D. Dissertation, Department of Systems Engineering, University of Virginia, Charlottesville, VA (1988).
Google Scholar
C.C. White and D. Harrington, Application of Jensen's inequality for adaptive suboptimal design, J. Optim. Theory Appl. 32(1980)89–100.
Google Scholar
C.C. White and W.T. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37(1989)791–797.
Google Scholar
C.C. White and W.T. Scherer, Finite-memory suboptimal design for partially observed Markov decision processes, submitted for publication (1989).
C.C. White, L.C. Thomas and W.T. Scherer, Reward revision for discounted Markov decision processes, Oper. Res. 33(1985)1299–1315.
Google Scholar
C.C. White and D.J. White, Markov decision processes, Eur. J. Oper. Res. 39(1989)1–16.
Google Scholar
D.J. White, Further real applications of Markov decision processes, Interfaces 18(1988)55–61.
Google Scholar
D.J. White, Real applications of Markov decision processes, Interface 15(1985)7–83.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Operations Engineering, University of Michigan, 48109-2117, Ann Arbor, Michigan, USA
Chelsea C. White III

Authors

Chelsea C. White III
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This research was supported by NSF Grant ECS-8708183 and ARO Contract DAAG-29-85-K0089.

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, C.C. A survey of solution techniques for the partially observed Markov decision process. Ann Oper Res 32, 215–230 (1991). https://doi.org/10.1007/BF02204836

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02204836

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of solution techniques for the partially observed Markov decision process

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Genetic algorithms: theory, genetic operators, solutions, and applications

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of solution techniques for the partially observed Markov decision process

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Genetic algorithms: theory, genetic operators, solutions, and applications

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation