Abstract
We survey several computational procedures for the partially observed Markov decision process (POMDP) that have been developed since the Monahan survey was published in 1982. The POMDP generalizes the standard, completely observed Markov decision process by permitting the possibility that state observations may be noise-corrupted and/or costly. Several computational procedures presented are convergence accelerating variants of, or approximations to, the Smallwood-Sondik algorithm. Finite-memory suboptimal design results are reported, and new research directions involving heuristic search are discussed.
Similar content being viewed by others
References
K.J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10(1965)174–205.
K.J. Astrom, Optimal control of Markov processes with incomplete state information II. The convexity of the loss function, J. Math. Anal. Appl. 26(1969)403–406.
D.P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Englewood Cliffs, NJ, 1987).
H.T. Cheng, Algorithms for partially observed Markov decision processes, Ph.D. Dissertation, Commerce and Business Administration, University of British Columbia (1988).
J.E. Eckles, Optimum replacement of stochastically failing systems, Ph.D. Dissertation, Dept. Eng.-Econ. Syst., Stanford University, Stanford, CA (1966).
O. Hernández-Lerma and S.I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, Technical Report, Department of Electrical and Computer Engineering, University of Texas, Austin, TX (1987).
J.S. Kakalik, Optimum policies for partially observable Markov systems, Technical Report 18, Operations Research Center, MIT, Cambridge, MA (1965).
J.W. Lark, private communication (1989).
J.W. Lark and C.C. White, A heuristic search approach for solving finite-horizon, completely unobserved Markov decision processes, in preparation (1989).
W.S. Lovejoy, Computationally feasible bounds for partially observed Markov decision processes, Oper. Res. 39(1991)162–175.
G. Monahan, A survey of partially observable Markov decision processes: Theory, models, and algorithms, Manag. Sci. 28(1982)1–16.
A. Paz,Introduction to Probabilistic Automata (Academic Press, New York, 1971).
J. Pearl,Heuristics (Addison-Wesley, Reading, MA, 1984).
L.K. Platzman, Finite memory estimation and control of finite probabilistic systems, Ph.D. Dissertation (ESL-R1723), Dept. of Elec. Eng. and Comput. Sci., MIT, Cambridge, MA (1977).
L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM J. Control Optim. 18(1980)362–380.
M.L. Puterman and M.C. Shin, Action elimination procedures for modified policy iteration algorithms, Oper. Res. 30(1982)301–318.
M.L. Puterman and M.C. Shin, Modified policy iteration algorithms for discounted Markov decision processes, Manag. Sci. 24(1978)1127–1138.
R.D. Smallwood and E.J. Sondik, The optimal control of partially observable Markov processes over a finite horizon, Oper. Res. 21(1973)1071–1088.
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26(1978)282–304.
E.J. Sondik, The optimal control of partially-observable Markov processes, Ph.D. Dissertation, Eng.-Econ. Syst., Stanford University, Stanford, CA (1971).
E.J. Sondik and R. Mendelssohn, Information seeking in Markov decision processes, Southwest Fisheries Center Admin. Report H-79-13, National Marine Fisheries Service, Honolulu, HI (1979).
B.S. Stewart, Heuristic search with general order relation, Ph.D. Dissertation, Department of Systems Engineering, University of Virginia, Charlottesville, VA (1988).
C.C. White and D. Harrington, Application of Jensen's inequality for adaptive suboptimal design, J. Optim. Theory Appl. 32(1980)89–100.
C.C. White and W.T. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37(1989)791–797.
C.C. White and W.T. Scherer, Finite-memory suboptimal design for partially observed Markov decision processes, submitted for publication (1989).
C.C. White, L.C. Thomas and W.T. Scherer, Reward revision for discounted Markov decision processes, Oper. Res. 33(1985)1299–1315.
C.C. White and D.J. White, Markov decision processes, Eur. J. Oper. Res. 39(1989)1–16.
D.J. White, Further real applications of Markov decision processes, Interfaces 18(1988)55–61.
D.J. White, Real applications of Markov decision processes, Interface 15(1985)7–83.
Author information
Authors and Affiliations
Additional information
This research was supported by NSF Grant ECS-8708183 and ARO Contract DAAG-29-85-K0089.
Rights and permissions
About this article
Cite this article
White, C.C. A survey of solution techniques for the partially observed Markov decision process. Ann Oper Res 32, 215–230 (1991). https://doi.org/10.1007/BF02204836
Issue Date:
DOI: https://doi.org/10.1007/BF02204836