Skip to main content
Log in

A survey of solution techniques for the partially observed Markov decision process

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

We survey several computational procedures for the partially observed Markov decision process (POMDP) that have been developed since the Monahan survey was published in 1982. The POMDP generalizes the standard, completely observed Markov decision process by permitting the possibility that state observations may be noise-corrupted and/or costly. Several computational procedures presented are convergence accelerating variants of, or approximations to, the Smallwood-Sondik algorithm. Finite-memory suboptimal design results are reported, and new research directions involving heuristic search are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K.J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10(1965)174–205.

    Google Scholar 

  2. K.J. Astrom, Optimal control of Markov processes with incomplete state information II. The convexity of the loss function, J. Math. Anal. Appl. 26(1969)403–406.

    Google Scholar 

  3. D.P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Englewood Cliffs, NJ, 1987).

    Google Scholar 

  4. H.T. Cheng, Algorithms for partially observed Markov decision processes, Ph.D. Dissertation, Commerce and Business Administration, University of British Columbia (1988).

  5. J.E. Eckles, Optimum replacement of stochastically failing systems, Ph.D. Dissertation, Dept. Eng.-Econ. Syst., Stanford University, Stanford, CA (1966).

    Google Scholar 

  6. O. Hernández-Lerma and S.I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, Technical Report, Department of Electrical and Computer Engineering, University of Texas, Austin, TX (1987).

    Google Scholar 

  7. J.S. Kakalik, Optimum policies for partially observable Markov systems, Technical Report 18, Operations Research Center, MIT, Cambridge, MA (1965).

    Google Scholar 

  8. J.W. Lark, private communication (1989).

  9. J.W. Lark and C.C. White, A heuristic search approach for solving finite-horizon, completely unobserved Markov decision processes, in preparation (1989).

  10. W.S. Lovejoy, Computationally feasible bounds for partially observed Markov decision processes, Oper. Res. 39(1991)162–175.

    Google Scholar 

  11. G. Monahan, A survey of partially observable Markov decision processes: Theory, models, and algorithms, Manag. Sci. 28(1982)1–16.

    Google Scholar 

  12. A. Paz,Introduction to Probabilistic Automata (Academic Press, New York, 1971).

    Google Scholar 

  13. J. Pearl,Heuristics (Addison-Wesley, Reading, MA, 1984).

    Google Scholar 

  14. L.K. Platzman, Finite memory estimation and control of finite probabilistic systems, Ph.D. Dissertation (ESL-R1723), Dept. of Elec. Eng. and Comput. Sci., MIT, Cambridge, MA (1977).

    Google Scholar 

  15. L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM J. Control Optim. 18(1980)362–380.

    Google Scholar 

  16. M.L. Puterman and M.C. Shin, Action elimination procedures for modified policy iteration algorithms, Oper. Res. 30(1982)301–318.

    Google Scholar 

  17. M.L. Puterman and M.C. Shin, Modified policy iteration algorithms for discounted Markov decision processes, Manag. Sci. 24(1978)1127–1138.

    Google Scholar 

  18. R.D. Smallwood and E.J. Sondik, The optimal control of partially observable Markov processes over a finite horizon, Oper. Res. 21(1973)1071–1088.

    Google Scholar 

  19. E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26(1978)282–304.

    Google Scholar 

  20. E.J. Sondik, The optimal control of partially-observable Markov processes, Ph.D. Dissertation, Eng.-Econ. Syst., Stanford University, Stanford, CA (1971).

    Google Scholar 

  21. E.J. Sondik and R. Mendelssohn, Information seeking in Markov decision processes, Southwest Fisheries Center Admin. Report H-79-13, National Marine Fisheries Service, Honolulu, HI (1979).

    Google Scholar 

  22. B.S. Stewart, Heuristic search with general order relation, Ph.D. Dissertation, Department of Systems Engineering, University of Virginia, Charlottesville, VA (1988).

    Google Scholar 

  23. C.C. White and D. Harrington, Application of Jensen's inequality for adaptive suboptimal design, J. Optim. Theory Appl. 32(1980)89–100.

    Google Scholar 

  24. C.C. White and W.T. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37(1989)791–797.

    Google Scholar 

  25. C.C. White and W.T. Scherer, Finite-memory suboptimal design for partially observed Markov decision processes, submitted for publication (1989).

  26. C.C. White, L.C. Thomas and W.T. Scherer, Reward revision for discounted Markov decision processes, Oper. Res. 33(1985)1299–1315.

    Google Scholar 

  27. C.C. White and D.J. White, Markov decision processes, Eur. J. Oper. Res. 39(1989)1–16.

    Google Scholar 

  28. D.J. White, Further real applications of Markov decision processes, Interfaces 18(1988)55–61.

    Google Scholar 

  29. D.J. White, Real applications of Markov decision processes, Interface 15(1985)7–83.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This research was supported by NSF Grant ECS-8708183 and ARO Contract DAAG-29-85-K0089.

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, C.C. A survey of solution techniques for the partially observed Markov decision process. Ann Oper Res 32, 215–230 (1991). https://doi.org/10.1007/BF02204836

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02204836

Keywords

Navigation