Skip to main content

2016 | OriginalPaper | Buchkapitel

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

verfasst von : Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Base case: \(T_{\pi ,\psi } F \le F\). Inductive step: assume \(T^{i}_{\pi ,\psi } F \le T^{i-1}_{\pi ,\psi } F\) then \(T^{i+1}_{\pi ,\psi } F = g_{\pi ,\psi } + \gamma P_{\pi ,\psi } T^i_{\pi ,\psi } F \le g_{\pi ,\psi } + \gamma P_{\pi ,\psi } T^{i-1}_{\pi ,\psi } F = T^i_{\pi ,\psi } F \) and similarly for the base case \(T_{\pi ,\psi } F \ge F \;\square \).
 
Literatur
1.
Zurück zum Zitat Åström, K.J., Wittenmark, B.: Adaptive control. Courier Corporation, Mineola (2013) Åström, K.J., Wittenmark, B.: Adaptive control. Courier Corporation, Mineola (2013)
3.
Zurück zum Zitat Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATH Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATH
4.
Zurück zum Zitat Braun, D.A., Ortega, P.A., Theodorou, E., Schaal, S.: Path integral control and bounded rationality. In: 2011 IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 202–209. IEEE (2011) Braun, D.A., Ortega, P.A., Theodorou, E., Schaal, S.: Path integral control and bounded rationality. In: 2011 IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 202–209. IEEE (2011)
5.
Zurück zum Zitat van den Broek, B., Wiegerinck, W., Kappen, H.J.: Risk sensitive path integral control. In: UAI (2010) van den Broek, B., Wiegerinck, W., Kappen, H.J.: Risk sensitive path integral control. In: UAI (2010)
6.
Zurück zum Zitat Chow, Y., Tamar, A., Mannor, S., Pavone, M.: Risk-sensitive and robust decision-making: a CVaR optimization approach. In: Advances in Neural Information Processing Systems, pp. 1522–1530 (2015) Chow, Y., Tamar, A., Mannor, S., Pavone, M.: Risk-sensitive and robust decision-making: a CVaR optimization approach. In: Advances in Neural Information Processing Systems, pp. 1522–1530 (2015)
7.
Zurück zum Zitat Duff, M.O.: Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. Ph.d. thesis, University of Massachusetts Amherst (2002) Duff, M.O.: Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. Ph.d. thesis, University of Massachusetts Amherst (2002)
8.
Zurück zum Zitat Fox, R., Pakman, A., Tishby, N.: G-learning: Taming the noise in reinforcement learning via soft updates. arXiv preprint (2015). arXiv:1512.08562 Fox, R., Pakman, A., Tishby, N.: G-learning: Taming the noise in reinforcement learning via soft updates. arXiv preprint (2015). arXiv:​1512.​08562
9.
Zurück zum Zitat Geramifard, A., Dann, C., Klein, R.H., Dabney, W., How, J.P.: Rlpy: a value-function-based reinforcement learning framework for education and research. J. Mach. Learn. Res. 16, 1573–1578 (2015)MATH Geramifard, A., Dann, C., Klein, R.H., Dabney, W., How, J.P.: Rlpy: a value-function-based reinforcement learning framework for education and research. J. Mach. Learn. Res. 16, 1573–1578 (2015)MATH
10.
Zurück zum Zitat Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems, pp. 1025–1033 (2012) Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems, pp. 1025–1033 (2012)
11.
Zurück zum Zitat Guez, A., Silver, D., Dayan, P.: Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search. J. Artif. Intell. Res. 48, 841–883 (2013)MathSciNetMATH Guez, A., Silver, D., Dayan, P.: Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search. J. Artif. Intell. Res. 48, 841–883 (2013)MathSciNetMATH
12.
Zurück zum Zitat Hansen, L.P., Sargent, T.J.: Robustness. Princeton University Press, Princeton (2008)CrossRefMATH Hansen, L.P., Sargent, T.J.: Robustness. Princeton University Press, Princeton (2008)CrossRefMATH
14.
Zurück zum Zitat Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Phys. Rev. Lett. 95(20), 200201 (2005)MathSciNetCrossRef Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Phys. Rev. Lett. 95(20), 200201 (2005)MathSciNetCrossRef
15.
Zurück zum Zitat Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308–322 (2007)CrossRefMATH Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308–322 (2007)CrossRefMATH
16.
Zurück zum Zitat Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)MathSciNetCrossRefMATH Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)MathSciNetCrossRefMATH
17.
Zurück zum Zitat Ortega, P.A., Braun, D.A.: A Bayesian rule for adaptive control based on causal interventions. In: 3rd Conference on Artificial General Intelligence (AGI-2010), Atlantis Press (2010) Ortega, P.A., Braun, D.A.: A Bayesian rule for adaptive control based on causal interventions. In: 3rd Conference on Artificial General Intelligence (AGI-2010), Atlantis Press (2010)
18.
Zurück zum Zitat Ortega, P.A., Braun, D.A.: A minimum relative entropy principle for learning and acting. J. Artif. Intell. Res. 38(11), 475–511 (2010)MathSciNetMATH Ortega, P.A., Braun, D.A.: A minimum relative entropy principle for learning and acting. J. Artif. Intell. Res. 38(11), 475–511 (2010)MathSciNetMATH
19.
Zurück zum Zitat Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A. 469, 20120683 (2013). The Royal SocietyMathSciNetCrossRef Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A. 469, 20120683 (2013). The Royal SocietyMathSciNetCrossRef
20.
Zurück zum Zitat Ortega, P.A., Braun, D.A.: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adapt. Syst. Model. 2(1), 2 (2014)CrossRef Ortega, P.A., Braun, D.A.: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adapt. Syst. Model. 2(1), 2 (2014)CrossRef
21.
Zurück zum Zitat Ortega, P.A., Braun, D.A., Tishby, N.: Monte Carlo methods for exact & efficient solution of the generalized optimality equations. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4322–4327. IEEE (2014) Ortega, P.A., Braun, D.A., Tishby, N.: Monte Carlo methods for exact & efficient solution of the generalized optimality equations. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4322–4327. IEEE (2014)
22.
Zurück zum Zitat Osogami, T.: Robustness and risk-sensitivity in Markov decision processes. In: Advances in Neural Information Processing Systems, pp. 233–241 (2012) Osogami, T.: Robustness and risk-sensitivity in Markov decision processes. In: Advances in Neural Information Processing Systems, pp. 233–241 (2012)
23.
Zurück zum Zitat Peters, J., Mülling, K., Altun, Y., Poole, F.D., et al.: Relative entropy policy search. In: Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), pp. 1607–1612. AAAI Press (2010) Peters, J., Mülling, K., Altun, Y., Poole, F.D., et al.: Relative entropy policy search. In: Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), pp. 1607–1612. AAAI Press (2010)
24.
Zurück zum Zitat Ross, S., Pineau, J., Chaib-draa, B., Kreitmann, P.: A Bayesian approach for learning and planning in partially observable Markov decision processes. J. Mach. Learn. Res. 12, 1729–1770 (2011)MathSciNetMATH Ross, S., Pineau, J., Chaib-draa, B., Kreitmann, P.: A Bayesian approach for learning and planning in partially observable Markov decision processes. J. Mach. Learn. Res. 12, 1729–1770 (2011)MathSciNetMATH
25.
Zurück zum Zitat Rubin, J., Shamir, O., Tishby, N.: Trading value and information in MDPs. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. Intelligent Systems Reference Library, vol. 28, pp. 57–74. Springer, Heidelberg (2012)CrossRef Rubin, J., Shamir, O., Tishby, N.: Trading value and information in MDPs. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. Intelligent Systems Reference Library, vol. 28, pp. 57–74. Springer, Heidelberg (2012)CrossRef
26.
Zurück zum Zitat Shen, Y., Tobia, M.J., Sommer, T., Obermayer, K.: Risk-sensitive reinforcement learning. Neural Comput. 26(7), 1298–1328 (2014)MathSciNetCrossRef Shen, Y., Tobia, M.J., Sommer, T., Obermayer, K.: Risk-sensitive reinforcement learning. Neural Comput. 26(7), 1298–1328 (2014)MathSciNetCrossRef
27.
Zurück zum Zitat Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: Pac analysis. J. Mach. Learn. Res. 10, 2413–2444 (2009)MathSciNetMATH Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: Pac analysis. J. Mach. Learn. Res. 10, 2413–2444 (2009)MathSciNetMATH
28.
Zurück zum Zitat Szita, I., Lőrincz, A.: The many faces of optimism: a unifying approach. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1048–1055. ACM (2008) Szita, I., Lőrincz, A.: The many faces of optimism: a unifying approach. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1048–1055. ACM (2008)
29.
Zurück zum Zitat Szita, I., Szepesvári, C.: Model-based reinforcement learning with nearly tight exploration complexity bounds. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1031–1038 (2010) Szita, I., Szepesvári, C.: Model-based reinforcement learning with nearly tight exploration complexity bounds. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1031–1038 (2010)
30.
Zurück zum Zitat Tishby, N., Polani, D.: Information theory of decisions and actions. In: Cutsuridis, V., Hussain, A., Taylor, J.G. (eds.) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems, pp. 601–636. Springer, New York (2011)CrossRef Tishby, N., Polani, D.: Information theory of decisions and actions. In: Cutsuridis, V., Hussain, A., Taylor, J.G. (eds.) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems, pp. 601–636. Springer, New York (2011)CrossRef
31.
Zurück zum Zitat Todorov, E.: Linearly-solvable Markov decision problems. In: Advances in Neural Information Processing Systems, pp. 1369–1376 (2006) Todorov, E.: Linearly-solvable Markov decision problems. In: Advances in Neural Information Processing Systems, pp. 1369–1376 (2006)
32.
Zurück zum Zitat Todorov, E.: Efficient computation of optimal actions. Proc. Nat. Acad. Sci. 106(28), 11478–11483 (2009)CrossRefMATH Todorov, E.: Efficient computation of optimal actions. Proc. Nat. Acad. Sci. 106(28), 11478–11483 (2009)CrossRefMATH
Metadaten
Titel
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
verfasst von
Jordi Grau-Moya
Felix Leibfried
Tim Genewein
Daniel A. Braun
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46227-1_30