nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

verfasst von : Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Aspect Mining with Rating Bias

Nächstes Kapitel Subgroup Discovery with Proper Scoring Rules

Base case: \(T_{\pi ,\psi } F \le F\). Inductive step: assume \(T^{i}_{\pi ,\psi } F \le T^{i-1}_{\pi ,\psi } F\) then \(T^{i+1}_{\pi ,\psi } F = g_{\pi ,\psi } + \gamma P_{\pi ,\psi } T^i_{\pi ,\psi } F \le g_{\pi ,\psi } + \gamma P_{\pi ,\psi } T^{i-1}_{\pi ,\psi } F = T^i_{\pi ,\psi } F \) and similarly for the base case \(T_{\pi ,\psi } F \ge F \;\square \).

Åström, K.J., Wittenmark, B.: Adaptive control. Courier Corporation, Mineola (2013)

Bellman, R.: Dynamic Programming, 1st edn. Princeton University Press, Princeton (1957). http://books.google.com/books?id=fyVtp3EMxasC&pg=PR5&dq=dynamic+programming+richard+e+bellman&client=firefox-a#v=onepage&q=dynamic%20programming%20richard%20e%20bellman&f=false

Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATH

Braun, D.A., Ortega, P.A., Theodorou, E., Schaal, S.: Path integral control and bounded rationality. In: 2011 IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 202–209. IEEE (2011)

van den Broek, B., Wiegerinck, W., Kappen, H.J.: Risk sensitive path integral control. In: UAI (2010)

Chow, Y., Tamar, A., Mannor, S., Pavone, M.: Risk-sensitive and robust decision-making: a CVaR optimization approach. In: Advances in Neural Information Processing Systems, pp. 1522–1530 (2015)

Duff, M.O.: Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. Ph.d. thesis, University of Massachusetts Amherst (2002)

Fox, R., Pakman, A., Tishby, N.: G-learning: Taming the noise in reinforcement learning via soft updates. arXiv preprint (2015). arXiv:1512.08562

Geramifard, A., Dann, C., Klein, R.H., Dabney, W., How, J.P.: Rlpy: a value-function-based reinforcement learning framework for education and research. J. Mach. Learn. Res. 16, 1573–1578 (2015)MATH

10.

Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems, pp. 1025–1033 (2012)

11.

Guez, A., Silver, D., Dayan, P.: Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search. J. Artif. Intell. Res. 48, 841–883 (2013)MathSciNetMATH

12.

Hansen, L.P., Sargent, T.J.: Robustness. Princeton University Press, Princeton (2008)CrossRefMATH

13.

Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)MathSciNetCrossRefMATH

14.

Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Phys. Rev. Lett. 95(20), 200201 (2005)MathSciNetCrossRef

15.

Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308–322 (2007)CrossRefMATH

16.

Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)MathSciNetCrossRefMATH

17.

Ortega, P.A., Braun, D.A.: A Bayesian rule for adaptive control based on causal interventions. In: 3rd Conference on Artificial General Intelligence (AGI-2010), Atlantis Press (2010)

18.

Ortega, P.A., Braun, D.A.: A minimum relative entropy principle for learning and acting. J. Artif. Intell. Res. 38(11), 475–511 (2010)MathSciNetMATH

19.

Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A. 469, 20120683 (2013). The Royal SocietyMathSciNetCrossRef

20.

Ortega, P.A., Braun, D.A.: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adapt. Syst. Model. 2(1), 2 (2014)CrossRef

21.

Ortega, P.A., Braun, D.A., Tishby, N.: Monte Carlo methods for exact & efficient solution of the generalized optimality equations. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4322–4327. IEEE (2014)

22.

Osogami, T.: Robustness and risk-sensitivity in Markov decision processes. In: Advances in Neural Information Processing Systems, pp. 233–241 (2012)

23.

Peters, J., Mülling, K., Altun, Y., Poole, F.D., et al.: Relative entropy policy search. In: Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), pp. 1607–1612. AAAI Press (2010)

24.

Ross, S., Pineau, J., Chaib-draa, B., Kreitmann, P.: A Bayesian approach for learning and planning in partially observable Markov decision processes. J. Mach. Learn. Res. 12, 1729–1770 (2011)MathSciNetMATH

25.

Rubin, J., Shamir, O., Tishby, N.: Trading value and information in MDPs. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. Intelligent Systems Reference Library, vol. 28, pp. 57–74. Springer, Heidelberg (2012)CrossRef

26.

Shen, Y., Tobia, M.J., Sommer, T., Obermayer, K.: Risk-sensitive reinforcement learning. Neural Comput. 26(7), 1298–1328 (2014)MathSciNetCrossRef

27.

Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: Pac analysis. J. Mach. Learn. Res. 10, 2413–2444 (2009)MathSciNetMATH

28.

Szita, I., Lőrincz, A.: The many faces of optimism: a unifying approach. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1048–1055. ACM (2008)

29.

Szita, I., Szepesvári, C.: Model-based reinforcement learning with nearly tight exploration complexity bounds. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1031–1038 (2010)

30.

Tishby, N., Polani, D.: Information theory of decisions and actions. In: Cutsuridis, V., Hussain, A., Taylor, J.G. (eds.) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems, pp. 601–636. Springer, New York (2011)CrossRef

31.

Todorov, E.: Linearly-solvable Markov decision problems. In: Advances in Neural Information Processing Systems, pp. 1369–1376 (2006)

32.

Todorov, E.: Efficient computation of optimal actions. Proc. Nat. Acad. Sci. 106(28), 11478–11483 (2009)CrossRefMATH

33.

Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153–183 (2013)MathSciNetCrossRefMATH

Titel: Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
verfasst von: Jordi Grau-Moya
Felix Leibfried
Tim Genewein
Daniel A. Braun
Verlag: Springer International Publishing
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-319-46226-4

Electronic ISBN: 978-3-319-46227-1

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-46227-1_30

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"