nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Bayesian Inference for Least Squares Temporal Difference Regularization

verfasst von : Nikolaos Tziortziotis, Christos Dimitrakakis

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper proposes a fully Bayesian approach for Least-Squares Temporal Differences (LSTD), resulting in fully probabilistic inference of value functions that avoids the overfitting commonly experienced with classical LSTD when the number of features is larger than the number of samples. Sparse Bayesian learning provides an elegant solution through the introduction of a prior over value function parameters. This gives us the advantages of probabilistic predictions, a sparse model, and good generalisation capabilities, as irrelevant parameters are marginalised out. The algorithm efficiently approximates the posterior distribution through variational inference. We demonstrate the ability of the algorithm in avoiding overfitting experimentally.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Bayesian Heatmaps: Probabilistic Classification with Multiple Unreliable Information Sources

Nächstes Kapitel Discovery of Causal Models that Contain Latent Variables Through Bayesian Scoring of Independence Constraints

With the starting state \(s_0 \sim d(\cdot )\) sampled from some starting distribution d.

The squared norm \(\Vert \varvec{u}\Vert ^2_D = \varvec{u}^{\top }D\varvec{u}\) is weighted by the non-negative diagonal matrix \(D \in {\mathbb {R}}^{|{\mathcal {S}}|\times |{\mathcal {S}}|}\) with elements d(s) on its diagonal.

Based on LASSO regression, which uses \(\ell _1\)-regularization.

P-matrix is a squared matrix with all of its principal minors positive (superset of the class of positive definite matrices).

Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)CrossRefMATH

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)MATH

Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: Uncertainty in Artificial Intelligence, pp. 46–53 (2000)

Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. CoRR (2016)

Boyan, J.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2), 233–246 (2002)CrossRefMATH

Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22(1), 33–57 (1996)MATH

Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: a survey and comparison. J. Mach. Learn. Res. 15, 809–883 (2014)MathSciNetMATH

Efron, B., Hastie, T., Johnstone, L., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)MathSciNetCrossRefMATH

Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian process. In: International Conference on Machine Learning, pp. 201–208 (2005)

10.

Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. Adv. Neural Inf. Process. Syst. 21, 441–448 (2008)MATH

11.

Geist, M., Scherrer, B.: \(\ell \)1-penalized projected Bellman residual. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 89–101. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_12 CrossRef

12.

Geist, M., Scherrer, B., Lazaric, A., Ghavamzadeh, M.: A Dantzig selector approach to temporal difference learning. In: International Conference on Machine Learning, pp. 1399–1406 (2012)

13.

Geramifard, A., Bowling, M., Sutton, R.S.: Incremental least-square temporal difference learning. In: The Twenty-first National Conference on Artificial Intelligence (AAAI), pp. 356–361 (2006)

14.

Ghavamzadeh, M., Lazaric, A., Munos, R., Hoffman, M.W.: Finite-sample analysis of Lasso-TD. In: International Conference on Machine Learning, pp. 1177–1184 (2011)

15.

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining Inference and Prediction. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-84858-7 CrossRefMATH

16.

Hoffman, M.W., Lazaric, A., Ghavamzadeh, M., Munos, R.: Regularized least squares temporal difference learning with nested \(\ell \)2 and \(\ell \)1 penalization. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 102–114. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_13 CrossRef

17.

Johns, J., Painter-wakefield, C., Parr, R.: Linear complementarity for regularized policy evaluation and improvement. Adv. Neural Inf. Process. Syst. 23, 1009–1017 (2010)

18.

Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)CrossRefMATH

19.

Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: International Conference on Machine Learning, pp. 521–528 (2009)

20.

Lagoudakis, M., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)MathSciNetMATH

21.

Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of LSTD. In: International Conference on Machine Learning, pp. 615–622 (2010)

22.

Liu, B., Zhang, L., Liu, J.: Dantzig selector with an approximately optimal denoising matrix and its application in sparse reinforcement learning. In: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI (2016)

23.

Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance in value function estimation. In: International Conference on Machine Learning (2004)

24.

Nedić, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. 13(1), 79–110 (2003)MathSciNetMATH

25.

Painter-Wakefield, C., Parr, R.: Greedy algorithms for sparse reinforcement learning. In: International Conference on Machine Learning (2012)

26.

Parisi, G.: Statistical field theory. In: Frontiers in Physics. Addison-Wesley, Boston (1988)

27.

Pires, B.A.: Statistical analysis of l1-penalized linear estimation with applications (2011)

28.

Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New Jersey (2005)MATH

29.

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

30.

Sutton, R., Maei, H., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: International Conference on Machine Learning, pp. 993–1000 (2009)

31.

Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)MathSciNetMATH

32.

Tziortziotis, N.: Machine Learning for Intelligent Agents. Ph.D. thesis, Department of Computer Science and Engineering, University of Ioannina, Greece (2015)

33.

Tziortziotis, N., Blekas, K.: Value function approximation through sparse Bayesian modeling. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 128–139. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_15 CrossRef

34.

Vlassis, N., Ghavamzadeh, M., Mannor, S., Poupart, P.: Reinforcement learning. In: Wiering, M., Van Otterlo, M. (eds.) Bayesian Reinforcement Learning, vol. 12, pp. 359–386. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_11 CrossRef

Titel: Bayesian Inference for Least Squares Temporal Difference Regularization
verfasst von: Nikolaos Tziortziotis
Christos Dimitrakakis
Verlag: Springer International Publishing
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-319-71245-1

Electronic ISBN: 978-3-319-71246-8

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-71246-8_8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"