Skip to main content

2017 | OriginalPaper | Buchkapitel

Bayesian Inference for Least Squares Temporal Difference Regularization

verfasst von : Nikolaos Tziortziotis, Christos Dimitrakakis

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a fully Bayesian approach for Least-Squares Temporal Differences (LSTD), resulting in fully probabilistic inference of value functions that avoids the overfitting commonly experienced with classical LSTD when the number of features is larger than the number of samples. Sparse Bayesian learning provides an elegant solution through the introduction of a prior over value function parameters. This gives us the advantages of probabilistic predictions, a sparse model, and good generalisation capabilities, as irrelevant parameters are marginalised out. The algorithm efficiently approximates the posterior distribution through variational inference. We demonstrate the ability of the algorithm in avoiding overfitting experimentally.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
With the starting state \(s_0 \sim d(\cdot )\) sampled from some starting distribution d.
 
2
The squared norm \(\Vert \varvec{u}\Vert ^2_D = \varvec{u}^{\top }D\varvec{u}\) is weighted by the non-negative diagonal matrix \(D \in {\mathbb {R}}^{|{\mathcal {S}}|\times |{\mathcal {S}}|}\) with elements d(s) on its diagonal.
 
3
Based on LASSO regression, which uses \(\ell _1\)-regularization.
 
4
P-matrix is a squared matrix with all of its principal minors positive (superset of the class of positive definite matrices).
 
Literatur
1.
Zurück zum Zitat Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)CrossRefMATH Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)CrossRefMATH
2.
Zurück zum Zitat Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)MATH
3.
Zurück zum Zitat Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: Uncertainty in Artificial Intelligence, pp. 46–53 (2000) Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: Uncertainty in Artificial Intelligence, pp. 46–53 (2000)
4.
Zurück zum Zitat Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. CoRR (2016) Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. CoRR (2016)
5.
Zurück zum Zitat Boyan, J.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2), 233–246 (2002)CrossRefMATH Boyan, J.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2), 233–246 (2002)CrossRefMATH
6.
Zurück zum Zitat Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22(1), 33–57 (1996)MATH Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22(1), 33–57 (1996)MATH
7.
Zurück zum Zitat Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: a survey and comparison. J. Mach. Learn. Res. 15, 809–883 (2014)MathSciNetMATH Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: a survey and comparison. J. Mach. Learn. Res. 15, 809–883 (2014)MathSciNetMATH
9.
Zurück zum Zitat Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian process. In: International Conference on Machine Learning, pp. 201–208 (2005) Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian process. In: International Conference on Machine Learning, pp. 201–208 (2005)
10.
Zurück zum Zitat Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. Adv. Neural Inf. Process. Syst. 21, 441–448 (2008)MATH Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. Adv. Neural Inf. Process. Syst. 21, 441–448 (2008)MATH
12.
Zurück zum Zitat Geist, M., Scherrer, B., Lazaric, A., Ghavamzadeh, M.: A Dantzig selector approach to temporal difference learning. In: International Conference on Machine Learning, pp. 1399–1406 (2012) Geist, M., Scherrer, B., Lazaric, A., Ghavamzadeh, M.: A Dantzig selector approach to temporal difference learning. In: International Conference on Machine Learning, pp. 1399–1406 (2012)
13.
Zurück zum Zitat Geramifard, A., Bowling, M., Sutton, R.S.: Incremental least-square temporal difference learning. In: The Twenty-first National Conference on Artificial Intelligence (AAAI), pp. 356–361 (2006) Geramifard, A., Bowling, M., Sutton, R.S.: Incremental least-square temporal difference learning. In: The Twenty-first National Conference on Artificial Intelligence (AAAI), pp. 356–361 (2006)
14.
Zurück zum Zitat Ghavamzadeh, M., Lazaric, A., Munos, R., Hoffman, M.W.: Finite-sample analysis of Lasso-TD. In: International Conference on Machine Learning, pp. 1177–1184 (2011) Ghavamzadeh, M., Lazaric, A., Munos, R., Hoffman, M.W.: Finite-sample analysis of Lasso-TD. In: International Conference on Machine Learning, pp. 1177–1184 (2011)
16.
17.
Zurück zum Zitat Johns, J., Painter-wakefield, C., Parr, R.: Linear complementarity for regularized policy evaluation and improvement. Adv. Neural Inf. Process. Syst. 23, 1009–1017 (2010) Johns, J., Painter-wakefield, C., Parr, R.: Linear complementarity for regularized policy evaluation and improvement. Adv. Neural Inf. Process. Syst. 23, 1009–1017 (2010)
18.
Zurück zum Zitat Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)CrossRefMATH Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)CrossRefMATH
19.
Zurück zum Zitat Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: International Conference on Machine Learning, pp. 521–528 (2009) Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: International Conference on Machine Learning, pp. 521–528 (2009)
20.
Zurück zum Zitat Lagoudakis, M., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)MathSciNetMATH Lagoudakis, M., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)MathSciNetMATH
21.
Zurück zum Zitat Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of LSTD. In: International Conference on Machine Learning, pp. 615–622 (2010) Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of LSTD. In: International Conference on Machine Learning, pp. 615–622 (2010)
22.
Zurück zum Zitat Liu, B., Zhang, L., Liu, J.: Dantzig selector with an approximately optimal denoising matrix and its application in sparse reinforcement learning. In: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI (2016) Liu, B., Zhang, L., Liu, J.: Dantzig selector with an approximately optimal denoising matrix and its application in sparse reinforcement learning. In: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI (2016)
23.
Zurück zum Zitat Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance in value function estimation. In: International Conference on Machine Learning (2004) Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance in value function estimation. In: International Conference on Machine Learning (2004)
24.
Zurück zum Zitat Nedić, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. 13(1), 79–110 (2003)MathSciNetMATH Nedić, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. 13(1), 79–110 (2003)MathSciNetMATH
25.
Zurück zum Zitat Painter-Wakefield, C., Parr, R.: Greedy algorithms for sparse reinforcement learning. In: International Conference on Machine Learning (2012) Painter-Wakefield, C., Parr, R.: Greedy algorithms for sparse reinforcement learning. In: International Conference on Machine Learning (2012)
26.
Zurück zum Zitat Parisi, G.: Statistical field theory. In: Frontiers in Physics. Addison-Wesley, Boston (1988) Parisi, G.: Statistical field theory. In: Frontiers in Physics. Addison-Wesley, Boston (1988)
27.
Zurück zum Zitat Pires, B.A.: Statistical analysis of l1-penalized linear estimation with applications (2011) Pires, B.A.: Statistical analysis of l1-penalized linear estimation with applications (2011)
28.
Zurück zum Zitat Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New Jersey (2005)MATH Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New Jersey (2005)MATH
29.
Zurück zum Zitat Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
30.
Zurück zum Zitat Sutton, R., Maei, H., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: International Conference on Machine Learning, pp. 993–1000 (2009) Sutton, R., Maei, H., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: International Conference on Machine Learning, pp. 993–1000 (2009)
31.
Zurück zum Zitat Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)MathSciNetMATH Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)MathSciNetMATH
32.
Zurück zum Zitat Tziortziotis, N.: Machine Learning for Intelligent Agents. Ph.D. thesis, Department of Computer Science and Engineering, University of Ioannina, Greece (2015) Tziortziotis, N.: Machine Learning for Intelligent Agents. Ph.D. thesis, Department of Computer Science and Engineering, University of Ioannina, Greece (2015)
Metadaten
Titel
Bayesian Inference for Least Squares Temporal Difference Regularization
verfasst von
Nikolaos Tziortziotis
Christos Dimitrakakis
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71246-8_8