Skip to main content
Erschienen in: Neural Processing Letters 1/2015

01.02.2015

Neural Network Ensembles in Reinforcement Learning

verfasst von: Stefan Faußer, Friedhelm Schwenker

Erschienen in: Neural Processing Letters | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The integration of function approximation methods into reinforcement learning models allows for learning state- and state-action values in large state spaces. Model-free methods, like temporal-difference or SARSA, yield good results for problems where the Markov property holds. However, methods based on a temporal-difference are known to be unstable estimators of the value functions, when used with function approximation. Such unstable behavior depends on the Markov chain, the discounting value and the chosen function approximator. In this paper, we propose a meta-algorithm to learn state- or state-action values in a neural network ensemble, formed by a committee of multiple agents. The agents learn from joint decisions. It is shown that the committee benefits from the diversity on the estimation of the values. We empirically evaluate our algorithm on a generalized maze problem and on SZ-Tetris. The empirical evaluations confirm our analytical results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baird LCI (1995) Residual algorithms: reinforcement learning with function approximation. In: Prieditis A, Russell SJ (eds) Proceedings of the 12th international conference on machine learning (ICML’95). Morgan Kaufmann, p 30–37 Baird LCI (1995) Residual algorithms: reinforcement learning with function approximation. In: Prieditis A, Russell SJ (eds) Proceedings of the 12th international conference on machine learning (ICML’95). Morgan Kaufmann, p 30–37
2.
Zurück zum Zitat Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH
3.
Zurück zum Zitat Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATH Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATH
5.
Zurück zum Zitat Burgiel H (1997) How to lose at tetris. Math Gazette 81(491):194–200CrossRef Burgiel H (1997) How to lose at tetris. Math Gazette 81(491):194–200CrossRef
6.
Zurück zum Zitat Dietrich C, Palm G, Riede K, Schwenker F (2004) Classification of bioacoustic time series based on the combination of global and local decisions. Pattern Recognit 37(12):2293–2305CrossRef Dietrich C, Palm G, Riede K, Schwenker F (2004) Classification of bioacoustic time series based on the combination of global and local decisions. Pattern Recognit 37(12):2293–2305CrossRef
7.
Zurück zum Zitat Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: Sansone C, Kittler J, Roli F (eds) Proceedings of the 10th international workshop on multiple classifier systems (MCS 2011). Lecture notes in computer science, vol 6713. Springer, p 56–65 Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: Sansone C, Kittler J, Roli F (eds) Proceedings of the 10th international workshop on multiple classifier systems (MCS 2011). Lecture notes in computer science, vol 6713. Springer, p 56–65
8.
Zurück zum Zitat Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698CrossRef Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698CrossRef
9.
Zurück zum Zitat Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, p 401–406 Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, p 401–406
10.
Zurück zum Zitat Li L (2008) A worst-case comparison between temporal difference and residual gradient with linear function approximation. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the 25th international conference on machine learning (ICML’08). ACM, ACM International Conference Proceeding Series, vol 307. pp 560–567. Li L (2008) A worst-case comparison between temporal difference and residual gradient with linear function approximation. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the 25th international conference on machine learning (ICML’08). ACM, ACM International Conference Proceeding Series, vol 307. pp 560–567.
11.
Zurück zum Zitat Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In: Fürnkranz J, Joachims T (eds) Machine learning (ICML’10). Omnipress, p 959–966. Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In: Fürnkranz J, Joachims T (eds) Machine learning (ICML’10). Omnipress, p 959–966.
12.
Zurück zum Zitat Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis-function networks. Neural Netw 14(4–5):439–458CrossRef Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis-function networks. Neural Netw 14(4–5):439–458CrossRef
13.
Zurück zum Zitat Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
14.
Zurück zum Zitat Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk AP, Bottou L, Littman ML (eds) Proceedings of the 26th international conference on machine learning (ICML’09), vol 382. ACM, ACM International Conference Proceeding Series, p 125. Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk AP, Bottou L, Littman ML (eds) Proceedings of the 26th international conference on machine learning (ICML’09), vol 382. ACM, ACM International Conference Proceeding Series, p 125.
15.
Zurück zum Zitat Szita I, Szepesvári C (2010) SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In: ICML 2010 workshop on machine learning and games. Szita I, Szepesvári C (2010) SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In: ICML 2010 workshop on machine learning and games.
16.
Zurück zum Zitat Tsitsiklis J, Roy BV (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans on Autom Control 42(5):674–690CrossRefMATH Tsitsiklis J, Roy BV (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans on Autom Control 42(5):674–690CrossRefMATH
17.
Zurück zum Zitat Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern Part B 38(4):930–936CrossRef Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern Part B 38(4):930–936CrossRef
18.
Zurück zum Zitat Zhou ZH (2009) When semi-supervised learning meets ensemble learning. In: Benediktsson JA, Kittler J, Roli F (eds) MCS. Lecture Notes in Computer Science, vol 5519. Springer, p 529–538. Zhou ZH (2009) When semi-supervised learning meets ensemble learning. In: Benediktsson JA, Kittler J, Roli F (eds) MCS. Lecture Notes in Computer Science, vol 5519. Springer, p 529–538.
19.
Zurück zum Zitat Zhou ZH (2011) Unlabeled data and multiple views. In: Schwenker F, Trentin E (eds) PSL. Lecture Notes in Computer Science, vol 7081. Springer, p 1–7. Zhou ZH (2011) Unlabeled data and multiple views. In: Schwenker F, Trentin E (eds) PSL. Lecture Notes in Computer Science, vol 7081. Springer, p 1–7.
Metadaten
Titel
Neural Network Ensembles in Reinforcement Learning
verfasst von
Stefan Faußer
Friedhelm Schwenker
Publikationsdatum
01.02.2015
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 1/2015
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-013-9334-5

Weitere Artikel der Ausgabe 1/2015

Neural Processing Letters 1/2015 Zur Ausgabe

Neuer Inhalt