nach oben

Neural Processing Letters

Erschienen in:

01.02.2015

Neural Network Ensembles in Reinforcement Learning

verfasst von: Stefan Faußer, Friedhelm Schwenker

Erschienen in: Neural Processing Letters | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The integration of function approximation methods into reinforcement learning models allows for learning state- and state-action values in large state spaces. Model-free methods, like temporal-difference or SARSA, yield good results for problems where the Markov property holds. However, methods based on a temporal-difference are known to be unstable estimators of the value functions, when used with function approximation. Such unstable behavior depends on the Markov chain, the discounting value and the chosen function approximator. In this paper, we propose a meta-algorithm to learn state- or state-action values in a neural network ensemble, formed by a committee of multiple agents. The agents learn from joint decisions. It is shown that the committee benefits from the diversity on the estimation of the values. We empirically evaluate our algorithm on a generalized maze problem and on SZ-Tetris. The empirical evaluations confirm our analytical results.

Vorheriger Artikel Face Recognition Via Weighted Two Phase Test Sample Sparse Representation

Nächster Artikel Finite-Time Function Projective Synchronization in Complex Multi-links Networks with Time-Varying Delay

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Baird LCI (1995) Residual algorithms: reinforcement learning with function approximation. In: Prieditis A, Russell SJ (eds) Proceedings of the 12th international conference on machine learning (ICML’95). Morgan Kaufmann, p 30–37

Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH

Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATH

Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATHMathSciNet

Burgiel H (1997) How to lose at tetris. Math Gazette 81(491):194–200CrossRef

Dietrich C, Palm G, Riede K, Schwenker F (2004) Classification of bioacoustic time series based on the combination of global and local decisions. Pattern Recognit 37(12):2293–2305CrossRef

Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: Sansone C, Kittler J, Roli F (eds) Proceedings of the 10th international workshop on multiple classifier systems (MCS 2011). Lecture notes in computer science, vol 6713. Springer, p 56–65

Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698CrossRef

Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, p 401–406

10.

Li L (2008) A worst-case comparison between temporal difference and residual gradient with linear function approximation. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the 25th international conference on machine learning (ICML’08). ACM, ACM International Conference Proceeding Series, vol 307. pp 560–567.

11.

Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In: Fürnkranz J, Joachims T (eds) Machine learning (ICML’10). Omnipress, p 959–966.

12.

Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis-function networks. Neural Netw 14(4–5):439–458CrossRef

13.

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

14.

Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk AP, Bottou L, Littman ML (eds) Proceedings of the 26th international conference on machine learning (ICML’09), vol 382. ACM, ACM International Conference Proceeding Series, p 125.

15.

Szita I, Szepesvári C (2010) SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In: ICML 2010 workshop on machine learning and games.

16.

Tsitsiklis J, Roy BV (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans on Autom Control 42(5):674–690CrossRefMATH

17.

Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern Part B 38(4):930–936CrossRef

18.

Zhou ZH (2009) When semi-supervised learning meets ensemble learning. In: Benediktsson JA, Kittler J, Roli F (eds) MCS. Lecture Notes in Computer Science, vol 5519. Springer, p 529–538.

19.

Zhou ZH (2011) Unlabeled data and multiple views. In: Schwenker F, Trentin E (eds) PSL. Lecture Notes in Computer Science, vol 7081. Springer, p 1–7.

Titel: Neural Network Ensembles in Reinforcement Learning
verfasst von: Stefan Faußer
Friedhelm Schwenker
Publikationsdatum: 01.02.2015
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 1/2015
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-013-9334-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2015

Adaptive Constraint Propagation for Semi-Supervised Kernel Matrix Learning

Finite-Time Function Projective Synchronization in Complex Multi-links Networks with Time-Varying Delay

Effective Parameter Tuning of SVMs Using Radius/Margin Bound Through Data Envelopment Analysis

Face Recognition Via Weighted Two Phase Test Sample Sparse Representation

Robust Stability of Markovian Jump Stochastic Neural Networks with Time Delays in the Leakage Terms

Exploiting Chaos in Learning System Identification for Nonlinear State Space Models

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.