nach oben

Soft Computing

Erschienen in:

01.06.2011 | Focus

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

verfasst von: Xin Xu, Chunming Liu, Dewen Hu

Erschienen in: Soft Computing | Ausgabe 6/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.

Vorheriger Artikel A hybrid neural network cybernetic system for quantifying cross-market dynamics and business forecasting

Nächster Artikel Learning and backtracking in non-preemptive scheduling of tasks under timing constraints

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48MathSciNetCrossRef

Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13(5):835–846

Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350MathSciNetMATH

Bertsekas DP, Tsitsiklis JN (1996) Neurodynamic programming. Athena Scientific, Belmont

Boyan J (2002) Technical update: least-squares temporal difference learning. Mach Learn 49(2–3):233–246MATHCrossRef

Crites RH, Barto AG (1998) Elevator group control using multiple reinforcement learning agents. Mach Learn 33(2–3):235–262MATHCrossRef

Dayan P (1992) The convergence of TD(λ) for general λ. Mach Learn 8:341–362MATH

Dayan P, Sejnowski TJ (1994) TD(λ) converges with probability 1. Mach Learn 14:295–301

Engel Y, Mannor S, Meir R (2004) The kernel recursive least-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285MathSciNetCrossRef

Hasselt HV, Wiering M (2007) Reinforcement learning in continuous action spaces. In: 2007 IEEE symposium on approximate dynamic programming and reinforcement learning, pp 272–279

Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149MathSciNetCrossRef

Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. In: Advances in neural information processing systems. MIT Press, Cambridge

Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. J Mach Learn Res 8:2169–2231MathSciNet

Millan JDR, Posenato D, Dedieu E (2002) Continuous-action q-learning. Mach Learn 49(2/3):247–265MATHCrossRef

Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007CrossRef

Rasmussen CE, Kuss M (2004) Gaussian processes in reinforcement learning. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems, vol 16. MIT Press, Cambridge, pp 751–759

Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

Singh SP, Jaakkola T, Littman ML, Szepesvari C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38:287–308MATHCrossRef

Sutton R (1988) Learning to predict by the method of temporal differences. Mach Learn 3(1):9–44

Sutton R (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 1038–1044

Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

Tesauro G (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6:215–219CrossRef

Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16:185–202MATH

Tsitsiklis JN, Roy BV (1997) An analysis of temporal difference learning with function approximation. IEEE Trans Autom Control 42(5):674–690MATHCrossRef

Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292MATH

Whiteson S, Stone P (2006) Evolutionary function approximation for reinforcement learning. J Mach Learn Res 7:877–917MathSciNet

Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256MATH

Xu X, Hu DW, Lu XC (2007) Kernel-based least-squares policy iteration for reinforcement learning. IEEE Trans Neural Netw 18(4):973–997CrossRef

Zhang W, Dietterich T (1995) A reinforcement learning approach to job-shop scheduling. In: Proceedings of the fourteenth international joint conference on artificial intelligence. Morgan Kaufmann, pp 1114–1120

Titel: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
verfasst von: Xin Xu
Chunming Liu
Dewen Hu
Publikationsdatum: 01.06.2011
Verlag: Springer-Verlag
Erschienen in: Soft Computing / Ausgabe 6/2011
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-010-0581-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2011

Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets

Fuzzy decision tree based on fuzzy-rough technique

Detecting anomalies from high-dimensional wireless network data streams: a case study

Automatic localization and annotation of facial features using machine learning techniques

On intuitionistic fuzzy topologies based on intuitionistic fuzzy reflexive and transitive relations

Granular computing based on fuzzy similarity relations