2009 | OriginalPaper | Buchkapitel
Feedback of Delayed Rewards in XCS for Environments with Aliasing States
verfasst von : Kuang-Yuan Chen, Peter A. Lindsay
Erschienen in: Artificial Life: Borrowing from Biology
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Wilson [13] showed how delayed reward feedback can be used to solve many multi-step problems for the widely used XCS learning classifier system. However, Wilson’s method – based on back-propagation with discounting from Q-learning – runs into difficulties in environments with aliasing states, since the local reward function often does not converge. This paper describes a different approach to reward feedback, in which a layered reward scheme for XCS classifiers is learnt during training. We show that, with a relatively minor modification to XCS feedback, the approach not only solves problems such as Woods1 but can also solve aliasing states problems such as Littman57, MiyazakiA and MazeB.