2008 | OriginalPaper | Buchkapitel
Proposal of Exploitation-Oriented Learning PS-r#
verfasst von : Kazuteru Miyazaki, Shigenobu Kobayashi
Erschienen in: Intelligent Data Engineering and Automated Learning – IDEAL 2008
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Exploitation-oriented Learning
(XoL) is a novel approach to goal-directed learning from interaction. Though
reinforcement learning
is much more focus on the learning and can gurantee the optimality in
Markov Decision Processes
(MDPs) environments, XoL aims to learn
a rational policy
, whose expected reward per an action is larger than zero, very quickly. We know PS-r* that is one of the XoL methods. It can learn
an useful rational policy
that is not inferior to a random walk in
Partially Observed Markov Decision Processes
(POMDPs) environments where the number of types of a reward is one. However, PS-r* requires
O
(
MN
2
) memories where
N
and
M
are the numbers of types of a sensory input and an action.In this paper, we propose PS-r
#
that can learn an useful rational policy in the POMDPs environments by
O
(
MN
) memories. We confirm the effectiveness of PS-r
#
in numerical examples.