2008 | OriginalPaper | Chapter
Proposal of Exploitation-Oriented Learning PS-r#
Authors : Kazuteru Miyazaki, Shigenobu Kobayashi
Published in: Intelligent Data Engineering and Automated Learning – IDEAL 2008
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Exploitation-oriented Learning
(XoL) is a novel approach to goal-directed learning from interaction. Though
reinforcement learning
is much more focus on the learning and can gurantee the optimality in
Markov Decision Processes
(MDPs) environments, XoL aims to learn
a rational policy
, whose expected reward per an action is larger than zero, very quickly. We know PS-r* that is one of the XoL methods. It can learn
an useful rational policy
that is not inferior to a random walk in
Partially Observed Markov Decision Processes
(POMDPs) environments where the number of types of a reward is one. However, PS-r* requires
O
(
MN
2
) memories where
N
and
M
are the numbers of types of a sensory input and an action.In this paper, we propose PS-r
#
that can learn an useful rational policy in the POMDPs environments by
O
(
MN
) memories. We confirm the effectiveness of PS-r
#
in numerical examples.