人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
論文
報酬の分散を推定するTDアルゴリズムとMean-Variance強化学習法の提案
佐藤 誠木村 元小林 重信
著者情報
ジャーナル フリー

2001 年 16 巻 3 号 p. 353-362

詳細
抄録

Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.

著者関連情報
© 2001 JSAI (The Japanese Society for Artificial Intelligence)
前の記事
feedback
Top