Skip to main content
Top

2003 | OriginalPaper | Chapter

On-Line Learning with Imperfect Monitoring

Authors : Shie Mannor, Nahum Shimkin

Published in: Learning Theory and Kernel Machines

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation Bayes Envelope (POBE) as the best reward against the worst-case stationary strategy of the opponent that agrees with past observations. Our goal is to have the (unobserved) average reward above the POBE. For the case where the observations (but not necessarily the rewards) depend on the opponent play alone, an algorithm for attaining the POBE is derived. This algorithm is based on an application of approachability theory combined with a worst-case view over the unobserved rewards. We also suggest a simplified solution concept for general signaling structure. This concept may fall short of the POBE.

Metadata
Title
On-Line Learning with Imperfect Monitoring
Authors
Shie Mannor
Nahum Shimkin
Copyright Year
2003
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-45167-9_40

Premium Partner