2008 | OriginalPaper | Buchkapitel
Online Multiagent Learning against Memory Bounded Adversaries
verfasst von : Doran Chakraborty, Peter Stone
Erschienen in: Machine Learning and Knowledge Discovery in Databases
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a
fixed set
of known targeted strategies. This paper introduces an algorithm called
L
earn
o
r
E
xploit for
A
dversary
I
nduced
M
arkov
D
ecision
P
rocess (
LoE-AIM
) that targets optimality against any learning opponent that can be treated as a memory bounded adversary.
LoE-AIM
makes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories.
LoE-AIM
either explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play. We further extend
LoE-AIM
to account for online repeated interactions against the same adversary with plays against other adversaries interleaved in between.
LoE-AIM-repeated
stores learned knowledge about an adversary, identifies the adversary in case of repeated interaction, and reuses the stored knowledge about the behavior of the adversary to enhance learning in the current epoch of play. LoE-AIM and LoE-AIM-repeated are fully implemented, with results demonstrating their superiority over other existing MAL algorithms.