2012 | OriginalPaper | Buchkapitel
Solving Sparse Delayed Coordination Problems in Multi-Agent Reinforcement Learning
verfasst von : Yann-Michaël De Hauwere, Peter Vrancx, Ann Nowé
Erschienen in: Adaptive and Learning Agents
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem.
Future Coordinating Q-learning
(FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.