Skip to main content

Über dieses Buch

This volume constitutes the thoroughly refereed post-conference proceedings of the International Workshop on Adaptive and Learning Agents, ALA 2011, held at the 10th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2011, in Taipei, Taiwan, in May 2011. The 7 revised full papers presented together with 1 invited talk were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on single and multi-agent reinforcement learning, supervised multiagent learning, adaptation and learning in dynamic environments, learning trust and reputation, minority games and agent coordination.



Invited Contribution

Co-learning Segmentation in Marketplaces

We present the problem of automatic co-niching in which potential suppliers of some product or service need to determine which offers to make to the marketplace at the same time as potential buyers need to determine which offers (if any) to purchase. Because both groups typically face incomplete or uncertain information needed for these decisions, participants in repeated market interactions engage in a learning process, making tentative decisions and adjusting these in the light of experiences they gain. Perhaps surprisingly, real markets typically then exhibit a form of parallel clustering: buyers cluster into segments of similar preferences and buyers into segments of similar offers. For computer scientists, the interesting question is whether such co-niching behaviours can be automated. We report on the first simulation experiments showing automated co-niching is possible using reinforcement learning in a multi-attribute product model. The work is of relevance to designers of online marketplaces, of computational resource allocation systems, and of automated software trading agents.
Edward Robinson, Peter McBurney, Xin Yao

Workshop Contributions

Reinforcement Learning Transfer via Common Subspaces

Agents in reinforcement learning tasks may learn slowly in large or complex tasks — transfer learning is one technique to speed up learning by providing an informative prior. How to best enable transfer between tasks with different state representations and/or actions is currently an open question. This paper introduces the concept of a common task subspace, which is used to autonomously learn how two tasks are related. Experiments in two different nonlinear domains empirically show that a learned inter-state mapping can successfully be used by fitted value iteration, to (1) improving the performance of a policy learned with a fixed number of samples, and (2) reducing the time required to converge to a (near-) optimal policy with unlimited samples.
Haitham Bou Ammar, Matthew E. Taylor

A Convergent Multiagent Reinforcement Learning Approach for a Subclass of Cooperative Stochastic Games

We present a distributed Q-Learning approach for independently learning agents in a subclass of cooperative stochastic games called cooperative sequential stage games. In this subclass, several stage games are played one after the other. We also propose a transformation function for that class and prove that transformed and original games have the same set of optimal joint strategies. Under the condition that the played game is obtained through transformation, it will be proven that our approach converges to an optimal joint strategy for the last stage game of the transformed game and thus also for the original game. In addition, the ability to converge to ε-optimal joint strategies for each of the stage games is shown. The environment in our approach does not need to present a state signal to the agents. Instead, by the use of the aforementioned transformation function, the agents gain knowledge about state changes from an engineered reward. This allows agents to omit storing strategies for each single state, but to use only one strategy that is adapted to the currently played stage game. Thus, the algorithm has very low space requirements and its complexity is comparable to single agent Q-Learning. Besides theoretical analyses, we also underline the convergence properties with some experiments.
Thomas Kemmerich, Hans Kleine Büning

Multi-agent Reinforcement Learning for Simulating Pedestrian Navigation

In this paper we introduce a Multi-agent system that uses Reinforcement Learning (RL) techniques to learn local navigational behaviors to simulate virtual pedestrian groups. The aim of the paper is to study empirically the validity of RL to learn agent-based navigation controllers and their transfer capabilities when they are used in simulation environments with a higher number of agents than in the learned scenario. Two RL algorithms which use Vector Quantization (VQ) as the generalization method for the space state are presented. Both strategies are focused on obtaining a good vector quantizier that generalizes adequately the state space of the agents. We empirically state the convergence of both methods in our navigational Multi-agent learning domain. Besides, we use validation tools of pedestrian models to analyze the simulation results in the context of pedestrian dynamics. The simulations carried out, scaling up the number of agents in our environment (a closed room with a door through which the agents have to leave), have revealed that the basic characteristics of pedestrian movements have been learned.
Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández

Leveraging Domain Knowledge to Learn Normative Behavior: A Bayesian Approach

This paper addresses the problem of norm adaptation using Bayesian reinforcement learning. We are concerned with the effectiveness of adding prior domain knowledge when facing environments with different settings as well as with the speed of adapting to a new environment. Individuals develop their normative framework via interaction with their surrounding environment (including other individuals). An agent acquires the domain-dependent knowledge in a certain environment and later reuses them in different settings. This work is novel in that it represents normative behaviors as probabilities over belief sets. We propose a two-level learning framework to learn the values of normative actions and set them as prior knowledge, when agents are confident about them, to feed them back to their belief sets. Developing a prior belief set about a certain domain can improve an agent’s learning process to adjust its norms to the new environment’s dynamics. Our evaluation shows that a normative agent, having been trained in an initial environment, is able to adjust its beliefs about the dynamics and behavioral norms in a new environment. Therefore, it converges to the optimal policy more quickly, especially in the early stages of learning.
Hadi Hosseini, Mihaela Ulieru

Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics

We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide bisimulation metric bounds for general feature maps. This result suggests a new way of generating features, with strong theoretical guarantees on the quality of the obtained approximation. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used in the basis function construction process.
Gheorghe Comanici, Doina Precup

Heterogeneous Populations of Learning Agents in the Minority Game

We study how a group of adaptive agents can coordinate when competing for limited resources. A popular game theoretic model for this is the Minority Game. In this article we show that the coordination among learning agents can improve when agents use different learning parameters or even evolve their learning parameters. Better coordination leads to less resources being wasted and agents achieving higher individual performance. We also show that learning algorithms which achieve good results when all agents use that same algorithm, may be outcompeted when directly confronting other learning algorithms in the Minority Game.
David Catteeuw, Bernard Manderick

Solving Sparse Delayed Coordination Problems in Multi-Agent Reinforcement Learning

One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.
Yann-Michaël De Hauwere, Peter Vrancx, Ann Nowé


Weitere Informationen

Premium Partner