Learning in groups of traffic signals

https://doi.org/10.1016/j.engappai.2009.11.009Get rights and content

Abstract

Computer science in general, and artificial intelligence and multiagent systems in particular, are part of an effort to build intelligent transportation systems. An efficient use of the existing infrastructure relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. In particular, traffic signal controllers located at intersections can be seen as autonomous agents. However, challenging issues are involved in this kind of modeling: the number of agents is high; in general agents must be highly adaptive; they must react to changes in the environment at individual level while also causing an unpredictable collective pattern, as they act in a highly coupled environment. Therefore, traffic signal control poses many challenges for standard techniques from multiagent systems such as learning. Despite the progress in multiagent reinforcement learning via formalisms based on stochastic games, these cannot cope with a high number of agents due to the combinatorial explosion in the number of joint actions. One possible way to reduce the complexity of the problem is to have agents organized in groups of limited size so that the number of joint actions is reduced. These groups are then coordinated by another agent, a tutor or supervisor. Thus, this paper investigates the task of multiagent reinforcement learning for control of traffic signals in two situations: agents act individually (individual learners) and agents can be “tutored”, meaning that another agent with a broader sight will recommend a joint action.

Introduction

Given the increasing demand for mobility in our society and the fact that it is not always possible to provide additional capacity, a more efficient use of the available transportation infrastructure is necessary. This issue relates closely to artificial intelligence (AI) and multiagent systems. AI and multiagent techniques have been used in many stages of the process of traffic management. In fact many actors in a transportation system fit very well the concept of an autonomous agent: the driver, the pedestrian, and the traffic light.

This paper focus on control strategies as defined in the control loop described by Papageorgiou (2003) (the physical network, its model, the model of demand and disturbances; control devices; surveillance devices; and the control strategy). Specifically, control strategies based on learning are emphasized. In order to discuss this relationship, the next section briefly reviews basic concepts on reinforcement learning and summarizes the current discussion on approaches for multiagent reinforcement learning  (MARL).

Techniques and methods from control theory are applied in traffic control in a fine-grained way. This leads to the problem that those techniques can be applied only to single intersections or a small number of them. For any arbitrarily big network, the real-time solution of the control loop faces a number of apparently insurmountable difficulties (Papageorgiou, 2003). Similarly, the problems posed by many actors in a multiagent reinforcement learning scenario are inherently more complex than in single-agent reinforcement learning (SARL). These problems arise mainly due to the fact that no agent lives in a vacuum (Littman, 1994). While one agent is trying to model the environment (other agents included), other agents are doing the same (or at least reacting). This results in an environment which is inherently non-stationary. Thus, the notions of convergence as previously known cannot any longer be guaranteed.

One popular formalism for MARL is the one based on stochastic games (SG), which is investigated by game theory and is an extension of the basic Markov decision processes (MDP). However, the aforementioned increase in complexity has some consequences for this formalization. First, the approaches proposed for the case of general sum SG require several assumptions regarding the game structure (agents’ knowledge, self-play, etc.). Also, it is rarely stated what agents must know in order to use a particular approach. Those assumptions restrain the convergence results to common payoff (team) games and other special cases such as zero-sum games. Moreover, the focus is normally put on two-agent games, and not infrequently, two-action stage games. Otherwise, an oracle is needed if one wants to deal with the problem of equilibrium selection when two or more equilibria exist. This can be implemented in several ways (e.g. using the concept of focal point) but it would probably require extra communication. Second, despite recent results on formalizing multiagent reinforcement learning using SG, these cannot be used for systems of many agents, if any flavor of joint-action is explicitly considered, unless the obligation of visiting all pairs of state-action is relaxed, which has impacts on the convergence. The problem with using a high number of agents happens mainly due to the exponential increase in the space of joint actions.

Up to now, these issues have prevented the use of SG-based MARL in real-world problems, unless simplifications are made, such as letting each agent learn individually through single-agent based approaches. It is well known that this approach is not effective since agents converge to sub-optimal states. Therefore, partitioning the problem into several, smaller multiagent systems may be a good compromise between complete distribution and complete centralization. This research line has also been tried by others: Boutilier et al. (1999) propose decomposition of actions, rewards and other components of an MDP. In Kok and Vlassis (2006) a sparse cooperative reinforcement learning algorithm is used in which local reward functions only depend on a subset of all possible states and variables.

In traffic networks, where each agent represents a traffic signal controlled junction, the problems mentioned above may prevent the use of MARL in control optimization since this is a typical many-agent system in which joint actions do matter. It is not clear whether one must address all agents of the network at once, e.g. in a single-agent learning effort. This seems not to be the case and practice in traffic engineering, even because the control using standard measures (see e.g. Hunt et al., 1981, Diakaki et al., 2003) is computationally demanding as well. Rather, there is a trend in traffic engineering towards decentralization of control, with the network being divided into smaller groups. As shown later in this paper, even small groups pose a challenge for MARL as the number of joint actions and states grows exponentially.

Thus alternative approaches to cope with this increase are necessary. Here we propose the use of the standard individual agent-based reinforcement learning approach, with the addition of a supervisor that observes these individuals for a time period, collects joint actions and records their rewards in a kind of case base, where a case is the set of states agents find themselves in, plus the rewards they get; the solution of the case is the joint action. After that period is over, the supervisor observes the state in which the agents it controls are in, looks for the best reward it has observed so far when agents were in that same joint state, and recommends a joint action. This way, the present paper addresses a many-agent system in which these are divided into groups that are then supervised by further agents that have an overview of the group's performance.

The paper is organized as follows. In Section 2 some approaches to distributed traffic signal control, which are based on AI and multiagent systems are discussed. Section 3 focuses on multiagent learning. In 4 Supervised learning based approach, 5 Experimental validation our approach is introduced; experiments and results are discussed. A conclusion and future directions are presented in Section 6.

Section snippets

AI and MAS based approaches to distributed traffic signal control

It is beyond the scope of this paper to review the extensive literature on traffic signals control. The reader is referred to Bazzan (2009) for a survey on opportunities for learning and multiagent systems in traffic control. We notice however that classical approaches, e.g. based on optimization via hill climbing (e.g. Robertson, 1969), signal synchronization and actuated control (e.g. Hunt et al., 1981) can be combined within the hierarchical approach proposed here in Section 4 in the sense

Single agent reinforcement learning

Usually, reinforcement learning (RL) problems are modeled as Markov decision processes (MDPs). These are described by a set of states, S, a set of actions, A, a reward function R(s,a)R and a probabilistic state transition function T(s,a,s)[0,1]. An experience tuple s,a,s,r denotes the fact that the agent was in state s, performed action a and ended up in s with reward r. The goal of an MDP is to calculate the optimal policy π*, which is a mapping from states to actions such that the

Overview

The supervised learning control strategy that is proposed here, when applied to traffic controllers, is composed of two kinds of agents. Low-level, local agents control, each, a junction. Besides, hierarchically superior agents (supervisors or tutors) are in charge of controlling groups containing a small number of low-level agents. Control here is used in a relaxed way. Supervisors in fact must be seen as facilitators or tutors that will observe the traffic situation from a broader perspective

Simulation infrastructure: ITSUMO

The simulations described in this section were performed using ITSUMO (Silva et al., 2006b) which is a microscopic traffic simulator based on the Nagel–Schreckenberg cellular automaton (CA) model (Nagel and Schreckenberg, 1992). In short, each road is divided into cells with fixed length. This allows the representation of a road as an array where vehicles occupy discrete positions. Each vehicle travels with a speed based on the number of cells it may advance without hitting another vehicle. The

Conclusion

Multi-agent reinforcement learning is inherently more complex than single agent reinforcement learning because while one agent is trying to model the environment, other agents are doing the same. This results in an environment that is non-stationary. MDP-based formalisms extended to multiagent environments (MMDP) is an efficient solution only in case of few agents, few states, and few actions. For a large number of agents, alternative solutions are necessary. In this paper the problem is

Acknowledgements

Ana Bazzan is partially supported by CNPq and by the Alexander v. Humboldt Foundation; Denise de Oliveira was partially supported by CAPES and CNPq.

References (25)

  • Y. Shoham et al.

    If multi-agent learning is the answer, what is the question

    Artif. Intell.

    (2007)
  • G. Balan et al.

    History-based traffic control

  • A.L.C. Bazzan

    A distributed approach for coordination of traffic signal agents

    Autonomous Agents and Multiagent Systems

    (2005)
  • A.L.C. Bazzan

    Opportunities for multiagent systems and multiagent reinforcement learning in traffic control

    Autonomous Agents and Multiagent Systems

    (2009)
  • C. Boutilier et al.

    Decision-theoretic planning: structural assumptions and computational leverage

    J. Artif. Intell. Res.

    (1999)
  • Camponogara, E., Kraus, W. Jr., Distributed learning agents in urban traffic control. In: Moura-Pires, F., Abreu, S....
  • Claus, C., Boutilier, C., 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In:...
  • Diakaki, C., Dinopoulou, V., Aboudolas, K., Papageorgiou, M., Ben-Shabat, E., Seider, E., Leibov, A., 2003. Extensions...
  • K. Dresner et al.

    Multiagent traffic management: a reservation-based intersection control mechanism

  • J. France et al.

    A multiagent system for optimizing urban traffic

  • Hu, J., Wellman, M.P., 1998. Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings...
  • Hunt, P.B., Robertson, D.I., Bretherton, R.D., Winton, R.I., 1981. SCOOT—a traffic responsive method of coordinating...
  • Cited by (76)

    • Hierarchical multi-agent control of traffic lights based on collective learning

      2018, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      According to the design level of detail, the component of a signal controller could refer to: Among the agent designs above, several recent approaches model traffic light controller by intersection agents that determine signal plans (e.g., Bazzan et al., 2010), or green duration of each signal phase (e.g., Balaji et al., 2010; Abdoos et al., 2014), or a selection of phase with a defined time interval (e.g., Arel et al., 2010; El-Tantawy et al., 2013). The phase sequence of each intersection agent is predetermined in these studies.

    View all citing articles on Scopus
    1

    Author partially supported by CNPq.

    View full text