Learning in groups of traffic signals
Introduction
Given the increasing demand for mobility in our society and the fact that it is not always possible to provide additional capacity, a more efficient use of the available transportation infrastructure is necessary. This issue relates closely to artificial intelligence (AI) and multiagent systems. AI and multiagent techniques have been used in many stages of the process of traffic management. In fact many actors in a transportation system fit very well the concept of an autonomous agent: the driver, the pedestrian, and the traffic light.
This paper focus on control strategies as defined in the control loop described by Papageorgiou (2003) (the physical network, its model, the model of demand and disturbances; control devices; surveillance devices; and the control strategy). Specifically, control strategies based on learning are emphasized. In order to discuss this relationship, the next section briefly reviews basic concepts on reinforcement learning and summarizes the current discussion on approaches for multiagent reinforcement learning (MARL).
Techniques and methods from control theory are applied in traffic control in a fine-grained way. This leads to the problem that those techniques can be applied only to single intersections or a small number of them. For any arbitrarily big network, the real-time solution of the control loop faces a number of apparently insurmountable difficulties (Papageorgiou, 2003). Similarly, the problems posed by many actors in a multiagent reinforcement learning scenario are inherently more complex than in single-agent reinforcement learning (SARL). These problems arise mainly due to the fact that no agent lives in a vacuum (Littman, 1994). While one agent is trying to model the environment (other agents included), other agents are doing the same (or at least reacting). This results in an environment which is inherently non-stationary. Thus, the notions of convergence as previously known cannot any longer be guaranteed.
One popular formalism for MARL is the one based on stochastic games (SG), which is investigated by game theory and is an extension of the basic Markov decision processes (MDP). However, the aforementioned increase in complexity has some consequences for this formalization. First, the approaches proposed for the case of general sum SG require several assumptions regarding the game structure (agents’ knowledge, self-play, etc.). Also, it is rarely stated what agents must know in order to use a particular approach. Those assumptions restrain the convergence results to common payoff (team) games and other special cases such as zero-sum games. Moreover, the focus is normally put on two-agent games, and not infrequently, two-action stage games. Otherwise, an oracle is needed if one wants to deal with the problem of equilibrium selection when two or more equilibria exist. This can be implemented in several ways (e.g. using the concept of focal point) but it would probably require extra communication. Second, despite recent results on formalizing multiagent reinforcement learning using SG, these cannot be used for systems of many agents, if any flavor of joint-action is explicitly considered, unless the obligation of visiting all pairs of state-action is relaxed, which has impacts on the convergence. The problem with using a high number of agents happens mainly due to the exponential increase in the space of joint actions.
Up to now, these issues have prevented the use of SG-based MARL in real-world problems, unless simplifications are made, such as letting each agent learn individually through single-agent based approaches. It is well known that this approach is not effective since agents converge to sub-optimal states. Therefore, partitioning the problem into several, smaller multiagent systems may be a good compromise between complete distribution and complete centralization. This research line has also been tried by others: Boutilier et al. (1999) propose decomposition of actions, rewards and other components of an MDP. In Kok and Vlassis (2006) a sparse cooperative reinforcement learning algorithm is used in which local reward functions only depend on a subset of all possible states and variables.
In traffic networks, where each agent represents a traffic signal controlled junction, the problems mentioned above may prevent the use of MARL in control optimization since this is a typical many-agent system in which joint actions do matter. It is not clear whether one must address all agents of the network at once, e.g. in a single-agent learning effort. This seems not to be the case and practice in traffic engineering, even because the control using standard measures (see e.g. Hunt et al., 1981, Diakaki et al., 2003) is computationally demanding as well. Rather, there is a trend in traffic engineering towards decentralization of control, with the network being divided into smaller groups. As shown later in this paper, even small groups pose a challenge for MARL as the number of joint actions and states grows exponentially.
Thus alternative approaches to cope with this increase are necessary. Here we propose the use of the standard individual agent-based reinforcement learning approach, with the addition of a supervisor that observes these individuals for a time period, collects joint actions and records their rewards in a kind of case base, where a case is the set of states agents find themselves in, plus the rewards they get; the solution of the case is the joint action. After that period is over, the supervisor observes the state in which the agents it controls are in, looks for the best reward it has observed so far when agents were in that same joint state, and recommends a joint action. This way, the present paper addresses a many-agent system in which these are divided into groups that are then supervised by further agents that have an overview of the group's performance.
The paper is organized as follows. In Section 2 some approaches to distributed traffic signal control, which are based on AI and multiagent systems are discussed. Section 3 focuses on multiagent learning. In 4 Supervised learning based approach, 5 Experimental validation our approach is introduced; experiments and results are discussed. A conclusion and future directions are presented in Section 6.
Section snippets
AI and MAS based approaches to distributed traffic signal control
It is beyond the scope of this paper to review the extensive literature on traffic signals control. The reader is referred to Bazzan (2009) for a survey on opportunities for learning and multiagent systems in traffic control. We notice however that classical approaches, e.g. based on optimization via hill climbing (e.g. Robertson, 1969), signal synchronization and actuated control (e.g. Hunt et al., 1981) can be combined within the hierarchical approach proposed here in Section 4 in the sense
Single agent reinforcement learning
Usually, reinforcement learning (RL) problems are modeled as Markov decision processes (MDPs). These are described by a set of states, , a set of actions, , a reward function and a probabilistic state transition function . An experience tuple denotes the fact that the agent was in state , performed action a and ended up in with reward . The goal of an MDP is to calculate the optimal policy , which is a mapping from states to actions such that the
Overview
The supervised learning control strategy that is proposed here, when applied to traffic controllers, is composed of two kinds of agents. Low-level, local agents control, each, a junction. Besides, hierarchically superior agents (supervisors or tutors) are in charge of controlling groups containing a small number of low-level agents. Control here is used in a relaxed way. Supervisors in fact must be seen as facilitators or tutors that will observe the traffic situation from a broader perspective
Simulation infrastructure: ITSUMO
The simulations described in this section were performed using ITSUMO (Silva et al., 2006b) which is a microscopic traffic simulator based on the Nagel–Schreckenberg cellular automaton (CA) model (Nagel and Schreckenberg, 1992). In short, each road is divided into cells with fixed length. This allows the representation of a road as an array where vehicles occupy discrete positions. Each vehicle travels with a speed based on the number of cells it may advance without hitting another vehicle. The
Conclusion
Multi-agent reinforcement learning is inherently more complex than single agent reinforcement learning because while one agent is trying to model the environment, other agents are doing the same. This results in an environment that is non-stationary. MDP-based formalisms extended to multiagent environments (MMDP) is an efficient solution only in case of few agents, few states, and few actions. For a large number of agents, alternative solutions are necessary. In this paper the problem is
Acknowledgements
Ana Bazzan is partially supported by CNPq and by the Alexander v. Humboldt Foundation; Denise de Oliveira was partially supported by CAPES and CNPq.
References (25)
- et al.
If multi-agent learning is the answer, what is the question
Artif. Intell.
(2007) - et al.
History-based traffic control
A distributed approach for coordination of traffic signal agents
Autonomous Agents and Multiagent Systems
(2005)Opportunities for multiagent systems and multiagent reinforcement learning in traffic control
Autonomous Agents and Multiagent Systems
(2009)- et al.
Decision-theoretic planning: structural assumptions and computational leverage
J. Artif. Intell. Res.
(1999) - Camponogara, E., Kraus, W. Jr., Distributed learning agents in urban traffic control. In: Moura-Pires, F., Abreu, S....
- Claus, C., Boutilier, C., 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In:...
- Diakaki, C., Dinopoulou, V., Aboudolas, K., Papageorgiou, M., Ben-Shabat, E., Seider, E., Leibov, A., 2003. Extensions...
- et al.
Multiagent traffic management: a reservation-based intersection control mechanism
- et al.
A multiagent system for optimizing urban traffic
Cited by (76)
Reinforcement learning in urban network traffic signal control: A systematic literature review
2022, Expert Systems with ApplicationsInformation upwards, recommendation downwards: Reinforcement learning with hierarchy for traffic signal control
2022, Procedia Computer ScienceHierarchical multi-agent control of traffic lights based on collective learning
2018, Engineering Applications of Artificial IntelligenceCitation Excerpt :According to the design level of detail, the component of a signal controller could refer to: Among the agent designs above, several recent approaches model traffic light controller by intersection agents that determine signal plans (e.g., Bazzan et al., 2010), or green duration of each signal phase (e.g., Balaji et al., 2010; Abdoos et al., 2014), or a selection of phase with a defined time interval (e.g., Arel et al., 2010; El-Tantawy et al., 2013). The phase sequence of each intersection agent is predetermined in these studies.
Collaborative Traffic Signal Automation Using Deep Q-Learning
2023, IEEE AccessDischarge control policy based on density and speed for deep Q-learning adaptive traffic signal
2023, Transportmetrica B
- 1
Author partially supported by CNPq.