Elsevier

Neurocomputing

Volume 233, 12 April 2017, Pages 34-42
Neurocomputing

Path planning of multi-agent systems in unknown environment with neural kernel smoothing and reinforcement learning

https://doi.org/10.1016/j.neucom.2016.08.108Get rights and content

Abstract

Path planning is a basic task of robot navigation, especially for autonomous robots. It is more complex and difficult for multi-agent systems. The popular reinforcement learning method cannot solve the path planning problem directly in unknown environment.

In this paper, the classical multi-agent reinforcement learning algorithm is modified such that it does not need the unvisited state. The neural networks and kernel smoothing techniques are applied to approximate greedy actions by estimating the unknown environment. Experimental and simulation results show that the proposed algorithms can generate paths in unknown environment for multiple agents.

Introduction

A multi-agent system includes several intelligent agents in an environment. Each agent has its independent behavior and coordinates with the others [34]. An important benefit of using the multi-agent system is that it can model the cooperation of real life situations. The multi-agent system is also used to learn new behaviors such that the performance of natural systems is predicted [39]. There are many other applications of the multi-agent systems, such as robotics team [13], distributed control [16], resource management [27], collaborative decision [3], and data mining [31].

Path planning generates a path from a starting-point to an ending-point with respect to some restrictions. It is important in some real-life problems, such as mobile robots, logistics, and game design [29]. The path planning problem can be solved by state searching. The single-agent path planning has limit states in an environment. For the multi-agent path planning, many units move simultaneously in the environment, and the dimension of the configuration space increases exponentially [22]. The path planning of the multi-agent system becomes more difficult [17].

There are several methods of the multi-agent path planning. In [16], the high-gain decentralized control law is designed to solve the consensus problem [40]. In [4], [12], the grid graph method is applied to design a state-task graph. In [37], the undirected graph is used for multi-agent path planning. In [11], the random tree algorithm is extended to update the paths. Artificial intelligence methods have learning ability and they can solve the path planning problem directly [6]. For the single agent, the genetic algorithm is used in [9], kernel smoothing is applied in [23], fuzzy logic is used in [21], and iterative learning approach is applied in [24].

The reinforcement learning (RL) is the most popular method for multi-agent system [18], [35]. It is also called multi-agent reinforcement learning (MARL) [7]. The objective of MARL is to maximize a reward function defined by the environment and agents, such that the agents can interact with the environment [18]. At each learning step, the agent senses the environment, takes an action, and transits to a new state the environment [2]. The quality of each transition is evaluated by a reward function. The agents know which action has the most reward [32]. In order to generate a good action, RL feedback is needed [10]. The Q-learning algorithm is a popular RL feedback [33]. Watkins and Dayan [38] give the Q-learning algorithm for the single-agent. Abdi et al. [1] propose the multi-agents Q-learning method. Both of them assume that the environment can be described by a set of states.

In unknown or dynamics environment, if the desired information are available, the supervised learning methods can be applied. In [23], the neural networks are used to estimate the moving obstacles. In [15], the velocity is estimated for the unknown and bounded disturbances. In [13], single-agent path planning is realized by RL and the neural networks. In [19], RL is applied to multi-agent case with fuzzy logic technique, it does not have learning mechanism. However, if the desired information are not available, the above methods do not work.

This paper takes both advantages of the unsupervised learning (kernel smoothing) [26] and the supervised learning (neural networks) [25]. We combine them with RL to solve multi-agent path planning in the unknown environment. This hybrid algorithm includes two training phases:

  • 1.

    The Win/Learn Fast-Policy Hill Climbing method (WoLF-PHC) [5] is modified. In this stage, the task model consists of the transitions and the reward functions. Each agent does not know the other actions and reward functions. The agents explore in the unknown environment and collect state–action information through the unsupervised kernel smoothing and modified WoLF-PHC.

  • 2.

    The neural networks are integrated with the RL algorithm. We use the states in Stage 1 as desired values, such that the supervised learning of the neural networks can be applied. Finally, the action (controller) is given by a greedy policy from the state–action Q-table.

In this paper, we also show that after finite iterations, Q-values converge. Our algorithm is tested by two mobile robots Khepera in the dynamic environment. The experimental results show that our algorithms are simple and effective for multi-agent path planning in unknown environment.

Section snippets

Reinforcement learning for path planning of multi-agent systems

The current state of the environment which the agent observe is defined as st and the action of the agent is defined as at. The model of the single-agent reinforcement learning is a Markov decision process. It is defined as:

Definition 1

Single-agent RL process in dynamic environment isft:St×At×St1[0,1]ρt:St×At×St1Rwhere St and St−1 are the environment states at t and t1, respectively, At is the agent actions at time t, ft is the state transition probability function from the time t1 to the time t, and

Reinforcement learning with kernel smoothing and neural networks

RL algorithm needs all states of the agents. If some states are not available, the MARL discussed above does not work. In this section, we use kernel smoothing and the neural networks to estimate the unavailable states, then send them to the MARL.

Simulation and experimental results

The objectives of the simulations are to check and validate the operation of our proposed method. We also use two mobile robots to compare our algorithms with the other techniques.

Conclusion

In this paper, we successfully overcome the difficulty of reinforcement learning for unknown environment. Combination of the kernel approximation and neural networks with the WoLF-PHC algorithm can get over the drawback of the reinforcement learning, such as slow learning speed, time consuming and impossible learning in unknown environments.

The robustness of the changes in the environment depends on the approximation and generalization of the kernel method. The simulations and the experimental

David Luviano Cruz is a Ph.D. student in the Automatic Control Department at Cinvestav-IPN. He received his undergraduate degree in Electronic Engineering, in 2007, and M.S. in automatic control, in 2010, from Cinvestav-IPN. His research interests include learning and interaction in multi-agent systems.

References (41)

  • J. Abdi et al.

    Emotional temporal difference Q-learning signals in multi-agent system cooperationreal case studies

    IET Intell. Transp. Syst.

    (2013)
  • I. Arel et al.

    Reinforcement learning-based multi-agent system for network traffic signal control

    IET Intell. Transp. Syst.

    (2010)
  • D. Barbucha

    Search modes for the cooperative multi-agent system solving the vehicle routing problem

    Neurocomputing

    (2012)
  • S. Bhattacharya, M. Likhachev, V. Kumar, Multi-agent path planning with multiple tasks and distance constraints, in:...
  • M. Bowling et al.

    Multiagent learning using a variable learning rate

    Artif. Intell.

    (2002)
  • S.T. Brassai et al.

    Artificial intelligence in the path planning optimization of mobile agent navigation

    Proc. Econ. Finance

    (2012)
  • L. Busoniu et al.

    A comprehensive survey of multiagent reinforcement learning

    IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.

    (2008)
  • L.Busoniu, R. Babuska, B. De Schutter, Multi-agent Reinforcement Learning: An overview. Innovations in MASs and...
  • Z. Cai et al.

    Cooperative coevolutionary adaptive genetic algorithm in path planning of cooperative multi-mobile robot systems

    J. Intell. Robot. Syst.

    (2002)
  • V. Cherkassky et al.

    Learning from Data: Concepts, Theory and Methods

    (1998)
  • V.R. Desaraju, J.P. How, Decentralized path planning for multi-agent teams in complex environments using...
  • J. Fu et al.

    Adaptive consensus tracking of high-order nonlinear multi-agent systems with directed communication graphs

    Int. J. Control Autom. Syst.

    (2014)
  • V. Ganapathy, S. Chin, H. Kusama Joe, Neural Q-learning controller for mobile robot, in: IEEE/ASME International...
  • V. Ganapathy, S.C. Yun, W.L.D. Lui, Utilization of webots and Khepera II as a platform for neural Q-learning...
  • H. Hu et al.

    Second-order consensus of multi-agent systems with unknown but bounded disturbance

    Int. J. Control Autom. Syst.

    (2013)
  • A. Wei et al.

    Consensus of linear multi-agent systems subject to actuator saturation

    Int. J. Control Autom. Syst.

    (2013)
  • S.-H. Ji, J.-S. Choi, B.-H. Lee, A computational interactive approach to multi-agent motion planning, Int. J. Control...
  • L.P. Kaelbling et al.

    Reinforcement learninga survey

    J. Artif. Intell. Res.

    (1996)
  • M. Kaya et al.

    Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems

    IEEE Trans. Syst. Man Cybern. Part B: Cybern.

    (2004)
  • K-Team Corporation, 2013....
  • Cited by (54)

    • A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance

      2022, Neurocomputing
      Citation Excerpt :

      When the environment is complex, a large number of on-line computing makes the system insensitive to the state of the environment. In recent years, deep learning has shown remarkable ability in processing large-scale data, such as controlling the role in the game [11–15], analyzing and forecasting the data of the stock market [16–18], and controlling high-dimensional robots [19,20]. Deep reinforcement learning provides a new idea for some complex tasks, which are challenging for traditional methods.

    • Deep deterministic policy gradient algorithm for crowd-evacuation path planning

      2021, Computers and Industrial Engineering
      Citation Excerpt :

      It improves the cooperation and processing speed among agents, and performs effective information sharing among multi-agent. Cruz et al. (Cruz and Yu, 2017) used neural networks to estimate an unavailable state, improved the MARL algorithm, and realized path planning for an unknown environment. Cui et al. (Cui et al., 2020) successfully introduced shared space, enabling agents to share information and perform path planning more effectively.

    • Robust learning for collision-free trajectory in space environment with limited a priori information

      2021, Acta Astronautica
      Citation Excerpt :

      Recently, to solve trajectory planning problems in unknown or partially-known environments, innovative planning methods with learning ability are developed. Especially, reinforcement learning techniques that learn the optimal policies through environment interactions are investigated [16,17]. Antonello et al. [18] proposed a reactive autonomous navigation system, where the behaviors of target seeking and obstacle avoidance are simultaneously learned based on distance information.

    View all citing articles on Scopus

    David Luviano Cruz is a Ph.D. student in the Automatic Control Department at Cinvestav-IPN. He received his undergraduate degree in Electronic Engineering, in 2007, and M.S. in automatic control, in 2010, from Cinvestav-IPN. His research interests include learning and interaction in multi-agent systems.

    Wen Yu received the B.S. degree from Tsinghua University, Beijing, China, in 1990, and the M.S. and Ph.D. degrees, both in Electrical Engineering, from Northeastern University, Shenyang, China, in 1992 and 1995, respectively. From 1995 to 1996, he served as a Lecturer in the Department of Automatic Control at Northeastern University, Shenyang, China. Since 1996, he has been with the Centro de Investigación y de Estudios Avanzados, Instituto Politécnico Nacional (CINVESTAVIPN), Mexico City, Mexico, where he is currently a Professor with the Departamento de Control Automatico. From 2002 to 2003, he held research positions with the Instituto Mexicano del Petroleo. He was a Senior Visiting Research Fellow with Queen's University Belfast, Belfast, U.K., from 2006 to 2007, and a Visiting Associate Professor with the University of California, Santa Cruz, from 2009 to 2010. He also holds a visiting professorship at Northeastern University in China from 2006. Dr. Wen Yu serves as an Associate Editor of Neurocomputing, and Journal of Intelligent and Fuzzy Systems. He is a Member of the Mexican Academy of Sciences.

    View full text