Reinforcement based mobile robot navigation in dynamic environment

doi:10.1016/j.rcim.2010.06.019

Robotics and Computer-Integrated Manufacturing

Volume 27, Issue 1, February 2011, Pages 135-149

https://doi.org/10.1016/j.rcim.2010.06.019 Get rights and content

Abstract

In this paper, a new approach is developed for solving the problem of mobile robot path planning in an unknown dynamic environment based on Q-learning. Q-learning algorithms have been used widely for solving real world problems, especially in robotics since it has been proved to give reliable and efficient solutions due to its simple and well developed theory. However, most of the researchers who tried to use Q-learning for solving the mobile robot navigation problem dealt with static environments; they avoided using it for dynamic environments because it is a more complex problem that has infinite number of states. This great number of states makes the training for the intelligent agent very difficult. In this paper, the Q-learning algorithm was applied for solving the mobile robot navigation in dynamic environment problem by limiting the number of states based on a new definition for the states space. This has the effect of reducing the size of the Q-table and hence, increasing the speed of the navigation algorithm. The conducted experimental simulation scenarios indicate the strength of the new proposed approach for mobile robot navigation in dynamic environment. The results show that the new approach has a high Hit rate and that the robot succeeded to reach its target in a collision free path in most cases which is the most desirable feature in any navigation algorithm.

Introduction

In the recent years, research and industrial interests are focused on developing smart machines such as robots that are able to work under certain conditions for a very long time and without any human intervention. This includes doing specific tasks in hazardous and hostile environments. Mobile robots are smart machines that can do such tedious tasks. These robots are used in the areas where the robot navigates and carries a certain task at the same time such as, service robots, surveillance and explorations [1].

Mobile robot navigation in an unknown environment has two main problems: localization and path planning [5], [6]. Localization is the process of determining the position and orientation of the robot with respect to its surrounding. The robot needs to recognize the objects around it. It needs to recognize each object as a target or as an obstacle. Many techniques deal with this localization problem using laser range finders, sonar range finders, ultrasonic sensors, infrared sensors, vision sensors and GPS that have been developed on-board or off-board. When a larger view of the environment is necessary, a network of cameras has been used.

The other problem is the path planning in which the robot needs to find a collision free path from its starting point to its end point. In order to be able to find that path, the robot needs to run a suitable path planning algorithm, to compute the path between any two points [7].

Many researchers studied the problem of robot path planning with obstacle avoidance and many solutions were proposed to deal with the problem [8], [9]. Since the robot motion in dynamic field has a certain amount of randomness due to the nature of the real world, these solutions did not give accurate results under all conditions. In the recent years there was a drift toward artificial intelligent approaches to improve the robot autonomous ability based on accumulated experiences. In general, artificial intelligent methods can be computationally less expensive and easier than classical methods.

This research focuses on mobile robot path planning moving in a dynamic environment, where a new approach is proposed to solve this problem based on Q-learning algorithm. Q-learning algorithm has many features that make it suitable for solving the mobile robot navigation problem in dynamic environment. First, the Q-learning agent is a reinforcement learning [29] agent that has no previous knowledge about its working environment. It learns about the environment through interacting with it. This type of learning agents is called unsupervised learning agent. Since it was assumed that the mobile robot has no previous knowledge about its working environment a Q-learning agent is a good alternative for solving the mobile robot navigation problem in dynamic environment. Secondly, Q-learning agent is an on-line learning agent. It learns the best action to take at each state by trial and error. It chooses actions randomly and calculates the value for taking an action at a specific state. Through evaluating every state and action pair it can build a policy for working in the environment. In the mobile robot navigation problem in order to find a collision free path, the robot needs to find the best action to take at each state. It needs to learn this knowledge on line, while navigating its environment. Because Q-learning is very simple, it is a very appealing alternative [10].

Section snippets

Literature review

In the last decade many classical solutions tried to address the robot path planning problem. The most commonly used solution is the potential field method [22] and its variants [12], [14]. This method has been studied extensively by scholars. It was introduced in its most common form by Borenstein and Koren [11]. The basic idea behind this method is to fill the robot environment with a potential field in which the robot is attracted to the target position and is repulsive away from obstacles.

Mobile robot path planning using Reinforcement Learning in literature

In the proposed approach the robot path planning is solved using Q-learning. This method was first introduced by Watkins [18] for learning from delayed rewards and punishments. In literature there were many attempts to solve the mobile robot path planning problem using Reinforcement Learning algorithms. These methods learn the optimal policy for navigation to select the action that produces maximum cumulative reward.

Smart and Kaelbling [23], [24] used the Q-learning for mobile robot navigation

Assumptions

The initial robot location and the goal are predefined to the robot, where the robot will try to reach the goal with free collision path in spite of the presence of obstacles in the robot’s surrounding environment. There are no predefined assumptions on the velocity of the robot when it reaches its target, which means that it is a hard landing robot.

In this paper, it is assumed that the robot is equipped with all necessary sensors to supply the robot with all necessary sensory data required by

Methodology

In order to apply the Q-learning algorithm four major parts should be addressed: the working environment, the reward function, the value function and the adapted policy. In the following subsections, each part is explained in details.

Simulation and results

An extensive simulation studies were carried to train the robot and to prove the effectiveness of the new method. This simulation was implemented using MATLAB software. Different scenarios for different situations were implemented and the results of these scenarios were used to assess the performance of the proposed Q-learning solution.

Conclusions

In this research a new approach for mobile robot navigation in dynamic environment was presented using the Q-learning algorithm. The Q-learning algorithm helped in solving the problem of motion planning without having a model for the environment since the environment is completely unknown for the robot. No previous constrains were assumed about the environment or about the target or obstacles movements. In order to be able to apply this algorithm in dynamic environment a new definition for the

References (34)

J. Minguez et al.
Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios: real-time obstacle avoidance for fast mobile robots
Robotics and Autonomous Systems
(2005)
D.K. Pratihar et al.
A genetic-fuzzy approach for mobile robot navigation among moving obstacles
International Journal of Approximate Reasoning
(1999)
D. Kim et al.
A real-time limit-cycle navigation method for fast mobile robots and its application to robot soccer
Robotics and Autonomous Systems
(2003)
A. Ellery
Environment–robot interaction—the basis for mobility in planetary micro-rovers
Robotics and Autonomous Systems
(2005)
A. Chakravarthy et al.
Obstacle avoidance in a dynamic environment: a collision cone approach
IEEE Transactions on Systems, Man, and Cybernetics
(1998)
T. Bilgic et al.
Model-based localization for an autonomous mobile robot equipped with sonar sensors
IEEE International Conference on Systems, Man, and Cybernetics
(1995)
M. Mucientes et al.
Fuzzy temporal rules for mobile robot guidance in dynamic environments
IEEE Transactions on Systems, Man, and Cybernetics
(2001)
G. Filliat et al.
Map based navigation in mobile robots: I. A review of localization strategies
Cognitive System Research
(2003)
J. Mayer et al.
Map based navigation in mobile robots: II. A review of map learning and path planning strategies
Cognitive System Research
(2003)
A. Malki et al.
Vision based path planning for mobile robot using extrapolated artificial potential field and probabilistic obstacle avoidance
ASME International Mechanical Engineering Congress and Exposition
(2002)

K. Song et al.

Reactive navigation in dynamic environment using a multisensor predictor

IEEE Transactions on Systems, Man, and Cybernetics

(1999)

S. Russell et al.

Reinforcement learning in: artificial intelligence a modern approach

(2003)

J. Borenstein et al.

Real-time obstacle avoidance for fast mobile robots

IEEE Transactions on Systems, Man, and Cybernetics

(1989)

S. Ge et al.

New potential functions for mobile robot path planning

IEEE Transactions on Robotics and Automation

(2000)

S. Ge et al.

Dynamic motion planning for mobile robots using potential field method

Autonomous Robots

(2002)

N. Tsourveloudis et al.

Autonomous vehicle navigation utilizing electrostatic potential fields and fuzzy logic

IEEE Transactions on Robotics and Automation

(2001)

M. Joo et al.

Obstacle avoidance of a mobile robot using hybrid learning approach

IEEE Transactions on Industrial Electronics

(2005)

Cited by (179)

A real‐time fuzzy motion planning system for unmanned aerial vehicles in dynamic 3D environments
2024, Applied Soft Computing
This paper presents a new fuzzy potential system to plan the motion of Unmanned Aerial Vehicles (UAVs) in dynamic 3D Space. The system consists of two fuzzy subsystems representing the attractive model and the repulsive model of virtual forces in 3D. The attractive model will generate the attractive force required to pull the UAV in a smooth and optimized trajectory to land softly on a moving or stationary target. The repulsive model will generate the required repulsive force to avoid stationary or moving obstacles in 3D Space. The attractive fuzzy inference system takes the relative position and relative velocity between UAV and the target in the x, y, and z directions as inputs. It generates the required attractive force in the x, y, and z directions. The repulsive fuzzy inference system takes the relative position between UAV and obstacle in the xyz directions as input. Fuzzy associative memory (FAM) models the inputs and generates the required repulsive force in the x, y, and z directions. As a result, the UAV is considered to be moving under the influence of fuzzy virtual attractive and repulsive forces simultaneously. Accordingly, it will be able to change both its altitude and projected planner position concurrently and resolves the local minima problem if occurred. On the other hand, many classical models in dynamic environments require several additional inputs, such as the relative position and relative velocity, which increase the requirement on the measurement system to localize the moving objects in the 3D Space. Several experiments were performed and discussed to verify the robustness and effectiveness of the proposed motion planner with real-time implementations. The system performance was validated using three robotics platforms, two quadcopter drones, and one ground robot. The position and orientation of each robot were defined using a motion capture system with 6 opti-track cameras. The motion planning system produces a quadcopter drone's efficient and accurate low-frequency trajectory. The generated trajectory allows the drone to track the ground robot and avoid collision effectively with the second drone in the vicinity.
Hierarchical multi-robot navigation and formation in unknown environments via deep reinforcement learning and distributed optimization
2023, Robotics and Computer-Integrated Manufacturing
Compared with a single robot, Multi-robot Systems (MRSs) can undertake more challenging tasks in complex scenarios benefiting from the increased transportation capacity and fault tolerance. This paper presents a hierarchical framework for multi-robot navigation and formation in unknown environments with static and dynamic obstacles, where the robots compute and maintain the optimized formation while making progress to the target together. In the proposed framework, each single robot is capable of navigating to the global target in unknown environments based on its local perception, and only limited communication among robots is required to obtain the optimal formation. Accordingly, three modules are included in this framework. Firstly, we design a learning network based on Deep Deterministic Policy Gradient (DDPG) to address the global navigation task for single robot, which derives end-to-end policies that map the robot’s local perception into its velocity commands. To handle complex obstacle distributions (e.g. narrow/zigzag passage and local minimum) and stabilize the training process, strategies of Curriculum Learning (CL) and Reward Shaping (RS) are combined. Secondly, for an expected formation, its real-time configuration is optimized by a distributed optimization. This configuration considers surrounding obstacles and current formation status, and provides each robot with its formation target. Finally, a velocity adjustment method considering the robot kinematics is designed which adjusts the navigation velocity of each robot according to its formation target, making all the robots navigate to their targets while maintaining the expected formation. This framework allows for formation online reconfiguration and is scalable with the number of robots. Extensive simulations and 3-D evaluations verify that our method can navigate the MRS in unknown environments while maintaining the optimal formation.
Robot learning towards smart robotic manufacturing: A review
2022, Robotics and Computer-Integrated Manufacturing
Robotic equipment has been playing a central role since the proposal of smart manufacturing. Since the beginning of the first integration of industrial robots into production lines, industrial robots have enhanced productivity and relieved humans from heavy workloads significantly. Towards the next generation of manufacturing, this review first introduces the comprehensive background of smart robotic manufacturing within robotics, machine learning, and robot learning. Definitions and categories of robot learning are summarised. Concretely, imitation learning, policy gradient learning, value function learning, actor-critic learning, and model-based learning as the leading technologies in robot learning are reviewed. Training tools, benchmarks, and comparisons amongst different robot learning methods are delivered. Typical industrial applications in robotic grasping, assembly, process control, and industrial human-robot collaboration are listed and discussed. Finally, open problems and future research directions are summarised.
Motion planning of unmanned aerial vehicles in dynamic 3D space: a potential force approach
2022, Robotica
Adaptive Q-learning path planning algorithm based on virtual target guidance
2024, Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS
Robotic Disassembly of Electric Vehicle Batteries: Technologies and Opportunities
2024, SSRN

View all citing articles on Scopus

View full text

Reinforcement based mobile robot navigation in dynamic environment

Abstract

Introduction

Section snippets

Literature review

Mobile robot path planning using Reinforcement Learning in literature

Assumptions

Methodology

Simulation and results

Conclusions

Robotics and Autonomous Systems

International Journal of Approximate Reasoning

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Obstacle avoidance in a dynamic environment: a collision cone approach

IEEE Transactions on Systems, Man, and Cybernetics

Model-based localization for an autonomous mobile robot equipped with sonar sensors

IEEE International Conference on Systems, Man, and Cybernetics

Fuzzy temporal rules for mobile robot guidance in dynamic environments

IEEE Transactions on Systems, Man, and Cybernetics

Map based navigation in mobile robots: I. A review of localization strategies

Cognitive System Research

Map based navigation in mobile robots: II. A review of map learning and path planning strategies

Cognitive System Research

Vision based path planning for mobile robot using extrapolated artificial potential field and probabilistic obstacle avoidance

ASME International Mechanical Engineering Congress and Exposition

Reactive navigation in dynamic environment using a multisensor predictor

IEEE Transactions on Systems, Man, and Cybernetics

Reinforcement learning in: artificial intelligence a modern approach

Real-time obstacle avoidance for fast mobile robots

IEEE Transactions on Systems, Man, and Cybernetics

New potential functions for mobile robot path planning

IEEE Transactions on Robotics and Automation

Dynamic motion planning for mobile robots using potential field method

Autonomous Robots

Autonomous vehicle navigation utilizing electrostatic potential fields and fuzzy logic

IEEE Transactions on Robotics and Automation

Obstacle avoidance of a mobile robot using hybrid learning approach

IEEE Transactions on Industrial Electronics