Elsevier

Neurocomputing

Volume 72, Issues 4–6, January 2009, Pages 887-894
Neurocomputing

Hybridizing evolutionary computation and reinforcement learning for the design of almost universal controllers for autonomous robots

https://doi.org/10.1016/j.neucom.2008.04.058Get rights and content

Abstract

In this paper a hybrid approach to the autonomous motion control of robots in cluttered environments with unknown obstacles is introduced. It is shown the efficiency of a hybrid solution by combining the optimization power of evolutionary algorithms and at the same time the efficiency of reinforcement learning in real-time and on-line situations. Experimental results concerning the navigation of a L-shaped robot in a cluttered environment with unknown obstacles are also presented. In such environments there appear real-time and on-line constraints well-suited to RL algorithms and, at the same time, there exists an extremely high dimension of the state space usually unpractical for RL algorithms but well-suited to evolutionary algorithms. The experimental results confirm the validity of the hybrid approach to solve hard real-time, on-line and high dimensional robot motion planning and control problems, where the RL approach shows some difficulties.

Section snippets

Introduction: How important are representations in robotics?

By remembering the visionary words of Kenneth Craik [1], the representation of the external world (be it called reality or environment) is necessary for an agent to develop a particular task in a specific world. Obviously the nature and the characteristics of representations depend strongly on the physical nature of the agent itself. In this respect robot representations are specifically computational by its very nature while human mind representations according to a long tradition of

Guidelines for the design of an optimum (almost universal) controller for autonomous robots by the combination of evolutionary algorithms and RL

As commented above, the current robotic systems require controllers able to solve complex problems under very uncertain and dynamic environmental situations. The well-known RL paradigm is probably the approach best suited to the implementation of these controllers as it is based on the idea of choosing control actions that drive the system from an arbitrary initial state to a final desired state by means of applying the optimum actions available to the robot at each instant of time. RL is

A hard motion robot control problem

To illustrate the above discussion we have chosen an interesting autonomous robot planning and motion control problem: a two-link L-shaped robot moving in a cluttered environment with polygonal obstacles (Fig. 2, Fig. 3). The two-link robot has several degrees-of-freedom. The first ones are the linear movement along the XY Cartesian axes of the robot's middle joint. Another rotational movement of the robot is also considered (φ) in order to allow controlling the robot's orientation in the

Evolving the table of situations–actions

By following the general method explained in Section 2 our first step for obtaining the final knowledge rule base is the selection and subsequent granulation of the state variables of the particular application considered in this paper.

We have defined the table in such a way that we are able to efficiently and completely describe the robot state in a simulated environment with obstacles. To completely describe all the possible states of the robot–environment pair (or, in another words, all the

The transition from innate behavior to knowledge-based behavior by means of on-line experience

In MDP terminology a stationary deterministic policy πd is a policy that commits to a single action choice per state, that is, a mapping πd:SA from states to actions, in this case, πd(s) indicates the action that the agent takes in state s. Hence the table generated in the previous section by means of evolutionary techniques is just a stationary deterministic policy πd which represent the innate behavior of the robot controller. The goal of this development stage is to produce a robot

Experimental results

We have divided this section in two parts, one concerning a “simple problem” (i.e. a robot motion control problem in which the dimension of the state space is not extremely high) and another one a “complex problem” (i.e. in which the dimension of the state variables suffers a combinatorial explosion).

Conclusions and further research work

In this paper the advantages and disadvantages of two well-established approaches like reinforcement learning (RL) and evolutionary algorithms (EA) are discussed as regarding their respective performances in the solutions of a particular hard robot motion control problem. RL presents very attractive features concerning real-time applications, on-line applications, although, RL sometimes presents difficulties when the dimension of the state variables suffers a combinatorial explosion. On the

Darío Maravall (SM’78, M’80) received the MSc in Telecommunication Engineering from the Universidad Politécnica de Madrid in 1978 and the PhD degree at the same university in 1980. From 1980 to 1988 he was Associate Professor at the School of Telecommunication Engineering, Universidad Politécnica de Madrid. In 1988 he was promoted to Full Professor at the Faculty of Computer Science, Universidad Politécnica de Madrid. From 2000 to 2004 he was the Director of the Department of Artificial

References (15)

  • K. Craik

    The Nature of Explanation

    (1943)
  • E. Thompson

    Mind in Life: Biology, Phenomenology, and the Sciences of Mind

    (2007)
  • F. Varela et al.

    The Embodied Mind

    (1991)
  • J. de Lope et al.

    Integration of reactive utilitarian navigation and topological modeling

  • D. Kortenkamp et al.

    Artificial Intelligence and Mobile Robots: Case Studies of Successful Robot Systems

    (1998)
  • R. Brooks

    A robust layered control system for a mobile robot

    IEEE Journal of Robotics and Automation

    (1986)
  • R. Arkin

    Behaviour-based Robotics

    (1998)
There are more references available in the full text version of this article.

Cited by (0)

Darío Maravall (SM’78, M’80) received the MSc in Telecommunication Engineering from the Universidad Politécnica de Madrid in 1978 and the PhD degree at the same university in 1980. From 1980 to 1988 he was Associate Professor at the School of Telecommunication Engineering, Universidad Politécnica de Madrid. In 1988 he was promoted to Full Professor at the Faculty of Computer Science, Universidad Politécnica de Madrid. From 2000 to 2004 he was the Director of the Department of Artificial Intelligence of the Faculty of Computer Science at the Universidad Politécnica de Madrid. His current research interests include computer vision, autonomous robots and computational intelligence. He has published extensively on these subjects and has directed more than 20 funded projects, including a five-year R&D project for the automated inspection of wooden pallets using computer vision techniques and robotic mechanisms, with several operating plants in a number of European countries. As a result of this project he holds a patent issued by the European Patent Office at The Hague, The Netherlands.

Javier de Lope (SM’94, M’98) received the MSc in Computer Science from the Universidad Politécnica de Madrid in 1994 and the PhD degree at the same university in 1998. Currently, he is Associate Professor in the Department of Applied Intelligent Systems at the Universidad Politécnica de Madrid. His current research interest is centered on the study, design and construction of modular robots and multi-robot systems, and in the development of control systems based on soft-computing techniques. He is currently leading a three-year R&D project for developing industrial robotics mechanisms which follow the guidelines of multi-robot systems and reconfigurable robotics. In the past he also worked on projects related to the computer-aided automatic driving by means of external cameras and range sensors and the design and control of humanoid and flying robots.

José Antonio Martín H. received a BS and MS in Computer Science from the La Universidad del Zulia (LUZ) in 1992. He is a PhD candidate in Computer Science and Artificial Intelligence at the Universidad Politécnica de Madrid. He is near to present the final dissertation of his PhD on “Studies on adaptive systems with applications in autonomous robots and intelligent agents”. Also, he is in the PhD program of “Fundamentals of basic psychology” at the U.N.E.D. University where he have received the Advanced Studies Diploma on “A computational model of the equivalence class formation psychological phenomenon”. From 2005 he work as an Assistant Professor at the Department of Informatic Systems and Computation at the Universidad Complutense de Madrid. His main research areas are neuro-dynamic programming, machine learning and cybernetics.

This work has been partially funded by the Spanish Ministry of Science and Technology, Project: DPI2006-15346-C03-02.

View full text