Hybridizing evolutionary computation and reinforcement learning for the design of almost universal controllers for autonomous robots☆
Section snippets
Introduction: How important are representations in robotics?
By remembering the visionary words of Kenneth Craik [1], the representation of the external world (be it called reality or environment) is necessary for an agent to develop a particular task in a specific world. Obviously the nature and the characteristics of representations depend strongly on the physical nature of the agent itself. In this respect robot representations are specifically computational by its very nature while human mind representations according to a long tradition of
Guidelines for the design of an optimum (almost universal) controller for autonomous robots by the combination of evolutionary algorithms and RL
As commented above, the current robotic systems require controllers able to solve complex problems under very uncertain and dynamic environmental situations. The well-known RL paradigm is probably the approach best suited to the implementation of these controllers as it is based on the idea of choosing control actions that drive the system from an arbitrary initial state to a final desired state by means of applying the optimum actions available to the robot at each instant of time. RL is
A hard motion robot control problem
To illustrate the above discussion we have chosen an interesting autonomous robot planning and motion control problem: a two-link L-shaped robot moving in a cluttered environment with polygonal obstacles (Fig. 2, Fig. 3). The two-link robot has several degrees-of-freedom. The first ones are the linear movement along the XY Cartesian axes of the robot's middle joint. Another rotational movement of the robot is also considered in order to allow controlling the robot's orientation in the
Evolving the table of situations–actions
By following the general method explained in Section 2 our first step for obtaining the final knowledge rule base is the selection and subsequent granulation of the state variables of the particular application considered in this paper.
We have defined the table in such a way that we are able to efficiently and completely describe the robot state in a simulated environment with obstacles. To completely describe all the possible states of the robot–environment pair (or, in another words, all the
The transition from innate behavior to knowledge-based behavior by means of on-line experience
In MDP terminology a stationary deterministic policy is a policy that commits to a single action choice per state, that is, a mapping from states to actions, in this case, indicates the action that the agent takes in state . Hence the table generated in the previous section by means of evolutionary techniques is just a stationary deterministic policy which represent the innate behavior of the robot controller. The goal of this development stage is to produce a robot
Experimental results
We have divided this section in two parts, one concerning a “simple problem” (i.e. a robot motion control problem in which the dimension of the state space is not extremely high) and another one a “complex problem” (i.e. in which the dimension of the state variables suffers a combinatorial explosion).
Conclusions and further research work
In this paper the advantages and disadvantages of two well-established approaches like reinforcement learning (RL) and evolutionary algorithms (EA) are discussed as regarding their respective performances in the solutions of a particular hard robot motion control problem. RL presents very attractive features concerning real-time applications, on-line applications, although, RL sometimes presents difficulties when the dimension of the state variables suffers a combinatorial explosion. On the
Darío Maravall (SM’78, M’80) received the MSc in Telecommunication Engineering from the Universidad Politécnica de Madrid in 1978 and the PhD degree at the same university in 1980. From 1980 to 1988 he was Associate Professor at the School of Telecommunication Engineering, Universidad Politécnica de Madrid. In 1988 he was promoted to Full Professor at the Faculty of Computer Science, Universidad Politécnica de Madrid. From 2000 to 2004 he was the Director of the Department of Artificial
References (15)
The Nature of Explanation
(1943)Mind in Life: Biology, Phenomenology, and the Sciences of Mind
(2007)- et al.
The Embodied Mind
(1991) - et al.
Integration of reactive utilitarian navigation and topological modeling
- et al.
Artificial Intelligence and Mobile Robots: Case Studies of Successful Robot Systems
(1998) A robust layered control system for a mobile robot
IEEE Journal of Robotics and Automation
(1986)Behaviour-based Robotics
(1998)
Cited by (0)
Darío Maravall (SM’78, M’80) received the MSc in Telecommunication Engineering from the Universidad Politécnica de Madrid in 1978 and the PhD degree at the same university in 1980. From 1980 to 1988 he was Associate Professor at the School of Telecommunication Engineering, Universidad Politécnica de Madrid. In 1988 he was promoted to Full Professor at the Faculty of Computer Science, Universidad Politécnica de Madrid. From 2000 to 2004 he was the Director of the Department of Artificial Intelligence of the Faculty of Computer Science at the Universidad Politécnica de Madrid. His current research interests include computer vision, autonomous robots and computational intelligence. He has published extensively on these subjects and has directed more than 20 funded projects, including a five-year R&D project for the automated inspection of wooden pallets using computer vision techniques and robotic mechanisms, with several operating plants in a number of European countries. As a result of this project he holds a patent issued by the European Patent Office at The Hague, The Netherlands.
Javier de Lope (SM’94, M’98) received the MSc in Computer Science from the Universidad Politécnica de Madrid in 1994 and the PhD degree at the same university in 1998. Currently, he is Associate Professor in the Department of Applied Intelligent Systems at the Universidad Politécnica de Madrid. His current research interest is centered on the study, design and construction of modular robots and multi-robot systems, and in the development of control systems based on soft-computing techniques. He is currently leading a three-year R&D project for developing industrial robotics mechanisms which follow the guidelines of multi-robot systems and reconfigurable robotics. In the past he also worked on projects related to the computer-aided automatic driving by means of external cameras and range sensors and the design and control of humanoid and flying robots.
José Antonio Martín H. received a BS and MS in Computer Science from the La Universidad del Zulia (LUZ) in 1992. He is a PhD candidate in Computer Science and Artificial Intelligence at the Universidad Politécnica de Madrid. He is near to present the final dissertation of his PhD on “Studies on adaptive systems with applications in autonomous robots and intelligent agents”. Also, he is in the PhD program of “Fundamentals of basic psychology” at the U.N.E.D. University where he have received the Advanced Studies Diploma on “A computational model of the equivalence class formation psychological phenomenon”. From 2005 he work as an Assistant Professor at the Department of Informatic Systems and Computation at the Universidad Complutense de Madrid. His main research areas are neuro-dynamic programming, machine learning and cybernetics.
- ☆
This work has been partially funded by the Spanish Ministry of Science and Technology, Project: DPI2006-15346-C03-02.