nach oben

Neural Computing and Applications

Erschienen in:

01.12.2003 | Original Article

A topological reinforcement learning agent for navigation

verfasst von: Arthur P. S. Braga, Aluízio F. R. Araújo

Erschienen in: Neural Computing and Applications | Ausgabe 3-4/2003

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.

Vorheriger Artikel Optimising newspaper sales using neural-Bayesian technology

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Hierarchical solutions and generalisation methods are other approaches for accelerating RL that are not mentioned in detail in this text.

Neuron and node are interchangeable terms used throughout this text.

The Delaunay triangulation is the dual of the Voronoi diagram [23]. The Voronoi diagram of a set W = {w ₁ , ..., w _N} of vectors w _i ∈ S is given by N Voronoi cells V _i , which is formed by the set of states s ∈ S that are closer to w _i than any other w _j ∈ S, j≠i. The Delaunay triangulation is the graph obtained if one connects all pairs w _i, w _j ∈ S whose Voronoi cells share an edge.

In this work the stimulus represents the spatial position of the agent in an environment.

One characteristic of a Delaunay triangulation is that to each triangle can be associated a circle. None of the vectors used as vertices for the triangulation [8] can be found in this circle. If there is a vector in any circle, there is a non-Delaunay edge connected to this point that should be deleted.

A trial is described as a sequence of 4-tuples(s _t, a _t, r _t+1, s _t+1) generated from an initial random state s ₀ until the agent reaches the goal state.

In case of multiple neighbour nodes with the same highest value, one of them is randomly chosen.

The adopted policy is a ε—greedy where the action is selected with probability 1-ε by Eq. 11 and with probability ε by an exploration strategy.

The maximum number of possible actions is eight. They are represented in Fig. 1.

Other exploratory strategies different from the semi-uniform distribution of probability.

Similar to [3], in TRLA the neighbourhood of the nearest node in the topological map of the current state gives all necessary information to the policy (see Eq. 11).

Arbib MA, Erdi P and Szentagothai J (1998) Neural organization—structure, function and dynamics. Bradford MIT Press, Cambridge, MA

Althoefer K, Krekelberg B, Husmeier D and Seneviratne L (2001) Reinforcement learning in a rule-based navigator for robotic manipulators. Neurocomputing 37:51–70CrossRefMATH

Atkeson CG, Schaal S (1995) Memory-based neural networks for robot learning. Neurocomputing 9(13):243–269CrossRef

Blodgett C (1929) The effect of the introduction of reward upon the maze performance of rats. Univ CA Pub Psychol 4:113–134

Dean T, Kaelbling LP, Kirman J and Nicholson A (1995) Planning under time constraints in stochastic domains. Art Intellig 76:35–74CrossRef

Fritzke B (1994) Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neur Netwks 7(9):1441–1460CrossRef

Gaussier P, Revel A, Joulain C and Zrehen S (1997) Living in a partially structured environment: how to bypass the limitations of classical reinforcement techniques. Robot Auton Sys 20:225–250CrossRef

George PL (1991) Automatic mesh generation—application to finite element methods. Wiley, New York

Haykin S (1999) Neural networks—a comprehensive foundation. Prentice Hall, New York

10.

Jockusch J, Ritter H (1999) An instantaneous topological mapping model for correlated stimuli. In: Proceedings of the IJCNN’99, Washington, DC, 10-16 July 1999

11.

Jockusch J (2000). Exploration based on neural networks with applications in manipulator control. Dissertation, University of Bielefeld

12.

Johannet A, Sarda I (1999) Goal-directed behaviours by reinforcement learning. Neurocomputing 28:107–125CrossRef

13.

Lin L-J (1993) Reinforcement learning for robots using neural networks. Dissertation, Carnegie Mellon University

14.

Kaelbling LP, Littman ML and Moore AW (1996) Reinforcement learning: a survey. J Art Intellig Res 4:237–285

15.

Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Rob Res 5(1): 90–98

16.

Koenig S, Simmons RG (1996) The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms. Mach Learn 22:227–250CrossRefMATH

17.

Kohonen T (1984) Self-organization and associative memory. Springer, Berlin Heidelberg New York

18.

Kohonen T (2001) Self-organizing maps. Springer, Berlin Heidelberg New York

19.

Kortenkamp D, Bonasso RP and Murphy R (1998) Artificial intelligence and mobile robots. AAAI Press / MIT Press, Cambridge, MA

20.

Leonard JJ, Durrant-White HF (1991) Mobile robot localization by tracking geometric beacons. IEEE Trans Robot Automat 7(3):376–382CrossRef

21.

Linhares A (1998) State-space search strategies gleaned from animal behavior: a traveling salesman experiment. Biol Cybern 78:167–173CrossRefMATH

22.

Mahadevan S, Connell J (1992) Automatic programming of behavior-based robots using reinforcement learning. Art Intellig 55:311–365CrossRef

23.

Martinetz T, Schulten K (1994) Topology representing networks. Neur Netwks 7(3):507–522CrossRef

24.

Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130CrossRef

25.

Muller RU, Stead M and Pach J (1996) The hippocampus as a cognitive graph. J Gen Physiol 7:663–694

26.

Nolfi S (2002) Power and limits of reactive agents. Neurocomputing 42:119–145CrossRefMATH

27.

O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely moving rat. Expedr Brain Res 34:171–175CrossRef

28.

O’Keefe J, Nadel L (1978) The hippocampus as a cognitive map. Claredon Press, Oxford, UK

29.

O’Keefe J, Burgess N, Donnett JG, Jeffery KJ and Maguire EA (1998) Place cells, navigational accuracy, and the human hippocampus. Phil Trans Roy Soc Lond Series B-Biol Sci 353(1373):1333–1340

30.

Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454

31.

Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22:283–290CrossRef

32.

Rummery GA (1995) Problem solving with reinforcement learning. Dissertation, Cambridge University

33.

Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9-44

34.

Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bullet 2:160–163

35.

Sutton RS, Barto A (1998) Introduction to reinforcement learning. MIT Press / Bradford Books, Cambridge, MA

36.

Smith JNM (1974) Food searching behavior of two European thrushes—adaptiveness of search patterns. Behaviour 49:1-61

37.

Tchernichovski O, Benjamini Y and Golani I (1998) The dynamics of long-term exploration in rat. Part I—a phase-plane analysis of the relationship between location and velocity. Biol Cybern 78:423–432CrossRef

38.

Tchernichovski O, Benjamini Y (1998) The dynamics of long-term exploration in rat. Part II—an analytical model of the kinematic structure of rat exploratory behavior. Biol Cybern 78:433–440CrossRef

39.

Tesauro G (1995) Temporal differences learning and TD-Gammon. Comm ACM 38:58–68CrossRef

40.

Thrun S, Moeller K and Linden A (1991) Planning with an adaptive world model. In: Touretzky D, Lippmann R (eds) Advances in neural information processing systems (NIPS) 3, Morgan Kaufmann, San Mateo, CA

41.

Thrun SB (1992) Efficient exploration in reinforcement learning. Technical Report CMU-CS-92–102, Carnegie Mellon University

42.

Tolman EC (1948) Cognitive maps in rats and men. Psychol Rev 55:189–208

43.

Tolman EC, Honzik CH (1930) Insight in rats. Univ CA Pub Psychol 4:215–232

44.

Touzet C (1997) Neural reinforcement learning for behaviour synthesis. Robot Auton Sys 22(3–4):251–281

45.

Trullier O, Wiener S, Berthoz A and Meyer JA (1997) Biologically-based artificial navigation systems: review and prospects. Prog Neurobiol 51(5):483–544CrossRef

46.

Trullier O, Meyer J-A (2000) Animate navigation using a cognitive graph. Biol Cybern 83:271–285CrossRefMATH

47.

Voicu H, Schmajuk N (2002) Latent learning, shortcuts and detours: a computational model. Behav Process 59:67–86CrossRef

48.

Xiao J, Michalewicz Z, Zhang L and Trojanowski K (1997) Adaptative evolutionary planner/navigator for mobile robots. IEEE Trans Evolut Comput 1(1):18–28CrossRef

49.

Watkins CJCH (1989) Learning from delayed rewards. Dissertation, Cambridge University

50.

Wiering M, Schimidhuber J (1998) Fast online Q(λ). Mach Learn 33:105–115CrossRefMATH

51.

Wyatt J (1997) Exploration and inference in learning from reinforcement. Dissertation, University of Edinburgh

52.

Zalama E, Gaudiano P and Coronado JL (1995) A real-time, unsupervised neural network for the low-level control of a mobile robot in a nonstationary environment. Neur Netwks 8(1):103–123CrossRef

53.

Zeller M, Sharma R and Schulten K (1997) Motion planning of a pneumatic robot using a neural network. IEEE Contr Sys Mag 17:89–98CrossRef

54.

Zelinsky A (1992) A mobile robot exploration algorithm. IEEE Trans Robot Automat 8(6):707–717CrossRef

Titel: A topological reinforcement learning agent for navigation
verfasst von: Arthur P. S. Braga
Aluízio F. R. Araújo
Publikationsdatum: 01.12.2003
Verlag: Springer-Verlag
Erschienen in: Neural Computing and Applications / Ausgabe 3-4/2003
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-003-0385-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 3-4/2003

“Reasonable” support to knowledge sharing through schema analysis and articulation

New methods for self-organising map visual analysis

Navigating mobile robots with a modular neural architecture

A simplified approach to independent component analysis

Neuro-fuzzy clustering techniques for complex acoustic scenarios

A neuro-fuzzy technique for document binarisation

Premium Partner