Skip to main content
Erschienen in: Neural Computing and Applications 3-4/2003

01.12.2003 | Original Article

A topological reinforcement learning agent for navigation

verfasst von: Arthur P. S. Braga, Aluízio F. R. Araújo

Erschienen in: Neural Computing and Applications | Ausgabe 3-4/2003

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Hierarchical solutions and generalisation methods are other approaches for accelerating RL that are not mentioned in detail in this text.
 
2
Neuron and node are interchangeable terms used throughout this text.
 
3
The Delaunay triangulation is the dual of the Voronoi diagram [23]. The Voronoi diagram of a set W = {w 1 , ..., w N } of vectors w i S is given by N Voronoi cells V i , which is formed by the set of states s S that are closer to w i than any other w j S, j≠i. The Delaunay triangulation is the graph obtained if one connects all pairs w i , w j S whose Voronoi cells share an edge.
 
4
In this work the stimulus represents the spatial position of the agent in an environment.
 
5
One characteristic of a Delaunay triangulation is that to each triangle can be associated a circle. None of the vectors used as vertices for the triangulation [8] can be found in this circle. If there is a vector in any circle, there is a non-Delaunay edge connected to this point that should be deleted.
 
6
A trial is described as a sequence of 4-tuples(s t , a t , r t+1, s t+1 ) generated from an initial random state s 0 until the agent reaches the goal state.
 
7
In case of multiple neighbour nodes with the same highest value, one of them is randomly chosen.
 
8
The adopted policy is a ε—greedy where the action is selected with probability 1-ε by Eq. 11 and with probability ε by an exploration strategy.
 
9
The maximum number of possible actions is eight. They are represented in Fig. 1.
 
10
Other exploratory strategies different from the semi-uniform distribution of probability.
 
11
Similar to [3], in TRLA the neighbourhood of the nearest node in the topological map of the current state gives all necessary information to the policy (see Eq. 11).
 
Literatur
1.
Zurück zum Zitat Arbib MA, Erdi P and Szentagothai J (1998) Neural organization—structure, function and dynamics. Bradford MIT Press, Cambridge, MA Arbib MA, Erdi P and Szentagothai J (1998) Neural organization—structure, function and dynamics. Bradford MIT Press, Cambridge, MA
2.
Zurück zum Zitat Althoefer K, Krekelberg B, Husmeier D and Seneviratne L (2001) Reinforcement learning in a rule-based navigator for robotic manipulators. Neurocomputing 37:51–70CrossRefMATH Althoefer K, Krekelberg B, Husmeier D and Seneviratne L (2001) Reinforcement learning in a rule-based navigator for robotic manipulators. Neurocomputing 37:51–70CrossRefMATH
3.
Zurück zum Zitat Atkeson CG, Schaal S (1995) Memory-based neural networks for robot learning. Neurocomputing 9(13):243–269CrossRef Atkeson CG, Schaal S (1995) Memory-based neural networks for robot learning. Neurocomputing 9(13):243–269CrossRef
4.
Zurück zum Zitat Blodgett C (1929) The effect of the introduction of reward upon the maze performance of rats. Univ CA Pub Psychol 4:113–134 Blodgett C (1929) The effect of the introduction of reward upon the maze performance of rats. Univ CA Pub Psychol 4:113–134
5.
Zurück zum Zitat Dean T, Kaelbling LP, Kirman J and Nicholson A (1995) Planning under time constraints in stochastic domains. Art Intellig 76:35–74CrossRef Dean T, Kaelbling LP, Kirman J and Nicholson A (1995) Planning under time constraints in stochastic domains. Art Intellig 76:35–74CrossRef
6.
Zurück zum Zitat Fritzke B (1994) Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neur Netwks 7(9):1441–1460CrossRef Fritzke B (1994) Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neur Netwks 7(9):1441–1460CrossRef
7.
Zurück zum Zitat Gaussier P, Revel A, Joulain C and Zrehen S (1997) Living in a partially structured environment: how to bypass the limitations of classical reinforcement techniques. Robot Auton Sys 20:225–250CrossRef Gaussier P, Revel A, Joulain C and Zrehen S (1997) Living in a partially structured environment: how to bypass the limitations of classical reinforcement techniques. Robot Auton Sys 20:225–250CrossRef
8.
Zurück zum Zitat George PL (1991) Automatic mesh generation—application to finite element methods. Wiley, New York George PL (1991) Automatic mesh generation—application to finite element methods. Wiley, New York
9.
Zurück zum Zitat Haykin S (1999) Neural networks—a comprehensive foundation. Prentice Hall, New York Haykin S (1999) Neural networks—a comprehensive foundation. Prentice Hall, New York
10.
Zurück zum Zitat Jockusch J, Ritter H (1999) An instantaneous topological mapping model for correlated stimuli. In: Proceedings of the IJCNN’99, Washington, DC, 10-16 July 1999 Jockusch J, Ritter H (1999) An instantaneous topological mapping model for correlated stimuli. In: Proceedings of the IJCNN’99, Washington, DC, 10-16 July 1999
11.
Zurück zum Zitat Jockusch J (2000). Exploration based on neural networks with applications in manipulator control. Dissertation, University of Bielefeld Jockusch J (2000). Exploration based on neural networks with applications in manipulator control. Dissertation, University of Bielefeld
12.
Zurück zum Zitat Johannet A, Sarda I (1999) Goal-directed behaviours by reinforcement learning. Neurocomputing 28:107–125CrossRef Johannet A, Sarda I (1999) Goal-directed behaviours by reinforcement learning. Neurocomputing 28:107–125CrossRef
13.
Zurück zum Zitat Lin L-J (1993) Reinforcement learning for robots using neural networks. Dissertation, Carnegie Mellon University Lin L-J (1993) Reinforcement learning for robots using neural networks. Dissertation, Carnegie Mellon University
14.
Zurück zum Zitat Kaelbling LP, Littman ML and Moore AW (1996) Reinforcement learning: a survey. J Art Intellig Res 4:237–285 Kaelbling LP, Littman ML and Moore AW (1996) Reinforcement learning: a survey. J Art Intellig Res 4:237–285
15.
Zurück zum Zitat Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Rob Res 5(1): 90–98 Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Rob Res 5(1): 90–98
16.
Zurück zum Zitat Koenig S, Simmons RG (1996) The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms. Mach Learn 22:227–250CrossRefMATH Koenig S, Simmons RG (1996) The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms. Mach Learn 22:227–250CrossRefMATH
17.
Zurück zum Zitat Kohonen T (1984) Self-organization and associative memory. Springer, Berlin Heidelberg New York Kohonen T (1984) Self-organization and associative memory. Springer, Berlin Heidelberg New York
18.
Zurück zum Zitat Kohonen T (2001) Self-organizing maps. Springer, Berlin Heidelberg New York Kohonen T (2001) Self-organizing maps. Springer, Berlin Heidelberg New York
19.
Zurück zum Zitat Kortenkamp D, Bonasso RP and Murphy R (1998) Artificial intelligence and mobile robots. AAAI Press / MIT Press, Cambridge, MA Kortenkamp D, Bonasso RP and Murphy R (1998) Artificial intelligence and mobile robots. AAAI Press / MIT Press, Cambridge, MA
20.
Zurück zum Zitat Leonard JJ, Durrant-White HF (1991) Mobile robot localization by tracking geometric beacons. IEEE Trans Robot Automat 7(3):376–382CrossRef Leonard JJ, Durrant-White HF (1991) Mobile robot localization by tracking geometric beacons. IEEE Trans Robot Automat 7(3):376–382CrossRef
21.
Zurück zum Zitat Linhares A (1998) State-space search strategies gleaned from animal behavior: a traveling salesman experiment. Biol Cybern 78:167–173CrossRefMATH Linhares A (1998) State-space search strategies gleaned from animal behavior: a traveling salesman experiment. Biol Cybern 78:167–173CrossRefMATH
22.
Zurück zum Zitat Mahadevan S, Connell J (1992) Automatic programming of behavior-based robots using reinforcement learning. Art Intellig 55:311–365CrossRef Mahadevan S, Connell J (1992) Automatic programming of behavior-based robots using reinforcement learning. Art Intellig 55:311–365CrossRef
23.
Zurück zum Zitat Martinetz T, Schulten K (1994) Topology representing networks. Neur Netwks 7(3):507–522CrossRef Martinetz T, Schulten K (1994) Topology representing networks. Neur Netwks 7(3):507–522CrossRef
24.
Zurück zum Zitat Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130CrossRef Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130CrossRef
25.
Zurück zum Zitat Muller RU, Stead M and Pach J (1996) The hippocampus as a cognitive graph. J Gen Physiol 7:663–694 Muller RU, Stead M and Pach J (1996) The hippocampus as a cognitive graph. J Gen Physiol 7:663–694
26.
27.
Zurück zum Zitat O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely moving rat. Expedr Brain Res 34:171–175CrossRef O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely moving rat. Expedr Brain Res 34:171–175CrossRef
28.
Zurück zum Zitat O’Keefe J, Nadel L (1978) The hippocampus as a cognitive map. Claredon Press, Oxford, UK O’Keefe J, Nadel L (1978) The hippocampus as a cognitive map. Claredon Press, Oxford, UK
29.
Zurück zum Zitat O’Keefe J, Burgess N, Donnett JG, Jeffery KJ and Maguire EA (1998) Place cells, navigational accuracy, and the human hippocampus. Phil Trans Roy Soc Lond Series B-Biol Sci 353(1373):1333–1340 O’Keefe J, Burgess N, Donnett JG, Jeffery KJ and Maguire EA (1998) Place cells, navigational accuracy, and the human hippocampus. Phil Trans Roy Soc Lond Series B-Biol Sci 353(1373):1333–1340
30.
Zurück zum Zitat Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454 Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454
31.
Zurück zum Zitat Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22:283–290CrossRef Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22:283–290CrossRef
32.
Zurück zum Zitat Rummery GA (1995) Problem solving with reinforcement learning. Dissertation, Cambridge University Rummery GA (1995) Problem solving with reinforcement learning. Dissertation, Cambridge University
33.
Zurück zum Zitat Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9-44 Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9-44
34.
Zurück zum Zitat Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bullet 2:160–163 Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bullet 2:160–163
35.
Zurück zum Zitat Sutton RS, Barto A (1998) Introduction to reinforcement learning. MIT Press / Bradford Books, Cambridge, MA Sutton RS, Barto A (1998) Introduction to reinforcement learning. MIT Press / Bradford Books, Cambridge, MA
36.
Zurück zum Zitat Smith JNM (1974) Food searching behavior of two European thrushes—adaptiveness of search patterns. Behaviour 49:1-61 Smith JNM (1974) Food searching behavior of two European thrushes—adaptiveness of search patterns. Behaviour 49:1-61
37.
Zurück zum Zitat Tchernichovski O, Benjamini Y and Golani I (1998) The dynamics of long-term exploration in rat. Part I—a phase-plane analysis of the relationship between location and velocity. Biol Cybern 78:423–432CrossRef Tchernichovski O, Benjamini Y and Golani I (1998) The dynamics of long-term exploration in rat. Part I—a phase-plane analysis of the relationship between location and velocity. Biol Cybern 78:423–432CrossRef
38.
Zurück zum Zitat Tchernichovski O, Benjamini Y (1998) The dynamics of long-term exploration in rat. Part II—an analytical model of the kinematic structure of rat exploratory behavior. Biol Cybern 78:433–440CrossRef Tchernichovski O, Benjamini Y (1998) The dynamics of long-term exploration in rat. Part II—an analytical model of the kinematic structure of rat exploratory behavior. Biol Cybern 78:433–440CrossRef
39.
Zurück zum Zitat Tesauro G (1995) Temporal differences learning and TD-Gammon. Comm ACM 38:58–68CrossRef Tesauro G (1995) Temporal differences learning and TD-Gammon. Comm ACM 38:58–68CrossRef
40.
Zurück zum Zitat Thrun S, Moeller K and Linden A (1991) Planning with an adaptive world model. In: Touretzky D, Lippmann R (eds) Advances in neural information processing systems (NIPS) 3, Morgan Kaufmann, San Mateo, CA Thrun S, Moeller K and Linden A (1991) Planning with an adaptive world model. In: Touretzky D, Lippmann R (eds) Advances in neural information processing systems (NIPS) 3, Morgan Kaufmann, San Mateo, CA
41.
Zurück zum Zitat Thrun SB (1992) Efficient exploration in reinforcement learning. Technical Report CMU-CS-92–102, Carnegie Mellon University Thrun SB (1992) Efficient exploration in reinforcement learning. Technical Report CMU-CS-92–102, Carnegie Mellon University
42.
Zurück zum Zitat Tolman EC (1948) Cognitive maps in rats and men. Psychol Rev 55:189–208 Tolman EC (1948) Cognitive maps in rats and men. Psychol Rev 55:189–208
43.
Zurück zum Zitat Tolman EC, Honzik CH (1930) Insight in rats. Univ CA Pub Psychol 4:215–232 Tolman EC, Honzik CH (1930) Insight in rats. Univ CA Pub Psychol 4:215–232
44.
Zurück zum Zitat Touzet C (1997) Neural reinforcement learning for behaviour synthesis. Robot Auton Sys 22(3–4):251–281 Touzet C (1997) Neural reinforcement learning for behaviour synthesis. Robot Auton Sys 22(3–4):251–281
45.
Zurück zum Zitat Trullier O, Wiener S, Berthoz A and Meyer JA (1997) Biologically-based artificial navigation systems: review and prospects. Prog Neurobiol 51(5):483–544CrossRef Trullier O, Wiener S, Berthoz A and Meyer JA (1997) Biologically-based artificial navigation systems: review and prospects. Prog Neurobiol 51(5):483–544CrossRef
46.
Zurück zum Zitat Trullier O, Meyer J-A (2000) Animate navigation using a cognitive graph. Biol Cybern 83:271–285CrossRefMATH Trullier O, Meyer J-A (2000) Animate navigation using a cognitive graph. Biol Cybern 83:271–285CrossRefMATH
47.
Zurück zum Zitat Voicu H, Schmajuk N (2002) Latent learning, shortcuts and detours: a computational model. Behav Process 59:67–86CrossRef Voicu H, Schmajuk N (2002) Latent learning, shortcuts and detours: a computational model. Behav Process 59:67–86CrossRef
48.
Zurück zum Zitat Xiao J, Michalewicz Z, Zhang L and Trojanowski K (1997) Adaptative evolutionary planner/navigator for mobile robots. IEEE Trans Evolut Comput 1(1):18–28CrossRef Xiao J, Michalewicz Z, Zhang L and Trojanowski K (1997) Adaptative evolutionary planner/navigator for mobile robots. IEEE Trans Evolut Comput 1(1):18–28CrossRef
49.
Zurück zum Zitat Watkins CJCH (1989) Learning from delayed rewards. Dissertation, Cambridge University Watkins CJCH (1989) Learning from delayed rewards. Dissertation, Cambridge University
50.
51.
Zurück zum Zitat Wyatt J (1997) Exploration and inference in learning from reinforcement. Dissertation, University of Edinburgh Wyatt J (1997) Exploration and inference in learning from reinforcement. Dissertation, University of Edinburgh
52.
Zurück zum Zitat Zalama E, Gaudiano P and Coronado JL (1995) A real-time, unsupervised neural network for the low-level control of a mobile robot in a nonstationary environment. Neur Netwks 8(1):103–123CrossRef Zalama E, Gaudiano P and Coronado JL (1995) A real-time, unsupervised neural network for the low-level control of a mobile robot in a nonstationary environment. Neur Netwks 8(1):103–123CrossRef
53.
Zurück zum Zitat Zeller M, Sharma R and Schulten K (1997) Motion planning of a pneumatic robot using a neural network. IEEE Contr Sys Mag 17:89–98CrossRef Zeller M, Sharma R and Schulten K (1997) Motion planning of a pneumatic robot using a neural network. IEEE Contr Sys Mag 17:89–98CrossRef
54.
Zurück zum Zitat Zelinsky A (1992) A mobile robot exploration algorithm. IEEE Trans Robot Automat 8(6):707–717CrossRef Zelinsky A (1992) A mobile robot exploration algorithm. IEEE Trans Robot Automat 8(6):707–717CrossRef
Metadaten
Titel
A topological reinforcement learning agent for navigation
verfasst von
Arthur P. S. Braga
Aluízio F. R. Araújo
Publikationsdatum
01.12.2003
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 3-4/2003
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-003-0385-9

Weitere Artikel der Ausgabe 3-4/2003

Neural Computing and Applications 3-4/2003 Zur Ausgabe

Premium Partner