Skip to main content
Erschienen in: Neural Computing and Applications 7-8/2013

01.12.2013 | ISNN2012

A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots

verfasst von: Lei Zuo, Xin Xu, Chunming Liu, Zhenhua Huang

Erschienen in: Neural Computing and Applications | Ausgabe 7-8/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Robust motion control is fundamental to autonomous mobile robots. In the past few years, reinforcement learning (RL) has attracted considerable attention in the feedback control of wheeled mobile robot. However, it is still difficult for RL to solve problems with large or continuous state spaces, which is common in robotics. To improve the generalization ability of RL, this paper presents a novel hierarchical RL approach for optimal path tracking of wheeled mobile robots. In the proposed approach, a graph Laplacian-based hierarchical approximate policy iteration (GHAPI) algorithm is developed, in which the basis functions are constructed automatically using the graph Laplacian operator. In GHAPI, the state space of an Markov decision process is divided into several subspaces and approximate policy iteration is carried out on each subspace. Then, a near-optimal path-tracking control strategy can be obtained by GHAPI combined with proportional-derivative (PD) control. The performance of the proposed approach is evaluated by using a P3-AT wheeled mobile robot. It is demonstrated that the GHAPI-based PD control can obtain better near-optimal control policies than previous approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Oh SY, Lee JH et al (2000) A new reinforcement learning vehicle control architecture for vision-based road following [J]. IEEE Trans Veh Technol 49(3):997–1005CrossRef Oh SY, Lee JH et al (2000) A new reinforcement learning vehicle control architecture for vision-based road following [J]. IEEE Trans Veh Technol 49(3):997–1005CrossRef
2.
Zurück zum Zitat Yamaguchi T, Sato E et al (2003) Intelligent space and human centered robotics [J]. IEEE Trans Ind Electron 50(5):881–889CrossRef Yamaguchi T, Sato E et al (2003) Intelligent space and human centered robotics [J]. IEEE Trans Ind Electron 50(5):881–889CrossRef
3.
Zurück zum Zitat Lee JM, Son K et al (2003) Localization of a mobile robot using the image of a moving object [J]. IEEE Trans Ind Electron 50(3):612–619CrossRef Lee JM, Son K et al (2003) Localization of a mobile robot using the image of a moving object [J]. IEEE Trans Ind Electron 50(3):612–619CrossRef
4.
Zurück zum Zitat Lee TC, Tsai CY et al (2004) Fast parking control of mobile robots: a motion planning approach with experimental validation [J]. IEEE Trans Control Syst Technol 12(5):661–676CrossRef Lee TC, Tsai CY et al (2004) Fast parking control of mobile robots: a motion planning approach with experimental validation [J]. IEEE Trans Control Syst Technol 12(5):661–676CrossRef
5.
Zurück zum Zitat Palacin J, Salse JA et al (2004) Building a mobile robot for a floor-cleaning operation in domestic environments [J]. IEEE Trans Instrum Meas 53(5):1418–1424CrossRef Palacin J, Salse JA et al (2004) Building a mobile robot for a floor-cleaning operation in domestic environments [J]. IEEE Trans Instrum Meas 53(5):1418–1424CrossRef
6.
Zurück zum Zitat Ding D, Cooper RA (2005) Electric-powered wheelchairs: a review of current technology and insight into future direction [J]. IEEE Control Syst Mag 25(2):22–34CrossRef Ding D, Cooper RA (2005) Electric-powered wheelchairs: a review of current technology and insight into future direction [J]. IEEE Control Syst Mag 25(2):22–34CrossRef
7.
Zurück zum Zitat Shim HS, Sung YG (2004) Stability and four-posture control for nonholonomic mobile robots [J]. IEEE Trans Robot Autom 20(1):148–154CrossRef Shim HS, Sung YG (2004) Stability and four-posture control for nonholonomic mobile robots [J]. IEEE Trans Robot Autom 20(1):148–154CrossRef
8.
Zurück zum Zitat Zhao DB, Deng XY, Yi JQ (2009) Motion and internal force control for omni-directional wheeled mobile robots [J]. IEEE ASME Trans Mechatron 14(3):382–387CrossRef Zhao DB, Deng XY, Yi JQ (2009) Motion and internal force control for omni-directional wheeled mobile robots [J]. IEEE ASME Trans Mechatron 14(3):382–387CrossRef
9.
Zurück zum Zitat Wu Y, Wang B et al (2005) Finite-time tracking controller design for nonholonomic systems with extended chained form[J]. IEEE Trans Circuit Syst II Exp Briefs 52(11):798–802CrossRef Wu Y, Wang B et al (2005) Finite-time tracking controller design for nonholonomic systems with extended chained form[J]. IEEE Trans Circuit Syst II Exp Briefs 52(11):798–802CrossRef
10.
Zurück zum Zitat Antonelli G, Chiaverini S et al (2007) A fuzzy-logic-based approach for mobile robot path tracking[J]. IEEE Trans Fuzzy Syst 15(2):211–221CrossRef Antonelli G, Chiaverini S et al (2007) A fuzzy-logic-based approach for mobile robot path tracking[J]. IEEE Trans Fuzzy Syst 15(2):211–221CrossRef
11.
Zurück zum Zitat Raffo GV, Gomes GK et al (2009) A predictive controller for autonomous vehicle path tracking[J]. IEEE Trans Intell Transp Syst 10(1):92–102CrossRef Raffo GV, Gomes GK et al (2009) A predictive controller for autonomous vehicle path tracking[J]. IEEE Trans Intell Transp Syst 10(1):92–102CrossRef
12.
Zurück zum Zitat Wai R, Liu C (2009) Design of dynamic petri recurrent fuzzy neural network and its application to path-tracking control of nonholonomic mobile robot[J]. IEEE Trans Ind Electron 56(7):2667–2683CrossRef Wai R, Liu C (2009) Design of dynamic petri recurrent fuzzy neural network and its application to path-tracking control of nonholonomic mobile robot[J]. IEEE Trans Ind Electron 56(7):2667–2683CrossRef
13.
Zurück zum Zitat Mohareri O, Dhaouadi R et al (2012) Indirect adaptive tracking control of a nonholonomic mobile robot via neural networks[J]. Neurocomputing 88:54–66CrossRef Mohareri O, Dhaouadi R et al (2012) Indirect adaptive tracking control of a nonholonomic mobile robot via neural networks[J]. Neurocomputing 88:54–66CrossRef
14.
Zurück zum Zitat Aguiar AP, Hespanha JP (2007) Trajectory-tracking and path-following of underactuated autonomous vehicles with parametric modeling uncertainty[J]. IEEE Trans Autom Cont 52(8):1362–1379MathSciNetCrossRef Aguiar AP, Hespanha JP (2007) Trajectory-tracking and path-following of underactuated autonomous vehicles with parametric modeling uncertainty[J]. IEEE Trans Autom Cont 52(8):1362–1379MathSciNetCrossRef
15.
Zurück zum Zitat Xu D, Zhao DB, Yi JQ, Tan XM (2009) Trajectory tracking control of omnidirectional wheeled mobile manipulators: robust neural network based sliding mode approach [J]. IEEE Trans Syst Man Cybern Part B 39(3):788–799CrossRef Xu D, Zhao DB, Yi JQ, Tan XM (2009) Trajectory tracking control of omnidirectional wheeled mobile manipulators: robust neural network based sliding mode approach [J]. IEEE Trans Syst Man Cybern Part B 39(3):788–799CrossRef
16.
Zurück zum Zitat Park BS, Yoo SJ et al (2010) A simple adaptive control approach for trajectory tracking of electrically driven nonholonomic mobile robots[J]. IEEE Trans Control Syst Technol 18(5):1199–1206CrossRef Park BS, Yoo SJ et al (2010) A simple adaptive control approach for trajectory tracking of electrically driven nonholonomic mobile robots[J]. IEEE Trans Control Syst Technol 18(5):1199–1206CrossRef
17.
Zurück zum Zitat Sutton R, Barto AG (1998) Reinforcement learning: an introduction[M]. The MIT Press, Cambridge Sutton R, Barto AG (1998) Reinforcement learning: an introduction[M]. The MIT Press, Cambridge
18.
Zurück zum Zitat Zhang Q, Li M, Wang XS, Zhang Y (2012) Reinforcement learning in robot path optimization [J]. J Softw 7(3):657–662 Zhang Q, Li M, Wang XS, Zhang Y (2012) Reinforcement learning in robot path optimization [J]. J Softw 7(3):657–662
19.
Zurück zum Zitat Zhang PC, Xu X, Liu C, Yuan Q (2009) Reinforcement learning control of a real mobile robot using approximate policy iteration [C]. ISNN 2009, Part III, Lecture Notes in Computer Science, LNCS 5553, pp 278–288 Zhang PC, Xu X, Liu C, Yuan Q (2009) Reinforcement learning control of a real mobile robot using approximate policy iteration [C]. ISNN 2009, Part III, Lecture Notes in Computer Science, LNCS 5553, pp 278–288
20.
Zurück zum Zitat Yen GG, Hickey TW (2004) Reinforcement learning algorithms for robotic navigation in dynamic environments. ISA Trans 43:217–230CrossRef Yen GG, Hickey TW (2004) Reinforcement learning algorithms for robotic navigation in dynamic environments. ISA Trans 43:217–230CrossRef
21.
Zurück zum Zitat Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction [J]. IEEE Comput Intell Mag 4(2):39–47CrossRef Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction [J]. IEEE Comput Intell Mag 4(2):39–47CrossRef
22.
Zurück zum Zitat Sutton R (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding[C]. In: Advances in Neural Information Processing Systems 8 (Proceedings of the 1995 conference). MIT Press, pp 1038–1044 Sutton R (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding[C]. In: Advances in Neural Information Processing Systems 8 (Proceedings of the 1995 conference). MIT Press, pp 1038–1044
23.
Zurück zum Zitat Xu X, He H et al (2002) Efficient reinforcement learning using recursive least-squares methods[J]. J Art Intell Res 16:259–292MathSciNetMATH Xu X, He H et al (2002) Efficient reinforcement learning using recursive least-squares methods[J]. J Art Intell Res 16:259–292MathSciNetMATH
24.
Zurück zum Zitat Lagoudakis MG, Parr R (2003) Least-squares policy Iteration[J]. J Mach Learn Res 4:1107–1149MathSciNet Lagoudakis MG, Parr R (2003) Least-squares policy Iteration[J]. J Mach Learn Res 4:1107–1149MathSciNet
25.
Zurück zum Zitat Liu D, Javaherian H, Kovalenko O, Huang T (2008) Adaptive critic learning techniques for engine torque and air-fuel ratio control [J]. IEEE Trans Syst Man Cybern Part B Cybern 38(4):988–993CrossRef Liu D, Javaherian H, Kovalenko O, Huang T (2008) Adaptive critic learning techniques for engine torque and air-fuel ratio control [J]. IEEE Trans Syst Man Cybern Part B Cybern 38(4):988–993CrossRef
26.
Zurück zum Zitat Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503CrossRef Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503CrossRef
27.
Zurück zum Zitat Zhang HG, Wei QL, Liu D (2011) An iterative approximate dynamic programming method to solve for a class of nonlinear zero-sum differential games. Automatica 47(1):207–214MathSciNetCrossRefMATH Zhang HG, Wei QL, Liu D (2011) An iterative approximate dynamic programming method to solve for a class of nonlinear zero-sum differential games. Automatica 47(1):207–214MathSciNetCrossRefMATH
28.
Zurück zum Zitat Yang Q, Jagannathan S (2007) Online reinforcement learning neural network controller design for nanomanipulation. In: Proceedings of IEEE symposium on approximate dynamic programming and reinforcement learning. Honolulu, HI, pp 225–232 Yang Q, Jagannathan S (2007) Online reinforcement learning neural network controller design for nanomanipulation. In: Proceedings of IEEE symposium on approximate dynamic programming and reinforcement learning. Honolulu, HI, pp 225–232
29.
Zurück zum Zitat Xu X, Hu DW et al (2007) Kernel-based least squares policy iteration for reinforcement learning[J]. IEEE Trans Neural Netw 18(4):973–992CrossRef Xu X, Hu DW et al (2007) Kernel-based least squares policy iteration for reinforcement learning[J]. IEEE Trans Neural Netw 18(4):973–992CrossRef
30.
Zurück zum Zitat Bernhard H (2003) Discovering hierarchy in reinforcement learning[D]. University of New South Wales, Australia Bernhard H (2003) Discovering hierarchy in reinforcement learning[D]. University of New South Wales, Australia
31.
Zurück zum Zitat Doina SRSP et al (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artif Intell 112:181–211CrossRefMATH Doina SRSP et al (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artif Intell 112:181–211CrossRefMATH
32.
Zurück zum Zitat Parr R (1998) Hierarchical control and learning for Markov decision processes[D]. University of California, Berkeley, California Parr R (1998) Hierarchical control and learning for Markov decision processes[D]. University of California, Berkeley, California
33.
Zurück zum Zitat Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. J Art Intell Res 13:227–303MathSciNetMATH Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. J Art Intell Res 13:227–303MathSciNetMATH
34.
Zurück zum Zitat Xu X, Liu C et al (2011) Hierarchical approximate policy iteration with binary-tree state space decomposition[J]. IEEE Trans Neural Netw 22(12):1863–1877CrossRef Xu X, Liu C et al (2011) Hierarchical approximate policy iteration with binary-tree state space decomposition[J]. IEEE Trans Neural Netw 22(12):1863–1877CrossRef
35.
Zurück zum Zitat Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, MassachusettsMATH Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, MassachusettsMATH
36.
Zurück zum Zitat Vapnik V (1998) Statistical learning theory[M]. John Wiley and Sons, Inc., New York Vapnik V (1998) Statistical learning theory[M]. John Wiley and Sons, Inc., New York
37.
Zurück zum Zitat Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes[J]. J Mach Learn Res 8:2169–2231MathSciNetMATH Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes[J]. J Mach Learn Res 8:2169–2231MathSciNetMATH
38.
Zurück zum Zitat Mahadevan S (2008) Learning representation and control in Markov decision processes: new Frontiers[J]. Found Trends Mach Learn 1(4):403–565MathSciNetCrossRef Mahadevan S (2008) Learning representation and control in Markov decision processes: new Frontiers[J]. Found Trends Mach Learn 1(4):403–565MathSciNetCrossRef
39.
Zurück zum Zitat Normey-Rico JE, Alcalab I et al (2001) Mobile robot path tracking using a robust PID controller[J]. Control Eng Pract 9:1209–1214CrossRef Normey-Rico JE, Alcalab I et al (2001) Mobile robot path tracking using a robust PID controller[J]. Control Eng Pract 9:1209–1214CrossRef
40.
Zurück zum Zitat Mahadevan S, Maggioni M (2006) Value function approximation with diffusion wavelets and Laplacian eigenfunctions[C]. In: Proceedings of the neural information processing systems (NIPS). MIT Press Mahadevan S, Maggioni M (2006) Value function approximation with diffusion wavelets and Laplacian eigenfunctions[C]. In: Proceedings of the neural information processing systems (NIPS). MIT Press
41.
Zurück zum Zitat Munos R (2003) Error bounds for approximate policy iteration. In: Proceedings of the 20th annual international conference machine learning. p 560 Munos R (2003) Error bounds for approximate policy iteration. In: Proceedings of the 20th annual international conference machine learning. p 560
Metadaten
Titel
A hierarchical reinforcement learning approach for optimal path tracking of wheeled mobile robots
verfasst von
Lei Zuo
Xin Xu
Chunming Liu
Zhenhua Huang
Publikationsdatum
01.12.2013
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 7-8/2013
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-012-1243-4

Weitere Artikel der Ausgabe 7-8/2013

Neural Computing and Applications 7-8/2013 Zur Ausgabe