Skip to main content
Erschienen in: Autonomous Robots 4/2012

01.11.2012

Reinforcement learning to adjust parametrized motor primitives to new situations

verfasst von: Jens Kober, Andreas Wilhelm, Erhan Oztop, Jan Peters

Erschienen in: Autonomous Robots | Ausgabe 4/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that the dynamical systems motor primitives ensure the stability of the movement generation but cannot guarantee the stability of the movement execution (Ijspeert et al. 2002; Schaal et al. 2007).
 
2
The equality (Φ T I)−1 Φ T R=Φ T(ΦΦ TR −1)−1 is straightforward to verify by left and right multiplying the non-inverted terms: Φ T R(ΦΦ TR −1)=(Φ T I)Φ T.
 
Literatur
Zurück zum Zitat Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379. MathSciNetCrossRef Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379. MathSciNetCrossRef
Zurück zum Zitat Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396. CrossRef Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396. CrossRef
Zurück zum Zitat Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611. CrossRef Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611. CrossRef
Zurück zum Zitat Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. MATH Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. MATH
Zurück zum Zitat Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75. CrossRef Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75. CrossRef
Zurück zum Zitat Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114. CrossRef Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114. CrossRef
Zurück zum Zitat Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278. MATHCrossRef Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278. MATHCrossRef
Zurück zum Zitat Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506. CrossRef Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506. CrossRef
Zurück zum Zitat Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208). CrossRef Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208). CrossRef
Zurück zum Zitat Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028). Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028).
Zurück zum Zitat Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning. Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning.
Zurück zum Zitat Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530). Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530).
Zurück zum Zitat Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710). Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710).
Zurück zum Zitat Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57). Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57).
Zurück zum Zitat Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343). Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343).
Zurück zum Zitat Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203. MATHCrossRef Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203. MATHCrossRef
Zurück zum Zitat Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858). Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858).
Zurück zum Zitat Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40). Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40).
Zurück zum Zitat Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717). Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717).
Zurück zum Zitat Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726). Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726).
Zurück zum Zitat Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361). Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361).
Zurück zum Zitat Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910). Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910).
Zurück zum Zitat McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368). McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368).
Zurück zum Zitat McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing. McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing.
Zurück zum Zitat Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416). CrossRef Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416). CrossRef
Zurück zum Zitat Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376. CrossRef Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376. CrossRef
Zurück zum Zitat Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91. CrossRef Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91. CrossRef
Zurück zum Zitat Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98). Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98).
Zurück zum Zitat Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298). Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298).
Zurück zum Zitat Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212. CrossRef Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212. CrossRef
Zurück zum Zitat Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697. CrossRef Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697. CrossRef
Zurück zum Zitat Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916). CrossRef Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916). CrossRef
Zurück zum Zitat Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press. MATH Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press. MATH
Zurück zum Zitat Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM. CrossRef Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM. CrossRef
Zurück zum Zitat Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445. CrossRef Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445. CrossRef
Zurück zum Zitat Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics. Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics.
Zurück zum Zitat Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press. Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Zurück zum Zitat Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063). Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063).
Zurück zum Zitat Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815. CrossRef Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815. CrossRef
Zurück zum Zitat Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500). Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500).
Zurück zum Zitat Welling, M. (2010). The Kalman filter. Lecture notes. Welling, M. (2010). The Kalman filter. Lecture notes.
Zurück zum Zitat Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256. MATH Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256. MATH
Zurück zum Zitat Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics. Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.
Metadaten
Titel
Reinforcement learning to adjust parametrized motor primitives to new situations
verfasst von
Jens Kober
Andreas Wilhelm
Erhan Oztop
Jan Peters
Publikationsdatum
01.11.2012
Verlag
Springer US
Erschienen in
Autonomous Robots / Ausgabe 4/2012
Print ISSN: 0929-5593
Elektronische ISSN: 1573-7527
DOI
https://doi.org/10.1007/s10514-012-9290-3

Weitere Artikel der Ausgabe 4/2012

Autonomous Robots 4/2012 Zur Ausgabe

Neuer Inhalt