nach oben

Autonomous Robots

Erschienen in:

01.11.2012

Reinforcement learning to adjust parametrized motor primitives to new situations

verfasst von: Jens Kober, Andreas Wilhelm, Erhan Oztop, Jan Peters

Erschienen in: Autonomous Robots | Ausgabe 4/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

Vorheriger Artikel Robot task plan representation by Petri nets: modelling, identification, analysis and execution

Nächster Artikel Force feedback exploiting tactile and proximal force/torque sensing

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Note that the dynamical systems motor primitives ensure the stability of the movement generation but cannot guarantee the stability of the movement execution (Ijspeert et al. 2002; Schaal et al. 2007).

The equality (Φ ^T RΦ+λI)⁻¹ Φ ^T R=Φ ^T(ΦΦ ^T+λR ⁻¹)⁻¹ is straightforward to verify by left and right multiplying the non-inverted terms: Φ ^T R(ΦΦ ^T+λR ⁻¹)=(Φ ^T RΦ+λI)Φ ^T.

Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379. MathSciNetCrossRef

Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396. CrossRef

Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611. CrossRef

Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. MATH

Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75. CrossRef

Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114. CrossRef

Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278. MATHCrossRef

Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506. CrossRef

Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208). CrossRef

Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028).

Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning.

Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530).

Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710).

Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57).

Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343).

Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203. MATHCrossRef

Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858).

Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40).

Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717).

Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726).

Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361).

Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910).

Masters Games Ltd (2010). The rules of darts. http://www.mastersgames.com/rules/darts-rules.htm.

McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368).

McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing.

Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416). CrossRef

Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376. CrossRef

Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91. CrossRef

Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98).

Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298).

Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212. CrossRef

Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697. CrossRef

Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916). CrossRef

Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press. MATH

Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM. CrossRef

Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445. CrossRef

Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics.

Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.

Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063).

Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815. CrossRef

Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500).

Welling, M. (2010). The Kalman filter. Lecture notes.

Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256. MATH

Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.

Titel: Reinforcement learning to adjust parametrized motor primitives to new situations
verfasst von: Jens Kober
Andreas Wilhelm
Erhan Oztop
Jan Peters
Publikationsdatum: 01.11.2012
Verlag: Springer US
Erschienen in: Autonomous Robots / Ausgabe 4/2012
Print ISSN: 0929-5593
Elektronische ISSN: 1573-7527
DOI: https://doi.org/10.1007/s10514-012-9290-3

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Frank Urbansky/© Peter Eichler / Leipzig, CO2-Fußabdruck/© Jenny Sturm / stock.adobe.com, Interview Entropie Bild 1/© Bernhard Weßling, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2012

A momentum-based balance controller for humanoid robots on non-level and non-stationary ground

Force feedback exploiting tactile and proximal force/torque sensing

Autonomous construction using scarce resources in unknown environments

Distributed orientation agreement in a group of robots

Robot task plan representation by Petri nets: modelling, identification, analysis and execution

A comparison of path planning strategies for autonomous exploration and mapping of unknown environments

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.