Skip to main content

2016 | OriginalPaper | Buchkapitel

15. Robot Learning

verfasst von : Jan Peters, Daniel D. Lee, Jens Kober, Duy Nguyen-Tuong, J. Andrew Bagnell, Stefan Schaal

Erschienen in: Springer Handbook of Robotics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors; conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in robot learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this chapter, we attempt to strengthen the links between the two research communities by providing a survey of work in robot learning for learning control and behavior generation in robots. We highlight both key challenges in robot learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our chapter lies on model learning for control and robot reinforcement learning. We demonstrate how machine learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
15.1
Zurück zum Zitat S. Schaal: The new robotics – Towards human-centered machines, HFSP J. Front. Interdiscip. Res, Life Sci. 1(2), 115–126 (2007) S. Schaal: The new robotics – Towards human-centered machines, HFSP J. Front. Interdiscip. Res, Life Sci. 1(2), 115–126 (2007)
15.2
Zurück zum Zitat B.D. Ziebart, A. Maas, J.A. Bagnell, A.K. Dey: Maximum entropy inverse reinforcement learning, AAAI Conf. Artif. Intell. (2008) B.D. Ziebart, A. Maas, J.A. Bagnell, A.K. Dey: Maximum entropy inverse reinforcement learning, AAAI Conf. Artif. Intell. (2008)
15.3
Zurück zum Zitat S. Thrun, W. Burgard, D. Fox: Probabilistic Robotics (MIT, Cambridge 2005)MATH S. Thrun, W. Burgard, D. Fox: Probabilistic Robotics (MIT, Cambridge 2005)MATH
15.4
Zurück zum Zitat B. Apolloni, A. Ghosh, F. Alpaslan, L.C. Jain, S. Patnaik (Eds.): Machine Learning and Robot Perception, Stud. Comput. Intell., Vol. 7 (Springer, Berlin, Heidelberg 2005)MATH B. Apolloni, A. Ghosh, F. Alpaslan, L.C. Jain, S. Patnaik (Eds.): Machine Learning and Robot Perception, Stud. Comput. Intell., Vol. 7 (Springer, Berlin, Heidelberg 2005)MATH
15.5
Zurück zum Zitat O. Jenkins, R. Bodenheimer, R. Peters: Manipulation manifolds: Explorations into uncovering manifolds in sensory-motor spaces, Int. Conf. Dev. Learn. (2006) O. Jenkins, R. Bodenheimer, R. Peters: Manipulation manifolds: Explorations into uncovering manifolds in sensory-motor spaces, Int. Conf. Dev. Learn. (2006)
15.6
Zurück zum Zitat M. Toussaint: Machine learning and robotics, Tutor. Conf. Mach. Learn. (2011) M. Toussaint: Machine learning and robotics, Tutor. Conf. Mach. Learn. (2011)
15.7
Zurück zum Zitat D.P. Bertsekas: Dynamic Programming and Optimal Control (Athena Scientific, Nashua 1995)MATH D.P. Bertsekas: Dynamic Programming and Optimal Control (Athena Scientific, Nashua 1995)MATH
15.8
Zurück zum Zitat R.E. Kalman: When is a linear control system optimal?, J. Basic Eng. 86(1), 51–60 (1964)CrossRef R.E. Kalman: When is a linear control system optimal?, J. Basic Eng. 86(1), 51–60 (1964)CrossRef
15.9
Zurück zum Zitat D. Nguyen-Tuong, J. Peters: Model learning in robotics: A survey, Cogn. Process. 12(4), 319–340 (2011)CrossRef D. Nguyen-Tuong, J. Peters: Model learning in robotics: A survey, Cogn. Process. 12(4), 319–340 (2011)CrossRef
15.10
Zurück zum Zitat J. Kober, D. Bagnell, J. Peters: Reinforcement learning in robotics: A survey, Int. J. Robotics Res. 32(11), 1238–1274 (2013)CrossRef J. Kober, D. Bagnell, J. Peters: Reinforcement learning in robotics: A survey, Int. J. Robotics Res. 32(11), 1238–1274 (2013)CrossRef
15.11
15.12
Zurück zum Zitat J. Ham, Y. Lin, D.D. Lee: Learning nonlinear appearance manifolds for robot localization, Int. Conf. Intell. Robots Syst. (2005) J. Ham, Y. Lin, D.D. Lee: Learning nonlinear appearance manifolds for robot localization, Int. Conf. Intell. Robots Syst. (2005)
15.13
Zurück zum Zitat R.S. Sutton, A.G. Barto: Reinforcement Learning (MIT, Cambridge 1998)MATH R.S. Sutton, A.G. Barto: Reinforcement Learning (MIT, Cambridge 1998)MATH
15.14
Zurück zum Zitat D. Nguyen-Tuong, J. Peters: Model learning with local Gaussian process regression, Adv. Robotics 23(15), 2015–2034 (2009)CrossRef D. Nguyen-Tuong, J. Peters: Model learning with local Gaussian process regression, Adv. Robotics 23(15), 2015–2034 (2009)CrossRef
15.15
Zurück zum Zitat J. Nakanishi, R. Cory, M. Mistry, J. Peters, S. Schaal: Operational space control: A theoretical and emprical comparison, Int. J. Robotics Res. 27(6), 737–757 (2008)CrossRef J. Nakanishi, R. Cory, M. Mistry, J. Peters, S. Schaal: Operational space control: A theoretical and emprical comparison, Int. J. Robotics Res. 27(6), 737–757 (2008)CrossRef
15.16
Zurück zum Zitat F.R. Reinhart, J.J. Steil: Attractor-based computation with reservoirs for online learning of inverse kinematics, Proc. Eur. Symp. Artif. Neural Netw. (2009) F.R. Reinhart, J.J. Steil: Attractor-based computation with reservoirs for online learning of inverse kinematics, Proc. Eur. Symp. Artif. Neural Netw. (2009)
15.17
Zurück zum Zitat J. Ting, M. Kalakrishnan, S. Vijayakumar, S. Schaal: Bayesian kernel shaping for learning control, Adv. Neural Inform. Process. Syst., Vol. 21 (2008) pp. 1673–1680 J. Ting, M. Kalakrishnan, S. Vijayakumar, S. Schaal: Bayesian kernel shaping for learning control, Adv. Neural Inform. Process. Syst., Vol. 21 (2008) pp. 1673–1680
15.18
Zurück zum Zitat J. Steffen, S. Klanke, S. Vijayakumar, H.J. Ritter: Realising dextrous manipulation with structured manifolds using unsupervised kernel regression with structural hints, ICRA 2009 Workshop: Approaches Sens. Learn. Humanoid Robots, Kobe (2009) J. Steffen, S. Klanke, S. Vijayakumar, H.J. Ritter: Realising dextrous manipulation with structured manifolds using unsupervised kernel regression with structural hints, ICRA 2009 Workshop: Approaches Sens. Learn. Humanoid Robots, Kobe (2009)
15.19
Zurück zum Zitat S. Klanke, D. Lebedev, R. Haschke, J.J. Steil, H. Ritter: Dynamic path planning for a 7-dof robot arm, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2006) S. Klanke, D. Lebedev, R. Haschke, J.J. Steil, H. Ritter: Dynamic path planning for a 7-dof robot arm, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2006)
15.20
Zurück zum Zitat A. Angelova, L. Matthies, D. Helmick, P. Perona: Slip prediction using visual information, Proc. Robotics Sci. Syst., Philadelphia (2006) A. Angelova, L. Matthies, D. Helmick, P. Perona: Slip prediction using visual information, Proc. Robotics Sci. Syst., Philadelphia (2006)
15.21
Zurück zum Zitat M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal: Learning locomotion over rough terrain using terrain templates, IEEE Int. Conf. Intell. Robots Syst. (2009) M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal: Learning locomotion over rough terrain using terrain templates, IEEE Int. Conf. Intell. Robots Syst. (2009)
15.22
Zurück zum Zitat N. Hawes, J.L. Wyatt, M. Sridharan, M. Kopicki, S. Hongeng, I. Calvert, A. Sloman, G.-J. Kruijff, H. Jacobsson, M. Brenner, D. Skočaj, A. Vrečko, N. Majer, M. Zillich: The playmate system, Cognit. Syst. 8, 367–393 (2010)CrossRef N. Hawes, J.L. Wyatt, M. Sridharan, M. Kopicki, S. Hongeng, I. Calvert, A. Sloman, G.-J. Kruijff, H. Jacobsson, M. Brenner, D. Skočaj, A. Vrečko, N. Majer, M. Zillich: The playmate system, Cognit. Syst. 8, 367–393 (2010)CrossRef
15.23
Zurück zum Zitat D. Skočaj, M. Kristan, A. Vrečko, A. Leonardis, M. Fritz, M. Stark, B. Schiele, S. Hongeng, J.L. Wyatt: Multi-modal learning, Cogn. Syst. 8, 265–309 (2010)CrossRef D. Skočaj, M. Kristan, A. Vrečko, A. Leonardis, M. Fritz, M. Stark, B. Schiele, S. Hongeng, J.L. Wyatt: Multi-modal learning, Cogn. Syst. 8, 265–309 (2010)CrossRef
15.24
Zurück zum Zitat O.J. Smith: A controller to overcome dead-time, Instrum. Soc. Am. J. 6, 28–33 (1959) O.J. Smith: A controller to overcome dead-time, Instrum. Soc. Am. J. 6, 28–33 (1959)
15.25
Zurück zum Zitat K.S. Narendra, A.M. Annaswamy: Stable Adaptive Systems (Prentice Hall, New Jersey 1989)MATH K.S. Narendra, A.M. Annaswamy: Stable Adaptive Systems (Prentice Hall, New Jersey 1989)MATH
15.26
Zurück zum Zitat S. Nicosia, P. Tomei: Model reference adaptive control algorithms for industrial robots, Automatica 20, 635–644 (1984)MATHCrossRef S. Nicosia, P. Tomei: Model reference adaptive control algorithms for industrial robots, Automatica 20, 635–644 (1984)MATHCrossRef
15.27
Zurück zum Zitat J.M. Maciejowski: Predictive Control with Constraints (Prentice Hall, New Jersey 2002)MATH J.M. Maciejowski: Predictive Control with Constraints (Prentice Hall, New Jersey 2002)MATH
15.28
Zurück zum Zitat R.S. Sutton: Dyna, an integrated architecture for learning, planning, and reacting, SIGART Bulletin 2(4), 160–163 (1991)CrossRef R.S. Sutton: Dyna, an integrated architecture for learning, planning, and reacting, SIGART Bulletin 2(4), 160–163 (1991)CrossRef
15.29
Zurück zum Zitat C.G. Atkeson, J. Morimoto: Nonparametric representation of policies and value functions: A trajectory-based approach, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) C.G. Atkeson, J. Morimoto: Nonparametric representation of policies and value functions: A trajectory-based approach, Adv. Neural Inform. Process. Syst., Vol. 15 (2002)
15.30
Zurück zum Zitat A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Proc. 11th Int. Symp. Exp. Robotics (2004) A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Proc. 11th Int. Symp. Exp. Robotics (2004)
15.31
Zurück zum Zitat C.E. Rasmussen, M. Kuss: Gaussian processes in reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 751–758 C.E. Rasmussen, M. Kuss: Gaussian processes in reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 751–758
15.32
Zurück zum Zitat A. Rottmann, W. Burgard: Adaptive autonomous control using online value iteration with Gaussian processes, Proc. IEEE Int. Conf. Robotics Autom. (2009) A. Rottmann, W. Burgard: Adaptive autonomous control using online value iteration with Gaussian processes, Proc. IEEE Int. Conf. Robotics Autom. (2009)
15.33
Zurück zum Zitat J.-J.E. Slotine, W. Li: Applied Nonlinear Control (Prentice Hall, Upper Saddle River 1991)MATH J.-J.E. Slotine, W. Li: Applied Nonlinear Control (Prentice Hall, Upper Saddle River 1991)MATH
15.34
Zurück zum Zitat A. De Luca, P. Lucibello: A general algorithm for dynamic feedback linearization of robots with elastic joints, Proc. IEEE Int. Conf. Robotics Autom. (1998) A. De Luca, P. Lucibello: A general algorithm for dynamic feedback linearization of robots with elastic joints, Proc. IEEE Int. Conf. Robotics Autom. (1998)
15.35
Zurück zum Zitat I. Jordan, D. Rumelhart: Forward models: Supervised learning with a distal teacher, Cognit. Sci. 16, 307–354 (1992)CrossRef I. Jordan, D. Rumelhart: Forward models: Supervised learning with a distal teacher, Cognit. Sci. 16, 307–354 (1992)CrossRef
15.36
Zurück zum Zitat D.M. Wolpert, M. Kawato: Multiple paired forward and inverse models for motor control, Neural Netw. 11, 1317–1329 (1998)CrossRef D.M. Wolpert, M. Kawato: Multiple paired forward and inverse models for motor control, Neural Netw. 11, 1317–1329 (1998)CrossRef
15.37
Zurück zum Zitat M. Kawato: Internal models for motor control and trajectory planning, Curr. Opin. Neurobiol. 9(6), 718–727 (1999)CrossRef M. Kawato: Internal models for motor control and trajectory planning, Curr. Opin. Neurobiol. 9(6), 718–727 (1999)CrossRef
15.38
Zurück zum Zitat D.M. Wolpert, R.C. Miall, M. Kawato: Internal models in the cerebellum, Trends Cogn. Sci. 2(9), 338–347 (1998)CrossRef D.M. Wolpert, R.C. Miall, M. Kawato: Internal models in the cerebellum, Trends Cogn. Sci. 2(9), 338–347 (1998)CrossRef
15.39
Zurück zum Zitat N. Bhushan, R. Shadmehr: Evidence for a forward dynamics model in human adaptive motor control, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 3–9 N. Bhushan, R. Shadmehr: Evidence for a forward dynamics model in human adaptive motor control, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 3–9
15.40
Zurück zum Zitat K. Narendra, J. Balakrishnan, M. Ciliz: Adaptation and learning using multiple models, switching and tuning, IEEE Control Syst, Mag. 15(3), 37–51 (1995) K. Narendra, J. Balakrishnan, M. Ciliz: Adaptation and learning using multiple models, switching and tuning, IEEE Control Syst, Mag. 15(3), 37–51 (1995)
15.41
15.42
Zurück zum Zitat M. Haruno, D.M. Wolpert, M. Kawato: Mosaic model for sensorimotor learning and control, Neural Comput. 13(10), 2201–2220 (2001)MATHCrossRef M. Haruno, D.M. Wolpert, M. Kawato: Mosaic model for sensorimotor learning and control, Neural Comput. 13(10), 2201–2220 (2001)MATHCrossRef
15.43
Zurück zum Zitat J. Peters, S. Schaal: Learning to control in operational space, Int. J. Robotics Res. 27(2), 197–212 (2008)CrossRef J. Peters, S. Schaal: Learning to control in operational space, Int. J. Robotics Res. 27(2), 197–212 (2008)CrossRef
15.45
Zurück zum Zitat R.M.C. De Keyser, A.R.V. Cauwenberghe: A self-tuning multistep predictor application, Automatica 17, 167–174 (1980)CrossRef R.M.C. De Keyser, A.R.V. Cauwenberghe: A self-tuning multistep predictor application, Automatica 17, 167–174 (1980)CrossRef
15.46
Zurück zum Zitat S.S. Billings, S. Chen, G. Korenberg: Identification of mimo nonlinear systems using a forward-regression orthogonal estimator, Int. J. Control 49, 2157–2189 (1989)MathSciNetMATHCrossRef S.S. Billings, S. Chen, G. Korenberg: Identification of mimo nonlinear systems using a forward-regression orthogonal estimator, Int. J. Control 49, 2157–2189 (1989)MathSciNetMATHCrossRef
15.47
15.48
Zurück zum Zitat J. Kocijan, R. Murray-Smith, C. Rasmussen, A. Girard: Gaussian process model based predictive control, Proc. Am. Control Conf. (2004) J. Kocijan, R. Murray-Smith, C. Rasmussen, A. Girard: Gaussian process model based predictive control, Proc. Am. Control Conf. (2004)
15.49
Zurück zum Zitat A. Girard, C.E. Rasmussen, J.Q. Candela, R.M. Smith: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) pp. 545–552 A. Girard, C.E. Rasmussen, J.Q. Candela, R.M. Smith: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) pp. 545–552
15.50
Zurück zum Zitat C.G. Atkeson, A. Moore, S. Stefan: Locally weighted learning for control, AI Review 11, 75–113 (1997) C.G. Atkeson, A. Moore, S. Stefan: Locally weighted learning for control, AI Review 11, 75–113 (1997)
15.51
Zurück zum Zitat L. Ljung: System Identification – Theory for the User (Prentice-Hall, New Jersey 2004)MATH L. Ljung: System Identification – Theory for the User (Prentice-Hall, New Jersey 2004)MATH
15.52
Zurück zum Zitat S. Haykin: Neural Networks: A Comprehensive Foundation (Prentice Hall, New Jersey 1999)MATH S. Haykin: Neural Networks: A Comprehensive Foundation (Prentice Hall, New Jersey 1999)MATH
15.53
Zurück zum Zitat J.J. Steil: Backpropagation-decorrelation: Online recurrent learning with O(N) complexity, Proc. Int. Jt. Conf. Neural Netw. (2004) J.J. Steil: Backpropagation-decorrelation: Online recurrent learning with O(N) complexity, Proc. Int. Jt. Conf. Neural Netw. (2004)
15.54
Zurück zum Zitat C.E. Rasmussen, C.K. Williams: Gaussian Processes for Machine Learning (MIT, Cambridge 2006)MATH C.E. Rasmussen, C.K. Williams: Gaussian Processes for Machine Learning (MIT, Cambridge 2006)MATH
15.55
Zurück zum Zitat B. Schölkopf, A. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond (MIT, Cambridge 2002) B. Schölkopf, A. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond (MIT, Cambridge 2002)
15.56
Zurück zum Zitat K.J. Aström, B. Wittenmark: Adaptive Control (Addison Wesley, Boston 1995)MATH K.J. Aström, B. Wittenmark: Adaptive Control (Addison Wesley, Boston 1995)MATH
15.57
Zurück zum Zitat F.J. Coito, J.M. Lemos: A long-range adaptive controller for robot manipulators, Int. J. Robotics Res. 10, 684–707 (1991)CrossRef F.J. Coito, J.M. Lemos: A long-range adaptive controller for robot manipulators, Int. J. Robotics Res. 10, 684–707 (1991)CrossRef
15.58
Zurück zum Zitat P. Vempaty, K. Cheok, R. Loh: Model reference adaptive control for actuators of a biped robot locomotion, Proc. World Congr. Eng. Comput. Sci. (2009) P. Vempaty, K. Cheok, R. Loh: Model reference adaptive control for actuators of a biped robot locomotion, Proc. World Congr. Eng. Comput. Sci. (2009)
15.59
Zurück zum Zitat J.R. Layne, K.M. Passino: Fuzzy model reference learning control, J. Intell. Fuzzy Syst. 4, 33–47 (1996)CrossRef J.R. Layne, K.M. Passino: Fuzzy model reference learning control, J. Intell. Fuzzy Syst. 4, 33–47 (1996)CrossRef
15.60
Zurück zum Zitat J. Nakanishi, J.A. Farrell, S. Schaal: Composite adaptive control with locally weighted statistical learning, Neural Netw. 18(1), 71–90 (2005)MATHCrossRef J. Nakanishi, J.A. Farrell, S. Schaal: Composite adaptive control with locally weighted statistical learning, Neural Netw. 18(1), 71–90 (2005)MATHCrossRef
15.61
Zurück zum Zitat J.J. Craig: Introduction to Robotics: Mechanics and Control (Prentice Hall, Upper Saddle River 2004) J.J. Craig: Introduction to Robotics: Mechanics and Control (Prentice Hall, Upper Saddle River 2004)
15.62
Zurück zum Zitat M.W. Spong, S. Hutchinson, M. Vidyasagar: Robot Dynamics and Control (Wiley, New York 2006) M.W. Spong, S. Hutchinson, M. Vidyasagar: Robot Dynamics and Control (Wiley, New York 2006)
15.63
Zurück zum Zitat S. Schaal, C.G. Atkeson, S. Vijayakumar: Scalable techniques from nonparametric statistics for real-time robot learning, Appl. Intell. 17(1), 49–60 (2002)MATHCrossRef S. Schaal, C.G. Atkeson, S. Vijayakumar: Scalable techniques from nonparametric statistics for real-time robot learning, Appl. Intell. 17(1), 49–60 (2002)MATHCrossRef
15.64
Zurück zum Zitat H. Cao, Y. Yin, D. Du, L. Lin, W. Gu, Z. Yang: Neural network inverse dynamic online learning control on physical exoskeleton, 13th Int. Conf. Neural Inform. Process. (2006) H. Cao, Y. Yin, D. Du, L. Lin, W. Gu, Z. Yang: Neural network inverse dynamic online learning control on physical exoskeleton, 13th Int. Conf. Neural Inform. Process. (2006)
15.65
Zurück zum Zitat C.G. Atkeson, C.H. An, J.M. Hollerbach: Estimation of inertial parameters of manipulator loads and links, Int. J. Robotics Res. 5(3), 101–119 (1986)CrossRef C.G. Atkeson, C.H. An, J.M. Hollerbach: Estimation of inertial parameters of manipulator loads and links, Int. J. Robotics Res. 5(3), 101–119 (1986)CrossRef
15.66
Zurück zum Zitat E. Burdet, B. Sprenger, A. Codourey: Experiments in nonlinear adaptive control, Int. Conf. Robotics Autom. 1, 537–542 (1997)CrossRef E. Burdet, B. Sprenger, A. Codourey: Experiments in nonlinear adaptive control, Int. Conf. Robotics Autom. 1, 537–542 (1997)CrossRef
15.67
Zurück zum Zitat E. Burdet, A. Codourey: Evaluation of parametric and nonparametric nonlinear adaptive controllers, Robotica 16(1), 59–73 (1998)CrossRef E. Burdet, A. Codourey: Evaluation of parametric and nonparametric nonlinear adaptive controllers, Robotica 16(1), 59–73 (1998)CrossRef
15.68
15.69
Zurück zum Zitat H.D. Patino, R. Carelli, B.R. Kuchen: Neural networks for advanced control of robot manipulators, IEEE Trans. Neural Netw. 13(2), 343–354 (2002)CrossRef H.D. Patino, R. Carelli, B.R. Kuchen: Neural networks for advanced control of robot manipulators, IEEE Trans. Neural Netw. 13(2), 343–354 (2002)CrossRef
15.70
Zurück zum Zitat D. Nguyen-Tuong, J. Peters: Incremental sparsification for real-time online model learning, Neurocomputing 74(11), 1859–1867 (2011)CrossRef D. Nguyen-Tuong, J. Peters: Incremental sparsification for real-time online model learning, Neurocomputing 74(11), 1859–1867 (2011)CrossRef
15.71
Zurück zum Zitat D. Nguyen-Tuong, J. Peters: Using model knowledge for learning inverse dynamics, Proc. IEEE Int. Conf. Robotics Autom. (2010) D. Nguyen-Tuong, J. Peters: Using model knowledge for learning inverse dynamics, Proc. IEEE Int. Conf. Robotics Autom. (2010)
15.72
Zurück zum Zitat S.S. Ge, T.H. Lee, E.G. Tan: Adaptive neural network control of flexible joint robots based on feedback linearization, Int. J. Syst. Sci. 29(6), 623–635 (1998)CrossRef S.S. Ge, T.H. Lee, E.G. Tan: Adaptive neural network control of flexible joint robots based on feedback linearization, Int. J. Syst. Sci. 29(6), 623–635 (1998)CrossRef
15.73
Zurück zum Zitat C.M. Chow, A.G. Kuznetsov, D.W. Clarke: Successive one-step-ahead predictions in multiple model predictive control, Int. J. Control 29, 971–979 (1998)MATH C.M. Chow, A.G. Kuznetsov, D.W. Clarke: Successive one-step-ahead predictions in multiple model predictive control, Int. J. Control 29, 971–979 (1998)MATH
15.74
Zurück zum Zitat M. Kawato: Feedback error learning neural network for supervised motor learning. In: Advanced Neural Computers, ed. by R. Eckmiller (Elsevier, North-Holland, Amsterdam 1990) pp. 365–372 M. Kawato: Feedback error learning neural network for supervised motor learning. In: Advanced Neural Computers, ed. by R. Eckmiller (Elsevier, North-Holland, Amsterdam 1990) pp. 365–372
15.75
Zurück zum Zitat J. Nakanishi, S. Schaal: Feedback error learning and nonlinear adaptive control, Neural Netw. 17(10), 1453–1465 (2004)MATHCrossRef J. Nakanishi, S. Schaal: Feedback error learning and nonlinear adaptive control, Neural Netw. 17(10), 1453–1465 (2004)MATHCrossRef
15.76
Zurück zum Zitat T. Shibata, C. Schaal: Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks, Neural Netw. 14(2), 201–216 (2001)CrossRef T. Shibata, C. Schaal: Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks, Neural Netw. 14(2), 201–216 (2001)CrossRef
15.77
Zurück zum Zitat H. Miyamoto, M. Kawato, T. Setoyama, R. Suzuki: Feedback-error-learning neural network for trajectory control of a robotic manipulator, Neural Netw. 1(3), 251–265 (1988)CrossRef H. Miyamoto, M. Kawato, T. Setoyama, R. Suzuki: Feedback-error-learning neural network for trajectory control of a robotic manipulator, Neural Netw. 1(3), 251–265 (1988)CrossRef
15.78
Zurück zum Zitat H. Gomi, M. Kawato: Recognition of manipulated objects by motor learning with modular architecture networks, Neural Netw. 6(4), 485–497 (1993)CrossRef H. Gomi, M. Kawato: Recognition of manipulated objects by motor learning with modular architecture networks, Neural Netw. 6(4), 485–497 (1993)CrossRef
15.79
Zurück zum Zitat A. D'Souza, S. Vijayakumar, S. Schaal: Learning inverse kinematics, IEEE Int. Conf. Intell. Robots Syst. (2001) A. D'Souza, S. Vijayakumar, S. Schaal: Learning inverse kinematics, IEEE Int. Conf. Intell. Robots Syst. (2001)
15.80
Zurück zum Zitat S. Vijayakumar, S. Schaal: Locally weighted projection regression: An O(N) algorithm for incremental real time learning in high dimensional space, Proc. 16th Int. Conf. Mach. Learn. (2000) S. Vijayakumar, S. Schaal: Locally weighted projection regression: An O(N) algorithm for incremental real time learning in high dimensional space, Proc. 16th Int. Conf. Mach. Learn. (2000)
15.81
Zurück zum Zitat M. Toussaint, S. Vijayakumar: Learning discontinuities with products-of-sigmoids for switching between local models, Proc. 22nd Int. Conf. Mach. Learn. (2005) M. Toussaint, S. Vijayakumar: Learning discontinuities with products-of-sigmoids for switching between local models, Proc. 22nd Int. Conf. Mach. Learn. (2005)
15.82
Zurück zum Zitat J. Tenenbaum, V. de Silva, J. Langford: A global geometric framework for nonlinear dimensionality reduction, Science 290, 2319–2323 (2000)CrossRef J. Tenenbaum, V. de Silva, J. Langford: A global geometric framework for nonlinear dimensionality reduction, Science 290, 2319–2323 (2000)CrossRef
15.83
Zurück zum Zitat S. Roweis, L. Saul: Nonlinear dimensionality reduction by locally linear embedding, Science 290, 2323 (2000)CrossRef S. Roweis, L. Saul: Nonlinear dimensionality reduction by locally linear embedding, Science 290, 2323 (2000)CrossRef
15.84
Zurück zum Zitat H. Hoffman, S. Schaal, S. Vijayakumar: Local dimensionality reduction for non-parametric regression, Neural Process. Lett. 29(2), 109–131 (2009)CrossRef H. Hoffman, S. Schaal, S. Vijayakumar: Local dimensionality reduction for non-parametric regression, Neural Process. Lett. 29(2), 109–131 (2009)CrossRef
15.85
Zurück zum Zitat S. Thrun, T. Mitchell: Lifelong robot learning, Robotics Auton. Syst. 15, 25–46 (1995)CrossRef S. Thrun, T. Mitchell: Lifelong robot learning, Robotics Auton. Syst. 15, 25–46 (1995)CrossRef
15.86
Zurück zum Zitat Y. Engel, S. Mannor, R. Meir: Sparse online greedy support vector regression, Eur. Conf. Mach. Learn. (2002) Y. Engel, S. Mannor, R. Meir: Sparse online greedy support vector regression, Eur. Conf. Mach. Learn. (2002)
15.87
15.88
Zurück zum Zitat C.E. Rasmussen: Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression (University of Toronto, Toronto 1996) C.E. Rasmussen: Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression (University of Toronto, Toronto 1996)
15.89
Zurück zum Zitat L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines (MIT, Cambridge 2007)CrossRef L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines (MIT, Cambridge 2007)CrossRef
15.90
Zurück zum Zitat J.Q. Candela, C.E. Rasmussen: A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6, 1939–1959 (2005)MathSciNetMATH J.Q. Candela, C.E. Rasmussen: A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6, 1939–1959 (2005)MathSciNetMATH
15.91
Zurück zum Zitat R. Genov, S. Chakrabartty, G. Cauwenberghs: Silicon support vector machine with online learning, Int. J. Pattern Recognit. Articial Intell. 17, 385–404 (2003)CrossRef R. Genov, S. Chakrabartty, G. Cauwenberghs: Silicon support vector machine with online learning, Int. J. Pattern Recognit. Articial Intell. 17, 385–404 (2003)CrossRef
15.92
Zurück zum Zitat S. Vijayakumar, A. D'Souza, S. Schaal: Incremental online learning in high dimensions, Neural Comput. 12(11), 2602–2634 (2005)MathSciNetCrossRef S. Vijayakumar, A. D'Souza, S. Schaal: Incremental online learning in high dimensions, Neural Comput. 12(11), 2602–2634 (2005)MathSciNetCrossRef
15.93
Zurück zum Zitat B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior knowledge in support vector kernel, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 640–646 B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior knowledge in support vector kernel, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 640–646
15.94
Zurück zum Zitat E. Krupka, N. Tishby: Incorporating prior knowledge on features into learning, Int. Conf. Artif. Intell. Stat. (San Juan, Puerto Rico 2007) E. Krupka, N. Tishby: Incorporating prior knowledge on features into learning, Int. Conf. Artif. Intell. Stat. (San Juan, Puerto Rico 2007)
15.95
Zurück zum Zitat A. Smola, T. Friess, B. Schoelkopf: Semiparametric support vector and linear programming machines, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 585–591 A. Smola, T. Friess, B. Schoelkopf: Semiparametric support vector and linear programming machines, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 585–591
15.96
Zurück zum Zitat B.J. Kröse, N. Vlassis, R. Bunschoten, Y. Motomura: A probabilistic model for appearance-based robot localization, Image Vis. Comput. 19, 381–391 (2001)CrossRef B.J. Kröse, N. Vlassis, R. Bunschoten, Y. Motomura: A probabilistic model for appearance-based robot localization, Image Vis. Comput. 19, 381–391 (2001)CrossRef
15.97
Zurück zum Zitat M.K. Titsias, N.D. Lawrence: Bayesian Gaussian process latent variable model, Proc. 13th Int. Conf. Artif. Intell. Stat. (2010) M.K. Titsias, N.D. Lawrence: Bayesian Gaussian process latent variable model, Proc. 13th Int. Conf. Artif. Intell. Stat. (2010)
15.98
Zurück zum Zitat R. Jacobs, M. Jordan, S. Nowlan, G.E. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)CrossRef R. Jacobs, M. Jordan, S. Nowlan, G.E. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)CrossRef
15.99
Zurück zum Zitat S. Calinon, F. D'halluin, E. Sauser, D. Caldwell, A. Billard: A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation, IEEE Robotics Autom. Mag. 17, 44–54 (2010)CrossRef S. Calinon, F. D'halluin, E. Sauser, D. Caldwell, A. Billard: A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation, IEEE Robotics Autom. Mag. 17, 44–54 (2010)CrossRef
15.100
Zurück zum Zitat V. Treps: A bayesian committee machine, Neural Comput. 12(11), 2719–2741 (2000)CrossRef V. Treps: A bayesian committee machine, Neural Comput. 12(11), 2719–2741 (2000)CrossRef
15.101
Zurück zum Zitat L. Csato, M. Opper: Sparse online Gaussian processes, Neural Comput. 14(3), 641–668 (2002)MATHCrossRef L. Csato, M. Opper: Sparse online Gaussian processes, Neural Comput. 14(3), 641–668 (2002)MATHCrossRef
15.102
Zurück zum Zitat D.H. Grollman, O.C. Jenkins: Sparse incremental learning for interactive robot control policy estimation, IEEE Int. Conf. Robotics Autom., Pasadena (2008) D.H. Grollman, O.C. Jenkins: Sparse incremental learning for interactive robot control policy estimation, IEEE Int. Conf. Robotics Autom., Pasadena (2008)
15.103
Zurück zum Zitat M. Seeger: Gaussian processes for machine learning, Int. J. Neural Syst. 14(2), 69–106 (2004)CrossRef M. Seeger: Gaussian processes for machine learning, Int. J. Neural Syst. 14(2), 69–106 (2004)CrossRef
15.104
Zurück zum Zitat C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy, W. Burgard: Learning predictive terrain models for legged robot locomotion, Proc. IEEE Int. Conf. Intell. Robots Syst. (2008) C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy, W. Burgard: Learning predictive terrain models for legged robot locomotion, Proc. IEEE Int. Conf. Intell. Robots Syst. (2008)
15.105
Zurück zum Zitat J. Ko, D. Fox: GP-bayesfilters: Bayesian filtering using Gaussian process prediction and observation models, Auton. Robots 27(1), 75–90 (2009)CrossRef J. Ko, D. Fox: GP-bayesfilters: Bayesian filtering using Gaussian process prediction and observation models, Auton. Robots 27(1), 75–90 (2009)CrossRef
15.106
Zurück zum Zitat J.P. Ferreira, M. Crisostomo, A.P. Coimbra, B. Ribeiro: Simulation control of a biped robot with support vector regression, IEEE Int. Symp. Intell. Signal Process. (2007) J.P. Ferreira, M. Crisostomo, A.P. Coimbra, B. Ribeiro: Simulation control of a biped robot with support vector regression, IEEE Int. Symp. Intell. Signal Process. (2007)
15.107
Zurück zum Zitat R. Pelossof, A. Miller, P. Allen, T. Jebara: An SVM learning approach to robotic grasping, IEEE Int. Conf. Robotics Autom. (2004) R. Pelossof, A. Miller, P. Allen, T. Jebara: An SVM learning approach to robotic grasping, IEEE Int. Conf. Robotics Autom. (2004)
15.108
Zurück zum Zitat J. Ma, J. Theiler, S. Perkins: Accurate on-line support vector regression, Neural Comput. 15, 2683–2703 (2005)MATHCrossRef J. Ma, J. Theiler, S. Perkins: Accurate on-line support vector regression, Neural Comput. 15, 2683–2703 (2005)MATHCrossRef
15.109
Zurück zum Zitat Y. Choi, S.Y. Cheong, N. Schweighofer: Local online support vector regression for learning control, Proc. IEEE Int. Symp. Comput. Intell. Robotics Autom. (2007) Y. Choi, S.Y. Cheong, N. Schweighofer: Local online support vector regression for learning control, Proc. IEEE Int. Symp. Comput. Intell. Robotics Autom. (2007)
15.110
Zurück zum Zitat J.-A. Ting, A. D'Souza, S. Schaal: Bayesian robot system identification with input and output noise, Neural Netw. 24(1), 99–108 (2011)MATHCrossRef J.-A. Ting, A. D'Souza, S. Schaal: Bayesian robot system identification with input and output noise, Neural Netw. 24(1), 99–108 (2011)MATHCrossRef
15.111
Zurück zum Zitat S. Nowlan, G.E. Hinton: Evaluation of adaptive mixtures of competing experts, Adv. Neural Inform. Process. Syst., Vol. 3 (1991) pp. 774–780 S. Nowlan, G.E. Hinton: Evaluation of adaptive mixtures of competing experts, Adv. Neural Inform. Process. Syst., Vol. 3 (1991) pp. 774–780
15.112
Zurück zum Zitat V. Treps: Mixtures of Gaussian processes, Adv. Neural Inform. Process. Syst., Vol. 13 (2001) pp. 654–660 V. Treps: Mixtures of Gaussian processes, Adv. Neural Inform. Process. Syst., Vol. 13 (2001) pp. 654–660
15.113
Zurück zum Zitat C.E. Rasmussen, Z. Ghahramani: Infinite mixtures of Gaussian process experts, Adv. Neural Inform. Process. Syst., Vol. 14 (2002) pp. 881–888 C.E. Rasmussen, Z. Ghahramani: Infinite mixtures of Gaussian process experts, Adv. Neural Inform. Process. Syst., Vol. 14 (2002) pp. 881–888
15.114
Zurück zum Zitat T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning (Springer, New York, 2001)MATHCrossRef T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning (Springer, New York, 2001)MATHCrossRef
15.115
Zurück zum Zitat W.K. Haerdle, M. Mueller, S. Sperlich, A. Werwatz: Nonparametric and Semiparametric Models (Springer, New York 2004)CrossRef W.K. Haerdle, M. Mueller, S. Sperlich, A. Werwatz: Nonparametric and Semiparametric Models (Springer, New York 2004)CrossRef
15.116
Zurück zum Zitat D.J. MacKay: A practical Bayesian framework for back-propagation networks, Computation 4(3), 448–472 (1992) D.J. MacKay: A practical Bayesian framework for back-propagation networks, Computation 4(3), 448–472 (1992)
15.117
Zurück zum Zitat R.M. Neal: Bayesian Learning for Neural Networks, Lecture Notes in Statistics, Vol. 118 (Springer, New York 1996)MATHCrossRef R.M. Neal: Bayesian Learning for Neural Networks, Lecture Notes in Statistics, Vol. 118 (Springer, New York 1996)MATHCrossRef
15.118
Zurück zum Zitat B. Schölkopf, A.J. Smola, R. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)CrossRef B. Schölkopf, A.J. Smola, R. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)CrossRef
15.119
Zurück zum Zitat C. Plagemann, K. Kersting, P. Pfaff, W. Burgard: Heteroscedastic Gaussian process regression for modeling range sensors in mobile robotics, Snowbird Learn. Workshop (2007) C. Plagemann, K. Kersting, P. Pfaff, W. Burgard: Heteroscedastic Gaussian process regression for modeling range sensors in mobile robotics, Snowbird Learn. Workshop (2007)
15.120
Zurück zum Zitat W.S. Cleveland, C.L. Loader: Smoothing by local regression: Principles and methods. In: Statistical Theory and Computational Aspects of Smoothing, ed. by W. Härdle, M.G. Schimele (Physica, Heidelberg 1996) W.S. Cleveland, C.L. Loader: Smoothing by local regression: Principles and methods. In: Statistical Theory and Computational Aspects of Smoothing, ed. by W. Härdle, M.G. Schimele (Physica, Heidelberg 1996)
15.121
Zurück zum Zitat J. Fan, I. Gijbels: Local Polynomial Modelling and Its Applications (Chapman Hall, New York 1996)MATH J. Fan, I. Gijbels: Local Polynomial Modelling and Its Applications (Chapman Hall, New York 1996)MATH
15.122
Zurück zum Zitat J. Fan, I. Gijbels: Data driven bandwidth selection in local polynomial fitting, J. R. Stat. Soc. 57(2), 371–394 (1995)MATH J. Fan, I. Gijbels: Data driven bandwidth selection in local polynomial fitting, J. R. Stat. Soc. 57(2), 371–394 (1995)MATH
15.123
Zurück zum Zitat A. Moore, M.S. Lee: Efficient algorithms for minimizing cross validation error, Proc. 11th Int. Conf. Mach. Learn. (1994) A. Moore, M.S. Lee: Efficient algorithms for minimizing cross validation error, Proc. 11th Int. Conf. Mach. Learn. (1994)
15.124
Zurück zum Zitat A. Moore: Fast, robust adaptive control by learning only forward models, Adv. Neural Inform. Process. Syst., Vol. 4 (1992) pp. 571–578 A. Moore: Fast, robust adaptive control by learning only forward models, Adv. Neural Inform. Process. Syst., Vol. 4 (1992) pp. 571–578
15.125
Zurück zum Zitat C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning for control, Artif. Intell. Rev. 11, 75–113 (1997)CrossRef C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning for control, Artif. Intell. Rev. 11, 75–113 (1997)CrossRef
15.126
Zurück zum Zitat G. Tevatia, S. Schaal: Efficient Inverse Kinematics Algorithms for High-Dimensional Movement Systems (University of Southern California, Los Angeles 2008) G. Tevatia, S. Schaal: Efficient Inverse Kinematics Algorithms for High-Dimensional Movement Systems (University of Southern California, Los Angeles 2008)
15.127
Zurück zum Zitat C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning, Artif. Intell. Rev. 11(1–5), 11–73 (1997)CrossRef C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning, Artif. Intell. Rev. 11(1–5), 11–73 (1997)CrossRef
15.128
Zurück zum Zitat N.U. Edakunni, S. Schaal, S. Vijayakumar: Kernel carpentry for online regression using randomly varying coefficient model, Proc. 20th Int. Jt. Conf. Artif. Intell. (2007) N.U. Edakunni, S. Schaal, S. Vijayakumar: Kernel carpentry for online regression using randomly varying coefficient model, Proc. 20th Int. Jt. Conf. Artif. Intell. (2007)
15.129
Zurück zum Zitat D.H. Jacobson, D.Q. Mayne: Differential Dynamic Programming (American Elsevier, New York 1973)MATH D.H. Jacobson, D.Q. Mayne: Differential Dynamic Programming (American Elsevier, New York 1973)MATH
15.130
Zurück zum Zitat C.G. Atkeson, S. Schaal: Robot learning from demonstration, Proc. 14th Int. Conf. Mach. Learn. (1997) C.G. Atkeson, S. Schaal: Robot learning from demonstration, Proc. 14th Int. Conf. Mach. Learn. (1997)
15.131
Zurück zum Zitat J. Morimoto, G. Zeglin, C.G. Atkeson: Minimax differential dynamic programming: Application to a biped walking robot, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2003) J. Morimoto, G. Zeglin, C.G. Atkeson: Minimax differential dynamic programming: Application to a biped walking robot, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2003)
15.132
Zurück zum Zitat P. Abbeel, A. Coates, M. Quigley, A.Y. Ng: An application of reinforcement learning to aerobatic helicopter flight, Adv. Neural Inform. Process. Syst., Vol. 19 (2007) pp. 1–8 P. Abbeel, A. Coates, M. Quigley, A.Y. Ng: An application of reinforcement learning to aerobatic helicopter flight, Adv. Neural Inform. Process. Syst., Vol. 19 (2007) pp. 1–8
15.133
Zurück zum Zitat P.W. Glynn: Likelihood ratio gradient estimation: An overview, Proc. Winter Simul. Conf. (1987) P.W. Glynn: Likelihood ratio gradient estimation: An overview, Proc. Winter Simul. Conf. (1987)
15.134
Zurück zum Zitat A.Y. Ng, M. Jordan: Pegasus: A policy search method for large MDPs and POMDPs, Proc. 16th Conf. Uncertain. Artif. Intell. (2000) A.Y. Ng, M. Jordan: Pegasus: A policy search method for large MDPs and POMDPs, Proc. 16th Conf. Uncertain. Artif. Intell. (2000)
15.135
Zurück zum Zitat B.M. Akesson, H.T. Toivonen: A neural network model predictive controller, J. Process Control 16(9), 937–946 (2006)CrossRef B.M. Akesson, H.T. Toivonen: A neural network model predictive controller, J. Process Control 16(9), 937–946 (2006)CrossRef
15.136
Zurück zum Zitat D. Gu, H. Hu: Predictive control for a car-like mobile robot, Robotics Auton. Syst. 39, 73–86 (2002)CrossRef D. Gu, H. Hu: Predictive control for a car-like mobile robot, Robotics Auton. Syst. 39, 73–86 (2002)CrossRef
15.137
Zurück zum Zitat E.A. Wan, A.A. Bogdanov: Model predictive neural control with applications to a 6 DOF helicopter model, Proc. Am. Control Conf. (2001) E.A. Wan, A.A. Bogdanov: Model predictive neural control with applications to a 6 DOF helicopter model, Proc. Am. Control Conf. (2001)
15.138
Zurück zum Zitat O. Khatib: A unified approach for motion and force control of robot manipulators: The operational space formulation, J. Robotics Autom. 3(1), 43–53 (1987)CrossRef O. Khatib: A unified approach for motion and force control of robot manipulators: The operational space formulation, J. Robotics Autom. 3(1), 43–53 (1987)CrossRef
15.139
Zurück zum Zitat J. Peters, M. Mistry, F.E. Udwadia, J. Nakanishi, S. Schaal: A unifying methodology for robot control with redundant dofs, Auton. Robots 24(1), 1–12 (2008)CrossRef J. Peters, M. Mistry, F.E. Udwadia, J. Nakanishi, S. Schaal: A unifying methodology for robot control with redundant dofs, Auton. Robots 24(1), 1–12 (2008)CrossRef
15.140
Zurück zum Zitat C. Salaun, V. Padois, O. Sigaud: Control of redundant robots using learned models: An operational space control approach, Proc. IEEE Int. Conf. Intell. Robots Syst. (2009) C. Salaun, V. Padois, O. Sigaud: Control of redundant robots using learned models: An operational space control approach, Proc. IEEE Int. Conf. Intell. Robots Syst. (2009)
15.141
Zurück zum Zitat F.R. Reinhart, J.J. Steil: Recurrent neural associative learning of forward and inverse kinematics for movement generation of the redundant PA-10 robot, Symp. Learn. Adapt. Behav. Robotics Syst. (2008) F.R. Reinhart, J.J. Steil: Recurrent neural associative learning of forward and inverse kinematics for movement generation of the redundant PA-10 robot, Symp. Learn. Adapt. Behav. Robotics Syst. (2008)
15.142
Zurück zum Zitat J.Q. Candela, C.E. Rasmussen, C.K. Williams: Large Scale Kernel Machines (MIT, Cambridge 2007) J.Q. Candela, C.E. Rasmussen, C.K. Williams: Large Scale Kernel Machines (MIT, Cambridge 2007)
15.143
Zurück zum Zitat S. Ben-David, R. Schuller: Exploiting task relatedness for multiple task learning, Proc. Conf. Learn. Theory (2003) S. Ben-David, R. Schuller: Exploiting task relatedness for multiple task learning, Proc. Conf. Learn. Theory (2003)
15.144
Zurück zum Zitat I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH
15.145
Zurück zum Zitat O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)CrossRef O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)CrossRef
15.146
Zurück zum Zitat J.D. Lafferty, A. McCallum, F.C.N. Pereira: Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th Int. Conf. Mach. Learn. (2001) J.D. Lafferty, A. McCallum, F.C.N. Pereira: Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th Int. Conf. Mach. Learn. (2001)
15.147
Zurück zum Zitat K. Muelling, J. Kober, O. Kroemer, J. Peters: Learning to select and generalize striking movements in robot table tennis, Int. J. Robotics Res. 32(3), 263–279 (2012)CrossRef K. Muelling, J. Kober, O. Kroemer, J. Peters: Learning to select and generalize striking movements in robot table tennis, Int. J. Robotics Res. 32(3), 263–279 (2012)CrossRef
15.148
Zurück zum Zitat S. Mahadevan, J. Connell: Automatic programming of behavior-based robots using reinforcement learning, Artif. Intell. 55(2/3), 311–365 (1992)CrossRef S. Mahadevan, J. Connell: Automatic programming of behavior-based robots using reinforcement learning, Artif. Intell. 55(2/3), 311–365 (1992)CrossRef
15.149
Zurück zum Zitat V. Gullapalli, J.A. Franklin, H. Benbrahim: Acquiring robot skills via reinforcement learning, IEEE Control Syst. Mag. 14(1), 13–24 (1994)CrossRef V. Gullapalli, J.A. Franklin, H. Benbrahim: Acquiring robot skills via reinforcement learning, IEEE Control Syst. Mag. 14(1), 13–24 (1994)CrossRef
15.150
Zurück zum Zitat J.A. Bagnell, J.C. Schneider: Autonomous helicopter control using reinforcement learning policy search methods, IEEE Int. Conf. Robotics Autom. (2001) J.A. Bagnell, J.C. Schneider: Autonomous helicopter control using reinforcement learning policy search methods, IEEE Int. Conf. Robotics Autom. (2001)
15.151
Zurück zum Zitat S. Schaal: Learning from demonstration, Adv. Neural Inform. Process. Syst., Vol. 9 (1996) pp. 1040–1046 S. Schaal: Learning from demonstration, Adv. Neural Inform. Process. Syst., Vol. 9 (1996) pp. 1040–1046
15.152
Zurück zum Zitat W. B. Powell: AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization, Tech. Rep. (Princeton University, Princeton 2012) W. B. Powell: AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization, Tech. Rep. (Princeton University, Princeton 2012)
15.153
Zurück zum Zitat C.G. Atkeson: Nonparametric model-based reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 1008–1014 C.G. Atkeson: Nonparametric model-based reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 1008–1014
15.154
Zurück zum Zitat A. Coates, P. Abbeel, A.Y. Ng: Apprenticeship learning for helicopter control, Communication ACM 52(7), 97–105 (2009)CrossRef A. Coates, P. Abbeel, A.Y. Ng: Apprenticeship learning for helicopter control, Communication ACM 52(7), 97–105 (2009)CrossRef
15.155
Zurück zum Zitat R.S. Sutton, A.G. Barto, R.J. Williams: Reinforcement learning is direct adaptive optimal control, Am. Control Conf. (1991) R.S. Sutton, A.G. Barto, R.J. Williams: Reinforcement learning is direct adaptive optimal control, Am. Control Conf. (1991)
15.156
Zurück zum Zitat A.D. Laud: Theory and Application of Reward Shaping in Reinforcement Learning (University of Illinois, Urbana-Champaign 2004) A.D. Laud: Theory and Application of Reward Shaping in Reinforcement Learning (University of Illinois, Urbana-Champaign 2004)
15.157
Zurück zum Zitat M.P. Deisenrot, C.E. Rasmussen: PILCO: A model-based and data-efficient approach to policy search, 28th Int. Conf. Mach. Learn. (2011) M.P. Deisenrot, C.E. Rasmussen: PILCO: A model-based and data-efficient approach to policy search, 28th Int. Conf. Mach. Learn. (2011)
15.158
Zurück zum Zitat H. Miyamoto, S. Schaal, F. Gandolfo, H. Gomi, Y. Koike, R. Osu, E. Nakano, Y. Wada, M. Kawato: A Kendama learning robot based on bidirectional theory, Neural Netw. 9(8), 1281–1302 (1996)CrossRef H. Miyamoto, S. Schaal, F. Gandolfo, H. Gomi, Y. Koike, R. Osu, E. Nakano, Y. Wada, M. Kawato: A Kendama learning robot based on bidirectional theory, Neural Netw. 9(8), 1281–1302 (1996)CrossRef
15.159
Zurück zum Zitat N. Kohl, P. Stone: Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE Int. Conf. Robotics Autom. (2004) N. Kohl, P. Stone: Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE Int. Conf. Robotics Autom. (2004)
15.160
Zurück zum Zitat R. Tedrake, T.W. Zhang, H.S. Seung: Learning to walk in 20 minutes, Yale Workshop Adapt. Learn. Syst. (2005) R. Tedrake, T.W. Zhang, H.S. Seung: Learning to walk in 20 minutes, Yale Workshop Adapt. Learn. Syst. (2005)
15.161
Zurück zum Zitat J. Peters, S. Schaal: Reinforcement learning of motor skills with policy gradients, Neural Netw. 21(4), 682–697 (2008)CrossRef J. Peters, S. Schaal: Reinforcement learning of motor skills with policy gradients, Neural Netw. 21(4), 682–697 (2008)CrossRef
15.162
Zurück zum Zitat J. Peters, S. Schaal: Natural actor-critic, Neurocomputing 71(7–9), 1180–1190 (2008)CrossRef J. Peters, S. Schaal: Natural actor-critic, Neurocomputing 71(7–9), 1180–1190 (2008)CrossRef
15.163
Zurück zum Zitat J. Kober, J. Peters: Policy search for motor primitives in robotics, Adv. Neural Inform. Process. Syst., Vol. 21 (2009) pp. 849–856 J. Kober, J. Peters: Policy search for motor primitives in robotics, Adv. Neural Inform. Process. Syst., Vol. 21 (2009) pp. 849–856
15.164
Zurück zum Zitat M.P. Deisenroth, C.E. Rasmussen, D. Fox: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems VII, ed. by H. Durrand-Whyte, N. Roy, P. Abbeel (MIT, Cambridge 2011) M.P. Deisenroth, C.E. Rasmussen, D. Fox: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems VII, ed. by H. Durrand-Whyte, N. Roy, P. Abbeel (MIT, Cambridge 2011)
15.165
Zurück zum Zitat L.P. Kaelbling, M.L. Littman, A.W. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)CrossRef L.P. Kaelbling, M.L. Littman, A.W. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)CrossRef
15.166
Zurück zum Zitat M.E. Lewis, M.L. Puterman: The Handbook of Markov Decision Processes: Methods and Applications (Kluwer, Dordrecht 2001) pp. 89–111 M.E. Lewis, M.L. Puterman: The Handbook of Markov Decision Processes: Methods and Applications (Kluwer, Dordrecht 2001) pp. 89–111
15.167
Zurück zum Zitat J. Peters, S. Vijayakumar, S. Schaal: Linear Quadratic Regulation as Benchmark for Policy Gradient Methods, Technical Report (University of Southern California, Los Angeles 2004) J. Peters, S. Vijayakumar, S. Schaal: Linear Quadratic Regulation as Benchmark for Policy Gradient Methods, Technical Report (University of Southern California, Los Angeles 2004)
15.168
Zurück zum Zitat R.E. Bellman: Dynamic Programming (Princeton Univ. Press, Princeton 1957)MATH R.E. Bellman: Dynamic Programming (Princeton Univ. Press, Princeton 1957)MATH
15.169
Zurück zum Zitat R.S. Sutton, D. McAllester, S.P. Singh, Y. Mansour: Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inform. Process. Syst., Vol. 12 (1999) pp. 1057–1063 R.S. Sutton, D. McAllester, S.P. Singh, Y. Mansour: Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inform. Process. Syst., Vol. 12 (1999) pp. 1057–1063
15.170
Zurück zum Zitat T. Jaakkola, M.I. Jordan, S.P. Singh: Convergence of stochastic iterative dynamic programming algorithms, Adv. Neural Inform. Process. Syst., Vol. 6 (1993) pp. 703–710 T. Jaakkola, M.I. Jordan, S.P. Singh: Convergence of stochastic iterative dynamic programming algorithms, Adv. Neural Inform. Process. Syst., Vol. 6 (1993) pp. 703–710
15.172
Zurück zum Zitat D.E. Kirk: Optimal Control Theory (Prentice-Hall, Englewood Cliffs 1970) D.E. Kirk: Optimal Control Theory (Prentice-Hall, Englewood Cliffs 1970)
15.173
Zurück zum Zitat A. Schwartz: A reinforcement learning method for maximizing undiscounted rewards, Int. Conf. Mach. Learn. (1993) A. Schwartz: A reinforcement learning method for maximizing undiscounted rewards, Int. Conf. Mach. Learn. (1993)
15.174
Zurück zum Zitat C.G. Atkeson, S. Schaal: Robot learning from demonstration, Int. Conf. Mach. Learn. (1997) C.G. Atkeson, S. Schaal: Robot learning from demonstration, Int. Conf. Mach. Learn. (1997)
15.175
Zurück zum Zitat J. Peters, K. Muelling, Y. Altun: Relative entropy policy search, Natl. Conf. Artif. Intell. (2010) J. Peters, K. Muelling, Y. Altun: Relative entropy policy search, Natl. Conf. Artif. Intell. (2010)
15.176
Zurück zum Zitat G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robotics Res. 27(2), 213–228 (2008)CrossRef G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robotics Res. 27(2), 213–228 (2008)CrossRef
15.177
Zurück zum Zitat F. Guenter, M. Hersch, S. Calinon, A. Billard: Reinforcement learning for imitating constrained reaching movements, Adv. Robotics 21(13), 1521–1544 (2007)CrossRef F. Guenter, M. Hersch, S. Calinon, A. Billard: Reinforcement learning for imitating constrained reaching movements, Adv. Robotics 21(13), 1521–1544 (2007)CrossRef
15.178
Zurück zum Zitat J.Z. Kolter, A.Y. Ng: Policy search via the signed derivative, Robotics Sci. Syst. V, Seattle (2009) J.Z. Kolter, A.Y. Ng: Policy search via the signed derivative, Robotics Sci. Syst. V, Seattle (2009)
15.179
Zurück zum Zitat A.Y. Ng, H.J. Kim, M.I. Jordan, S. Sastry: Autonomous helicopter flight via reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2004) pp. 799–806 A.Y. Ng, H.J. Kim, M.I. Jordan, S. Sastry: Autonomous helicopter flight via reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2004) pp. 799–806
15.180
Zurück zum Zitat J.W. Roberts, L. Moret, J. Zhang, R. Tedrake: From motor to interaction learning in robots, Stud. Comput. Intell. 264, 293–309 (2010)MATHCrossRef J.W. Roberts, L. Moret, J. Zhang, R. Tedrake: From motor to interaction learning in robots, Stud. Comput. Intell. 264, 293–309 (2010)MATHCrossRef
15.181
Zurück zum Zitat R. Tedrake: Stochastic policy gradient reinforcement learning on a simple 3D biped, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004) R. Tedrake: Stochastic policy gradient reinforcement learning on a simple 3D biped, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)
15.182
Zurück zum Zitat F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal: Learning motion primitive goals for robust manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011) F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal: Learning motion primitive goals for robust manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
15.183
Zurück zum Zitat M. Strens, A. Moore: Direct policy search using paired statistical tests, Int. Conf. Mach. Learn. (2001) M. Strens, A. Moore: Direct policy search using paired statistical tests, Int. Conf. Mach. Learn. (2001)
15.184
Zurück zum Zitat A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Int. Symp. Exp. Robotics (2004) A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Int. Symp. Exp. Robotics (2004)
15.185
Zurück zum Zitat T. Geng, B. Porr, F. Wörgötter: Fast biped walking with a reflexive controller and real-time policy searching, Adv. Neural Inform. Process. Syst., Vol. 18 (2006) pp. 427–434 T. Geng, B. Porr, F. Wörgötter: Fast biped walking with a reflexive controller and real-time policy searching, Adv. Neural Inform. Process. Syst., Vol. 18 (2006) pp. 427–434
15.186
Zurück zum Zitat N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, N. Hagita: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2005) N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, N. Hagita: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2005)
15.187
Zurück zum Zitat M. Sato, Y. Nakamura, S. Ishii: Reinforcement learning for biped locomotion, Int. Conf. Artif. Neural Netw. (2002) M. Sato, Y. Nakamura, S. Ishii: Reinforcement learning for biped locomotion, Int. Conf. Artif. Neural Netw. (2002)
15.188
Zurück zum Zitat R.Y. Rubinstein, D.P. Kroese: The Cross Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation (Springer, New York 2004)MATHCrossRef R.Y. Rubinstein, D.P. Kroese: The Cross Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation (Springer, New York 2004)MATHCrossRef
15.189
Zurück zum Zitat D.E. Goldberg: Genetic Algorithms (Addision Wesley, New York 1989)MATH D.E. Goldberg: Genetic Algorithms (Addision Wesley, New York 1989)MATH
15.190
Zurück zum Zitat J.T. Betts: Practical Methods for Optimal Control Using Nonlinear Programming, Adv. Design Control, Vol. 3 (SIAM, Philadelphia 2001)MATH J.T. Betts: Practical Methods for Optimal Control Using Nonlinear Programming, Adv. Design Control, Vol. 3 (SIAM, Philadelphia 2001)MATH
15.191
Zurück zum Zitat R.J. Williams: Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8, 229–256 (1992)MATH R.J. Williams: Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8, 229–256 (1992)MATH
15.192
Zurück zum Zitat P. Dayan, G.E. Hinton: Using expectation-maximization for reinforcement learning, Neural Comput. 9(2), 271–278 (1997)MATHCrossRef P. Dayan, G.E. Hinton: Using expectation-maximization for reinforcement learning, Neural Comput. 9(2), 271–278 (1997)MATHCrossRef
15.193
Zurück zum Zitat N. Vlassis, M. Toussaint, G. Kontes, S. Piperidis: Learning model-free robot control by a Monte Carlo EM algorithm, Auton. Robots 27(2), 123–130 (2009)CrossRef N. Vlassis, M. Toussaint, G. Kontes, S. Piperidis: Learning model-free robot control by a Monte Carlo EM algorithm, Auton. Robots 27(2), 123–130 (2009)CrossRef
15.194
Zurück zum Zitat J. Kober, E. Oztop, J. Peters: Reinforcement learning to adjust robot movements to new situations, Proc. Robotics Sci. Syst. Conf. (2010) J. Kober, E. Oztop, J. Peters: Reinforcement learning to adjust robot movements to new situations, Proc. Robotics Sci. Syst. Conf. (2010)
15.195
Zurück zum Zitat E.A. Theodorou, J. Buchli, S. Schaal: Reinforcement learning of motor skills in high dimensions: A path integral approach, IEEE Int. Conf. Robotics Autom. (2010) E.A. Theodorou, J. Buchli, S. Schaal: Reinforcement learning of motor skills in high dimensions: A path integral approach, IEEE Int. Conf. Robotics Autom. (2010)
15.196
Zurück zum Zitat J.A. Bagnell, A.Y. Ng, S. Kakade, J. Schneider: Policy search by dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 831–838 J.A. Bagnell, A.Y. Ng, S. Kakade, J. Schneider: Policy search by dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 831–838
15.197
Zurück zum Zitat T. Kollar, N. Roy: Trajectory optimization using reinforcement learning for map exploration, Int. J. Robotics Res. 27(2), 175–197 (2008)CrossRef T. Kollar, N. Roy: Trajectory optimization using reinforcement learning for map exploration, Int. J. Robotics Res. 27(2), 175–197 (2008)CrossRef
15.198
Zurück zum Zitat D. Lizotte, T. Wang, M. Bowling, D. Schuurmans: Automatic gait optimization with Gaussian process regression, Int. Jt. Conf. Artif. Intell. (2007) D. Lizotte, T. Wang, M. Bowling, D. Schuurmans: Automatic gait optimization with Gaussian process regression, Int. Jt. Conf. Artif. Intell. (2007)
15.199
Zurück zum Zitat S. Kuindersma, R. Grupen, A.G. Barto: Learning dynamic arm motions for postural recovery, IEEE-RAS Int. Conf. Humanoid Robots (2011) S. Kuindersma, R. Grupen, A.G. Barto: Learning dynamic arm motions for postural recovery, IEEE-RAS Int. Conf. Humanoid Robots (2011)
15.200
Zurück zum Zitat M. Tesch, J.G. Schneider, H. Choset: Using response surfaces and expected improvement to optimize snake robot gait parameters, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011) M. Tesch, J.G. Schneider, H. Choset: Using response surfaces and expected improvement to optimize snake robot gait parameters, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
15.201
Zurück zum Zitat S.-J. Yi, B.-T. Zhang, D. Hong, D.D. Lee: Learning full body push recovery control for small humanoid robots, IEEE Proc. Int. Conf. Robotics Autom. (2011) S.-J. Yi, B.-T. Zhang, D. Hong, D.D. Lee: Learning full body push recovery control for small humanoid robots, IEEE Proc. Int. Conf. Robotics Autom. (2011)
15.202
Zurück zum Zitat J.A. Boyan, A.W. Moore: Generalization in reinforcement learning: Safely approximating the value function, Adv. Neural Inform. Process. Syst., Vol. 7 (1995) pp. 369–376 J.A. Boyan, A.W. Moore: Generalization in reinforcement learning: Safely approximating the value function, Adv. Neural Inform. Process. Syst., Vol. 7 (1995) pp. 369–376
15.203
Zurück zum Zitat S. Kakade, J. Langford: Approximately optimal approximate reinforcement learning, Int. Conf. Mach. Learn. (2002) S. Kakade, J. Langford: Approximately optimal approximate reinforcement learning, Int. Conf. Mach. Learn. (2002)
15.204
Zurück zum Zitat E. Greensmith, P.L. Bartlett, J. Baxter: Variance reduction techniques for gradient estimates in reinforcement learning, J. Mach. Learn. Res. 5, 1471–1530 (2004)MathSciNetMATH E. Greensmith, P.L. Bartlett, J. Baxter: Variance reduction techniques for gradient estimates in reinforcement learning, J. Mach. Learn. Res. 5, 1471–1530 (2004)MathSciNetMATH
15.205
Zurück zum Zitat M.T. Rosenstein, A.G. Barto: Reinforcement learning with supervision by a stable controller, Am. Control Conf. (2004) M.T. Rosenstein, A.G. Barto: Reinforcement learning with supervision by a stable controller, Am. Control Conf. (2004)
15.206
Zurück zum Zitat J.N. Tsitsiklis, B. Van Roy: An analysis of temporal-difference learning with function approximation, IEEE Trans. Autom. Control 42(5), 674–690 (1997)MathSciNetMATHCrossRef J.N. Tsitsiklis, B. Van Roy: An analysis of temporal-difference learning with function approximation, IEEE Trans. Autom. Control 42(5), 674–690 (1997)MathSciNetMATHCrossRef
15.207
Zurück zum Zitat J.Z. Kolter, A.Y. Ng: Regularization and feature selection in least-squares temporal difference learning, Int. Conf. Mach. Learn. (2009) J.Z. Kolter, A.Y. Ng: Regularization and feature selection in least-squares temporal difference learning, Int. Conf. Mach. Learn. (2009)
15.208
Zurück zum Zitat L.C. Baird, H. Klopf: Reinforcement Learning with High-Dimensional Continuous Actions, Technical Report WL-TR-93-1147 (Wright-Patterson Air Force Base, Dayton 1993)CrossRef L.C. Baird, H. Klopf: Reinforcement Learning with High-Dimensional Continuous Actions, Technical Report WL-TR-93-1147 (Wright-Patterson Air Force Base, Dayton 1993)CrossRef
15.209
Zurück zum Zitat G.D. Konidaris, S. Osentoski, P. Thomas: Value function approximation in reinforcement learning using the Fourier basis, AAAI Conf. Artif. Intell. (2011) G.D. Konidaris, S. Osentoski, P. Thomas: Value function approximation in reinforcement learning using the Fourier basis, AAAI Conf. Artif. Intell. (2011)
15.210
Zurück zum Zitat J. Peters, K. Muelling, J. Kober, D. Nguyen-Tuong, O. Kroemer: Towards motor skill learning for robotics, Int. Symp. Robotics Res. (2010) J. Peters, K. Muelling, J. Kober, D. Nguyen-Tuong, O. Kroemer: Towards motor skill learning for robotics, Int. Symp. Robotics Res. (2010)
15.211
Zurück zum Zitat L. Buşoniu, R. Babuška, B. de Schutter, D. Ernst: Reinforcement Learning and Dynamic Programming Using Function Approximators (CRC, Boca Raton 2010)MATH L. Buşoniu, R. Babuška, B. de Schutter, D. Ernst: Reinforcement Learning and Dynamic Programming Using Function Approximators (CRC, Boca Raton 2010)MATH
15.212
Zurück zum Zitat A.G. Barto, S. Mahadevan: Recent advances in hierarchical reinforcement learning, Discret. Event Dyn. Syst. 13(4), 341–379 (2003)MathSciNetMATHCrossRef A.G. Barto, S. Mahadevan: Recent advances in hierarchical reinforcement learning, Discret. Event Dyn. Syst. 13(4), 341–379 (2003)MathSciNetMATHCrossRef
15.213
Zurück zum Zitat S. Hart, R. Grupen: Learning generalizable control programs, IEEE Trans. Auton. Mental Dev. 3(3), 216–231 (2011)CrossRef S. Hart, R. Grupen: Learning generalizable control programs, IEEE Trans. Auton. Mental Dev. 3(3), 216–231 (2011)CrossRef
15.214
Zurück zum Zitat J.G. Schneider: Exploiting model uncertainty estimates for safe dynamic control learning, Adv. Neural Inform. Process. Syst., Vol. 9 (1997) pp. 1047–1053 J.G. Schneider: Exploiting model uncertainty estimates for safe dynamic control learning, Adv. Neural Inform. Process. Syst., Vol. 9 (1997) pp. 1047–1053
15.215
Zurück zum Zitat J.A. Bagnell: Learning Decisions: Robustness, Uncertainty, and Approximation. Dissertation (Robotics Institute, Carnegie Mellon University, Pittsburgh 2004) J.A. Bagnell: Learning Decisions: Robustness, Uncertainty, and Approximation. Dissertation (Robotics Institute, Carnegie Mellon University, Pittsburgh 2004)
15.216
Zurück zum Zitat T.M. Moldovan, P. Abbeel: Safe exploration in markov decision processes, 29th Int. Conf. Mach. Learn. (2012) T.M. Moldovan, P. Abbeel: Safe exploration in markov decision processes, 29th Int. Conf. Mach. Learn. (2012)
15.217
Zurück zum Zitat T. Hester, M. Quinlan, P. Stone: RTMBA: A real-time model-based reinforcement learning architecture for robot control, IEEE Int. Conf. Robotics Autom. (2012) T. Hester, M. Quinlan, P. Stone: RTMBA: A real-time model-based reinforcement learning architecture for robot control, IEEE Int. Conf. Robotics Autom. (2012)
15.218
Zurück zum Zitat C.G. Atkeson: Using local trajectory optimizers to speed up global optimization in dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 6 (1994) pp. 663–670 C.G. Atkeson: Using local trajectory optimizers to speed up global optimization in dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 6 (1994) pp. 663–670
15.219
Zurück zum Zitat J. Kober, J. Peters: Policy search for motor primitives in robotics, Mach. Learn. 84(1/2), 171–203 (2010)MathSciNetMATH J. Kober, J. Peters: Policy search for motor primitives in robotics, Mach. Learn. 84(1/2), 171–203 (2010)MathSciNetMATH
15.220
Zurück zum Zitat S. Russell: Learning agents for uncertain environments (extended abstract), Conf. Comput. Learn. Theory (1989) S. Russell: Learning agents for uncertain environments (extended abstract), Conf. Comput. Learn. Theory (1989)
15.221
Zurück zum Zitat P. Abbeel, A.Y. Ng: Apprenticeship learning via inverse reinforcement learning, Int. Conf. Mach. Learn. (2004) P. Abbeel, A.Y. Ng: Apprenticeship learning via inverse reinforcement learning, Int. Conf. Mach. Learn. (2004)
15.222
Zurück zum Zitat N.D. Ratliff, J.A. Bagnell, M.A. Zinkevich: Maximum margin planning, Int. Conf. Mach. Learn. (2006) N.D. Ratliff, J.A. Bagnell, M.A. Zinkevich: Maximum margin planning, Int. Conf. Mach. Learn. (2006)
15.223
Zurück zum Zitat R.L. Keeney, H. Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, New York 1976)MATH R.L. Keeney, H. Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, New York 1976)MATH
15.224
Zurück zum Zitat N. Ratliff, D. Bradley, J.A. Bagnell, J. Chestnutt: Boosting structured prediction for imitation learning, Adv. Neural Inform. Process. Syst., Vol. 19 (2006) pp. 1153–1160 N. Ratliff, D. Bradley, J.A. Bagnell, J. Chestnutt: Boosting structured prediction for imitation learning, Adv. Neural Inform. Process. Syst., Vol. 19 (2006) pp. 1153–1160
15.225
Zurück zum Zitat D. Silver, J.A. Bagnell, A. Stentz: High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008) D. Silver, J.A. Bagnell, A. Stentz: High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)
15.226
Zurück zum Zitat D. Silver, J.A. Bagnell, A. Stentz: Learning from demonstration for autonomous navigation in complex unstructured terrain, Int. J. Robotics Res. 29(12), 1565–1592 (2010)CrossRef D. Silver, J.A. Bagnell, A. Stentz: Learning from demonstration for autonomous navigation in complex unstructured terrain, Int. J. Robotics Res. 29(12), 1565–1592 (2010)CrossRef
15.227
Zurück zum Zitat N. Ratliff, J.A. Bagnell, S. Srinivasa: Imitation learning for locomotion and manipulation, IEEE-RAS Int. Conf. Humanoid Robots (2007) N. Ratliff, J.A. Bagnell, S. Srinivasa: Imitation learning for locomotion and manipulation, IEEE-RAS Int. Conf. Humanoid Robots (2007)
15.228
Zurück zum Zitat J.Z. Kolter, P. Abbeel, A.Y. Ng: Hierarchical apprenticeship learning with application to quadruped locomotion, Adv. Neural Inform. Process. Syst., Vol. 20 (2007) pp. 769–776 J.Z. Kolter, P. Abbeel, A.Y. Ng: Hierarchical apprenticeship learning with application to quadruped locomotion, Adv. Neural Inform. Process. Syst., Vol. 20 (2007) pp. 769–776
15.229
Zurück zum Zitat J. Sorg, S.P. Singh, R.L. Lewis: Reward design via online gradient ascent, Adv. Neural Inform. Process. Syst., Vol. 23 (2010) pp. 2190–2198 J. Sorg, S.P. Singh, R.L. Lewis: Reward design via online gradient ascent, Adv. Neural Inform. Process. Syst., Vol. 23 (2010) pp. 2190–2198
15.230
Zurück zum Zitat M. Zucker, J.A. Bagnell: Reinforcement planning: RL for optimal planners, IEEE Proc. Int. Conf. Robotics Autom. (2012) M. Zucker, J.A. Bagnell: Reinforcement planning: RL for optimal planners, IEEE Proc. Int. Conf. Robotics Autom. (2012)
15.231
Zurück zum Zitat H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge: Real-time learning: A ball on a beam, Int. Jt. Conf. Neural Netw. (1992) H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge: Real-time learning: A ball on a beam, Int. Jt. Conf. Neural Netw. (1992)
15.232
Zurück zum Zitat B. Nemec, M. Zorko, L. Zlajpah: Learning of a ball-in-a-cup playing robot, Int. Workshop Robotics, Alpe-Adria-Danube Region (2010) B. Nemec, M. Zorko, L. Zlajpah: Learning of a ball-in-a-cup playing robot, Int. Workshop Robotics, Alpe-Adria-Danube Region (2010)
15.233
Zurück zum Zitat M. Tokic, W. Ertel, J. Fessler: The crawler, a class room demonstrator for reinforcement learning, Int. Fla. Artif. Intell. Res. Soc. Conf. (2009) M. Tokic, W. Ertel, J. Fessler: The crawler, a class room demonstrator for reinforcement learning, Int. Fla. Artif. Intell. Res. Soc. Conf. (2009)
15.234
Zurück zum Zitat H. Kimura, T. Yamashita, S. Kobayashi: Reinforcement learning of walking behavior for a four-legged robot, IEEE Conf. Decis. Control (2001) H. Kimura, T. Yamashita, S. Kobayashi: Reinforcement learning of walking behavior for a four-legged robot, IEEE Conf. Decis. Control (2001)
15.235
Zurück zum Zitat R.A. Willgoss, J. Iqbal: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing, Aust. Conf. Robotics Autom. (1999) R.A. Willgoss, J. Iqbal: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing, Aust. Conf. Robotics Autom. (1999)
15.236
Zurück zum Zitat L. Paletta, G. Fritz, F. Kintzler, J. Irran, G. Dorffner: Perception and developmental learning of affordances in autonomous robots, Lect. Notes Comput. Sci. 4667, 235–250 (2007)CrossRef L. Paletta, G. Fritz, F. Kintzler, J. Irran, G. Dorffner: Perception and developmental learning of affordances in autonomous robots, Lect. Notes Comput. Sci. 4667, 235–250 (2007)CrossRef
15.237
Zurück zum Zitat C. Kwok, D. Fox: Reinforcement learning for sensing strategies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004) C. Kwok, D. Fox: Reinforcement learning for sensing strategies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)
15.238
Zurück zum Zitat T. Yasuda, K. Ohkura: A reinforcement learning technique with an adaptive action generator for a multi-robot system, Int. Conf. Simul. Adapt. Behav. (2008) T. Yasuda, K. Ohkura: A reinforcement learning technique with an adaptive action generator for a multi-robot system, Int. Conf. Simul. Adapt. Behav. (2008)
15.239
Zurück zum Zitat J.H. Piater, S. Jodogne, R. Detry, D. Kraft, N. Krüger, O. Kroemer, J. Peters: Learning visual representations for perception-action systems, Int. J. Robotics Res. 30(3), 294–307 (2011)MATHCrossRef J.H. Piater, S. Jodogne, R. Detry, D. Kraft, N. Krüger, O. Kroemer, J. Peters: Learning visual representations for perception-action systems, Int. J. Robotics Res. 30(3), 294–307 (2011)MATHCrossRef
15.240
Zurück zum Zitat M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda: Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Mach. Learn. 23(2/3), 279–303 (1996)CrossRef M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda: Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Mach. Learn. 23(2/3), 279–303 (1996)CrossRef
15.241
Zurück zum Zitat M. Huber, R.A. Grupen: A feedback control structure for on-line learning tasks, Robotics Auton. Syst. 22(3/4), 303–315 (1997)CrossRef M. Huber, R.A. Grupen: A feedback control structure for on-line learning tasks, Robotics Auton. Syst. 22(3/4), 303–315 (1997)CrossRef
15.242
Zurück zum Zitat P. Fidelman, P. Stone: Learning ball acquisition on a physical robot, Int. Symp. Robotics Autom. (2004) P. Fidelman, P. Stone: Learning ball acquisition on a physical robot, Int. Symp. Robotics Autom. (2004)
15.243
Zurück zum Zitat V. Soni, S.P. Singh: Reinforcement learning of hierarchical skills on the Sony AIBO robot, Int. Conf. Dev. Learn. (2006) V. Soni, S.P. Singh: Reinforcement learning of hierarchical skills on the Sony AIBO robot, Int. Conf. Dev. Learn. (2006)
15.244
Zurück zum Zitat B. Nemec, M. Tamošiunaitė, F. Wörgötter, A. Ude: Task adaptation through exploration and action sequencing, IEEE-RAS Int. Conf. Humanoid Robots (2009) B. Nemec, M. Tamošiunaitė, F. Wörgötter, A. Ude: Task adaptation through exploration and action sequencing, IEEE-RAS Int. Conf. Humanoid Robots (2009)
15.245
Zurück zum Zitat M.J. Matarić: Reinforcement learning in the multi-robot domain, Auton. Robots 4, 73–83 (1997)CrossRef M.J. Matarić: Reinforcement learning in the multi-robot domain, Auton. Robots 4, 73–83 (1997)CrossRef
15.246
Zurück zum Zitat M.J. Matarić: Reward functions for accelerated learning, Int. Conf. Mach. Learn. (ICML) (1994) M.J. Matarić: Reward functions for accelerated learning, Int. Conf. Mach. Learn. (ICML) (1994)
15.247
Zurück zum Zitat R. Platt, R.A. Grupen, A.H. Fagg: Improving grasp skills using schema structured learning, Int. Conf. Dev. Learn. (2006) R. Platt, R.A. Grupen, A.H. Fagg: Improving grasp skills using schema structured learning, Int. Conf. Dev. Learn. (2006)
15.248
Zurück zum Zitat M. Dorigo, M. Colombetti: Robot Shaping: Developing Situated Agents Through Learning, Technical Report (International Computer Science Institute, Berkeley 1993) M. Dorigo, M. Colombetti: Robot Shaping: Developing Situated Agents Through Learning, Technical Report (International Computer Science Institute, Berkeley 1993)
15.249
Zurück zum Zitat G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Autonomous skill acquisition on a mobile manipulator, AAAI Conf. Artif. Intell. (2011) G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Autonomous skill acquisition on a mobile manipulator, AAAI Conf. Artif. Intell. (2011)
15.250
Zurück zum Zitat G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Robot learning from demonstration by constructing skill trees, Int. J. Robotics Res. 31(3), 360–375 (2012)CrossRef G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Robot learning from demonstration by constructing skill trees, Int. J. Robotics Res. 31(3), 360–375 (2012)CrossRef
15.251
Zurück zum Zitat A. Cocora, K. Kersting, C. Plagemann, W. Burgard, L. de Raedt: Learning relational navigation policies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2006) A. Cocora, K. Kersting, C. Plagemann, W. Burgard, L. de Raedt: Learning relational navigation policies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2006)
15.252
Zurück zum Zitat D. Katz, Y. Pyuro, O. Brock: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008) D. Katz, Y. Pyuro, O. Brock: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)
15.253
Zurück zum Zitat C.H. An, C.G. Atkeson, J.M. Hollerbach: Model-Based Control of a Robot Manipulator (MIT, Press, Cambridge 1988) C.H. An, C.G. Atkeson, J.M. Hollerbach: Model-Based Control of a Robot Manipulator (MIT, Press, Cambridge 1988)
15.254
Zurück zum Zitat C. Gaskett, L. Fletcher, A. Zelinsky: Reinforcement learning for a vision based mobile robot, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2000) C. Gaskett, L. Fletcher, A. Zelinsky: Reinforcement learning for a vision based mobile robot, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2000)
15.255
Zurück zum Zitat Y. Duan, B. Cui, H. Yang: Robot navigation based on fuzzy RL algorithm, Int. Symp. Neural Netw. (2008) Y. Duan, B. Cui, H. Yang: Robot navigation based on fuzzy RL algorithm, Int. Symp. Neural Netw. (2008)
15.256
Zurück zum Zitat H. Benbrahim, J.A. Franklin: Biped dynamic walking using reinforcement learning, Robotics Auton. Syst. 22(3/4), 283–302 (1997)CrossRef H. Benbrahim, J.A. Franklin: Biped dynamic walking using reinforcement learning, Robotics Auton. Syst. 22(3/4), 283–302 (1997)CrossRef
15.257
Zurück zum Zitat W.D. Smart, L. Pack Kaelbling: A framework for reinforcement learning on real robots, Natl. Conf. Artif. Intell./Innov. Appl. Artif. Intell. (1989) W.D. Smart, L. Pack Kaelbling: A framework for reinforcement learning on real robots, Natl. Conf. Artif. Intell./Innov. Appl. Artif. Intell. (1989)
15.258
Zurück zum Zitat D.C. Bentivegna: Learning from Observation Using Primitives (Georgia Institute of Technology, Atlanta 2004) D.C. Bentivegna: Learning from Observation Using Primitives (Georgia Institute of Technology, Atlanta 2004)
15.259
Zurück zum Zitat A. Rottmann, C. Plagemann, P. Hilgers, W. Burgard: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2007) A. Rottmann, C. Plagemann, P. Hilgers, W. Burgard: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2007)
15.260
Zurück zum Zitat K. Gräve, J. Stückler, S. Behnke: Learning motion skills from expert demonstrations and own experience using Gaussian process regression, Jt. Int. Symp. Robotics (ISR) Ger. Conf. Robotics (ROBOTIK) (2010) K. Gräve, J. Stückler, S. Behnke: Learning motion skills from expert demonstrations and own experience using Gaussian process regression, Jt. Int. Symp. Robotics (ISR) Ger. Conf. Robotics (ROBOTIK) (2010)
15.261
Zurück zum Zitat O. Kroemer, R. Detry, J. Piater, J. Peters: Active learning using mean shift optimization for robot grasping, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2009) O. Kroemer, R. Detry, J. Piater, J. Peters: Active learning using mean shift optimization for robot grasping, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2009)
15.262
Zurück zum Zitat O. Kroemer, R. Detry, J. Piater, J. Peters: Combining active learning and reactive control for robot grasping, Robotics Auton. Syst. 58(9), 1105–1116 (2010)CrossRef O. Kroemer, R. Detry, J. Piater, J. Peters: Combining active learning and reactive control for robot grasping, Robotics Auton. Syst. 58(9), 1105–1116 (2010)CrossRef
15.263
Zurück zum Zitat T. Tamei, T. Shibata: Policy gradient learning of cooperative interaction with a robot using user's biological signals, Int. Conf. Neural Inf. Process. (2009) T. Tamei, T. Shibata: Policy gradient learning of cooperative interaction with a robot using user's biological signals, Int. Conf. Neural Inf. Process. (2009)
15.264
Zurück zum Zitat A.J. Ijspeert, J. Nakanishi, S. Schaal: Learning attractor landscapes for learning motor primitives, Adv. Neural Inform. Process. Syst., Vol. 15 (2003) pp. 1547–1554 A.J. Ijspeert, J. Nakanishi, S. Schaal: Learning attractor landscapes for learning motor primitives, Adv. Neural Inform. Process. Syst., Vol. 15 (2003) pp. 1547–1554
15.265
Zurück zum Zitat S. Schaal, P. Mohajerian, A.J. Ijspeert: Dynamics systems vs. optimal control – A unifying view, Prog. Brain Res. 165(1), 425–445 (2007)CrossRef S. Schaal, P. Mohajerian, A.J. Ijspeert: Dynamics systems vs. optimal control – A unifying view, Prog. Brain Res. 165(1), 425–445 (2007)CrossRef
15.266
Zurück zum Zitat H.-I. Lin, C.-C. Lai: Learning collision-free reaching skill from primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012) H.-I. Lin, C.-C. Lai: Learning collision-free reaching skill from primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)
15.267
Zurück zum Zitat J. Kober, B. Mohler, J. Peters: Learning perceptual coupling for motor primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2008) J. Kober, B. Mohler, J. Peters: Learning perceptual coupling for motor primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2008)
15.268
Zurück zum Zitat S. Bitzer, M. Howard, S. Vijayakumar: Using dimensionality reduction to exploit constraints in reinforcement learning, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (2010) S. Bitzer, M. Howard, S. Vijayakumar: Using dimensionality reduction to exploit constraints in reinforcement learning, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (2010)
15.269
Zurück zum Zitat J. Buchli, F. Stulp, E. Theodorou, S. Schaal: Learning variable impedance control, Int. J. Robotics Res. 30(7), 820–833 (2011)CrossRef J. Buchli, F. Stulp, E. Theodorou, S. Schaal: Learning variable impedance control, Int. J. Robotics Res. 30(7), 820–833 (2011)CrossRef
15.270
Zurück zum Zitat P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, S. Schaal: Skill learning and task outcome prediction for manipulation, IEEE Int. Conf. Robotics Autom. (2011) P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, S. Schaal: Skill learning and task outcome prediction for manipulation, IEEE Int. Conf. Robotics Autom. (2011)
15.271
Zurück zum Zitat M. Kalakrishnan, L. Righetti, P. Pastor, S. Schaal: Learning force control policies for compliant manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011) M. Kalakrishnan, L. Righetti, P. Pastor, S. Schaal: Learning force control policies for compliant manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
15.272
Zurück zum Zitat D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learning from observation and practice using behavioral primitives: Marble maze, 11th Int. Symp. Robotics Res. (2004) D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learning from observation and practice using behavioral primitives: Marble maze, 11th Int. Symp. Robotics Res. (2004)
15.273
Zurück zum Zitat F. Kirchner: Q-learning of complex behaviours on a six-legged walking machine, EUROMICRO Workshop Adv. Mobile Robots (1997) F. Kirchner: Q-learning of complex behaviours on a six-legged walking machine, EUROMICRO Workshop Adv. Mobile Robots (1997)
15.274
Zurück zum Zitat J. Morimoto, K. Doya: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Auton. Syst. 36(1), 37–51 (2001)MATHCrossRef J. Morimoto, K. Doya: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Auton. Syst. 36(1), 37–51 (2001)MATHCrossRef
15.275
Zurück zum Zitat J.-Y. Donnart, J.-A. Meyer: Learning reactive and planning rules in a motivationally autonomous animat, Syst. Man Cybern. B 26(3), 381–395 (1996)CrossRef J.-Y. Donnart, J.-A. Meyer: Learning reactive and planning rules in a motivationally autonomous animat, Syst. Man Cybern. B 26(3), 381–395 (1996)CrossRef
15.276
Zurück zum Zitat C. Daniel, G. Neumann, J. Peters: Learning concurrent motor skills in versatile solution spaces, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012) C. Daniel, G. Neumann, J. Peters: Learning concurrent motor skills in versatile solution spaces, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)
15.277
Zurück zum Zitat E.C. Whitman, C.G. Atkeson: Control of instantaneously coupled systems applied to humanoid walking, IEEE-RAS Int. Conf. Humanoid Robots (2010) E.C. Whitman, C.G. Atkeson: Control of instantaneously coupled systems applied to humanoid walking, IEEE-RAS Int. Conf. Humanoid Robots (2010)
15.278
Zurück zum Zitat X. Huang, J. Weng: Novelty and reinforcement learning in the value system of developmental robots, 2nd Int. Workshop Epigenetic Robotics Model. Cognit. Dev. Robotic Syst. (2002) X. Huang, J. Weng: Novelty and reinforcement learning in the value system of developmental robots, 2nd Int. Workshop Epigenetic Robotics Model. Cognit. Dev. Robotic Syst. (2002)
15.279
Zurück zum Zitat M. Pendrith: Reinforcement learning in situated agents: Some theoretical problems and practical solutions, Eur. Workshop Learn. Robots (1999) M. Pendrith: Reinforcement learning in situated agents: Some theoretical problems and practical solutions, Eur. Workshop Learn. Robots (1999)
15.280
Zurück zum Zitat B. Wang, J.W. Li, H. Liu: A heuristic reinforcement learning for robot approaching objects, IEEE Conf. Robotics Autom. Mechatron. (2006) B. Wang, J.W. Li, H. Liu: A heuristic reinforcement learning for robot approaching objects, IEEE Conf. Robotics Autom. Mechatron. (2006)
15.281
Zurück zum Zitat L.P. Kaelbling: Learning in Embedded Systems (Stanford University, Stanford 1990) L.P. Kaelbling: Learning in Embedded Systems (Stanford University, Stanford 1990)
15.282
Zurück zum Zitat R.S. Sutton: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Int. Conf. Mach. Learn. (1990) R.S. Sutton: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Int. Conf. Mach. Learn. (1990)
15.283
Zurück zum Zitat A.W. Moore, C.G. Atkeson: Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn. 13(1), 103–130 (1993) A.W. Moore, C.G. Atkeson: Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn. 13(1), 103–130 (1993)
15.284
Zurück zum Zitat J. Peng, R.J. Williams: Incremental multi-step Q-learning, Mach. Learn. 22(1), 283–290 (1996) J. Peng, R.J. Williams: Incremental multi-step Q-learning, Mach. Learn. 22(1), 283–290 (1996)
15.285
Zurück zum Zitat N. Jakobi, P. Husbands, I. Harvey: Noise and the reality gap: The use of simulation in evolutionary robotics, 3rd Eur. Conf. Artif. Life (1995) N. Jakobi, P. Husbands, I. Harvey: Noise and the reality gap: The use of simulation in evolutionary robotics, 3rd Eur. Conf. Artif. Life (1995)
Metadaten
Titel
Robot Learning
verfasst von
Jan Peters
Daniel D. Lee
Jens Kober
Duy Nguyen-Tuong
J. Andrew Bagnell
Stefan Schaal
Copyright-Jahr
2016
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-32552-1_15

Neuer Inhalt