Skip to main content
Top

2016 | OriginalPaper | Chapter

15. Robot Learning

Authors : Jan Peters, Daniel D. Lee, Jens Kober, Duy Nguyen-Tuong, J. Andrew Bagnell, Stefan Schaal

Published in: Springer Handbook of Robotics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Machine learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors; conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in robot learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this chapter, we attempt to strengthen the links between the two research communities by providing a survey of work in robot learning for learning control and behavior generation in robots. We highlight both key challenges in robot learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our chapter lies on model learning for control and robot reinforcement learning. We demonstrate how machine learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
15.1
go back to reference S. Schaal: The new robotics – Towards human-centered machines, HFSP J. Front. Interdiscip. Res, Life Sci. 1(2), 115–126 (2007) S. Schaal: The new robotics – Towards human-centered machines, HFSP J. Front. Interdiscip. Res, Life Sci. 1(2), 115–126 (2007)
15.2
go back to reference B.D. Ziebart, A. Maas, J.A. Bagnell, A.K. Dey: Maximum entropy inverse reinforcement learning, AAAI Conf. Artif. Intell. (2008) B.D. Ziebart, A. Maas, J.A. Bagnell, A.K. Dey: Maximum entropy inverse reinforcement learning, AAAI Conf. Artif. Intell. (2008)
15.3
go back to reference S. Thrun, W. Burgard, D. Fox: Probabilistic Robotics (MIT, Cambridge 2005)MATH S. Thrun, W. Burgard, D. Fox: Probabilistic Robotics (MIT, Cambridge 2005)MATH
15.4
go back to reference B. Apolloni, A. Ghosh, F. Alpaslan, L.C. Jain, S. Patnaik (Eds.): Machine Learning and Robot Perception, Stud. Comput. Intell., Vol. 7 (Springer, Berlin, Heidelberg 2005)MATH B. Apolloni, A. Ghosh, F. Alpaslan, L.C. Jain, S. Patnaik (Eds.): Machine Learning and Robot Perception, Stud. Comput. Intell., Vol. 7 (Springer, Berlin, Heidelberg 2005)MATH
15.5
go back to reference O. Jenkins, R. Bodenheimer, R. Peters: Manipulation manifolds: Explorations into uncovering manifolds in sensory-motor spaces, Int. Conf. Dev. Learn. (2006) O. Jenkins, R. Bodenheimer, R. Peters: Manipulation manifolds: Explorations into uncovering manifolds in sensory-motor spaces, Int. Conf. Dev. Learn. (2006)
15.6
go back to reference M. Toussaint: Machine learning and robotics, Tutor. Conf. Mach. Learn. (2011) M. Toussaint: Machine learning and robotics, Tutor. Conf. Mach. Learn. (2011)
15.7
go back to reference D.P. Bertsekas: Dynamic Programming and Optimal Control (Athena Scientific, Nashua 1995)MATH D.P. Bertsekas: Dynamic Programming and Optimal Control (Athena Scientific, Nashua 1995)MATH
15.8
go back to reference R.E. Kalman: When is a linear control system optimal?, J. Basic Eng. 86(1), 51–60 (1964)CrossRef R.E. Kalman: When is a linear control system optimal?, J. Basic Eng. 86(1), 51–60 (1964)CrossRef
15.9
go back to reference D. Nguyen-Tuong, J. Peters: Model learning in robotics: A survey, Cogn. Process. 12(4), 319–340 (2011)CrossRef D. Nguyen-Tuong, J. Peters: Model learning in robotics: A survey, Cogn. Process. 12(4), 319–340 (2011)CrossRef
15.10
go back to reference J. Kober, D. Bagnell, J. Peters: Reinforcement learning in robotics: A survey, Int. J. Robotics Res. 32(11), 1238–1274 (2013)CrossRef J. Kober, D. Bagnell, J. Peters: Reinforcement learning in robotics: A survey, Int. J. Robotics Res. 32(11), 1238–1274 (2013)CrossRef
15.11
15.12
go back to reference J. Ham, Y. Lin, D.D. Lee: Learning nonlinear appearance manifolds for robot localization, Int. Conf. Intell. Robots Syst. (2005) J. Ham, Y. Lin, D.D. Lee: Learning nonlinear appearance manifolds for robot localization, Int. Conf. Intell. Robots Syst. (2005)
15.13
go back to reference R.S. Sutton, A.G. Barto: Reinforcement Learning (MIT, Cambridge 1998)MATH R.S. Sutton, A.G. Barto: Reinforcement Learning (MIT, Cambridge 1998)MATH
15.14
go back to reference D. Nguyen-Tuong, J. Peters: Model learning with local Gaussian process regression, Adv. Robotics 23(15), 2015–2034 (2009)CrossRef D. Nguyen-Tuong, J. Peters: Model learning with local Gaussian process regression, Adv. Robotics 23(15), 2015–2034 (2009)CrossRef
15.15
go back to reference J. Nakanishi, R. Cory, M. Mistry, J. Peters, S. Schaal: Operational space control: A theoretical and emprical comparison, Int. J. Robotics Res. 27(6), 737–757 (2008)CrossRef J. Nakanishi, R. Cory, M. Mistry, J. Peters, S. Schaal: Operational space control: A theoretical and emprical comparison, Int. J. Robotics Res. 27(6), 737–757 (2008)CrossRef
15.16
go back to reference F.R. Reinhart, J.J. Steil: Attractor-based computation with reservoirs for online learning of inverse kinematics, Proc. Eur. Symp. Artif. Neural Netw. (2009) F.R. Reinhart, J.J. Steil: Attractor-based computation with reservoirs for online learning of inverse kinematics, Proc. Eur. Symp. Artif. Neural Netw. (2009)
15.17
go back to reference J. Ting, M. Kalakrishnan, S. Vijayakumar, S. Schaal: Bayesian kernel shaping for learning control, Adv. Neural Inform. Process. Syst., Vol. 21 (2008) pp. 1673–1680 J. Ting, M. Kalakrishnan, S. Vijayakumar, S. Schaal: Bayesian kernel shaping for learning control, Adv. Neural Inform. Process. Syst., Vol. 21 (2008) pp. 1673–1680
15.18
go back to reference J. Steffen, S. Klanke, S. Vijayakumar, H.J. Ritter: Realising dextrous manipulation with structured manifolds using unsupervised kernel regression with structural hints, ICRA 2009 Workshop: Approaches Sens. Learn. Humanoid Robots, Kobe (2009) J. Steffen, S. Klanke, S. Vijayakumar, H.J. Ritter: Realising dextrous manipulation with structured manifolds using unsupervised kernel regression with structural hints, ICRA 2009 Workshop: Approaches Sens. Learn. Humanoid Robots, Kobe (2009)
15.19
go back to reference S. Klanke, D. Lebedev, R. Haschke, J.J. Steil, H. Ritter: Dynamic path planning for a 7-dof robot arm, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2006) S. Klanke, D. Lebedev, R. Haschke, J.J. Steil, H. Ritter: Dynamic path planning for a 7-dof robot arm, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2006)
15.20
go back to reference A. Angelova, L. Matthies, D. Helmick, P. Perona: Slip prediction using visual information, Proc. Robotics Sci. Syst., Philadelphia (2006) A. Angelova, L. Matthies, D. Helmick, P. Perona: Slip prediction using visual information, Proc. Robotics Sci. Syst., Philadelphia (2006)
15.21
go back to reference M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal: Learning locomotion over rough terrain using terrain templates, IEEE Int. Conf. Intell. Robots Syst. (2009) M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal: Learning locomotion over rough terrain using terrain templates, IEEE Int. Conf. Intell. Robots Syst. (2009)
15.22
go back to reference N. Hawes, J.L. Wyatt, M. Sridharan, M. Kopicki, S. Hongeng, I. Calvert, A. Sloman, G.-J. Kruijff, H. Jacobsson, M. Brenner, D. Skočaj, A. Vrečko, N. Majer, M. Zillich: The playmate system, Cognit. Syst. 8, 367–393 (2010)CrossRef N. Hawes, J.L. Wyatt, M. Sridharan, M. Kopicki, S. Hongeng, I. Calvert, A. Sloman, G.-J. Kruijff, H. Jacobsson, M. Brenner, D. Skočaj, A. Vrečko, N. Majer, M. Zillich: The playmate system, Cognit. Syst. 8, 367–393 (2010)CrossRef
15.23
go back to reference D. Skočaj, M. Kristan, A. Vrečko, A. Leonardis, M. Fritz, M. Stark, B. Schiele, S. Hongeng, J.L. Wyatt: Multi-modal learning, Cogn. Syst. 8, 265–309 (2010)CrossRef D. Skočaj, M. Kristan, A. Vrečko, A. Leonardis, M. Fritz, M. Stark, B. Schiele, S. Hongeng, J.L. Wyatt: Multi-modal learning, Cogn. Syst. 8, 265–309 (2010)CrossRef
15.24
go back to reference O.J. Smith: A controller to overcome dead-time, Instrum. Soc. Am. J. 6, 28–33 (1959) O.J. Smith: A controller to overcome dead-time, Instrum. Soc. Am. J. 6, 28–33 (1959)
15.25
go back to reference K.S. Narendra, A.M. Annaswamy: Stable Adaptive Systems (Prentice Hall, New Jersey 1989)MATH K.S. Narendra, A.M. Annaswamy: Stable Adaptive Systems (Prentice Hall, New Jersey 1989)MATH
15.26
go back to reference S. Nicosia, P. Tomei: Model reference adaptive control algorithms for industrial robots, Automatica 20, 635–644 (1984)MATHCrossRef S. Nicosia, P. Tomei: Model reference adaptive control algorithms for industrial robots, Automatica 20, 635–644 (1984)MATHCrossRef
15.27
go back to reference J.M. Maciejowski: Predictive Control with Constraints (Prentice Hall, New Jersey 2002)MATH J.M. Maciejowski: Predictive Control with Constraints (Prentice Hall, New Jersey 2002)MATH
15.28
go back to reference R.S. Sutton: Dyna, an integrated architecture for learning, planning, and reacting, SIGART Bulletin 2(4), 160–163 (1991)CrossRef R.S. Sutton: Dyna, an integrated architecture for learning, planning, and reacting, SIGART Bulletin 2(4), 160–163 (1991)CrossRef
15.29
go back to reference C.G. Atkeson, J. Morimoto: Nonparametric representation of policies and value functions: A trajectory-based approach, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) C.G. Atkeson, J. Morimoto: Nonparametric representation of policies and value functions: A trajectory-based approach, Adv. Neural Inform. Process. Syst., Vol. 15 (2002)
15.30
go back to reference A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Proc. 11th Int. Symp. Exp. Robotics (2004) A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Proc. 11th Int. Symp. Exp. Robotics (2004)
15.31
go back to reference C.E. Rasmussen, M. Kuss: Gaussian processes in reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 751–758 C.E. Rasmussen, M. Kuss: Gaussian processes in reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 751–758
15.32
go back to reference A. Rottmann, W. Burgard: Adaptive autonomous control using online value iteration with Gaussian processes, Proc. IEEE Int. Conf. Robotics Autom. (2009) A. Rottmann, W. Burgard: Adaptive autonomous control using online value iteration with Gaussian processes, Proc. IEEE Int. Conf. Robotics Autom. (2009)
15.33
go back to reference J.-J.E. Slotine, W. Li: Applied Nonlinear Control (Prentice Hall, Upper Saddle River 1991)MATH J.-J.E. Slotine, W. Li: Applied Nonlinear Control (Prentice Hall, Upper Saddle River 1991)MATH
15.34
go back to reference A. De Luca, P. Lucibello: A general algorithm for dynamic feedback linearization of robots with elastic joints, Proc. IEEE Int. Conf. Robotics Autom. (1998) A. De Luca, P. Lucibello: A general algorithm for dynamic feedback linearization of robots with elastic joints, Proc. IEEE Int. Conf. Robotics Autom. (1998)
15.35
go back to reference I. Jordan, D. Rumelhart: Forward models: Supervised learning with a distal teacher, Cognit. Sci. 16, 307–354 (1992)CrossRef I. Jordan, D. Rumelhart: Forward models: Supervised learning with a distal teacher, Cognit. Sci. 16, 307–354 (1992)CrossRef
15.36
go back to reference D.M. Wolpert, M. Kawato: Multiple paired forward and inverse models for motor control, Neural Netw. 11, 1317–1329 (1998)CrossRef D.M. Wolpert, M. Kawato: Multiple paired forward and inverse models for motor control, Neural Netw. 11, 1317–1329 (1998)CrossRef
15.37
go back to reference M. Kawato: Internal models for motor control and trajectory planning, Curr. Opin. Neurobiol. 9(6), 718–727 (1999)CrossRef M. Kawato: Internal models for motor control and trajectory planning, Curr. Opin. Neurobiol. 9(6), 718–727 (1999)CrossRef
15.38
go back to reference D.M. Wolpert, R.C. Miall, M. Kawato: Internal models in the cerebellum, Trends Cogn. Sci. 2(9), 338–347 (1998)CrossRef D.M. Wolpert, R.C. Miall, M. Kawato: Internal models in the cerebellum, Trends Cogn. Sci. 2(9), 338–347 (1998)CrossRef
15.39
go back to reference N. Bhushan, R. Shadmehr: Evidence for a forward dynamics model in human adaptive motor control, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 3–9 N. Bhushan, R. Shadmehr: Evidence for a forward dynamics model in human adaptive motor control, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 3–9
15.40
go back to reference K. Narendra, J. Balakrishnan, M. Ciliz: Adaptation and learning using multiple models, switching and tuning, IEEE Control Syst, Mag. 15(3), 37–51 (1995) K. Narendra, J. Balakrishnan, M. Ciliz: Adaptation and learning using multiple models, switching and tuning, IEEE Control Syst, Mag. 15(3), 37–51 (1995)
15.41
15.42
go back to reference M. Haruno, D.M. Wolpert, M. Kawato: Mosaic model for sensorimotor learning and control, Neural Comput. 13(10), 2201–2220 (2001)MATHCrossRef M. Haruno, D.M. Wolpert, M. Kawato: Mosaic model for sensorimotor learning and control, Neural Comput. 13(10), 2201–2220 (2001)MATHCrossRef
15.43
go back to reference J. Peters, S. Schaal: Learning to control in operational space, Int. J. Robotics Res. 27(2), 197–212 (2008)CrossRef J. Peters, S. Schaal: Learning to control in operational space, Int. J. Robotics Res. 27(2), 197–212 (2008)CrossRef
15.45
go back to reference R.M.C. De Keyser, A.R.V. Cauwenberghe: A self-tuning multistep predictor application, Automatica 17, 167–174 (1980)CrossRef R.M.C. De Keyser, A.R.V. Cauwenberghe: A self-tuning multistep predictor application, Automatica 17, 167–174 (1980)CrossRef
15.46
go back to reference S.S. Billings, S. Chen, G. Korenberg: Identification of mimo nonlinear systems using a forward-regression orthogonal estimator, Int. J. Control 49, 2157–2189 (1989)MathSciNetMATHCrossRef S.S. Billings, S. Chen, G. Korenberg: Identification of mimo nonlinear systems using a forward-regression orthogonal estimator, Int. J. Control 49, 2157–2189 (1989)MathSciNetMATHCrossRef
15.47
15.48
go back to reference J. Kocijan, R. Murray-Smith, C. Rasmussen, A. Girard: Gaussian process model based predictive control, Proc. Am. Control Conf. (2004) J. Kocijan, R. Murray-Smith, C. Rasmussen, A. Girard: Gaussian process model based predictive control, Proc. Am. Control Conf. (2004)
15.49
go back to reference A. Girard, C.E. Rasmussen, J.Q. Candela, R.M. Smith: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) pp. 545–552 A. Girard, C.E. Rasmussen, J.Q. Candela, R.M. Smith: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) pp. 545–552
15.50
go back to reference C.G. Atkeson, A. Moore, S. Stefan: Locally weighted learning for control, AI Review 11, 75–113 (1997) C.G. Atkeson, A. Moore, S. Stefan: Locally weighted learning for control, AI Review 11, 75–113 (1997)
15.51
go back to reference L. Ljung: System Identification – Theory for the User (Prentice-Hall, New Jersey 2004)MATH L. Ljung: System Identification – Theory for the User (Prentice-Hall, New Jersey 2004)MATH
15.52
go back to reference S. Haykin: Neural Networks: A Comprehensive Foundation (Prentice Hall, New Jersey 1999)MATH S. Haykin: Neural Networks: A Comprehensive Foundation (Prentice Hall, New Jersey 1999)MATH
15.53
go back to reference J.J. Steil: Backpropagation-decorrelation: Online recurrent learning with O(N) complexity, Proc. Int. Jt. Conf. Neural Netw. (2004) J.J. Steil: Backpropagation-decorrelation: Online recurrent learning with O(N) complexity, Proc. Int. Jt. Conf. Neural Netw. (2004)
15.54
go back to reference C.E. Rasmussen, C.K. Williams: Gaussian Processes for Machine Learning (MIT, Cambridge 2006)MATH C.E. Rasmussen, C.K. Williams: Gaussian Processes for Machine Learning (MIT, Cambridge 2006)MATH
15.55
go back to reference B. Schölkopf, A. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond (MIT, Cambridge 2002) B. Schölkopf, A. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond (MIT, Cambridge 2002)
15.56
go back to reference K.J. Aström, B. Wittenmark: Adaptive Control (Addison Wesley, Boston 1995)MATH K.J. Aström, B. Wittenmark: Adaptive Control (Addison Wesley, Boston 1995)MATH
15.57
go back to reference F.J. Coito, J.M. Lemos: A long-range adaptive controller for robot manipulators, Int. J. Robotics Res. 10, 684–707 (1991)CrossRef F.J. Coito, J.M. Lemos: A long-range adaptive controller for robot manipulators, Int. J. Robotics Res. 10, 684–707 (1991)CrossRef
15.58
go back to reference P. Vempaty, K. Cheok, R. Loh: Model reference adaptive control for actuators of a biped robot locomotion, Proc. World Congr. Eng. Comput. Sci. (2009) P. Vempaty, K. Cheok, R. Loh: Model reference adaptive control for actuators of a biped robot locomotion, Proc. World Congr. Eng. Comput. Sci. (2009)
15.59
go back to reference J.R. Layne, K.M. Passino: Fuzzy model reference learning control, J. Intell. Fuzzy Syst. 4, 33–47 (1996)CrossRef J.R. Layne, K.M. Passino: Fuzzy model reference learning control, J. Intell. Fuzzy Syst. 4, 33–47 (1996)CrossRef
15.60
go back to reference J. Nakanishi, J.A. Farrell, S. Schaal: Composite adaptive control with locally weighted statistical learning, Neural Netw. 18(1), 71–90 (2005)MATHCrossRef J. Nakanishi, J.A. Farrell, S. Schaal: Composite adaptive control with locally weighted statistical learning, Neural Netw. 18(1), 71–90 (2005)MATHCrossRef
15.61
go back to reference J.J. Craig: Introduction to Robotics: Mechanics and Control (Prentice Hall, Upper Saddle River 2004) J.J. Craig: Introduction to Robotics: Mechanics and Control (Prentice Hall, Upper Saddle River 2004)
15.62
go back to reference M.W. Spong, S. Hutchinson, M. Vidyasagar: Robot Dynamics and Control (Wiley, New York 2006) M.W. Spong, S. Hutchinson, M. Vidyasagar: Robot Dynamics and Control (Wiley, New York 2006)
15.63
go back to reference S. Schaal, C.G. Atkeson, S. Vijayakumar: Scalable techniques from nonparametric statistics for real-time robot learning, Appl. Intell. 17(1), 49–60 (2002)MATHCrossRef S. Schaal, C.G. Atkeson, S. Vijayakumar: Scalable techniques from nonparametric statistics for real-time robot learning, Appl. Intell. 17(1), 49–60 (2002)MATHCrossRef
15.64
go back to reference H. Cao, Y. Yin, D. Du, L. Lin, W. Gu, Z. Yang: Neural network inverse dynamic online learning control on physical exoskeleton, 13th Int. Conf. Neural Inform. Process. (2006) H. Cao, Y. Yin, D. Du, L. Lin, W. Gu, Z. Yang: Neural network inverse dynamic online learning control on physical exoskeleton, 13th Int. Conf. Neural Inform. Process. (2006)
15.65
go back to reference C.G. Atkeson, C.H. An, J.M. Hollerbach: Estimation of inertial parameters of manipulator loads and links, Int. J. Robotics Res. 5(3), 101–119 (1986)CrossRef C.G. Atkeson, C.H. An, J.M. Hollerbach: Estimation of inertial parameters of manipulator loads and links, Int. J. Robotics Res. 5(3), 101–119 (1986)CrossRef
15.66
go back to reference E. Burdet, B. Sprenger, A. Codourey: Experiments in nonlinear adaptive control, Int. Conf. Robotics Autom. 1, 537–542 (1997)CrossRef E. Burdet, B. Sprenger, A. Codourey: Experiments in nonlinear adaptive control, Int. Conf. Robotics Autom. 1, 537–542 (1997)CrossRef
15.67
go back to reference E. Burdet, A. Codourey: Evaluation of parametric and nonparametric nonlinear adaptive controllers, Robotica 16(1), 59–73 (1998)CrossRef E. Burdet, A. Codourey: Evaluation of parametric and nonparametric nonlinear adaptive controllers, Robotica 16(1), 59–73 (1998)CrossRef
15.69
go back to reference H.D. Patino, R. Carelli, B.R. Kuchen: Neural networks for advanced control of robot manipulators, IEEE Trans. Neural Netw. 13(2), 343–354 (2002)CrossRef H.D. Patino, R. Carelli, B.R. Kuchen: Neural networks for advanced control of robot manipulators, IEEE Trans. Neural Netw. 13(2), 343–354 (2002)CrossRef
15.70
go back to reference D. Nguyen-Tuong, J. Peters: Incremental sparsification for real-time online model learning, Neurocomputing 74(11), 1859–1867 (2011)CrossRef D. Nguyen-Tuong, J. Peters: Incremental sparsification for real-time online model learning, Neurocomputing 74(11), 1859–1867 (2011)CrossRef
15.71
go back to reference D. Nguyen-Tuong, J. Peters: Using model knowledge for learning inverse dynamics, Proc. IEEE Int. Conf. Robotics Autom. (2010) D. Nguyen-Tuong, J. Peters: Using model knowledge for learning inverse dynamics, Proc. IEEE Int. Conf. Robotics Autom. (2010)
15.72
go back to reference S.S. Ge, T.H. Lee, E.G. Tan: Adaptive neural network control of flexible joint robots based on feedback linearization, Int. J. Syst. Sci. 29(6), 623–635 (1998)CrossRef S.S. Ge, T.H. Lee, E.G. Tan: Adaptive neural network control of flexible joint robots based on feedback linearization, Int. J. Syst. Sci. 29(6), 623–635 (1998)CrossRef
15.73
go back to reference C.M. Chow, A.G. Kuznetsov, D.W. Clarke: Successive one-step-ahead predictions in multiple model predictive control, Int. J. Control 29, 971–979 (1998)MATH C.M. Chow, A.G. Kuznetsov, D.W. Clarke: Successive one-step-ahead predictions in multiple model predictive control, Int. J. Control 29, 971–979 (1998)MATH
15.74
go back to reference M. Kawato: Feedback error learning neural network for supervised motor learning. In: Advanced Neural Computers, ed. by R. Eckmiller (Elsevier, North-Holland, Amsterdam 1990) pp. 365–372 M. Kawato: Feedback error learning neural network for supervised motor learning. In: Advanced Neural Computers, ed. by R. Eckmiller (Elsevier, North-Holland, Amsterdam 1990) pp. 365–372
15.75
go back to reference J. Nakanishi, S. Schaal: Feedback error learning and nonlinear adaptive control, Neural Netw. 17(10), 1453–1465 (2004)MATHCrossRef J. Nakanishi, S. Schaal: Feedback error learning and nonlinear adaptive control, Neural Netw. 17(10), 1453–1465 (2004)MATHCrossRef
15.76
go back to reference T. Shibata, C. Schaal: Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks, Neural Netw. 14(2), 201–216 (2001)CrossRef T. Shibata, C. Schaal: Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks, Neural Netw. 14(2), 201–216 (2001)CrossRef
15.77
go back to reference H. Miyamoto, M. Kawato, T. Setoyama, R. Suzuki: Feedback-error-learning neural network for trajectory control of a robotic manipulator, Neural Netw. 1(3), 251–265 (1988)CrossRef H. Miyamoto, M. Kawato, T. Setoyama, R. Suzuki: Feedback-error-learning neural network for trajectory control of a robotic manipulator, Neural Netw. 1(3), 251–265 (1988)CrossRef
15.78
go back to reference H. Gomi, M. Kawato: Recognition of manipulated objects by motor learning with modular architecture networks, Neural Netw. 6(4), 485–497 (1993)CrossRef H. Gomi, M. Kawato: Recognition of manipulated objects by motor learning with modular architecture networks, Neural Netw. 6(4), 485–497 (1993)CrossRef
15.79
go back to reference A. D'Souza, S. Vijayakumar, S. Schaal: Learning inverse kinematics, IEEE Int. Conf. Intell. Robots Syst. (2001) A. D'Souza, S. Vijayakumar, S. Schaal: Learning inverse kinematics, IEEE Int. Conf. Intell. Robots Syst. (2001)
15.80
go back to reference S. Vijayakumar, S. Schaal: Locally weighted projection regression: An O(N) algorithm for incremental real time learning in high dimensional space, Proc. 16th Int. Conf. Mach. Learn. (2000) S. Vijayakumar, S. Schaal: Locally weighted projection regression: An O(N) algorithm for incremental real time learning in high dimensional space, Proc. 16th Int. Conf. Mach. Learn. (2000)
15.81
go back to reference M. Toussaint, S. Vijayakumar: Learning discontinuities with products-of-sigmoids for switching between local models, Proc. 22nd Int. Conf. Mach. Learn. (2005) M. Toussaint, S. Vijayakumar: Learning discontinuities with products-of-sigmoids for switching between local models, Proc. 22nd Int. Conf. Mach. Learn. (2005)
15.82
go back to reference J. Tenenbaum, V. de Silva, J. Langford: A global geometric framework for nonlinear dimensionality reduction, Science 290, 2319–2323 (2000)CrossRef J. Tenenbaum, V. de Silva, J. Langford: A global geometric framework for nonlinear dimensionality reduction, Science 290, 2319–2323 (2000)CrossRef
15.83
go back to reference S. Roweis, L. Saul: Nonlinear dimensionality reduction by locally linear embedding, Science 290, 2323 (2000)CrossRef S. Roweis, L. Saul: Nonlinear dimensionality reduction by locally linear embedding, Science 290, 2323 (2000)CrossRef
15.84
go back to reference H. Hoffman, S. Schaal, S. Vijayakumar: Local dimensionality reduction for non-parametric regression, Neural Process. Lett. 29(2), 109–131 (2009)CrossRef H. Hoffman, S. Schaal, S. Vijayakumar: Local dimensionality reduction for non-parametric regression, Neural Process. Lett. 29(2), 109–131 (2009)CrossRef
15.85
go back to reference S. Thrun, T. Mitchell: Lifelong robot learning, Robotics Auton. Syst. 15, 25–46 (1995)CrossRef S. Thrun, T. Mitchell: Lifelong robot learning, Robotics Auton. Syst. 15, 25–46 (1995)CrossRef
15.86
go back to reference Y. Engel, S. Mannor, R. Meir: Sparse online greedy support vector regression, Eur. Conf. Mach. Learn. (2002) Y. Engel, S. Mannor, R. Meir: Sparse online greedy support vector regression, Eur. Conf. Mach. Learn. (2002)
15.87
15.88
go back to reference C.E. Rasmussen: Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression (University of Toronto, Toronto 1996) C.E. Rasmussen: Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression (University of Toronto, Toronto 1996)
15.89
go back to reference L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines (MIT, Cambridge 2007)CrossRef L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines (MIT, Cambridge 2007)CrossRef
15.90
go back to reference J.Q. Candela, C.E. Rasmussen: A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6, 1939–1959 (2005)MathSciNetMATH J.Q. Candela, C.E. Rasmussen: A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6, 1939–1959 (2005)MathSciNetMATH
15.91
go back to reference R. Genov, S. Chakrabartty, G. Cauwenberghs: Silicon support vector machine with online learning, Int. J. Pattern Recognit. Articial Intell. 17, 385–404 (2003)CrossRef R. Genov, S. Chakrabartty, G. Cauwenberghs: Silicon support vector machine with online learning, Int. J. Pattern Recognit. Articial Intell. 17, 385–404 (2003)CrossRef
15.92
go back to reference S. Vijayakumar, A. D'Souza, S. Schaal: Incremental online learning in high dimensions, Neural Comput. 12(11), 2602–2634 (2005)MathSciNetCrossRef S. Vijayakumar, A. D'Souza, S. Schaal: Incremental online learning in high dimensions, Neural Comput. 12(11), 2602–2634 (2005)MathSciNetCrossRef
15.93
go back to reference B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior knowledge in support vector kernel, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 640–646 B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior knowledge in support vector kernel, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 640–646
15.94
go back to reference E. Krupka, N. Tishby: Incorporating prior knowledge on features into learning, Int. Conf. Artif. Intell. Stat. (San Juan, Puerto Rico 2007) E. Krupka, N. Tishby: Incorporating prior knowledge on features into learning, Int. Conf. Artif. Intell. Stat. (San Juan, Puerto Rico 2007)
15.95
go back to reference A. Smola, T. Friess, B. Schoelkopf: Semiparametric support vector and linear programming machines, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 585–591 A. Smola, T. Friess, B. Schoelkopf: Semiparametric support vector and linear programming machines, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 585–591
15.96
go back to reference B.J. Kröse, N. Vlassis, R. Bunschoten, Y. Motomura: A probabilistic model for appearance-based robot localization, Image Vis. Comput. 19, 381–391 (2001)CrossRef B.J. Kröse, N. Vlassis, R. Bunschoten, Y. Motomura: A probabilistic model for appearance-based robot localization, Image Vis. Comput. 19, 381–391 (2001)CrossRef
15.97
go back to reference M.K. Titsias, N.D. Lawrence: Bayesian Gaussian process latent variable model, Proc. 13th Int. Conf. Artif. Intell. Stat. (2010) M.K. Titsias, N.D. Lawrence: Bayesian Gaussian process latent variable model, Proc. 13th Int. Conf. Artif. Intell. Stat. (2010)
15.98
go back to reference R. Jacobs, M. Jordan, S. Nowlan, G.E. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)CrossRef R. Jacobs, M. Jordan, S. Nowlan, G.E. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)CrossRef
15.99
go back to reference S. Calinon, F. D'halluin, E. Sauser, D. Caldwell, A. Billard: A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation, IEEE Robotics Autom. Mag. 17, 44–54 (2010)CrossRef S. Calinon, F. D'halluin, E. Sauser, D. Caldwell, A. Billard: A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation, IEEE Robotics Autom. Mag. 17, 44–54 (2010)CrossRef
15.100
go back to reference V. Treps: A bayesian committee machine, Neural Comput. 12(11), 2719–2741 (2000)CrossRef V. Treps: A bayesian committee machine, Neural Comput. 12(11), 2719–2741 (2000)CrossRef
15.101
go back to reference L. Csato, M. Opper: Sparse online Gaussian processes, Neural Comput. 14(3), 641–668 (2002)MATHCrossRef L. Csato, M. Opper: Sparse online Gaussian processes, Neural Comput. 14(3), 641–668 (2002)MATHCrossRef
15.102
go back to reference D.H. Grollman, O.C. Jenkins: Sparse incremental learning for interactive robot control policy estimation, IEEE Int. Conf. Robotics Autom., Pasadena (2008) D.H. Grollman, O.C. Jenkins: Sparse incremental learning for interactive robot control policy estimation, IEEE Int. Conf. Robotics Autom., Pasadena (2008)
15.103
go back to reference M. Seeger: Gaussian processes for machine learning, Int. J. Neural Syst. 14(2), 69–106 (2004)CrossRef M. Seeger: Gaussian processes for machine learning, Int. J. Neural Syst. 14(2), 69–106 (2004)CrossRef
15.104
go back to reference C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy, W. Burgard: Learning predictive terrain models for legged robot locomotion, Proc. IEEE Int. Conf. Intell. Robots Syst. (2008) C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy, W. Burgard: Learning predictive terrain models for legged robot locomotion, Proc. IEEE Int. Conf. Intell. Robots Syst. (2008)
15.105
go back to reference J. Ko, D. Fox: GP-bayesfilters: Bayesian filtering using Gaussian process prediction and observation models, Auton. Robots 27(1), 75–90 (2009)CrossRef J. Ko, D. Fox: GP-bayesfilters: Bayesian filtering using Gaussian process prediction and observation models, Auton. Robots 27(1), 75–90 (2009)CrossRef
15.106
go back to reference J.P. Ferreira, M. Crisostomo, A.P. Coimbra, B. Ribeiro: Simulation control of a biped robot with support vector regression, IEEE Int. Symp. Intell. Signal Process. (2007) J.P. Ferreira, M. Crisostomo, A.P. Coimbra, B. Ribeiro: Simulation control of a biped robot with support vector regression, IEEE Int. Symp. Intell. Signal Process. (2007)
15.107
go back to reference R. Pelossof, A. Miller, P. Allen, T. Jebara: An SVM learning approach to robotic grasping, IEEE Int. Conf. Robotics Autom. (2004) R. Pelossof, A. Miller, P. Allen, T. Jebara: An SVM learning approach to robotic grasping, IEEE Int. Conf. Robotics Autom. (2004)
15.108
go back to reference J. Ma, J. Theiler, S. Perkins: Accurate on-line support vector regression, Neural Comput. 15, 2683–2703 (2005)MATHCrossRef J. Ma, J. Theiler, S. Perkins: Accurate on-line support vector regression, Neural Comput. 15, 2683–2703 (2005)MATHCrossRef
15.109
go back to reference Y. Choi, S.Y. Cheong, N. Schweighofer: Local online support vector regression for learning control, Proc. IEEE Int. Symp. Comput. Intell. Robotics Autom. (2007) Y. Choi, S.Y. Cheong, N. Schweighofer: Local online support vector regression for learning control, Proc. IEEE Int. Symp. Comput. Intell. Robotics Autom. (2007)
15.110
go back to reference J.-A. Ting, A. D'Souza, S. Schaal: Bayesian robot system identification with input and output noise, Neural Netw. 24(1), 99–108 (2011)MATHCrossRef J.-A. Ting, A. D'Souza, S. Schaal: Bayesian robot system identification with input and output noise, Neural Netw. 24(1), 99–108 (2011)MATHCrossRef
15.111
go back to reference S. Nowlan, G.E. Hinton: Evaluation of adaptive mixtures of competing experts, Adv. Neural Inform. Process. Syst., Vol. 3 (1991) pp. 774–780 S. Nowlan, G.E. Hinton: Evaluation of adaptive mixtures of competing experts, Adv. Neural Inform. Process. Syst., Vol. 3 (1991) pp. 774–780
15.112
go back to reference V. Treps: Mixtures of Gaussian processes, Adv. Neural Inform. Process. Syst., Vol. 13 (2001) pp. 654–660 V. Treps: Mixtures of Gaussian processes, Adv. Neural Inform. Process. Syst., Vol. 13 (2001) pp. 654–660
15.113
go back to reference C.E. Rasmussen, Z. Ghahramani: Infinite mixtures of Gaussian process experts, Adv. Neural Inform. Process. Syst., Vol. 14 (2002) pp. 881–888 C.E. Rasmussen, Z. Ghahramani: Infinite mixtures of Gaussian process experts, Adv. Neural Inform. Process. Syst., Vol. 14 (2002) pp. 881–888
15.114
go back to reference T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning (Springer, New York, 2001)MATHCrossRef T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning (Springer, New York, 2001)MATHCrossRef
15.115
go back to reference W.K. Haerdle, M. Mueller, S. Sperlich, A. Werwatz: Nonparametric and Semiparametric Models (Springer, New York 2004)CrossRef W.K. Haerdle, M. Mueller, S. Sperlich, A. Werwatz: Nonparametric and Semiparametric Models (Springer, New York 2004)CrossRef
15.116
go back to reference D.J. MacKay: A practical Bayesian framework for back-propagation networks, Computation 4(3), 448–472 (1992) D.J. MacKay: A practical Bayesian framework for back-propagation networks, Computation 4(3), 448–472 (1992)
15.117
go back to reference R.M. Neal: Bayesian Learning for Neural Networks, Lecture Notes in Statistics, Vol. 118 (Springer, New York 1996)MATHCrossRef R.M. Neal: Bayesian Learning for Neural Networks, Lecture Notes in Statistics, Vol. 118 (Springer, New York 1996)MATHCrossRef
15.118
go back to reference B. Schölkopf, A.J. Smola, R. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)CrossRef B. Schölkopf, A.J. Smola, R. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)CrossRef
15.119
go back to reference C. Plagemann, K. Kersting, P. Pfaff, W. Burgard: Heteroscedastic Gaussian process regression for modeling range sensors in mobile robotics, Snowbird Learn. Workshop (2007) C. Plagemann, K. Kersting, P. Pfaff, W. Burgard: Heteroscedastic Gaussian process regression for modeling range sensors in mobile robotics, Snowbird Learn. Workshop (2007)
15.120
go back to reference W.S. Cleveland, C.L. Loader: Smoothing by local regression: Principles and methods. In: Statistical Theory and Computational Aspects of Smoothing, ed. by W. Härdle, M.G. Schimele (Physica, Heidelberg 1996) W.S. Cleveland, C.L. Loader: Smoothing by local regression: Principles and methods. In: Statistical Theory and Computational Aspects of Smoothing, ed. by W. Härdle, M.G. Schimele (Physica, Heidelberg 1996)
15.121
go back to reference J. Fan, I. Gijbels: Local Polynomial Modelling and Its Applications (Chapman Hall, New York 1996)MATH J. Fan, I. Gijbels: Local Polynomial Modelling and Its Applications (Chapman Hall, New York 1996)MATH
15.122
go back to reference J. Fan, I. Gijbels: Data driven bandwidth selection in local polynomial fitting, J. R. Stat. Soc. 57(2), 371–394 (1995)MATH J. Fan, I. Gijbels: Data driven bandwidth selection in local polynomial fitting, J. R. Stat. Soc. 57(2), 371–394 (1995)MATH
15.123
go back to reference A. Moore, M.S. Lee: Efficient algorithms for minimizing cross validation error, Proc. 11th Int. Conf. Mach. Learn. (1994) A. Moore, M.S. Lee: Efficient algorithms for minimizing cross validation error, Proc. 11th Int. Conf. Mach. Learn. (1994)
15.124
go back to reference A. Moore: Fast, robust adaptive control by learning only forward models, Adv. Neural Inform. Process. Syst., Vol. 4 (1992) pp. 571–578 A. Moore: Fast, robust adaptive control by learning only forward models, Adv. Neural Inform. Process. Syst., Vol. 4 (1992) pp. 571–578
15.125
go back to reference C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning for control, Artif. Intell. Rev. 11, 75–113 (1997)CrossRef C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning for control, Artif. Intell. Rev. 11, 75–113 (1997)CrossRef
15.126
go back to reference G. Tevatia, S. Schaal: Efficient Inverse Kinematics Algorithms for High-Dimensional Movement Systems (University of Southern California, Los Angeles 2008) G. Tevatia, S. Schaal: Efficient Inverse Kinematics Algorithms for High-Dimensional Movement Systems (University of Southern California, Los Angeles 2008)
15.127
go back to reference C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning, Artif. Intell. Rev. 11(1–5), 11–73 (1997)CrossRef C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning, Artif. Intell. Rev. 11(1–5), 11–73 (1997)CrossRef
15.128
go back to reference N.U. Edakunni, S. Schaal, S. Vijayakumar: Kernel carpentry for online regression using randomly varying coefficient model, Proc. 20th Int. Jt. Conf. Artif. Intell. (2007) N.U. Edakunni, S. Schaal, S. Vijayakumar: Kernel carpentry for online regression using randomly varying coefficient model, Proc. 20th Int. Jt. Conf. Artif. Intell. (2007)
15.129
go back to reference D.H. Jacobson, D.Q. Mayne: Differential Dynamic Programming (American Elsevier, New York 1973)MATH D.H. Jacobson, D.Q. Mayne: Differential Dynamic Programming (American Elsevier, New York 1973)MATH
15.130
go back to reference C.G. Atkeson, S. Schaal: Robot learning from demonstration, Proc. 14th Int. Conf. Mach. Learn. (1997) C.G. Atkeson, S. Schaal: Robot learning from demonstration, Proc. 14th Int. Conf. Mach. Learn. (1997)
15.131
go back to reference J. Morimoto, G. Zeglin, C.G. Atkeson: Minimax differential dynamic programming: Application to a biped walking robot, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2003) J. Morimoto, G. Zeglin, C.G. Atkeson: Minimax differential dynamic programming: Application to a biped walking robot, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2003)
15.132
go back to reference P. Abbeel, A. Coates, M. Quigley, A.Y. Ng: An application of reinforcement learning to aerobatic helicopter flight, Adv. Neural Inform. Process. Syst., Vol. 19 (2007) pp. 1–8 P. Abbeel, A. Coates, M. Quigley, A.Y. Ng: An application of reinforcement learning to aerobatic helicopter flight, Adv. Neural Inform. Process. Syst., Vol. 19 (2007) pp. 1–8
15.133
go back to reference P.W. Glynn: Likelihood ratio gradient estimation: An overview, Proc. Winter Simul. Conf. (1987) P.W. Glynn: Likelihood ratio gradient estimation: An overview, Proc. Winter Simul. Conf. (1987)
15.134
go back to reference A.Y. Ng, M. Jordan: Pegasus: A policy search method for large MDPs and POMDPs, Proc. 16th Conf. Uncertain. Artif. Intell. (2000) A.Y. Ng, M. Jordan: Pegasus: A policy search method for large MDPs and POMDPs, Proc. 16th Conf. Uncertain. Artif. Intell. (2000)
15.135
go back to reference B.M. Akesson, H.T. Toivonen: A neural network model predictive controller, J. Process Control 16(9), 937–946 (2006)CrossRef B.M. Akesson, H.T. Toivonen: A neural network model predictive controller, J. Process Control 16(9), 937–946 (2006)CrossRef
15.136
go back to reference D. Gu, H. Hu: Predictive control for a car-like mobile robot, Robotics Auton. Syst. 39, 73–86 (2002)CrossRef D. Gu, H. Hu: Predictive control for a car-like mobile robot, Robotics Auton. Syst. 39, 73–86 (2002)CrossRef
15.137
go back to reference E.A. Wan, A.A. Bogdanov: Model predictive neural control with applications to a 6 DOF helicopter model, Proc. Am. Control Conf. (2001) E.A. Wan, A.A. Bogdanov: Model predictive neural control with applications to a 6 DOF helicopter model, Proc. Am. Control Conf. (2001)
15.138
go back to reference O. Khatib: A unified approach for motion and force control of robot manipulators: The operational space formulation, J. Robotics Autom. 3(1), 43–53 (1987)CrossRef O. Khatib: A unified approach for motion and force control of robot manipulators: The operational space formulation, J. Robotics Autom. 3(1), 43–53 (1987)CrossRef
15.139
go back to reference J. Peters, M. Mistry, F.E. Udwadia, J. Nakanishi, S. Schaal: A unifying methodology for robot control with redundant dofs, Auton. Robots 24(1), 1–12 (2008)CrossRef J. Peters, M. Mistry, F.E. Udwadia, J. Nakanishi, S. Schaal: A unifying methodology for robot control with redundant dofs, Auton. Robots 24(1), 1–12 (2008)CrossRef
15.140
go back to reference C. Salaun, V. Padois, O. Sigaud: Control of redundant robots using learned models: An operational space control approach, Proc. IEEE Int. Conf. Intell. Robots Syst. (2009) C. Salaun, V. Padois, O. Sigaud: Control of redundant robots using learned models: An operational space control approach, Proc. IEEE Int. Conf. Intell. Robots Syst. (2009)
15.141
go back to reference F.R. Reinhart, J.J. Steil: Recurrent neural associative learning of forward and inverse kinematics for movement generation of the redundant PA-10 robot, Symp. Learn. Adapt. Behav. Robotics Syst. (2008) F.R. Reinhart, J.J. Steil: Recurrent neural associative learning of forward and inverse kinematics for movement generation of the redundant PA-10 robot, Symp. Learn. Adapt. Behav. Robotics Syst. (2008)
15.142
go back to reference J.Q. Candela, C.E. Rasmussen, C.K. Williams: Large Scale Kernel Machines (MIT, Cambridge 2007) J.Q. Candela, C.E. Rasmussen, C.K. Williams: Large Scale Kernel Machines (MIT, Cambridge 2007)
15.143
go back to reference S. Ben-David, R. Schuller: Exploiting task relatedness for multiple task learning, Proc. Conf. Learn. Theory (2003) S. Ben-David, R. Schuller: Exploiting task relatedness for multiple task learning, Proc. Conf. Learn. Theory (2003)
15.144
go back to reference I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATH
15.145
go back to reference O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)CrossRef O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)CrossRef
15.146
go back to reference J.D. Lafferty, A. McCallum, F.C.N. Pereira: Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th Int. Conf. Mach. Learn. (2001) J.D. Lafferty, A. McCallum, F.C.N. Pereira: Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th Int. Conf. Mach. Learn. (2001)
15.147
go back to reference K. Muelling, J. Kober, O. Kroemer, J. Peters: Learning to select and generalize striking movements in robot table tennis, Int. J. Robotics Res. 32(3), 263–279 (2012)CrossRef K. Muelling, J. Kober, O. Kroemer, J. Peters: Learning to select and generalize striking movements in robot table tennis, Int. J. Robotics Res. 32(3), 263–279 (2012)CrossRef
15.148
go back to reference S. Mahadevan, J. Connell: Automatic programming of behavior-based robots using reinforcement learning, Artif. Intell. 55(2/3), 311–365 (1992)CrossRef S. Mahadevan, J. Connell: Automatic programming of behavior-based robots using reinforcement learning, Artif. Intell. 55(2/3), 311–365 (1992)CrossRef
15.149
go back to reference V. Gullapalli, J.A. Franklin, H. Benbrahim: Acquiring robot skills via reinforcement learning, IEEE Control Syst. Mag. 14(1), 13–24 (1994)CrossRef V. Gullapalli, J.A. Franklin, H. Benbrahim: Acquiring robot skills via reinforcement learning, IEEE Control Syst. Mag. 14(1), 13–24 (1994)CrossRef
15.150
go back to reference J.A. Bagnell, J.C. Schneider: Autonomous helicopter control using reinforcement learning policy search methods, IEEE Int. Conf. Robotics Autom. (2001) J.A. Bagnell, J.C. Schneider: Autonomous helicopter control using reinforcement learning policy search methods, IEEE Int. Conf. Robotics Autom. (2001)
15.151
go back to reference S. Schaal: Learning from demonstration, Adv. Neural Inform. Process. Syst., Vol. 9 (1996) pp. 1040–1046 S. Schaal: Learning from demonstration, Adv. Neural Inform. Process. Syst., Vol. 9 (1996) pp. 1040–1046
15.152
go back to reference W. B. Powell: AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization, Tech. Rep. (Princeton University, Princeton 2012) W. B. Powell: AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization, Tech. Rep. (Princeton University, Princeton 2012)
15.153
go back to reference C.G. Atkeson: Nonparametric model-based reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 1008–1014 C.G. Atkeson: Nonparametric model-based reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 1008–1014
15.154
go back to reference A. Coates, P. Abbeel, A.Y. Ng: Apprenticeship learning for helicopter control, Communication ACM 52(7), 97–105 (2009)CrossRef A. Coates, P. Abbeel, A.Y. Ng: Apprenticeship learning for helicopter control, Communication ACM 52(7), 97–105 (2009)CrossRef
15.155
go back to reference R.S. Sutton, A.G. Barto, R.J. Williams: Reinforcement learning is direct adaptive optimal control, Am. Control Conf. (1991) R.S. Sutton, A.G. Barto, R.J. Williams: Reinforcement learning is direct adaptive optimal control, Am. Control Conf. (1991)
15.156
go back to reference A.D. Laud: Theory and Application of Reward Shaping in Reinforcement Learning (University of Illinois, Urbana-Champaign 2004) A.D. Laud: Theory and Application of Reward Shaping in Reinforcement Learning (University of Illinois, Urbana-Champaign 2004)
15.157
go back to reference M.P. Deisenrot, C.E. Rasmussen: PILCO: A model-based and data-efficient approach to policy search, 28th Int. Conf. Mach. Learn. (2011) M.P. Deisenrot, C.E. Rasmussen: PILCO: A model-based and data-efficient approach to policy search, 28th Int. Conf. Mach. Learn. (2011)
15.158
go back to reference H. Miyamoto, S. Schaal, F. Gandolfo, H. Gomi, Y. Koike, R. Osu, E. Nakano, Y. Wada, M. Kawato: A Kendama learning robot based on bidirectional theory, Neural Netw. 9(8), 1281–1302 (1996)CrossRef H. Miyamoto, S. Schaal, F. Gandolfo, H. Gomi, Y. Koike, R. Osu, E. Nakano, Y. Wada, M. Kawato: A Kendama learning robot based on bidirectional theory, Neural Netw. 9(8), 1281–1302 (1996)CrossRef
15.159
go back to reference N. Kohl, P. Stone: Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE Int. Conf. Robotics Autom. (2004) N. Kohl, P. Stone: Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE Int. Conf. Robotics Autom. (2004)
15.160
go back to reference R. Tedrake, T.W. Zhang, H.S. Seung: Learning to walk in 20 minutes, Yale Workshop Adapt. Learn. Syst. (2005) R. Tedrake, T.W. Zhang, H.S. Seung: Learning to walk in 20 minutes, Yale Workshop Adapt. Learn. Syst. (2005)
15.161
go back to reference J. Peters, S. Schaal: Reinforcement learning of motor skills with policy gradients, Neural Netw. 21(4), 682–697 (2008)CrossRef J. Peters, S. Schaal: Reinforcement learning of motor skills with policy gradients, Neural Netw. 21(4), 682–697 (2008)CrossRef
15.162
go back to reference J. Peters, S. Schaal: Natural actor-critic, Neurocomputing 71(7–9), 1180–1190 (2008)CrossRef J. Peters, S. Schaal: Natural actor-critic, Neurocomputing 71(7–9), 1180–1190 (2008)CrossRef
15.163
go back to reference J. Kober, J. Peters: Policy search for motor primitives in robotics, Adv. Neural Inform. Process. Syst., Vol. 21 (2009) pp. 849–856 J. Kober, J. Peters: Policy search for motor primitives in robotics, Adv. Neural Inform. Process. Syst., Vol. 21 (2009) pp. 849–856
15.164
go back to reference M.P. Deisenroth, C.E. Rasmussen, D. Fox: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems VII, ed. by H. Durrand-Whyte, N. Roy, P. Abbeel (MIT, Cambridge 2011) M.P. Deisenroth, C.E. Rasmussen, D. Fox: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems VII, ed. by H. Durrand-Whyte, N. Roy, P. Abbeel (MIT, Cambridge 2011)
15.165
go back to reference L.P. Kaelbling, M.L. Littman, A.W. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)CrossRef L.P. Kaelbling, M.L. Littman, A.W. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)CrossRef
15.166
go back to reference M.E. Lewis, M.L. Puterman: The Handbook of Markov Decision Processes: Methods and Applications (Kluwer, Dordrecht 2001) pp. 89–111 M.E. Lewis, M.L. Puterman: The Handbook of Markov Decision Processes: Methods and Applications (Kluwer, Dordrecht 2001) pp. 89–111
15.167
go back to reference J. Peters, S. Vijayakumar, S. Schaal: Linear Quadratic Regulation as Benchmark for Policy Gradient Methods, Technical Report (University of Southern California, Los Angeles 2004) J. Peters, S. Vijayakumar, S. Schaal: Linear Quadratic Regulation as Benchmark for Policy Gradient Methods, Technical Report (University of Southern California, Los Angeles 2004)
15.168
go back to reference R.E. Bellman: Dynamic Programming (Princeton Univ. Press, Princeton 1957)MATH R.E. Bellman: Dynamic Programming (Princeton Univ. Press, Princeton 1957)MATH
15.169
go back to reference R.S. Sutton, D. McAllester, S.P. Singh, Y. Mansour: Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inform. Process. Syst., Vol. 12 (1999) pp. 1057–1063 R.S. Sutton, D. McAllester, S.P. Singh, Y. Mansour: Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inform. Process. Syst., Vol. 12 (1999) pp. 1057–1063
15.170
go back to reference T. Jaakkola, M.I. Jordan, S.P. Singh: Convergence of stochastic iterative dynamic programming algorithms, Adv. Neural Inform. Process. Syst., Vol. 6 (1993) pp. 703–710 T. Jaakkola, M.I. Jordan, S.P. Singh: Convergence of stochastic iterative dynamic programming algorithms, Adv. Neural Inform. Process. Syst., Vol. 6 (1993) pp. 703–710
15.172
go back to reference D.E. Kirk: Optimal Control Theory (Prentice-Hall, Englewood Cliffs 1970) D.E. Kirk: Optimal Control Theory (Prentice-Hall, Englewood Cliffs 1970)
15.173
go back to reference A. Schwartz: A reinforcement learning method for maximizing undiscounted rewards, Int. Conf. Mach. Learn. (1993) A. Schwartz: A reinforcement learning method for maximizing undiscounted rewards, Int. Conf. Mach. Learn. (1993)
15.174
go back to reference C.G. Atkeson, S. Schaal: Robot learning from demonstration, Int. Conf. Mach. Learn. (1997) C.G. Atkeson, S. Schaal: Robot learning from demonstration, Int. Conf. Mach. Learn. (1997)
15.175
go back to reference J. Peters, K. Muelling, Y. Altun: Relative entropy policy search, Natl. Conf. Artif. Intell. (2010) J. Peters, K. Muelling, Y. Altun: Relative entropy policy search, Natl. Conf. Artif. Intell. (2010)
15.176
go back to reference G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robotics Res. 27(2), 213–228 (2008)CrossRef G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robotics Res. 27(2), 213–228 (2008)CrossRef
15.177
go back to reference F. Guenter, M. Hersch, S. Calinon, A. Billard: Reinforcement learning for imitating constrained reaching movements, Adv. Robotics 21(13), 1521–1544 (2007)CrossRef F. Guenter, M. Hersch, S. Calinon, A. Billard: Reinforcement learning for imitating constrained reaching movements, Adv. Robotics 21(13), 1521–1544 (2007)CrossRef
15.178
go back to reference J.Z. Kolter, A.Y. Ng: Policy search via the signed derivative, Robotics Sci. Syst. V, Seattle (2009) J.Z. Kolter, A.Y. Ng: Policy search via the signed derivative, Robotics Sci. Syst. V, Seattle (2009)
15.179
go back to reference A.Y. Ng, H.J. Kim, M.I. Jordan, S. Sastry: Autonomous helicopter flight via reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2004) pp. 799–806 A.Y. Ng, H.J. Kim, M.I. Jordan, S. Sastry: Autonomous helicopter flight via reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2004) pp. 799–806
15.180
go back to reference J.W. Roberts, L. Moret, J. Zhang, R. Tedrake: From motor to interaction learning in robots, Stud. Comput. Intell. 264, 293–309 (2010)MATHCrossRef J.W. Roberts, L. Moret, J. Zhang, R. Tedrake: From motor to interaction learning in robots, Stud. Comput. Intell. 264, 293–309 (2010)MATHCrossRef
15.181
go back to reference R. Tedrake: Stochastic policy gradient reinforcement learning on a simple 3D biped, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004) R. Tedrake: Stochastic policy gradient reinforcement learning on a simple 3D biped, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)
15.182
go back to reference F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal: Learning motion primitive goals for robust manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011) F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal: Learning motion primitive goals for robust manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
15.183
go back to reference M. Strens, A. Moore: Direct policy search using paired statistical tests, Int. Conf. Mach. Learn. (2001) M. Strens, A. Moore: Direct policy search using paired statistical tests, Int. Conf. Mach. Learn. (2001)
15.184
go back to reference A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Int. Symp. Exp. Robotics (2004) A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Int. Symp. Exp. Robotics (2004)
15.185
go back to reference T. Geng, B. Porr, F. Wörgötter: Fast biped walking with a reflexive controller and real-time policy searching, Adv. Neural Inform. Process. Syst., Vol. 18 (2006) pp. 427–434 T. Geng, B. Porr, F. Wörgötter: Fast biped walking with a reflexive controller and real-time policy searching, Adv. Neural Inform. Process. Syst., Vol. 18 (2006) pp. 427–434
15.186
go back to reference N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, N. Hagita: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2005) N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, N. Hagita: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2005)
15.187
go back to reference M. Sato, Y. Nakamura, S. Ishii: Reinforcement learning for biped locomotion, Int. Conf. Artif. Neural Netw. (2002) M. Sato, Y. Nakamura, S. Ishii: Reinforcement learning for biped locomotion, Int. Conf. Artif. Neural Netw. (2002)
15.188
go back to reference R.Y. Rubinstein, D.P. Kroese: The Cross Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation (Springer, New York 2004)MATHCrossRef R.Y. Rubinstein, D.P. Kroese: The Cross Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation (Springer, New York 2004)MATHCrossRef
15.189
go back to reference D.E. Goldberg: Genetic Algorithms (Addision Wesley, New York 1989)MATH D.E. Goldberg: Genetic Algorithms (Addision Wesley, New York 1989)MATH
15.190
go back to reference J.T. Betts: Practical Methods for Optimal Control Using Nonlinear Programming, Adv. Design Control, Vol. 3 (SIAM, Philadelphia 2001)MATH J.T. Betts: Practical Methods for Optimal Control Using Nonlinear Programming, Adv. Design Control, Vol. 3 (SIAM, Philadelphia 2001)MATH
15.191
go back to reference R.J. Williams: Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8, 229–256 (1992)MATH R.J. Williams: Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8, 229–256 (1992)MATH
15.192
go back to reference P. Dayan, G.E. Hinton: Using expectation-maximization for reinforcement learning, Neural Comput. 9(2), 271–278 (1997)MATHCrossRef P. Dayan, G.E. Hinton: Using expectation-maximization for reinforcement learning, Neural Comput. 9(2), 271–278 (1997)MATHCrossRef
15.193
go back to reference N. Vlassis, M. Toussaint, G. Kontes, S. Piperidis: Learning model-free robot control by a Monte Carlo EM algorithm, Auton. Robots 27(2), 123–130 (2009)CrossRef N. Vlassis, M. Toussaint, G. Kontes, S. Piperidis: Learning model-free robot control by a Monte Carlo EM algorithm, Auton. Robots 27(2), 123–130 (2009)CrossRef
15.194
go back to reference J. Kober, E. Oztop, J. Peters: Reinforcement learning to adjust robot movements to new situations, Proc. Robotics Sci. Syst. Conf. (2010) J. Kober, E. Oztop, J. Peters: Reinforcement learning to adjust robot movements to new situations, Proc. Robotics Sci. Syst. Conf. (2010)
15.195
go back to reference E.A. Theodorou, J. Buchli, S. Schaal: Reinforcement learning of motor skills in high dimensions: A path integral approach, IEEE Int. Conf. Robotics Autom. (2010) E.A. Theodorou, J. Buchli, S. Schaal: Reinforcement learning of motor skills in high dimensions: A path integral approach, IEEE Int. Conf. Robotics Autom. (2010)
15.196
go back to reference J.A. Bagnell, A.Y. Ng, S. Kakade, J. Schneider: Policy search by dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 831–838 J.A. Bagnell, A.Y. Ng, S. Kakade, J. Schneider: Policy search by dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 831–838
15.197
go back to reference T. Kollar, N. Roy: Trajectory optimization using reinforcement learning for map exploration, Int. J. Robotics Res. 27(2), 175–197 (2008)CrossRef T. Kollar, N. Roy: Trajectory optimization using reinforcement learning for map exploration, Int. J. Robotics Res. 27(2), 175–197 (2008)CrossRef
15.198
go back to reference D. Lizotte, T. Wang, M. Bowling, D. Schuurmans: Automatic gait optimization with Gaussian process regression, Int. Jt. Conf. Artif. Intell. (2007) D. Lizotte, T. Wang, M. Bowling, D. Schuurmans: Automatic gait optimization with Gaussian process regression, Int. Jt. Conf. Artif. Intell. (2007)
15.199
go back to reference S. Kuindersma, R. Grupen, A.G. Barto: Learning dynamic arm motions for postural recovery, IEEE-RAS Int. Conf. Humanoid Robots (2011) S. Kuindersma, R. Grupen, A.G. Barto: Learning dynamic arm motions for postural recovery, IEEE-RAS Int. Conf. Humanoid Robots (2011)
15.200
go back to reference M. Tesch, J.G. Schneider, H. Choset: Using response surfaces and expected improvement to optimize snake robot gait parameters, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011) M. Tesch, J.G. Schneider, H. Choset: Using response surfaces and expected improvement to optimize snake robot gait parameters, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
15.201
go back to reference S.-J. Yi, B.-T. Zhang, D. Hong, D.D. Lee: Learning full body push recovery control for small humanoid robots, IEEE Proc. Int. Conf. Robotics Autom. (2011) S.-J. Yi, B.-T. Zhang, D. Hong, D.D. Lee: Learning full body push recovery control for small humanoid robots, IEEE Proc. Int. Conf. Robotics Autom. (2011)
15.202
go back to reference J.A. Boyan, A.W. Moore: Generalization in reinforcement learning: Safely approximating the value function, Adv. Neural Inform. Process. Syst., Vol. 7 (1995) pp. 369–376 J.A. Boyan, A.W. Moore: Generalization in reinforcement learning: Safely approximating the value function, Adv. Neural Inform. Process. Syst., Vol. 7 (1995) pp. 369–376
15.203
go back to reference S. Kakade, J. Langford: Approximately optimal approximate reinforcement learning, Int. Conf. Mach. Learn. (2002) S. Kakade, J. Langford: Approximately optimal approximate reinforcement learning, Int. Conf. Mach. Learn. (2002)
15.204
go back to reference E. Greensmith, P.L. Bartlett, J. Baxter: Variance reduction techniques for gradient estimates in reinforcement learning, J. Mach. Learn. Res. 5, 1471–1530 (2004)MathSciNetMATH E. Greensmith, P.L. Bartlett, J. Baxter: Variance reduction techniques for gradient estimates in reinforcement learning, J. Mach. Learn. Res. 5, 1471–1530 (2004)MathSciNetMATH
15.205
go back to reference M.T. Rosenstein, A.G. Barto: Reinforcement learning with supervision by a stable controller, Am. Control Conf. (2004) M.T. Rosenstein, A.G. Barto: Reinforcement learning with supervision by a stable controller, Am. Control Conf. (2004)
15.206
go back to reference J.N. Tsitsiklis, B. Van Roy: An analysis of temporal-difference learning with function approximation, IEEE Trans. Autom. Control 42(5), 674–690 (1997)MathSciNetMATHCrossRef J.N. Tsitsiklis, B. Van Roy: An analysis of temporal-difference learning with function approximation, IEEE Trans. Autom. Control 42(5), 674–690 (1997)MathSciNetMATHCrossRef
15.207
go back to reference J.Z. Kolter, A.Y. Ng: Regularization and feature selection in least-squares temporal difference learning, Int. Conf. Mach. Learn. (2009) J.Z. Kolter, A.Y. Ng: Regularization and feature selection in least-squares temporal difference learning, Int. Conf. Mach. Learn. (2009)
15.208
go back to reference L.C. Baird, H. Klopf: Reinforcement Learning with High-Dimensional Continuous Actions, Technical Report WL-TR-93-1147 (Wright-Patterson Air Force Base, Dayton 1993)CrossRef L.C. Baird, H. Klopf: Reinforcement Learning with High-Dimensional Continuous Actions, Technical Report WL-TR-93-1147 (Wright-Patterson Air Force Base, Dayton 1993)CrossRef
15.209
go back to reference G.D. Konidaris, S. Osentoski, P. Thomas: Value function approximation in reinforcement learning using the Fourier basis, AAAI Conf. Artif. Intell. (2011) G.D. Konidaris, S. Osentoski, P. Thomas: Value function approximation in reinforcement learning using the Fourier basis, AAAI Conf. Artif. Intell. (2011)
15.210
go back to reference J. Peters, K. Muelling, J. Kober, D. Nguyen-Tuong, O. Kroemer: Towards motor skill learning for robotics, Int. Symp. Robotics Res. (2010) J. Peters, K. Muelling, J. Kober, D. Nguyen-Tuong, O. Kroemer: Towards motor skill learning for robotics, Int. Symp. Robotics Res. (2010)
15.211
go back to reference L. Buşoniu, R. Babuška, B. de Schutter, D. Ernst: Reinforcement Learning and Dynamic Programming Using Function Approximators (CRC, Boca Raton 2010)MATH L. Buşoniu, R. Babuška, B. de Schutter, D. Ernst: Reinforcement Learning and Dynamic Programming Using Function Approximators (CRC, Boca Raton 2010)MATH
15.212
go back to reference A.G. Barto, S. Mahadevan: Recent advances in hierarchical reinforcement learning, Discret. Event Dyn. Syst. 13(4), 341–379 (2003)MathSciNetMATHCrossRef A.G. Barto, S. Mahadevan: Recent advances in hierarchical reinforcement learning, Discret. Event Dyn. Syst. 13(4), 341–379 (2003)MathSciNetMATHCrossRef
15.213
go back to reference S. Hart, R. Grupen: Learning generalizable control programs, IEEE Trans. Auton. Mental Dev. 3(3), 216–231 (2011)CrossRef S. Hart, R. Grupen: Learning generalizable control programs, IEEE Trans. Auton. Mental Dev. 3(3), 216–231 (2011)CrossRef
15.214
go back to reference J.G. Schneider: Exploiting model uncertainty estimates for safe dynamic control learning, Adv. Neural Inform. Process. Syst., Vol. 9 (1997) pp. 1047–1053 J.G. Schneider: Exploiting model uncertainty estimates for safe dynamic control learning, Adv. Neural Inform. Process. Syst., Vol. 9 (1997) pp. 1047–1053
15.215
go back to reference J.A. Bagnell: Learning Decisions: Robustness, Uncertainty, and Approximation. Dissertation (Robotics Institute, Carnegie Mellon University, Pittsburgh 2004) J.A. Bagnell: Learning Decisions: Robustness, Uncertainty, and Approximation. Dissertation (Robotics Institute, Carnegie Mellon University, Pittsburgh 2004)
15.216
go back to reference T.M. Moldovan, P. Abbeel: Safe exploration in markov decision processes, 29th Int. Conf. Mach. Learn. (2012) T.M. Moldovan, P. Abbeel: Safe exploration in markov decision processes, 29th Int. Conf. Mach. Learn. (2012)
15.217
go back to reference T. Hester, M. Quinlan, P. Stone: RTMBA: A real-time model-based reinforcement learning architecture for robot control, IEEE Int. Conf. Robotics Autom. (2012) T. Hester, M. Quinlan, P. Stone: RTMBA: A real-time model-based reinforcement learning architecture for robot control, IEEE Int. Conf. Robotics Autom. (2012)
15.218
go back to reference C.G. Atkeson: Using local trajectory optimizers to speed up global optimization in dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 6 (1994) pp. 663–670 C.G. Atkeson: Using local trajectory optimizers to speed up global optimization in dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 6 (1994) pp. 663–670
15.219
go back to reference J. Kober, J. Peters: Policy search for motor primitives in robotics, Mach. Learn. 84(1/2), 171–203 (2010)MathSciNetMATH J. Kober, J. Peters: Policy search for motor primitives in robotics, Mach. Learn. 84(1/2), 171–203 (2010)MathSciNetMATH
15.220
go back to reference S. Russell: Learning agents for uncertain environments (extended abstract), Conf. Comput. Learn. Theory (1989) S. Russell: Learning agents for uncertain environments (extended abstract), Conf. Comput. Learn. Theory (1989)
15.221
go back to reference P. Abbeel, A.Y. Ng: Apprenticeship learning via inverse reinforcement learning, Int. Conf. Mach. Learn. (2004) P. Abbeel, A.Y. Ng: Apprenticeship learning via inverse reinforcement learning, Int. Conf. Mach. Learn. (2004)
15.222
go back to reference N.D. Ratliff, J.A. Bagnell, M.A. Zinkevich: Maximum margin planning, Int. Conf. Mach. Learn. (2006) N.D. Ratliff, J.A. Bagnell, M.A. Zinkevich: Maximum margin planning, Int. Conf. Mach. Learn. (2006)
15.223
go back to reference R.L. Keeney, H. Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, New York 1976)MATH R.L. Keeney, H. Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, New York 1976)MATH
15.224
go back to reference N. Ratliff, D. Bradley, J.A. Bagnell, J. Chestnutt: Boosting structured prediction for imitation learning, Adv. Neural Inform. Process. Syst., Vol. 19 (2006) pp. 1153–1160 N. Ratliff, D. Bradley, J.A. Bagnell, J. Chestnutt: Boosting structured prediction for imitation learning, Adv. Neural Inform. Process. Syst., Vol. 19 (2006) pp. 1153–1160
15.225
go back to reference D. Silver, J.A. Bagnell, A. Stentz: High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008) D. Silver, J.A. Bagnell, A. Stentz: High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)
15.226
go back to reference D. Silver, J.A. Bagnell, A. Stentz: Learning from demonstration for autonomous navigation in complex unstructured terrain, Int. J. Robotics Res. 29(12), 1565–1592 (2010)CrossRef D. Silver, J.A. Bagnell, A. Stentz: Learning from demonstration for autonomous navigation in complex unstructured terrain, Int. J. Robotics Res. 29(12), 1565–1592 (2010)CrossRef
15.227
go back to reference N. Ratliff, J.A. Bagnell, S. Srinivasa: Imitation learning for locomotion and manipulation, IEEE-RAS Int. Conf. Humanoid Robots (2007) N. Ratliff, J.A. Bagnell, S. Srinivasa: Imitation learning for locomotion and manipulation, IEEE-RAS Int. Conf. Humanoid Robots (2007)
15.228
go back to reference J.Z. Kolter, P. Abbeel, A.Y. Ng: Hierarchical apprenticeship learning with application to quadruped locomotion, Adv. Neural Inform. Process. Syst., Vol. 20 (2007) pp. 769–776 J.Z. Kolter, P. Abbeel, A.Y. Ng: Hierarchical apprenticeship learning with application to quadruped locomotion, Adv. Neural Inform. Process. Syst., Vol. 20 (2007) pp. 769–776
15.229
go back to reference J. Sorg, S.P. Singh, R.L. Lewis: Reward design via online gradient ascent, Adv. Neural Inform. Process. Syst., Vol. 23 (2010) pp. 2190–2198 J. Sorg, S.P. Singh, R.L. Lewis: Reward design via online gradient ascent, Adv. Neural Inform. Process. Syst., Vol. 23 (2010) pp. 2190–2198
15.230
go back to reference M. Zucker, J.A. Bagnell: Reinforcement planning: RL for optimal planners, IEEE Proc. Int. Conf. Robotics Autom. (2012) M. Zucker, J.A. Bagnell: Reinforcement planning: RL for optimal planners, IEEE Proc. Int. Conf. Robotics Autom. (2012)
15.231
go back to reference H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge: Real-time learning: A ball on a beam, Int. Jt. Conf. Neural Netw. (1992) H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge: Real-time learning: A ball on a beam, Int. Jt. Conf. Neural Netw. (1992)
15.232
go back to reference B. Nemec, M. Zorko, L. Zlajpah: Learning of a ball-in-a-cup playing robot, Int. Workshop Robotics, Alpe-Adria-Danube Region (2010) B. Nemec, M. Zorko, L. Zlajpah: Learning of a ball-in-a-cup playing robot, Int. Workshop Robotics, Alpe-Adria-Danube Region (2010)
15.233
go back to reference M. Tokic, W. Ertel, J. Fessler: The crawler, a class room demonstrator for reinforcement learning, Int. Fla. Artif. Intell. Res. Soc. Conf. (2009) M. Tokic, W. Ertel, J. Fessler: The crawler, a class room demonstrator for reinforcement learning, Int. Fla. Artif. Intell. Res. Soc. Conf. (2009)
15.234
go back to reference H. Kimura, T. Yamashita, S. Kobayashi: Reinforcement learning of walking behavior for a four-legged robot, IEEE Conf. Decis. Control (2001) H. Kimura, T. Yamashita, S. Kobayashi: Reinforcement learning of walking behavior for a four-legged robot, IEEE Conf. Decis. Control (2001)
15.235
go back to reference R.A. Willgoss, J. Iqbal: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing, Aust. Conf. Robotics Autom. (1999) R.A. Willgoss, J. Iqbal: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing, Aust. Conf. Robotics Autom. (1999)
15.236
go back to reference L. Paletta, G. Fritz, F. Kintzler, J. Irran, G. Dorffner: Perception and developmental learning of affordances in autonomous robots, Lect. Notes Comput. Sci. 4667, 235–250 (2007)CrossRef L. Paletta, G. Fritz, F. Kintzler, J. Irran, G. Dorffner: Perception and developmental learning of affordances in autonomous robots, Lect. Notes Comput. Sci. 4667, 235–250 (2007)CrossRef
15.237
go back to reference C. Kwok, D. Fox: Reinforcement learning for sensing strategies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004) C. Kwok, D. Fox: Reinforcement learning for sensing strategies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)
15.238
go back to reference T. Yasuda, K. Ohkura: A reinforcement learning technique with an adaptive action generator for a multi-robot system, Int. Conf. Simul. Adapt. Behav. (2008) T. Yasuda, K. Ohkura: A reinforcement learning technique with an adaptive action generator for a multi-robot system, Int. Conf. Simul. Adapt. Behav. (2008)
15.239
go back to reference J.H. Piater, S. Jodogne, R. Detry, D. Kraft, N. Krüger, O. Kroemer, J. Peters: Learning visual representations for perception-action systems, Int. J. Robotics Res. 30(3), 294–307 (2011)MATHCrossRef J.H. Piater, S. Jodogne, R. Detry, D. Kraft, N. Krüger, O. Kroemer, J. Peters: Learning visual representations for perception-action systems, Int. J. Robotics Res. 30(3), 294–307 (2011)MATHCrossRef
15.240
go back to reference M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda: Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Mach. Learn. 23(2/3), 279–303 (1996)CrossRef M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda: Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Mach. Learn. 23(2/3), 279–303 (1996)CrossRef
15.241
go back to reference M. Huber, R.A. Grupen: A feedback control structure for on-line learning tasks, Robotics Auton. Syst. 22(3/4), 303–315 (1997)CrossRef M. Huber, R.A. Grupen: A feedback control structure for on-line learning tasks, Robotics Auton. Syst. 22(3/4), 303–315 (1997)CrossRef
15.242
go back to reference P. Fidelman, P. Stone: Learning ball acquisition on a physical robot, Int. Symp. Robotics Autom. (2004) P. Fidelman, P. Stone: Learning ball acquisition on a physical robot, Int. Symp. Robotics Autom. (2004)
15.243
go back to reference V. Soni, S.P. Singh: Reinforcement learning of hierarchical skills on the Sony AIBO robot, Int. Conf. Dev. Learn. (2006) V. Soni, S.P. Singh: Reinforcement learning of hierarchical skills on the Sony AIBO robot, Int. Conf. Dev. Learn. (2006)
15.244
go back to reference B. Nemec, M. Tamošiunaitė, F. Wörgötter, A. Ude: Task adaptation through exploration and action sequencing, IEEE-RAS Int. Conf. Humanoid Robots (2009) B. Nemec, M. Tamošiunaitė, F. Wörgötter, A. Ude: Task adaptation through exploration and action sequencing, IEEE-RAS Int. Conf. Humanoid Robots (2009)
15.245
go back to reference M.J. Matarić: Reinforcement learning in the multi-robot domain, Auton. Robots 4, 73–83 (1997)CrossRef M.J. Matarić: Reinforcement learning in the multi-robot domain, Auton. Robots 4, 73–83 (1997)CrossRef
15.246
go back to reference M.J. Matarić: Reward functions for accelerated learning, Int. Conf. Mach. Learn. (ICML) (1994) M.J. Matarić: Reward functions for accelerated learning, Int. Conf. Mach. Learn. (ICML) (1994)
15.247
go back to reference R. Platt, R.A. Grupen, A.H. Fagg: Improving grasp skills using schema structured learning, Int. Conf. Dev. Learn. (2006) R. Platt, R.A. Grupen, A.H. Fagg: Improving grasp skills using schema structured learning, Int. Conf. Dev. Learn. (2006)
15.248
go back to reference M. Dorigo, M. Colombetti: Robot Shaping: Developing Situated Agents Through Learning, Technical Report (International Computer Science Institute, Berkeley 1993) M. Dorigo, M. Colombetti: Robot Shaping: Developing Situated Agents Through Learning, Technical Report (International Computer Science Institute, Berkeley 1993)
15.249
go back to reference G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Autonomous skill acquisition on a mobile manipulator, AAAI Conf. Artif. Intell. (2011) G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Autonomous skill acquisition on a mobile manipulator, AAAI Conf. Artif. Intell. (2011)
15.250
go back to reference G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Robot learning from demonstration by constructing skill trees, Int. J. Robotics Res. 31(3), 360–375 (2012)CrossRef G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Robot learning from demonstration by constructing skill trees, Int. J. Robotics Res. 31(3), 360–375 (2012)CrossRef
15.251
go back to reference A. Cocora, K. Kersting, C. Plagemann, W. Burgard, L. de Raedt: Learning relational navigation policies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2006) A. Cocora, K. Kersting, C. Plagemann, W. Burgard, L. de Raedt: Learning relational navigation policies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2006)
15.252
go back to reference D. Katz, Y. Pyuro, O. Brock: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008) D. Katz, Y. Pyuro, O. Brock: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)
15.253
go back to reference C.H. An, C.G. Atkeson, J.M. Hollerbach: Model-Based Control of a Robot Manipulator (MIT, Press, Cambridge 1988) C.H. An, C.G. Atkeson, J.M. Hollerbach: Model-Based Control of a Robot Manipulator (MIT, Press, Cambridge 1988)
15.254
go back to reference C. Gaskett, L. Fletcher, A. Zelinsky: Reinforcement learning for a vision based mobile robot, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2000) C. Gaskett, L. Fletcher, A. Zelinsky: Reinforcement learning for a vision based mobile robot, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2000)
15.255
go back to reference Y. Duan, B. Cui, H. Yang: Robot navigation based on fuzzy RL algorithm, Int. Symp. Neural Netw. (2008) Y. Duan, B. Cui, H. Yang: Robot navigation based on fuzzy RL algorithm, Int. Symp. Neural Netw. (2008)
15.256
go back to reference H. Benbrahim, J.A. Franklin: Biped dynamic walking using reinforcement learning, Robotics Auton. Syst. 22(3/4), 283–302 (1997)CrossRef H. Benbrahim, J.A. Franklin: Biped dynamic walking using reinforcement learning, Robotics Auton. Syst. 22(3/4), 283–302 (1997)CrossRef
15.257
go back to reference W.D. Smart, L. Pack Kaelbling: A framework for reinforcement learning on real robots, Natl. Conf. Artif. Intell./Innov. Appl. Artif. Intell. (1989) W.D. Smart, L. Pack Kaelbling: A framework for reinforcement learning on real robots, Natl. Conf. Artif. Intell./Innov. Appl. Artif. Intell. (1989)
15.258
go back to reference D.C. Bentivegna: Learning from Observation Using Primitives (Georgia Institute of Technology, Atlanta 2004) D.C. Bentivegna: Learning from Observation Using Primitives (Georgia Institute of Technology, Atlanta 2004)
15.259
go back to reference A. Rottmann, C. Plagemann, P. Hilgers, W. Burgard: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2007) A. Rottmann, C. Plagemann, P. Hilgers, W. Burgard: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2007)
15.260
go back to reference K. Gräve, J. Stückler, S. Behnke: Learning motion skills from expert demonstrations and own experience using Gaussian process regression, Jt. Int. Symp. Robotics (ISR) Ger. Conf. Robotics (ROBOTIK) (2010) K. Gräve, J. Stückler, S. Behnke: Learning motion skills from expert demonstrations and own experience using Gaussian process regression, Jt. Int. Symp. Robotics (ISR) Ger. Conf. Robotics (ROBOTIK) (2010)
15.261
go back to reference O. Kroemer, R. Detry, J. Piater, J. Peters: Active learning using mean shift optimization for robot grasping, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2009) O. Kroemer, R. Detry, J. Piater, J. Peters: Active learning using mean shift optimization for robot grasping, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2009)
15.262
go back to reference O. Kroemer, R. Detry, J. Piater, J. Peters: Combining active learning and reactive control for robot grasping, Robotics Auton. Syst. 58(9), 1105–1116 (2010)CrossRef O. Kroemer, R. Detry, J. Piater, J. Peters: Combining active learning and reactive control for robot grasping, Robotics Auton. Syst. 58(9), 1105–1116 (2010)CrossRef
15.263
go back to reference T. Tamei, T. Shibata: Policy gradient learning of cooperative interaction with a robot using user's biological signals, Int. Conf. Neural Inf. Process. (2009) T. Tamei, T. Shibata: Policy gradient learning of cooperative interaction with a robot using user's biological signals, Int. Conf. Neural Inf. Process. (2009)
15.264
go back to reference A.J. Ijspeert, J. Nakanishi, S. Schaal: Learning attractor landscapes for learning motor primitives, Adv. Neural Inform. Process. Syst., Vol. 15 (2003) pp. 1547–1554 A.J. Ijspeert, J. Nakanishi, S. Schaal: Learning attractor landscapes for learning motor primitives, Adv. Neural Inform. Process. Syst., Vol. 15 (2003) pp. 1547–1554
15.265
go back to reference S. Schaal, P. Mohajerian, A.J. Ijspeert: Dynamics systems vs. optimal control – A unifying view, Prog. Brain Res. 165(1), 425–445 (2007)CrossRef S. Schaal, P. Mohajerian, A.J. Ijspeert: Dynamics systems vs. optimal control – A unifying view, Prog. Brain Res. 165(1), 425–445 (2007)CrossRef
15.266
go back to reference H.-I. Lin, C.-C. Lai: Learning collision-free reaching skill from primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012) H.-I. Lin, C.-C. Lai: Learning collision-free reaching skill from primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)
15.267
go back to reference J. Kober, B. Mohler, J. Peters: Learning perceptual coupling for motor primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2008) J. Kober, B. Mohler, J. Peters: Learning perceptual coupling for motor primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2008)
15.268
go back to reference S. Bitzer, M. Howard, S. Vijayakumar: Using dimensionality reduction to exploit constraints in reinforcement learning, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (2010) S. Bitzer, M. Howard, S. Vijayakumar: Using dimensionality reduction to exploit constraints in reinforcement learning, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (2010)
15.269
go back to reference J. Buchli, F. Stulp, E. Theodorou, S. Schaal: Learning variable impedance control, Int. J. Robotics Res. 30(7), 820–833 (2011)CrossRef J. Buchli, F. Stulp, E. Theodorou, S. Schaal: Learning variable impedance control, Int. J. Robotics Res. 30(7), 820–833 (2011)CrossRef
15.270
go back to reference P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, S. Schaal: Skill learning and task outcome prediction for manipulation, IEEE Int. Conf. Robotics Autom. (2011) P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, S. Schaal: Skill learning and task outcome prediction for manipulation, IEEE Int. Conf. Robotics Autom. (2011)
15.271
go back to reference M. Kalakrishnan, L. Righetti, P. Pastor, S. Schaal: Learning force control policies for compliant manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011) M. Kalakrishnan, L. Righetti, P. Pastor, S. Schaal: Learning force control policies for compliant manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
15.272
go back to reference D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learning from observation and practice using behavioral primitives: Marble maze, 11th Int. Symp. Robotics Res. (2004) D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learning from observation and practice using behavioral primitives: Marble maze, 11th Int. Symp. Robotics Res. (2004)
15.273
go back to reference F. Kirchner: Q-learning of complex behaviours on a six-legged walking machine, EUROMICRO Workshop Adv. Mobile Robots (1997) F. Kirchner: Q-learning of complex behaviours on a six-legged walking machine, EUROMICRO Workshop Adv. Mobile Robots (1997)
15.274
go back to reference J. Morimoto, K. Doya: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Auton. Syst. 36(1), 37–51 (2001)MATHCrossRef J. Morimoto, K. Doya: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Auton. Syst. 36(1), 37–51 (2001)MATHCrossRef
15.275
go back to reference J.-Y. Donnart, J.-A. Meyer: Learning reactive and planning rules in a motivationally autonomous animat, Syst. Man Cybern. B 26(3), 381–395 (1996)CrossRef J.-Y. Donnart, J.-A. Meyer: Learning reactive and planning rules in a motivationally autonomous animat, Syst. Man Cybern. B 26(3), 381–395 (1996)CrossRef
15.276
go back to reference C. Daniel, G. Neumann, J. Peters: Learning concurrent motor skills in versatile solution spaces, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012) C. Daniel, G. Neumann, J. Peters: Learning concurrent motor skills in versatile solution spaces, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)
15.277
go back to reference E.C. Whitman, C.G. Atkeson: Control of instantaneously coupled systems applied to humanoid walking, IEEE-RAS Int. Conf. Humanoid Robots (2010) E.C. Whitman, C.G. Atkeson: Control of instantaneously coupled systems applied to humanoid walking, IEEE-RAS Int. Conf. Humanoid Robots (2010)
15.278
go back to reference X. Huang, J. Weng: Novelty and reinforcement learning in the value system of developmental robots, 2nd Int. Workshop Epigenetic Robotics Model. Cognit. Dev. Robotic Syst. (2002) X. Huang, J. Weng: Novelty and reinforcement learning in the value system of developmental robots, 2nd Int. Workshop Epigenetic Robotics Model. Cognit. Dev. Robotic Syst. (2002)
15.279
go back to reference M. Pendrith: Reinforcement learning in situated agents: Some theoretical problems and practical solutions, Eur. Workshop Learn. Robots (1999) M. Pendrith: Reinforcement learning in situated agents: Some theoretical problems and practical solutions, Eur. Workshop Learn. Robots (1999)
15.280
go back to reference B. Wang, J.W. Li, H. Liu: A heuristic reinforcement learning for robot approaching objects, IEEE Conf. Robotics Autom. Mechatron. (2006) B. Wang, J.W. Li, H. Liu: A heuristic reinforcement learning for robot approaching objects, IEEE Conf. Robotics Autom. Mechatron. (2006)
15.281
go back to reference L.P. Kaelbling: Learning in Embedded Systems (Stanford University, Stanford 1990) L.P. Kaelbling: Learning in Embedded Systems (Stanford University, Stanford 1990)
15.282
go back to reference R.S. Sutton: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Int. Conf. Mach. Learn. (1990) R.S. Sutton: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Int. Conf. Mach. Learn. (1990)
15.283
go back to reference A.W. Moore, C.G. Atkeson: Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn. 13(1), 103–130 (1993) A.W. Moore, C.G. Atkeson: Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn. 13(1), 103–130 (1993)
15.284
go back to reference J. Peng, R.J. Williams: Incremental multi-step Q-learning, Mach. Learn. 22(1), 283–290 (1996) J. Peng, R.J. Williams: Incremental multi-step Q-learning, Mach. Learn. 22(1), 283–290 (1996)
15.285
go back to reference N. Jakobi, P. Husbands, I. Harvey: Noise and the reality gap: The use of simulation in evolutionary robotics, 3rd Eur. Conf. Artif. Life (1995) N. Jakobi, P. Husbands, I. Harvey: Noise and the reality gap: The use of simulation in evolutionary robotics, 3rd Eur. Conf. Artif. Life (1995)
Metadata
Title
Robot Learning
Authors
Jan Peters
Daniel D. Lee
Jens Kober
Duy Nguyen-Tuong
J. Andrew Bagnell
Stefan Schaal
Copyright Year
2016
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-32552-1_15