A survey of robot learning from demonstration

doi:10.1016/j.robot.2008.10.024

Robotics and Autonomous Systems

Volume 57, Issue 5, 31 May 2009, Pages 469-483

https://doi.org/10.1016/j.robot.2008.10.024 Get rights and content

Abstract

We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.

Introduction

The problem of learning a mapping between world state and actions lies at the heart of many robotics applications. This mapping, also called a policy, enables a robot to select an action based upon its current world state. The development of policies by hand is often very challenging and as a result machine learning techniques have been applied to policy development. In this survey, we examine a particular approach to policy learning, Learning from Demonstration (LfD).

Within LfD, a policy is learned from examples, or demonstrations, provided by a teacher. We define examples as sequences of state–action pairs that are recorded during the teacher’s demonstration of the desired robot behavior. LfD algorithms utilize this dataset of examples to derive a policy that reproduces the demonstrated behavior. This approach to obtaining a policy is in contrast to other techniques in which a policy is learned from experience, for example building a policy based on data acquired through exploration, as in Reinforcement Learning [1]. We note that a policy derived under LfD is necessarily defined only in those states encountered, and for those corresponding actions taken, during the example executions.

In this article, we present a survey of recent work within the LfD community, focusing specifically on robotic applications. We segment the LfD learning problem into two fundamental phases: gathering the examples, and deriving a policy from such examples. Based on our identification of the defining features of these techniques, we contribute a comprehensive survey and categorization of existing LfD approaches. Though LfD has been applied to a variety of robotics problems, to our knowledge there exists no established structure for concretely placing work within the larger community. In general, approaches are appropriately contrasted to similar or seminal research, but their relation to the remainder of the field lies largely unaddressed. Establishing these relations is further complicated by dealing with real world robotic platforms, for which the physical details between implementations may vary greatly and yet employ fundamentally identical learning techniques, or vice versa. A categorical structure therefore aids in comparative assessments among applications, as well as in identifying open areas for future research. In contributing our categorization of current approaches, we aim to lay the foundations for such a structure.

For the remainder of this section we motivate the application of LfD to robotics, and present a formal definition of the LfD problem. Section 2 presents the key design decisions for an LfD system. Methods for gathering demonstration examples are the focus of Section 3, where the various approaches to teacher demonstration and data recording are discussed. Section 4 examines the core techniques for policy derivation within LfD, followed in Section 5 by methods for improving robot performance beyond the capabilities of the teacher examples. To conclude, we identify and discuss open areas of research for future work in Section 6 and summarize the article with Section 7.

The presence of robots within society is becoming ever more prevalent. Whether an exploration rover in space, robot soccer or a recreational robot for the home, successful autonomous robot operation requires robust control algorithms. Non-robotics-experts may be increasingly presented with opportunities to interact with robots, and it is reasonable to expect that they have ideas about what a robot should do, and therefore what sort of behaviors these control algorithms should produce. A natural, and practical, extension of having this knowledge is to actually develop the desired control algorithm. Currently, however, policy development is a complex process restricted to experts within the field.

Traditional approaches to robot control model domain dynamics and derive mathematically-based policies. Though theoretically well-founded, these approaches depend heavily upon the accuracy of the world model. Not only does this model require considerable expertise to develop, but approximations such as linearization are often introduced for computational tractability, thereby degrading performance. Other approaches, such as Reinforcement Learning, guide policy learning by providing reward feedback about the desirability of visiting particular states. To define a function to provide the reward, however, is known to be difficult and requires considerable expertise to address. Furthermore, building the policy requires gathering information by visiting states to receive rewards, which is non-trivial for a robot learner executing actual actions in the real world.

Considering these challenges, LfD has many attractive points for both learner and teacher. LfD formulations typically do not require expert knowledge of the domain dynamics, which removes performance brittleness resulting from model simplifications. The absence of this expert domain knowledge requirement also opens policy development to non-robotics-experts, satisfying a need that increases as robots become more commonplace. Furthermore, demonstration has the attractive feature of being an intuitive medium for communication from humans, who already use demonstration to teach other humans. Demonstration also has the practical feature of focusing the dataset to areas of the state–space actually encountered during task execution.

LfD can be seen as a subset of Supervised Learning. In Supervised Learning the agent is presented with labeled training data and learns an approximation to the function which produced the data. Within LfD, this training dataset is composed of example executions of the task by a demonstration teacher (Fig. 1, top).

We formally construct the LfD problem as follows. The world consists of states $S$ and actions $A$ , with the mapping between states by way of actions being defined by a probabilistic transition function $T (s^{'} | s, a) : S \times A \times S \to [0, 1]$ . We assume that the state is not fully observable. The learner instead has access to observed state $Z$ , through the mapping $M : S \to Z$ . A policy $π : Z \to A$ selects actions based on observations of the world state. A single cycle of policy execution at time $t$ is shown in Fig. 1 (bottom).

The set $A$ ranges from containing low-level motions to high-level behaviors. For some simulated world applications, state may be fully transparent, in which case $M = I$ , the identity mapping. For all other applications state is not fully transparent and must be observed, for example through sensors in the real world. For succinctness, throughout the text we will use “state” interchangeably with “observed state.” It should be assumed, however, that state is always the observed state, unless explicitly noted otherwise. This assumption will be reinforced by use of the $Z$ notation throughout the text.

Throughout the teacher execution, states and selected actions are recorded. We represent a demonstration $d_{j} \in D$ formally as $k_{j}$ pairs of observations and actions: $d_{j} = {(z_{j}^{i}, a_{j}^{i})}, z_{j}^{i} \in Z, a_{j}^{i} \in A, i = 0 \dots k_{j}$ . These demonstrations set LfD apart from other learning approaches. The set $D$ of the demonstrations is made available to the learner. The policy derived from this dataset enables the learner to select an action based on the current state.

Before continuing, we pause to place the intents of this survey within the context of previous LfD literature. The aim of this survey is to review the broad topic of LfD, to provide a categorization that highlights differences between approaches, and to identify research areas within LfD that have not yet been explored.

We begin with a comment on terminology. Demonstration-based learning techniques are described by a variety of terms within the published literature, including Learning by Demonstration (LbD), Learning from Demonstration (LfD), Programming by Demonstration (PbD), Learning by Experienced Demonstrations, Assembly Plan from Observation, Learning by Showing, Learning by Watching, Learning from Observation, behavioral cloning, imitation and mimicry. While the definitions for some of these terms, such as imitation, have been loosely borrowed from other sciences, the overall use of these terms is often inconsistent or contradictory across articles.

Within this article, we refer to the general category of algorithms in which a policy is derived based on demonstrated data as Learning from Demonstration (LfD). Within this category, we further distinguish between approaches by their various characteristics, as outlined in Section 2, such as the source of the demonstrations and the learning techniques applied. Subsequent sections introduce terms used to characterize algorithmic differences. Due to the already contradictory use of terms in the existing literature, our definitions will not always agree with those of other publications. Our intent, however, is not for others in the field to adopt the terminology presented here, but rather to provide a consistent set of definitions that highlight distinctions between techniques.

Regarding a categorization for approaches, we note that many legitimate criteria could be used to subdivide LfD research. For example, one proposed categorization considers the broad spectrum of who, what, when and how to imitate, or subsets thereof [2], [3]. Our review aims to focus on the specifics of implementation. We therefore categorize approaches according to the computational formulations and techniques required to implement an LfD system.

To conclude, readers may also find useful other related surveys of the LfD research area. In particular, the book Imitation in Animals and Artifacts[4] provides an interdisciplinary overview of research in imitation learning, presenting leading work from neuroscience, psychology and linguistics as well as computer science. A narrower focus is presented in the chapter “Robot Programming by Demonstration” [2] within the book Handbook of Robotics. This work particularly highlights techniques which may augment or combine with traditional LfD, such as giving the teacher an active role during learning. By contrast, our focus is to provide a categorical structure for LfD approaches, in addition to presenting the specifics of implementation. We do refer the reader to this chapter for a more comprehensive historical overview of LfD, as the scope of our survey is restricted to recently published literature. Additional reviews that cover specific sub-areas of LfD research in detail are highlighted throughout the article.

Section snippets

Design choices

There are certain aspects of LfD which are common among all applications to date. One is the fact that a teacher demonstrates execution of a desired behavior. Another is that the learner is provided with a set of these demonstrations, and from them derives a policy able to reproduce the demonstrated behavior.

However, the developer still faces many design choices when developing a new LfD system. Some of these decisions, such as the choice of a discrete or continuous action representation, may

Gathering examples: How the dataset is built

In this section, we discuss various techniques for executing and recording demonstrations. The LfD dataset is composed of state–action pairs recorded during teacher executions of the desired behavior. Exactly how they are recorded, and what the teacher uses as a platform for the execution, varies greatly across approaches. Examples range from sensors on the robot learner recording its own actions as it is passively teleoperated by the teacher, to a camera recording a human teacher as she

Deriving a policy: The source of the state to action mapping

Given a dataset of state–action examples that have been acquired using one of the methods described in the previous section, we now discuss methods for deriving a policy using this data. LfD has seen the development of three core approaches to deriving policies from demonstration data, as summarized in Fig. 2. Learning a policy can involve simply learning an approximation to the state-action mapping (mapping function), or learning a model of the world dynamics and deriving a policy from this

Limitations of the demonstration dataset

LfD systems are inherently linked to the information provided in the demonstration dataset. As a result, learner performance is heavily limited by the quality of this information. In this article we identify two distinct causes for poor learner performance within LfD frameworks and survey the techniques that have been developed to address each limitation. The first cause, discussed in Section 5.1, is due to dataset sparsity, or the existence of areas of the state space that have not been

Future directions

As highlighted by the discussion in the previous sections, current approaches to LfD address a wide variety of problems under many different conditions and assumptions. In this section, we aim to highlight several promising areas of LfD research that have received limited attention, ranging from data representation to issues of system robustness and evaluation metrics.

Conclusion

In this article we have presented a comprehensive survey of Learning from Demonstration (LfD) techniques employed to address the robotics control problem. LfD has the attractive characteristics of being an intuitive communication medium for human teachers and of opening control algorithm development to non-robotics-experts. Additionally, LfD complements many traditional policy learning techniques, offering a solution to some of the weaknesses in traditional approaches. Consequently, LfD has

Acknowledgements

This research is partly sponsored by the Boeing Corporation under Grant No. CMU-BA-GTA-1, BBNT Solutions under subcontract No. 950008572, via prime Air Force contract No. SA-8650-06-C-7606, and the Qatar Foundation for Education, Science and Community Development. The views and conclusions contained in this document are solely those of the authors.

The authors would like to thank J. Andrew Bagnell and Darrin Bentivegna for feedback on the content and scope of this article.

Brenna D. Argall is a Ph.D. candidate in the Robotics Institute at Carnegie Mellon University. Prior to graduate school, she held a Computational Biology position in the Laboratory of Brain and Cognition, at the National Institutes of Health, while investigating visualization techniques for neural fMRI data. Argall received an M.S. in Robotics in 2006, and in 2002 a B.S. in Mathematics, both from Carnegie Mellon. Her research interests focus upon machine learning techniques to develop and

References (105)

A.G. Billard et al.
Discriminative and adaptive imitation in uni-manual and bi-manual tasks
The Social Mechanisms of Robot Programming by Demonstration
Robotics and Autonomous Systems
(2006)
C. Breazeal et al.
Using perspective taking to learn from ambiguous demonstrations
The Social Mechanisms of Robot Programming by Demonstration
Robotics and Autonomous Systems
(2006)
S. Lauria et al.
Mobile robot programming using natural language
Robotics and Autonomous Systems
(2002)
M. Ogino et al.
Interaction rule learning with a human partner based on an imitation faculty with a simple visuo-motor mapping
The Social Mechanisms of Robot Programming by Demonstration
Robotics and Autonomous Systems
(2006)
C. Breazeal et al.
Robots that imitate humans
Trends in Cognitive Sciences
(2002)
J. Nakanishi et al.
Learning from demonstration and adaptation of biped locomotion
Robotics and Autonomous Systems
(2004)
J. Aleotti et al.
Robust trajectory learning and approximation for robot programming by demonstration
The Social Mechanisms of Robot Programming by Demonstration
Robotics and Autonomous Systems
(2006)
A. Billard et al.
Learning human arm movements by imitation: Evaluation of biologically inspired connectionist architecture
Robotics and Autonomous Systems
(2001)
A. Ude et al.
Programming full-body movements for humanoid robots by observation
Robotics and Autonomous Systems
(2004)
A. Chella et al.
A cognitive framework for imitation learning
The Social Mechanisms of Robot Programming by Demonstration
Robotics and Autonomous Systems
(2006)

J. Steil et al.

Situated robot learning for multi-modal instruction and imitation of grasping

Robot Learning by Demonstration

Robotics and Autonomous Systems

(2004)

Y. Demiris et al.

Hierarchical attentive multiple models for execution and recognition of actions

The Social Mechanisms of Robot Programming by Demonstration

Robotics and Autonomous Systems

(2006)

B. Jansen et al.

A computational model of intention reading in imitation

The Social Mechanisms of Robot Programming by Demonstration

Robotics and Autonomous Systems

(2006)

J. Peters et al.

Natural actor-critic

Neurocomputing

(2008)

R.S. Sutton et al.

Reinforcement learning: An introduction

(1998)

A. Billard et al.

Robot programming by demonstration

S. Schaal et al.

Computational approaches to motor learning by imitation

Philosophical Transactions: Biological Sciences

(2003)

C.L. Nehaniv et al.

A. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang, Inverted autonomous helicopter flight...

B. Browning, L. Xu, M. Veloso, Skill acquisition and use for a dynamically-balancing soccer robot, in: Proceeding of...

P.K. Pook, D.H. Ballard, Recognizing teleoperated manipulations, in: Proceedings of the IEEE International Conference...

J.D. Sweeney, R.A. Grupen, A model of shared grasp affordances from demonstration, in: Proceedings of the IEEE-RAS...

J. Chen et al.

Programing by demonstration: Coping with suboptimal teaching actions

The International Journal of Robotics Research

(2003)

T. Inamura, M. Inaba, H. Inoue, Acquisition of probabilistic behavior decision model based on the interactive teaching...

W.D. Smart, Making Reinforcement Learning Work on Real Robots. Ph.D. Thesis, Department of Computer Science, Brown...

J.A. Clouse, On integrating apprentice learning and reinforcement learning. Ph.D. Thesis, University of Massachusetts,...

R.P.N. Rao et al.

A bayesian model of imitation in infants and robots

P. Abbeel, A.Y. Ng, Apprenticeship learning via inverse reinforcement learning, in: Proceedings of the 21st...

S. Chernova, M. Veloso, Multi-thresholded approach to demonstration selection for interactive robot learning, in:...

R. Aler et al.

Correcting and improving imitation models of humans for robosoccer agents

Evolutionary Computation

(2005)

P.E. Rybski, K. Yoon, J. Stolarz, M.M. Veloso, Interactive robot task training through dialog and demonstration, in:...

B. Argall, B. Browning, M. Veloso, Learning from demonstration with the critique of a human teacher, in: Proceedings of...

D.H. Grollman, O.C. Jenkins, Dogged learning for robots, in: Proceedings of the IEEE International Conference on...

M.T. Rosenstein et al.

Supervised actor-critic reinforcement learning

G.Z. Grudic, P.D. Lawrence, Human-to-robot skill transfer using the spore approximation, in: Proceedings of the IEEE...

Y. Demiris et al.

M.N. Nicolescu, M.J. Matarić, Experience-based representation construction: Learning from human and robot teachers, in:...

U. Nehmzow, O. Akanyeti, C. Weinrich, T. Kyriacou, S. Billings, Robot programming by demonstration through system...

A.J. Ijspeert, J. Nakanishi, S. Schaal, Movement imitation with nonlinear dynamical systems in humanoid robots, in:...

S. Calinon, A. Billard, Incremental learning of gestures by imitation in a humanoid robot, in: Proceedings of the 2nd...

C.G. Atkeson, S. Schaal, Robot learning from demonstration, in: Proceedings of the Fourteenth International Conference...

D.C. Bentivegna, A. Ude, C.G. Atkeson, G. Cheng, Humanoid robot learning and game playing using PC-based vision, in:...

R. Amit, M. Matarić, Learning movement sequences from demonstration, in: Proceedings of the 2nd International...

N. Pollard, J.K. Hodgins, Generalizing demonstrated manipulation tasks, in: Workshop on the Algorithmic Foundations of...

I. Infantino et al.

A posture sequence learning system for an anthropomorphic robotic hand

Robotics and Autonomous Systems

(2004)

M. Lopes et al.

Visual learning by imitation with motor representations

IEEE Transactions on Systems, Man, and Cybernetics, Part B

(2005)

R.M. Voyles et al.

A multi-agent system for programming robots by human demonstration

Integrated Computer-Aided Engineering

(2001)

J. Lieberman, C. Breazeal, Improvements on action parsing and action interpolation for learning through demonstration,...

M.J. Matarić

Cited by (3192)

Fusion dynamical systems with machine learning in imitation learning: A comprehensive overview
2024, Information Fusion
Imitation Learning (IL), also referred to as Learning from Demonstration (LfD), holds significant promise for capturing expert motor skills through efficient imitation, facilitating adept navigation of complex scenarios. A persistent challenge in IL lies in extending generalization from historical demonstrations, enabling the acquisition of new skills without re-teaching. Dynamical system-based IL (DSIL) emerges as a significant subset of IL methodologies, offering the ability to learn trajectories via movement primitives and policy learning based on experiential abstraction. This paper emphasizes the fusion of theoretical paradigms, integrating control theory principles inherent in dynamical systems into IL. This integration notably enhances robustness, adaptability, and convergence in the face of novel scenarios. This survey aims to present a comprehensive overview of DSIL methods, spanning from classical approaches to recent advanced approaches. We categorize DSIL into autonomous dynamical systems and non-autonomous dynamical systems, surveying traditional IL methods with low-dimensional input and advanced deep IL methods with high-dimensional input. Additionally, we present and analyze three main stability methods for IL: Lyapunov stability, contraction theory, and diffeomorphism mapping. Our exploration also extends to popular policy improvement methods for DSIL, encompassing reinforcement learning, deep reinforcement learning, and evolutionary strategies. The primary objective is to expedite readers’ comprehension of dynamical systems’ foundational aspects and capabilities, helping identify practical scenarios and charting potential future directions. By offering insights into the strengths and limitations of dynamical system methods, we aim to foster a deeper understanding among readers. Furthermore, we outline potential extensions and enhancements within the realm of dynamical systems, outlining avenues for further exploration.
A neurosymbolic cognitive architecture framework for handling novelties in open worlds
2024, Artificial Intelligence
“Open world” environments are those in which novel objects, agents, events, and more can appear and contradict previous understandings of the environment. This runs counter to the “closed world” assumption used in most AI research, where the environment is assumed to be fully understood and unchanging. The types of environments AI agents can be deployed in are limited by the inability to handle the novelties that occur in open world environments. This paper presents a novel cognitive architecture framework to handle open-world novelties. This framework combines symbolic planning, counterfactual reasoning, reinforcement learning, and deep computer vision to detect and accommodate novelties. We introduce general algorithms for exploring open worlds using inference and machine learning methodologies to facilitate novelty accommodation. The ability to detect and accommodate novelties allows agents built on this framework to successfully complete tasks despite a variety of novel changes to the world. Both the framework components and the entire system are evaluated in Minecraft-like simulated environments. Our results indicate that agents are able to efficiently complete tasks while accommodating “concealed novelties” not shared with the architecture development team.
DINA: Deformable INteraction Analogy
2024, Graphical Models
We introduce deformable interaction analogy (DINA) as a means to generate close interactions between two 3D objects. Given a single demo interaction between an anchor object (e.g. a hand) and a source object (e.g. a mug grasped by the hand), our goal is to generate many analogous 3D interactions between the same anchor object and various new target objects (e.g. a toy airplane), where the anchor object is allowed to be rigid or deformable. To this end, we optimize the pose or shape of the anchor object to adapt it to a new target object to mimic the demo. To facilitate the optimization, we advocate using interaction interface (ITF), defined by a set of points sampled on the anchor object, as a descriptive and robust interaction representation that is amenable to non-rigid deformation. We model similarity between interactions using ITF, while for interaction analogy, we transform the ITF, either rigidly or non-rigidly, to guide the feature matching to the reposing and deformation of the anchor object. Qualitative and quantitative experiments show that our ITF-guided deformable interaction analogy works surprisingly well even with simple distance features compared to variants of state-of-the-art methods that utilize more sophisticated interaction representations and feature learning from large datasets.
Machine learning meets advanced robotic manipulation
2024, Information Fusion
Automated industries lead to high quality production, lower manufacturing cost and better utilization of human resources. Robotic manipulator arms have major role in the automation process. However, for complex manipulation tasks, hard coding efficient and safe trajectories is challenging and time consuming. Machine learning methods have the potential to learn such controllers based on expert demonstrations. Despite promising advances, better approaches must be developed to improve safety, reliability, and efficiency of ML methods in both training and deployment phases. This survey aims to review cutting edge technologies and recent trends on ML methods applied to real-world manipulation tasks. After reviewing the related background on ML, the rest of the paper is devoted to ML applications in different domains such as industry, healthcare, agriculture, space, military, and search and rescue. The paper is closed with important research directions for future works.
Obstacles and opportunities for learning from demonstration in practical industrial assembly: A systematic literature review
2024, Robotics and Computer-Integrated Manufacturing
Learning from demonstration is one of the most promising methods to counteract the challenging long-term trends in repetitive industrial assembly. It offers not only a programming technique that is accessible to workers on the shop floor, reducing the need for robot experts and the associated costs but also a possible solution to the observable shift from mass-production to mass-customisation through flexible and generalising systems. Since the emergence of the learning from demonstration idea in the 1980s, its methodologies, capabilities, and achievements have constantly evolved. However, despite reports of continued progress in academic publications, the concept has not yet robustly emerged across the assembly industry. In light of its great potential, this paper presents the findings from a systematic literature review following the updated Preferred Reporting Items for Systematic Reviews (PRISMA) guidelines. It aims to provide an overview of the state-of-the-art learning from demonstration solutions developed for assembly-related tasks and offer a critical discussion of remaining obstacles in order to drive its progression towards meaningful deployments. The analysis includes a total of 61 papers over the period of 2013–2023 sourced from Scopus and Web of Science databases. Findings indicate that learning from demonstration has attained a significant level of maturity within the research environment, as evidenced by thorough experimental achievements, proving its great promise for industrial assembly applications. However, critical obstacles exist in the area of proven practicability, task complexity and diversity, generalisation, performance evaluation and integration concepts that require attention to promote its widespread adoption and create a seamless transition into industrial practices.
Emerging Frontiers in Human–Robot Interaction
2024, Journal of Intelligent and Robotic Systems: Theory and Applications

View all citing articles on Scopus

Sonia Chernova is a Ph.D. student in the Computer Science Department at Carnegie Mellon University. She received her undergraduate degree in Computer Science and robotics from Carnegie Mellon University in 2003. Her research interests include learning and interaction in robotic systems.

Manuela Veloso is Herbert A. Simon Professor of Computer Science at Carnegie Mellon University. She received a licenciatura in Electrical Engineering in 1980, and an M.Sc. in Electrical and Computer Engineering in 1984 from the Instituto Superior Tecnico in Lisbon. She earned her Ph.D. in Computer Science from Carnegie Mellon in 1992. Veloso researches in planning, control learning, and execution algorithms, in particular for multi-robot teams. With her students, Veloso has developed teams of robot soccer agents, which have been RoboCup world champions several times. She is a Fellow of AAAI, the Association for the Advancement of Artificial Intelligence, an IEEE Senior member, and the President Elect (2008) of the International RoboCup Federation.

Brett Browning is a Senior Systems Scientist in Carnegie Mellon University’s School of Computer Science, where he has been a faculty member of the Robotics Institute since 2002. Prior to that, he was a postdoctoral fellow at Carnegie Mellon working with Manuela Veloso. Browning received his Ph.D. from the University of Queensland in 2000, and a B.Electrical Engineer and B.Sc (Math) from the same institution in 1996. His research interests are on robot autonomy and in particular real-time robot perception, applied machine learning, and teamwork.

View full text

A survey of robot learning from demonstration

Abstract

Introduction

Section snippets

Design choices

Gathering examples: How the dataset is built

Deriving a policy: The source of the state to action mapping

Limitations of the demonstration dataset

Future directions

Conclusion

Acknowledgements

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Trends in Cognitive Sciences

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Neurocomputing

Reinforcement learning: An introduction

Robot programming by demonstration

Computational approaches to motor learning by imitation

Philosophical Transactions: Biological Sciences

Programing by demonstration: Coping with suboptimal teaching actions

The International Journal of Robotics Research

A bayesian model of imitation in infants and robots

Correcting and improving imitation models of humans for robosoccer agents

Evolutionary Computation

Supervised actor-critic reinforcement learning

A posture sequence learning system for an anthropomorphic robotic hand

Robotics and Autonomous Systems

Visual learning by imitation with motor representations

IEEE Transactions on Systems, Man, and Cybernetics, Part B

A multi-agent system for programming robots by human demonstration

Integrated Computer-Aided Engineering