Article

Apprenticeship learning via inverse reinforcement learning

Authors:
Pieter Abbeel

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Andrew Y. Ng

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

ICML '04: Proceedings of the twenty-first international conference on Machine learningJuly 2004https://doi.org/10.1145/1015330.1015430

Published:04 July 2004Publication History

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

References

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/irl/. Google ScholarDigital Library
Amit, R., & Mataric, M. (2002). Learning movement sequences from demonstration. Proc. ICDL. Google ScholarDigital Library
Atkeson, C., & Schaal, S. (1997). Robot learning from demonstration. Proc. ICML. Google ScholarDigital Library
Demiris, J., & Hayes, G. (1994). A robot controller using learning by imitation.Google Scholar
Hogan, N. (1984). An organizing principle for a class of voluntary movements. J. of Neuroscience, 4, 2745--2754.Google ScholarCross Ref
Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. T-RA, 10, 799--822.Google Scholar
Manne, A. (1960). Linear programming and sequential decisions. Management Science, 6.Google Scholar
Ng, A. Y., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. Proc. ICML. Google ScholarDigital Library
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. Proc. ICML. Google ScholarDigital Library
Pomerleau, D. (1989). Alvinn: An autonomous land vehicle in a neural network. NIPS 1. Google ScholarDigital Library
Rockafellar, R. (1970). Convex analysis. Princeton University Press.Google Scholar
Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. Proc. ICML. Google ScholarDigital Library
Uno, Y., Kawato, M., & Suzuki, R. (1989). Formation and control of optimal trajectory in human multijoint arm movement. minimum torque-change model. Biological Cybernetics, 61, 89--101.Google ScholarDigital Library
Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons. Google ScholarDigital Library

Apprenticeship learning via inverse reinforcement learning
1. Computing methodologies

Recommendations

Inverse Reinforcement Learning in Partially Observable Environments

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behavior of an expert. Most of the existing IRL algorithms assume that the environment is modeled as a Markov decision process (MDP), although it ...
Read More
Bayesian inverse reinforcement learning
IJCAI'07: Proceedings of the 20th international joint conference on Artifical intelligence

Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a ...
Read More
Apprenticeship learning techniques for knowledge based systems
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
Conference Chair:
Carla Brodley
Purdue University/Tufts University
Copyright © 2004 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 July 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,490
  Total Citations
  View Citations
- 11,781
  Total Downloads
- Downloads (Last 12 months)937
- Downloads (Last 6 weeks)124
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Apprenticeship learning via inverse reinforcement learning

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Inverse Reinforcement Learning in Partially Observable Environments

Bayesian inverse reinforcement learning

Apprenticeship learning techniques for knowledge based systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Apprenticeship learning via inverse reinforcement learning

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Inverse Reinforcement Learning in Partially Observable Environments

Bayesian inverse reinforcement learning

Apprenticeship learning techniques for knowledge based systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media