skip to main content
10.1145/1015330.1015430acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Apprenticeship learning via inverse reinforcement learning

Published:04 July 2004Publication History

ABSTRACT

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

References

  1. Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/irl/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amit, R., & Mataric, M. (2002). Learning movement sequences from demonstration. Proc. ICDL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Atkeson, C., & Schaal, S. (1997). Robot learning from demonstration. Proc. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Demiris, J., & Hayes, G. (1994). A robot controller using learning by imitation.Google ScholarGoogle Scholar
  5. Hogan, N. (1984). An organizing principle for a class of voluntary movements. J. of Neuroscience, 4, 2745--2754.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. T-RA, 10, 799--822.Google ScholarGoogle Scholar
  7. Manne, A. (1960). Linear programming and sequential decisions. Management Science, 6.Google ScholarGoogle Scholar
  8. Ng, A. Y., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. Proc. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. Proc. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Pomerleau, D. (1989). Alvinn: An autonomous land vehicle in a neural network. NIPS 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rockafellar, R. (1970). Convex analysis. Princeton University Press.Google ScholarGoogle Scholar
  12. Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. Proc. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Uno, Y., Kawato, M., & Suzuki, R. (1989). Formation and control of optimal trajectory in human multijoint arm movement. minimum torque-change model. Biological Cybernetics, 61, 89--101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Apprenticeship learning via inverse reinforcement learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '04: Proceedings of the twenty-first international conference on Machine learning
      July 2004
      934 pages
      ISBN:1581138385
      DOI:10.1145/1015330
      • Conference Chair:
      • Carla Brodley

      Copyright © 2004 Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 July 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader