skip to main content
10.1145/1566374.1566417acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
research-article

Policy teaching through reward function learning

Published:06 July 2009Publication History

ABSTRACT

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.

References

  1. M. Babaioff, M. Feldman, and N. Nisan. Combinatorial agency. In Proc. 7th ACM Conference on Electronic Commerce (EC-06), pages 18--28, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Babaioff, M. Feldman, and N. Nisan. Mixed strategies in combinatorial agency. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64(5):1125--49, September 1996.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Bertsimas and S. Vempala. Solving convex programs by random walks. Journal of the ACM, 51(4):540--556, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bolton and M. Dewatripont. Contract Theory. MIT Press, 2005.Google ScholarGoogle Scholar
  6. U. Chajewska, D. Koller, and D. Ormoneit. Learning an agent's utility function by observing behavior. In Proc. 18th International Conf. on Machine Learning, pages 35--42, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chuang, M. Feldman, and M. Babaioff. Incentives in peer-to-peer systems. In N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007.Google ScholarGoogle Scholar
  8. M. Feldman, J. Chuang, I. Stoica, and S. Shenker. Hidden-action in multi-hop routing. In Proc. Sixth ACM Conference on Electronic Commerce (EC'05), pages 117--126, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Grunbaum. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific Journal of Mathematics, 10(4):1257--1261, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  10. N. Immorlica, K. Jain, and M. Mahdian. Game-theoretic aspects of designing hyperlink structures. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), pages 150--161, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M.O. Jackson. Mechanism theory. In U. Derigs, editor, The Encyclopedia of Life Support Systems. EOLSS Publishers, 2003.Google ScholarGoogle Scholar
  12. G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--507, July 1999.Google ScholarGoogle ScholarCross RefCross Ref
  13. J.-J. Laffont and D. Martimort. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2001.Google ScholarGoogle Scholar
  14. D. Monderer and M. Tennenholtz. K-implementation. In EC '03: Proceedings of the 4th ACM conference on Electronic commerce, pages 19--28, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A.Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 663--670, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M.L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley&Sons, New York, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L.A. Rademacher. Approximating the centroid is hard. In SCG '07: Proceedings of the twenty-third annual symposium on Computational geometry, pages 302--305, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Vanderbei. Linear programming: foundations and extensions. Springer, 3rd edition, 2008.Google ScholarGoogle Scholar
  19. H. Varian. Revealed preference. In M. Szenberg, editor, Samuelsonian Economics and the 21st Century. Oxford University Press, 2003.Google ScholarGoogle Scholar
  20. H. Zhang, Y. Chen, and D. Parkes. A general approach to environment design with one agent. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Zhang and D. Parkes. Value-based policy teaching with active indirect elicitation. In Proceedings of the Twenty-Third National Conference on Artificial Intelligence (AAAI-2008), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Zhang and S. Zenios. A dynamic principal-agent model with hidden information: Sequential optimality through truthful state revelation. Operations Research, 56(3):681--696, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Policy teaching through reward function learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        EC '09: Proceedings of the 10th ACM conference on Electronic commerce
        July 2009
        376 pages
        ISBN:9781605584584
        DOI:10.1145/1566374
        • General Chair:
        • John Chuang,
        • Program Chairs:
        • Lance Fortnow,
        • Pearl Pu

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 July 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate664of2,389submissions,28%

        Upcoming Conference

        EC '24
        The 25th ACM Conference on Economics and Computation
        July 8 - 11, 2024
        New Haven , CT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader