ABSTRACT
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
- M. Babaioff, M. Feldman, and N. Nisan. Combinatorial agency. In Proc. 7th ACM Conference on Electronic Commerce (EC-06), pages 18--28, 2006. Google ScholarDigital Library
- M. Babaioff, M. Feldman, and N. Nisan. Mixed strategies in combinatorial agency. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), 2006. Google ScholarDigital Library
- D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64(5):1125--49, September 1996.Google ScholarCross Ref
- D. Bertsimas and S. Vempala. Solving convex programs by random walks. Journal of the ACM, 51(4):540--556, 2004. Google ScholarDigital Library
- P. Bolton and M. Dewatripont. Contract Theory. MIT Press, 2005.Google Scholar
- U. Chajewska, D. Koller, and D. Ormoneit. Learning an agent's utility function by observing behavior. In Proc. 18th International Conf. on Machine Learning, pages 35--42, 2001. Google ScholarDigital Library
- J. Chuang, M. Feldman, and M. Babaioff. Incentives in peer-to-peer systems. In N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007.Google Scholar
- M. Feldman, J. Chuang, I. Stoica, and S. Shenker. Hidden-action in multi-hop routing. In Proc. Sixth ACM Conference on Electronic Commerce (EC'05), pages 117--126, 2005. Google ScholarDigital Library
- B. Grunbaum. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific Journal of Mathematics, 10(4):1257--1261, 1960.Google ScholarCross Ref
- N. Immorlica, K. Jain, and M. Mahdian. Game-theoretic aspects of designing hyperlink structures. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), pages 150--161, 2006. Google ScholarDigital Library
- M.O. Jackson. Mechanism theory. In U. Derigs, editor, The Encyclopedia of Life Support Systems. EOLSS Publishers, 2003.Google Scholar
- G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--507, July 1999.Google ScholarCross Ref
- J.-J. Laffont and D. Martimort. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2001.Google Scholar
- D. Monderer and M. Tennenholtz. K-implementation. In EC '03: Proceedings of the 4th ACM conference on Electronic commerce, pages 19--28, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- A.Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 663--670, 2000. Google ScholarDigital Library
- M.L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley&Sons, New York, 1994. Google ScholarDigital Library
- L.A. Rademacher. Approximating the centroid is hard. In SCG '07: Proceedings of the twenty-third annual symposium on Computational geometry, pages 302--305, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- R. Vanderbei. Linear programming: foundations and extensions. Springer, 3rd edition, 2008.Google Scholar
- H. Varian. Revealed preference. In M. Szenberg, editor, Samuelsonian Economics and the 21st Century. Oxford University Press, 2003.Google Scholar
- H. Zhang, Y. Chen, and D. Parkes. A general approach to environment design with one agent. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), 2009. Google ScholarDigital Library
- H. Zhang and D. Parkes. Value-based policy teaching with active indirect elicitation. In Proceedings of the Twenty-Third National Conference on Artificial Intelligence (AAAI-2008), 2008. Google ScholarDigital Library
- H. Zhang and S. Zenios. A dynamic principal-agent model with hidden information: Sequential optimality through truthful state revelation. Operations Research, 56(3):681--696, 2008. Google ScholarDigital Library
Index Terms
- Policy teaching through reward function learning
Recommendations
Learning by teaching SimStudent: an initial classroom baseline study comparing with cognitive tutor
AIED'11: Proceedings of the 15th international conference on Artificial intelligence in educationThis paper describes an application of a machine-learning agent, SimStudent, as a teachable peer learner that allows a student to learn by teaching. SimStudent has been integrated into APLUS (Artificial Peer Learning environment Using SimStudent), an on-...
A blended learning Approach to teaching foreign policy: Student experiences of learning through face-to-face and online discussion and their relationship to academic performance
This article presents research on students' experiences of learning through a blend of face-to-face and online discussion. The participants in our study were students enrolled in a foreign policy course at a major Australian university. Students' ...
Teaching and learning with MOOCs: computing academics' perspectives and engagement
ITiCSE '14: Proceedings of the 2014 conference on Innovation & technology in computer science educationDuring the past two years, Massive Open Online Courses (MOOCs) have created wide interest in the academic world raising both enthusiasm for new opportunities for universities and many concerns for the future of university education. The discussion has ...
Comments