research-article

Policy teaching through reward function learning

Authors:
Haoqi Zhang

Harvard University, Cambridge, MA, USA

Harvard University, Cambridge, MA, USA
View Profile

,
David C. Parkes

Harvard University, Cambridge, MA, USA

Harvard University, Cambridge, MA, USA
View Profile

,
Yiling Chen

Harvard University, Cambridge, MA, USA

Harvard University, Cambridge, MA, USA
View Profile

EC '09: Proceedings of the 10th ACM conference on Electronic commerceJuly 2009Pages 295–304https://doi.org/10.1145/1566374.1566417

Published:06 July 2009Publication History

EC '09: Proceedings of the 10th ACM conference on Electronic commerce

Pages 295–304

ABSTRACT

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.

References

M. Babaioff, M. Feldman, and N. Nisan. Combinatorial agency. In Proc. 7th ACM Conference on Electronic Commerce (EC-06), pages 18--28, 2006. Google ScholarDigital Library
M. Babaioff, M. Feldman, and N. Nisan. Mixed strategies in combinatorial agency. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), 2006. Google ScholarDigital Library
D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64(5):1125--49, September 1996.Google ScholarCross Ref
D. Bertsimas and S. Vempala. Solving convex programs by random walks. Journal of the ACM, 51(4):540--556, 2004. Google ScholarDigital Library
P. Bolton and M. Dewatripont. Contract Theory. MIT Press, 2005.Google Scholar
U. Chajewska, D. Koller, and D. Ormoneit. Learning an agent's utility function by observing behavior. In Proc. 18th International Conf. on Machine Learning, pages 35--42, 2001. Google ScholarDigital Library
J. Chuang, M. Feldman, and M. Babaioff. Incentives in peer-to-peer systems. In N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007.Google Scholar
M. Feldman, J. Chuang, I. Stoica, and S. Shenker. Hidden-action in multi-hop routing. In Proc. Sixth ACM Conference on Electronic Commerce (EC'05), pages 117--126, 2005. Google ScholarDigital Library
B. Grunbaum. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific Journal of Mathematics, 10(4):1257--1261, 1960.Google ScholarCross Ref
N. Immorlica, K. Jain, and M. Mahdian. Game-theoretic aspects of designing hyperlink structures. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), pages 150--161, 2006. Google ScholarDigital Library
M.O. Jackson. Mechanism theory. In U. Derigs, editor, The Encyclopedia of Life Support Systems. EOLSS Publishers, 2003.Google Scholar
G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--507, July 1999.Google ScholarCross Ref
J.-J. Laffont and D. Martimort. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2001.Google Scholar
D. Monderer and M. Tennenholtz. K-implementation. In EC '03: Proceedings of the 4th ACM conference on Electronic commerce, pages 19--28, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
A.Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 663--670, 2000. Google ScholarDigital Library
M.L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley&Sons, New York, 1994. Google ScholarDigital Library
L.A. Rademacher. Approximating the centroid is hard. In SCG '07: Proceedings of the twenty-third annual symposium on Computational geometry, pages 302--305, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
R. Vanderbei. Linear programming: foundations and extensions. Springer, 3rd edition, 2008.Google Scholar
H. Varian. Revealed preference. In M. Szenberg, editor, Samuelsonian Economics and the 21st Century. Oxford University Press, 2003.Google Scholar
H. Zhang, Y. Chen, and D. Parkes. A general approach to environment design with one agent. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), 2009. Google ScholarDigital Library
H. Zhang and D. Parkes. Value-based policy teaching with active indirect elicitation. In Proceedings of the Twenty-Third National Conference on Artificial Intelligence (AAAI-2008), 2008. Google ScholarDigital Library
H. Zhang and S. Zenios. A dynamic principal-agent model with hidden information: Sequential optimality through truthful state revelation. Operations Research, 56(3):681--696, 2008. Google ScholarDigital Library

Index Terms

Policy teaching through reward function learning
1. Applied computing
  1. Law, social and behavioral sciences
    1. Economics
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Learning by teaching SimStudent: an initial classroom baseline study comparing with cognitive tutor
AIED'11: Proceedings of the 15th international conference on Artificial intelligence in education

This paper describes an application of a machine-learning agent, SimStudent, as a teachable peer learner that allows a student to learn by teaching. SimStudent has been integrated into APLUS (Artificial Peer Learning environment Using SimStudent), an on-...
Read More
A blended learning Approach to teaching foreign policy: Student experiences of learning through face-to-face and online discussion and their relationship to academic performance

This article presents research on students' experiences of learning through a blend of face-to-face and online discussion. The participants in our study were students enrolled in a foreign policy course at a major Australian university. Students' ...
Read More
Teaching and learning with MOOCs: computing academics' perspectives and engagement
ITiCSE '14: Proceedings of the 2014 conference on Innovation & technology in computer science education

During the past two years, Massive Open Online Courses (MOOCs) have created wide interest in the academic world raising both enthusiasm for new opportunities for universities and many concerns for the future of university education. The discussion has ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EC '09: Proceedings of the 10th ACM conference on Electronic commerce
July 2009
376 pages
ISBN:9781605584584
DOI:10.1145/1566374
General Chair:
John Chuang
University of California at Berkeley, USA
,
Program Chairs:
Lance Fortnow
Northwestern University, USA
,
Pearl Pu
Ecole Polytechnique Fédérale de Lausanne, Switzerland
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active indirect elicitation
environment design
policy teaching
preference elicitation
preference learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate664of2,389submissions,28%
Upcoming Conference
EC '24

Sponsor:

sigecom

The 25th ACM Conference on Economics and Computation

July 8 - 11, 2024

New Haven , CT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 311
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Policy teaching through reward function learning

EC '09: Proceedings of the 10th ACM conference on Electronic commerce

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning by teaching SimStudent: an initial classroom baseline study comparing with cognitive tutor

A blended learning Approach to teaching foreign policy: Student experiences of learning through face-to-face and online discussion and their relationship to academic performance

Teaching and learning with MOOCs: computing academics' perspectives and engagement