ABSTRACT
A value learning system has incentives to follow shutdown instructions, assuming the shutdown instruction provides information (in the technical sense) about which actions lead to valuable outcomes. However, this assumption is not robust to model mis-specification (e.g., in the case of programmer errors). We demonstrate this by presenting some Supervised POMDP scenarios in which errors in the parameterized reward function remove the incentive to follow shutdown commands. These difficulties parallel those discussed by Soares et al. 2015 in their paper on corrigibility. We argue that it is important to consider systems that follow shutdown commands under some weaker set of assumptions (e.g., that one small verified module is correctly implemented; as opposed to an entire prior probability distribution and/or parameterized reward function). We discuss some difficulties with simple ways to attempt to attain these sorts of guarantees in a value learning framework.
- Stuart Armstrong. 2010. Utility Indifference . Technical Report 2010--1. Oxford: Future of Humanity Institute, University of Oxford.Google Scholar
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. 2017. The Off-Switch Game. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 220--227. Google ScholarDigital Library
- Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, and Stuart Russell. 2017. Should Robots be Obedient? arXiv preprint arXiv:1705.09990 (2017). Google ScholarDigital Library
- Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, and Stuart Armstrong. 2015. Corrigibility. In 1st International Workshop on AI and Ethics at AAAI-2015 .Google Scholar
Index Terms
- Incorrigibility in the CIRL Framework
Recommendations
Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach
IJCCI 2019: Proceedings of the 11th International Joint Conference on Computational IntelligenceConditional Value-at-Risk (CVaR) is a well-known measure of risk that has been directly equated to robustness, an important component of Artificial Intelligence (AI) safety. In this paper we focus on optimizing CVaR in the context of Reinforcement ...
Towards mutation testing of Reinforcement Learning systems
AbstractReinforcement Learning (RL), one of the most active research areas in artificial intelligence, focuses on goal-directed learning from interaction with an uncertain environment. RL systems play an increasingly important role in many ...
Mutation Testing of Reinforcement Learning Systems
Dependable Software Engineering. Theories, Tools, and ApplicationsAbstractReinforcement Learning (RL), one of the most active research areas in artificial intelligence, focuses on goal-directed learning from interaction with an uncertain environment. RL systems play an increasingly important role in many aspects of ...
Comments