[Sutton and Barto (1998)] (RL) problems, learning agents execute sequential actions with the goal of maximizing a reward signal, which may be time-delayed. For example, an agent could learn to play a game by being told whether it wins or loses, without ever being told the “correct” action. The RL framework has gained popularity with the development of algorithms capable of mastering increasingly complex problems. However, when RL agents begin learning
, mastering difficult tasks is often slow or infeasible, and thus a significant amount of current research in RL focuses on improving the speed of learning by exploiting domain expertise with varying amounts of human-provided knowledge. Common approaches include deconstructing the task into a hierarchy of subtasks (c.f.,MAXQ [Dietterich (2000)]), finding ways to learn over temporally abstract actions (e.g., using the options framework [Sutton et al. (1999)]) rather than simple one-step actions, and abstracting over the state space (e.g., via function approximation [Sutton and Barto (1998)]) so agents may efficiently generalize experience.