Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options , in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric  between the states in a small MDP and the states in a large MDP, which we want to solve. The
of this metric is then used to completely define a set of options for the large MDP. We demonstrate empirically that our approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.