28-10-2023
Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Authors:
Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, Javier Del Ser
Published in:
Cognitive Computation
|
Issue 1/2024
Log in
Abstract
In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and not evenly exposed to the learner in time. In particular, we investigate the use of curiosity both as a tool for task boundary detection and as a priority metric when it comes to retaining old transition tuples, which we respectively use to propose two different buffers. Firstly, we propose a Hybrid Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to detect task boundaries that are not known due to the task-agnostic nature of the problem. Secondly, by using curiosity as a priority metric when it comes to retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We ultimately show that these buffers, in conjunction with regular reinforcement learning algorithms, can be used to alleviate the catastrophic forgetting issue suffered by the state of the art on replay buffers when the agent’s exposure to tasks is not equal along time. We evaluate catastrophic forgetting and the efficiency of our proposed buffers against the latest works such as the Hybrid Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three different continual reinforcement learning settings. These settings are defined based on how many times the agent encounters the same task, how long they last, and how different new tasks are when compared to the old ones (i.e., how large the task drift is). The three settings are namely, 1. prolonged task encounter with substantial task drift, and no task re-visitation, 2. frequent, short-lived task encounter with substantial task drift and task re-visitation, and 3. every timestep task encounter with small task drift and task re-visitation. Experiments were done on classical control tasks and Metaworld environment. Experiments show that our proposed replay buffers display better immunity to catastrophic forgetting compared to existing works in all but the every time step task encounter with small task drift and task re-visitation. In this scenario curiosity will always be higher, thus not being an useful measure in both proposed buffers, making them not universally better than other approaches across all types of CL settings, and thereby opening up an avenue for further research.