Skip to main content
Top
Published in:

28-10-2023

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

Authors: Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, Javier Del Ser

Published in: Cognitive Computation | Issue 1/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and not evenly exposed to the learner in time. In particular, we investigate the use of curiosity both as a tool for task boundary detection and as a priority metric when it comes to retaining old transition tuples, which we respectively use to propose two different buffers. Firstly, we propose a Hybrid Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to detect task boundaries that are not known due to the task-agnostic nature of the problem. Secondly, by using curiosity as a priority metric when it comes to retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We ultimately show that these buffers, in conjunction with regular reinforcement learning algorithms, can be used to alleviate the catastrophic forgetting issue suffered by the state of the art on replay buffers when the agent’s exposure to tasks is not equal along time. We evaluate catastrophic forgetting and the efficiency of our proposed buffers against the latest works such as the Hybrid Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three different continual reinforcement learning settings. These settings are defined based on how many times the agent encounters the same task, how long they last, and how different new tasks are when compared to the old ones (i.e., how large the task drift is). The three settings are namely, 1. prolonged task encounter with substantial task drift, and no task re-visitation, 2. frequent, short-lived task encounter with substantial task drift and task re-visitation, and 3. every timestep task encounter with small task drift and task re-visitation. Experiments were done on classical control tasks and Metaworld environment. Experiments show that our proposed replay buffers display better immunity to catastrophic forgetting compared to existing works in all but the every time step task encounter with small task drift and task re-visitation. In this scenario curiosity will always be higher, thus not being an useful measure in both proposed buffers, making them not universally better than other approaches across all types of CL settings, and thereby opening up an avenue for further research.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with neural networks: a review. CoRR. 2018;abs/1802.07569. arXiv:1802.07569. Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with neural networks: a review. CoRR. 2018;abs/1802.07569. arXiv:​1802.​07569.
2.
go back to reference Lesort T, Lomonaco V, Stoian A, Maltoni D, Filliat D, Díaz-Rodríguez N.: Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. arXiv:1907.00182. Lesort T, Lomonaco V, Stoian A, Maltoni D, Filliat D, Díaz-Rodríguez N.: Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. arXiv:​1907.​00182.
3.
go back to reference Díaz-Rodríguez N, Lomonaco V, Filliat D, Maltoni D.: Don’t forget, there is more than forgetting: new metrics for Continual Learning. arXiv:1810.13166. Díaz-Rodríguez N, Lomonaco V, Filliat D, Maltoni D.: Don’t forget, there is more than forgetting: new metrics for Continual Learning. arXiv:​1810.​13166.
4.
go back to reference Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT press; 2018. Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT press; 2018.
5.
6.
go back to reference Zenke F, Poole B, Ganguli S. Continual learning through synaptic intelligence. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. p. 3987–3995. Available from: http://proceedings.mlr.press/v70/zenke17a.html. Zenke F, Poole B, Ganguli S. Continual learning through synaptic intelligence. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. p. 3987–3995. Available from: http://​proceedings.​mlr.​press/​v70/​zenke17a.​html.
7.
go back to reference Kirkpatrick J, Pascanu R, Rabinowitz NC, Veness J, Desjardins G, Rusu AA, et al. Overcoming catastrophic forgetting in neural networks. CoRR. 2016;abs/1612.00796. arXiv:1612.00796. Kirkpatrick J, Pascanu R, Rabinowitz NC, Veness J, Desjardins G, Rusu AA, et al. Overcoming catastrophic forgetting in neural networks. CoRR. 2016;abs/1612.00796. arXiv:​1612.​00796.
10.
go back to reference Baranes A, Oudeyer P. Active learning of inverse models with intrinsically motivated goal exploration in robots. CoRR. 2013;abs/1301.4862. arXiv:1301.4862. Baranes A, Oudeyer P. Active learning of inverse models with intrinsically motivated goal exploration in robots. CoRR. 2013;abs/1301.4862. arXiv:​1301.​4862.
13.
go back to reference French R. Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference. Proceedings of the 16th Annual Cognitive Science Society Conference. 1994 08. French R. Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference. Proceedings of the 16th Annual Cognitive Science Society Conference. 1994 08.
14.
go back to reference Robins AV. Catastrophic forgetting. Rehearsal and Pseudorehearsal Connect Sci. 1995;7:123–46.CrossRef Robins AV. Catastrophic forgetting. Rehearsal and Pseudorehearsal Connect Sci. 1995;7:123–46.CrossRef
16.
go back to reference French RM. Pseudo-recurrent connectionist networks: an approach to the ‘sensitivity-stability’ dilemma. Connect Sci. 1997;9:353–80.CrossRef French RM. Pseudo-recurrent connectionist networks: an approach to the ‘sensitivity-stability’ dilemma. Connect Sci. 1997;9:353–80.CrossRef
20.
go back to reference Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv:1312.6211. Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv:​1312.​6211.
21.
go back to reference Ammar HB, Eaton E, Ruvolo P, Taylor ME. Online multi-task learning for policy gradient methods. In: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. ICML’14. JMLR.org; 2014. p. II-1206-II-1214. Ammar HB, Eaton E, Ruvolo P, Taylor ME. Online multi-task learning for policy gradient methods. In: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. ICML’14. JMLR.org; 2014. p. II-1206-II-1214.
22.
23.
25.
go back to reference Traoré R, Caselles-Dupré H, Lesort T, Sun T, Cai G, Díaz-Rodríguez N, et al.: DisCoRL: Continual reinforcement learning via policy distillation. arXiv:1907.05855. Traoré R, Caselles-Dupré H, Lesort T, Sun T, Cai G, Díaz-Rodríguez N, et al.: DisCoRL: Continual reinforcement learning via policy distillation. arXiv:​1907.​05855.
26.
go back to reference Traoré R, Caselles-Dupré H, Lesort T, Sun T, Díaz-Rodríguez N, Filliat D.: Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. arXiv:1906.04452. Traoré R, Caselles-Dupré H, Lesort T, Sun T, Díaz-Rodríguez N, Filliat D.: Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. arXiv:​1906.​04452.
27.
29.
go back to reference Tirumala D, Noh H, Galashov A, Hasenclever L, Ahuja A, Wayne G, et al.: Exploiting hierarchy for learning and transfer in KL-regularized RL; 2020. arXiv:1903.07438. Tirumala D, Noh H, Galashov A, Hasenclever L, Ahuja A, Wayne G, et al.: Exploiting hierarchy for learning and transfer in KL-regularized RL; 2020. arXiv:​1903.​07438.
30.
go back to reference Rebuffi S, Kolesnikov A, Lampert CH. iCaRL: incremental classifier and representation learning. CoRR. 2016;abs/1611.07725. arXiv:1611.07725. Rebuffi S, Kolesnikov A, Lampert CH. iCaRL: incremental classifier and representation learning. CoRR. 2016;abs/1611.07725. arXiv:​1611.​07725.
33.
go back to reference Chaudhry A, Rohrbach M, Elhoseiny M, Ajanthan T, Dokania PK, Torr PHS, et al.: On tiny episodic memories in continual learning; 2019. arXiv:1902.10486. Chaudhry A, Rohrbach M, Elhoseiny M, Ajanthan T, Dokania PK, Torr PHS, et al.: On tiny episodic memories in continual learning; 2019. arXiv:​1902.​10486.
34.
go back to reference Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al.: Playing Atari with deep reinforcement learning; 2013. arXiv:1312.5602. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al.: Playing Atari with deep reinforcement learning; 2013. arXiv:​1312.​5602.
35.
go back to reference Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al.: Continuous control with deep reinforcement learning; 2019. arXiv:1509.02971. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al.: Continuous control with deep reinforcement learning; 2019. arXiv:​1509.​02971.
36.
go back to reference Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR. 2018;abs/1801.01290. arXiv:1801.01290. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR. 2018;abs/1801.01290. arXiv:​1801.​01290.
40.
go back to reference Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. CoRR. 2016;abs/1606.01868. arXiv:1606.01868. Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. CoRR. 2016;abs/1606.01868. arXiv:​1606.​01868.
41.
go back to reference Lopes M, Lang T, Toussaint M, Oudeyer PY. Exploration in model-based reinforcement learning by empirically estimating learning progress. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NIPS’12. Red Hook, NY, USA: Curran Associates Inc.; 2012. p. 206–214. Lopes M, Lang T, Toussaint M, Oudeyer PY. Exploration in model-based reinforcement learning by empirically estimating learning progress. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NIPS’12. Red Hook, NY, USA: Curran Associates Inc.; 2012. p. 206–214.
42.
go back to reference Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P. Curiosity-driven exploration in deep reinforcement learning via Bayesian neural networks. CoRR. 2016;abs/1605.09674. arXiv:1605.09674. Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P. Curiosity-driven exploration in deep reinforcement learning via Bayesian neural networks. CoRR. 2016;abs/1605.09674. arXiv:​1605.​09674.
45.
go back to reference Doncieux S, Filliat D, Rodríguez ND, Hospedales TM, Duro RJ, Coninx A, et al. Open-ended learning: a conceptual framework based on representational redescription. Front Neurorobot. 2018;12. Doncieux S, Filliat D, Rodríguez ND, Hospedales TM, Duro RJ, Coninx A, et al. Open-ended learning: a conceptual framework based on representational redescription. Front Neurorobot. 2018;12.
48.
go back to reference Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al.: Playing Atari with deep reinforcement learning; 2013. arXiv:1312.5602. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al.: Playing Atari with deep reinforcement learning; 2013. arXiv:​1312.​5602.
49.
go back to reference Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning; 2016. arXiv:1605.02097. Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning; 2016. arXiv:​1605.​02097.
50.
go back to reference Rusu AA, Flennerhag S, Rao D, Pascanu R, Hadsell R.: Probing transfer in deep reinforcement learning without task engineering; 2022. arXiv:2210.12448. Rusu AA, Flennerhag S, Rao D, Pascanu R, Hadsell R.: Probing transfer in deep reinforcement learning without task engineering; 2022. arXiv:​2210.​12448.
54.
go back to reference Yu T, Quillen D, He Z, Julian R, Narayan A, Shively H, et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning; 2021. arXiv:1910.10897. Yu T, Quillen D, He Z, Julian R, Narayan A, Shively H, et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning; 2021. arXiv:​1910.​10897.
55.
go back to reference Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al.: PyTorch: an imperative style, high-performance deep learning library 2019. arXiv:1912.01703. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al.: PyTorch: an imperative style, high-performance deep learning library 2019. arXiv:​1912.​01703.
Metadata
Title
Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Authors
Pankayaraj Pathmanathan
Natalia Díaz-Rodríguez
Javier Del Ser
Publication date
28-10-2023
Publisher
Springer US
Published in
Cognitive Computation / Issue 1/2024
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-023-10213-9

Premium Partner