Top

Cognitive Computation

Published in:

28-10-2023

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

Authors: Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, Javier Del Ser

Published in: Cognitive Computation | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and not evenly exposed to the learner in time. In particular, we investigate the use of curiosity both as a tool for task boundary detection and as a priority metric when it comes to retaining old transition tuples, which we respectively use to propose two different buffers. Firstly, we propose a Hybrid Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to detect task boundaries that are not known due to the task-agnostic nature of the problem. Secondly, by using curiosity as a priority metric when it comes to retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We ultimately show that these buffers, in conjunction with regular reinforcement learning algorithms, can be used to alleviate the catastrophic forgetting issue suffered by the state of the art on replay buffers when the agent’s exposure to tasks is not equal along time. We evaluate catastrophic forgetting and the efficiency of our proposed buffers against the latest works such as the Hybrid Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three different continual reinforcement learning settings. These settings are defined based on how many times the agent encounters the same task, how long they last, and how different new tasks are when compared to the old ones (i.e., how large the task drift is). The three settings are namely, 1. prolonged task encounter with substantial task drift, and no task re-visitation, 2. frequent, short-lived task encounter with substantial task drift and task re-visitation, and 3. every timestep task encounter with small task drift and task re-visitation. Experiments were done on classical control tasks and Metaworld environment. Experiments show that our proposed replay buffers display better immunity to catastrophic forgetting compared to existing works in all but the every time step task encounter with small task drift and task re-visitation. In this scenario curiosity will always be higher, thus not being an useful measure in both proposed buffers, making them not universally better than other approaches across all types of CL settings, and thereby opening up an avenue for further research.

previous article Multispectral Image Quality Improvement Based on Global Iterative Fusion Constrained by Meteorological Factors

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with neural networks: a review. CoRR. 2018;abs/1802.07569. arXiv:1802.07569.

Lesort T, Lomonaco V, Stoian A, Maltoni D, Filliat D, Díaz-Rodríguez N.: Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. arXiv:1907.00182.

Díaz-Rodríguez N, Lomonaco V, Filliat D, Maltoni D.: Don’t forget, there is more than forgetting: new metrics for Continual Learning. arXiv:1810.13166.

Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT press; 2018.

Khetarpal K, Riemer M, Rish I, Precup D.: Towards continual reinforcement learning: a review and perspectives. arXiv:2012.13490.

Zenke F, Poole B, Ganguli S. Continual learning through synaptic intelligence. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. p. 3987–3995. Available from: http://proceedings.mlr.press/v70/zenke17a.html.

Kirkpatrick J, Pascanu R, Rabinowitz NC, Veness J, Desjardins G, Rusu AA, et al. Overcoming catastrophic forgetting in neural networks. CoRR. 2016;abs/1612.00796. arXiv:1612.00796.

Oudeyer P. Computational theories of curiosity-driven learning. CoRR. 2018;abs/1802.10546. arXiv:1802.10546.

Ten A, Oudeyer PY, Moulin-Frier C.: Curiosity-driven exploration: diversity of mechanisms and functions. https://doi.org/10.31234/osf.io/n2byt.

10.

Baranes A, Oudeyer P. Active learning of inverse models with intrinsically motivated goal exploration in robots. CoRR. 2013;abs/1301.4862. arXiv:1301.4862.

11.

Gottlieb J, Oudeyer PY, Lopes M, Baranes A. Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends Cogn Sci. 2013;17(11):585–93. https://doi.org/10.1016/j.tics.2013.09.001.CrossRef

12.

French R. Semi-distributed representations and catastrophic forgetting in connectionist networks. Connect Sci. 1992;01(4):365–77. https://doi.org/10.1080/09540099208946624.CrossRef

13.

French R. Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference. Proceedings of the 16th Annual Cognitive Science Society Conference. 1994 08.

14.

Robins AV. Catastrophic forgetting. Rehearsal and Pseudorehearsal Connect Sci. 1995;7:123–46.CrossRef

15.

Silver D, Mercer R. The task rehearsal method of life-long learning: overcoming impoverished data. In: Advances in Artificial Intelligence; 2002. p. 90–101. https://doi.org/10.1007/3-540-47922-8_8.

16.

French RM. Pseudo-recurrent connectionist networks: an approach to the ‘sensitivity-stability’ dilemma. Connect Sci. 1997;9:353–80.CrossRef

17.

Ans B, Rousset S. Avoiding catastrophic forgetting by coupling two reverberating neural networks. Comptes Rendus de l’Académie des Sciences - Series III - Sciences de la Vie. 1997;p. 989–997. https://doi.org/10.1016/S0764-4469(97)82472-9.

18.

Xiong F, Liu Z, Huang K, Yang X, Qiao H. State primitive learning to overcome catastrophic forgetting in robotics. Cogn Comput. 2021;p. 394–402. https://doi.org/10.1007/s12559-020-09784-8.

19.

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(56):1929–58. http://jmlr.org/papers/v15/srivastava14a.html

20.

Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv:1312.6211.

21.

Ammar HB, Eaton E, Ruvolo P, Taylor ME. Online multi-task learning for policy gradient methods. In: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. ICML’14. JMLR.org; 2014. p. II-1206-II-1214.

22.

Borsa D, Graepel T, Shawe-Taylor J.: Learning shared representations in multi-task reinforcement learning. arXiv:1603.02041.

23.

Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, et al.: Progressive neural networks. arXiv:1606.04671.

24.

Hinton G, Vinyals O, Dean J.: Distilling the knowledge in a neural network. arXiv:1503.02531.

25.

Traoré R, Caselles-Dupré H, Lesort T, Sun T, Cai G, Díaz-Rodríguez N, et al.: DisCoRL: Continual reinforcement learning via policy distillation. arXiv:1907.05855.

26.

Traoré R, Caselles-Dupré H, Lesort T, Sun T, Díaz-Rodríguez N, Filliat D.: Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. arXiv:1906.04452.

27.

Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, et al.: Policy distillation. arXiv:1511.06295.

28.

Kaplanis C, Shanahan M, Clopath C.: Policy consolidation for continual reinforcement learning. arXiv:1902.00255.

29.

Tirumala D, Noh H, Galashov A, Hasenclever L, Ahuja A, Wayne G, et al.: Exploiting hierarchy for learning and transfer in KL-regularized RL; 2020. arXiv:1903.07438.

30.

Rebuffi S, Kolesnikov A, Lampert CH. iCaRL: incremental classifier and representation learning. CoRR. 2016;abs/1611.07725. arXiv:1611.07725.

31.

Rolnick D, Ahuja A, Schwarz J, Lillicrap TP, Wayne G.: Experience replay for continual learning. arXiv:1811.11682.

32.

Isele D, Cosgun A.: Selective experience replay for lifelong learning. arXiv:1802.10269.

33.

Chaudhry A, Rohrbach M, Elhoseiny M, Ajanthan T, Dokania PK, Torr PHS, et al.: On tiny episodic memories in continual learning; 2019. arXiv:1902.10486.

34.

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al.: Playing Atari with deep reinforcement learning; 2013. arXiv:1312.5602.

35.

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al.: Continuous control with deep reinforcement learning; 2019. arXiv:1509.02971.

36.

Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR. 2018;abs/1801.01290. arXiv:1801.01290.

37.

Vitter JS. Random sampling with a reservoir. ACM Trans Math Softw. 1985;11(1):37–57.MathSciNetCrossRef

38.

Isele D, Cosgun A. Selective experience replay for lifelong learning. CoRR. 2018;abs/1802.10269. arXiv:1802.10269.

39.

Kaplanis C, Clopath C, Shanahan M.: Continual reinforcement learning with multi-timescale replay; 2020. arXiv:2004.07530.

40.

Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. CoRR. 2016;abs/1606.01868. arXiv:1606.01868.

41.

Lopes M, Lang T, Toussaint M, Oudeyer PY. Exploration in model-based reinforcement learning by empirically estimating learning progress. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NIPS’12. Red Hook, NY, USA: Curran Associates Inc.; 2012. p. 206–214.

42.

Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P. Curiosity-driven exploration in deep reinforcement learning via Bayesian neural networks. CoRR. 2016;abs/1605.09674. arXiv:1605.09674.

43.

Schmidhuber J. Formal theory of creativity, fun, and intrinsic motivation(1990–2010). IEEE Trans Auton Ment Dev. 2010;2(3):230–47. https://doi.org/10.1109/TAMD.2010.2056368.CrossRef

44.

Pathak D, Agrawal P, Efros AA, Darrell T.: Curiosity-driven Exploration by Self-supervised Prediction. arXiv:1705.05363.

45.

Doncieux S, Filliat D, Rodríguez ND, Hospedales TM, Duro RJ, Coninx A, et al. Open-ended learning: a conceptual framework based on representational redescription. Front Neurorobot. 2018;12.

46.

Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2012. p. 5026–5033. https://doi.org/10.1109/IROS.2012.6386109.

47.

Tassa Y, Doron Y, Muldal A, Erez T, Li Y, de Las Casas D, et al.: DeepMind control suite; 2018. arXiv:1801.00690.

48.

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al.: Playing Atari with deep reinforcement learning; 2013. arXiv:1312.5602.

49.

Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning; 2016. arXiv:1605.02097.

50.

Rusu AA, Flennerhag S, Rao D, Pascanu R, Hadsell R.: Probing transfer in deep reinforcement learning without task engineering; 2022. arXiv:2210.12448.

51.

Bellemare MG, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res. 2013;47:253–79. https://doi.org/10.1613/jair.3912.CrossRef

52.

Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, et al.: OpenAI Gym; 2016. arXiv:arXiv:1606.01540.

53.

Ellenberger B.: PyBullet Gymperium; 2018-2019. https://github.com/benelot/pybullet-gym.

54.

Yu T, Quillen D, He Z, Julian R, Narayan A, Shively H, et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning; 2021. arXiv:1910.10897.

55.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al.: PyTorch: an imperative style, high-performance deep learning library 2019. arXiv:1912.01703.

Title: Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Authors: Pankayaraj Pathmanathan
Natalia Díaz-Rodríguez
Javier Del Ser
Publication date: 28-10-2023
Publisher: Springer US
Published in: Cognitive Computation / Issue 1/2024
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-023-10213-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2024

Bifurcation−Driven Tipping in A Novel Bicyclic Crossed Neural Network with Multiple Time Delays

Attention-Guided Multi-Scale Fusion Network for Similar Objects Semantic Segmentation

TEGAN: Transformer Embedded Generative Adversarial Network for Underwater Image Enhancement

Explainable Artificial Intelligence in Alzheimer’s Disease Classification: A Systematic Review

Prototype Consistency Learning for Medical Image Segmentation by Cross Pseudo Supervision

Online Signature Recognition: A Biologically Inspired Feature Vector Splitting Approach

Premium Partner