Skip to main content
Top

2020 | OriginalPaper | Chapter

10. Hierarchical Reinforcement Learning

Author : Yanhua Huang

Published in: Deep Reinforcement Learning

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter, we introduce hierarchical reinforcement learning, which is a type of methods to improve the learning performance by constructing and leveraging the underlying structures of cognition and decision making process. Specifically, we first introduce the backgrounds and two primary categories of hierarchical reinforcement learning: options framework and feudal reinforcement learning. Then we have a detailed introduction of some typical algorithms in these categories, including strategic attentive writer, option-critic, and feudal networks, etc. Finally, we provide a summary of recent works on hierarchical reinforcement learning at the end of this chapter.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel OP, Zaremba W (2017) Hindsight experience replay. In: Advances in neural information processing systems, pp 5048–5058 Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel OP, Zaremba W (2017) Hindsight experience replay. In: Advances in neural information processing systems, pp 5048–5058
go back to reference Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Thirty-first AAAI conference on artificial intelligence Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Thirty-first AAAI conference on artificial intelligence
go back to reference Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):41–77MathSciNetCrossRef Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):41–77MathSciNetCrossRef
go back to reference Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A, et al (2016) DeepMind lab. Preprint. arXiv:161203801 Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A, et al (2016) DeepMind lab. Preprint. arXiv:161203801
go back to reference Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279CrossRef Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279CrossRef
go back to reference Bhatti S, Desmaison A, Miksik O, Nardelli N, Siddharth N, Torr PH (2016) Playing doom with slam-augmented deep reinforcement learning. Preprint. arXiv:161200380 Bhatti S, Desmaison A, Miksik O, Nardelli N, Siddharth N, Torr PH (2016) Playing doom with slam-augmented deep reinforcement learning. Preprint. arXiv:161200380
go back to reference Da Silva B, Konidaris G, Barto A (2012) Learning parameterized skills. Preprint. arXiv:12066398 Da Silva B, Konidaris G, Barto A (2012) Learning parameterized skills. Preprint. arXiv:12066398
go back to reference Dayan P (1993) Improving generalization for temporal difference learning: the successor representation. Neural Comput 5(4):613–624CrossRef Dayan P (1993) Improving generalization for temporal difference learning: the successor representation. Neural Comput 5(4):613–624CrossRef
go back to reference Dayan P, Hinton GE (1993) Feudal reinforcement learning. In: Advances in neural information processing systems, pp 271–278 Dayan P, Hinton GE (1993) Feudal reinforcement learning. In: Advances in neural information processing systems, pp 271–278
go back to reference Dietterich TG (1998) The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the international conference on machine learning (ICML), vol 98, Citeseer, pp 118–126 Dietterich TG (1998) The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the international conference on machine learning (ICML), vol 98, Citeseer, pp 118–126
go back to reference Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 13:227–303MathSciNetCrossRef Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 13:227–303MathSciNetCrossRef
go back to reference Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International conference on machine learning, pp 1329–1338 Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International conference on machine learning, pp 1329–1338
go back to reference Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. Preprint. arXiv:170403012 Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. Preprint. arXiv:170403012
go back to reference Frans K, Ho J, Chen X, Abbeel P, Schulman J (2017) Meta learning shared hierarchies. Preprint. arXiv:171009767 Frans K, Ho J, Chen X, Abbeel P, Schulman J (2017) Meta learning shared hierarchies. Preprint. arXiv:171009767
go back to reference Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the international conference on machine learning (ICML) Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the international conference on machine learning (ICML)
go back to reference Haarnoja T, Hartikainen K, Abbeel P, Levine S (2018) Latent space policies for hierarchical reinforcement learning. Preprint. arXiv:180402808 Haarnoja T, Hartikainen K, Abbeel P, Levine S (2018) Latent space policies for hierarchical reinforcement learning. Preprint. arXiv:180402808
go back to reference Harutyunyan A, Vrancx P, Bacon PL, Precup D, Nowe A (2018) Learning with options that terminate off-policy. In: Thirty-second AAAI conference on artificial intelligence Harutyunyan A, Vrancx P, Bacon PL, Precup D, Nowe A (2018) Learning with options that terminate off-policy. In: Thirty-second AAAI conference on artificial intelligence
go back to reference Hausknecht MJ (2000) Temporal abstraction in reinforcement learning. PhD thesis Hausknecht MJ (2000) Temporal abstraction in reinforcement learning. PhD thesis
go back to reference Hauskrecht M, Meuleau N, Kaelbling LP, Dean T, Boutilier C (1998) Hierarchical solution of Markov decision processes using macro-actions. In: Proceedings of the fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers, Burlington, pp 220–229 Hauskrecht M, Meuleau N, Kaelbling LP, Dean T, Boutilier C (1998) Hierarchical solution of Markov decision processes using macro-actions. In: Proceedings of the fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers, Burlington, pp 220–229
go back to reference Heess N, Wayne G, Tassa Y, Lillicrap T, Riedmiller M, Silver D (2016) Learning and transfer of modulated locomotor controllers. Preprint. arXiv:161005182 Heess N, Wayne G, Tassa Y, Lillicrap T, Riedmiller M, Silver D (2016) Learning and transfer of modulated locomotor controllers. Preprint. arXiv:161005182
go back to reference Kaelbling LP (1993) Hierarchical learning in stochastic domains: preliminary results. In: Proceedings of the tenth international conference on machine learning (ICML), vol 951, pp 167–173 Kaelbling LP (1993) Hierarchical learning in stochastic domains: preliminary results. In: Proceedings of the tenth international conference on machine learning (ICML), vol 951, pp 167–173
go back to reference Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: 2016 IEEE conference on computational intelligence and games (CIG). IEEE, Piscataway, pp 1–8 Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: 2016 IEEE conference on computational intelligence and games (CIG). IEEE, Piscataway, pp 1–8
go back to reference Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014 Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
go back to reference Konidaris G, Barto AG (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems, pp 1015–1023 Konidaris G, Barto AG (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems, pp 1015–1023
go back to reference Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683 Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675–3683
go back to reference Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436CrossRef Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436CrossRef
go back to reference Levy A, Platt R, Saenko K (2018) Hierarchical reinforcement learning with hindsight. Preprint. arXiv:180508180 Levy A, Platt R, Saenko K (2018) Hierarchical reinforcement learning with hindsight. Preprint. arXiv:180508180
go back to reference Machado MC, Bellemare MG, Bowling M (2017) A Laplacian framework for option discovery in reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR.org, pp 2295–2304 Machado MC, Bellemare MG, Bowling M (2017) A Laplacian framework for option discovery in reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR.org, pp 2295–2304
go back to reference Mnih V, Heess N, Graves A, et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212 Mnih V, Heess N, Graves A, et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
go back to reference Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533CrossRef
go back to reference Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (ICML), pp 1928–1937 Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (ICML), pp 1928–1937
go back to reference Nachum O, Gu SS, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. In: Advances in neural information processing systems, pp 3303–3313 Nachum O, Gu SS, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. In: Advances in neural information processing systems, pp 3303–3313
go back to reference Parr R, Russell SJ (1998a) Reinforcement learning with hierarchies of machines. In: Advances in neural information processing systems, pp 1043–1049 Parr R, Russell SJ (1998a) Reinforcement learning with hierarchies of machines. In: Advances in neural information processing systems, pp 1043–1049
go back to reference Parr RE, Russell S (1998b) Hierarchical control and learning for Markov decision processes. University of California, Berkeley Parr RE, Russell S (1998b) Hierarchical control and learning for Markov decision processes. University of California, Berkeley
go back to reference Riemer M, Liu M, Tesauro G (2018) Learning abstract options. In: Advances in neural information processing systems, pp 10424–10434 Riemer M, Liu M, Tesauro G (2018) Learning abstract options. In: Advances in neural information processing systems, pp 10424–10434
go back to reference Sahni H, Kumar S, Tejani F, Schroecker Y, Isbell C (2017) State space decomposition and subgoal creation for transfer in deep reinforcement learning. Preprint. arXiv:170508997 Sahni H, Kumar S, Tejani F, Schroecker Y, Isbell C (2017) State space decomposition and subgoal creation for transfer in deep reinforcement learning. Preprint. arXiv:170508997
go back to reference Schaul T, Horgan D, Gregor K, Silver D (2015) Universal value function approximators. In: International conference on machine learning, pp 1312–1320 Schaul T, Horgan D, Gregor K, Silver D (2015) Universal value function approximators. In: International conference on machine learning, pp 1312–1320
go back to reference Schulman J (2016) Optimizing expectations: from deep reinforcement learning to stochastic computation graphs. PhD thesis, UC Berkeley Schulman J (2016) Optimizing expectations: from deep reinforcement learning to stochastic computation graphs. PhD thesis, UC Berkeley
go back to reference Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning (ICML), pp 1889–1897 Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning (ICML), pp 1889–1897
go back to reference Sharma S, Lakshminarayanan AS, Ravindran B (2017) Learning to repeat: fine grained action repetition for deep reinforcement learning. Preprint. arXiv:170206054 Sharma S, Lakshminarayanan AS, Ravindran B (2017) Learning to repeat: fine grained action repetition for deep reinforcement learning. Preprint. arXiv:170206054
go back to reference Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–489CrossRef Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–489CrossRef
go back to reference Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. Preprint. arXiv:171201815 Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. Preprint. arXiv:171201815
go back to reference Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211MathSciNetCrossRef Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211MathSciNetCrossRef
go back to reference Tamar A, Wu Y, Thomas G, Levine S, Abbeel P (2016) Value iteration networks. In: Advances in neural information processing systems, pp 2154–2162 Tamar A, Wu Y, Thomas G, Levine S, Abbeel P (2016) Value iteration networks. In: Advances in neural information processing systems, pp 2154–2162
go back to reference Tessler C, Givony S, Zahavy T, Mankowitz DJ, Mannor S (2017) A deep hierarchical approach to lifelong learning in minecraft. In: Thirty-first AAAI conference on artificial intelligence Tessler C, Givony S, Zahavy T, Mankowitz DJ, Mannor S (2017) A deep hierarchical approach to lifelong learning in minecraft. In: Thirty-first AAAI conference on artificial intelligence
go back to reference Vezhnevets A, Mnih V, Osindero S, Graves A, Vinyals O, Agapiou J, et al (2016) Strategic attentive writer for learning macro-actions. In: Advances in neural information processing systems, pp 3486–3494 Vezhnevets A, Mnih V, Osindero S, Graves A, Vinyals O, Agapiou J, et al (2016) Strategic attentive writer for learning macro-actions. In: Advances in neural information processing systems, pp 3486–3494
go back to reference Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR.org, pp 3540–3549 Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR.org, pp 3540–3549
go back to reference Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354CrossRef Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354CrossRef
Metadata
Title
Hierarchical Reinforcement Learning
Author
Yanhua Huang
Copyright Year
2020
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-4095-0_10

Premium Partner