Skip to main content
Erschienen in: Neural Processing Letters 1/2023

09.06.2022

Variational Diversity Maximization for Hierarchical Skill Discovery

verfasst von: Yingnan Zhao, Peng Liu, Wei Zhao, Xianglong Tang

Erschienen in: Neural Processing Letters | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hierarchical Reinforcement Learning (HRL) has led to rapid progress on structured exploration and solving challenging tasks. In HRL, planning with skills instead of actions, which effectively shortens the task’s length and decreases the complexity of the problems. Most works on skill discovery focus on finding diverse skills. However, existing methods fail to increase the diversity of the visited states when the agent performs skills. In this paper, "Variational Diversity Maximization" (VIM) is proposed to address this problem. VIM encourages the agent to maximize an information theoretic objective: The entropy of states conditioned on skills. Hence the agent explores more about the environment when performing skills, increasing the possibility of finding the optimal policy. Maximizing the proposed conditional entropy is not a trivial problem. VIM approximates it through the reconstruction error of conditional variational autoencoders, hence this problem is solved elegantly. Besides this entropy, the mutual information between states and skills is also maximized to discover diverse skills, as other methods. Furthermore, a novel method is proposed to measure the diversity of skills efficiently. Experimental results suggest that VIM allows the agent to learn exploratory skills in an unsupervised way, and the agent achieves strong performance on the challenging tasks with these learned skills. Moreover, the proposed method can be easily combined with other planning algorithms to solve complicated tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, p 1726–1734. San Francisco, California Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, p 1726–1734. San Francisco, California
3.
Zurück zum Zitat Bagaria A, Senthil JK, Konidaris G (2021) Skill discovery for exploration and planning using deep skill graphs. In: International Conference on Machine Learning, p 521–531. PMLR Bagaria A, Senthil JK, Konidaris G (2021) Skill discovery for exploration and planning using deep skill graphs. In: International Conference on Machine Learning, p 521–531. PMLR
4.
Zurück zum Zitat Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. In: 7th International Conference on Learning Representations. New Orleans, LA Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. In: 7th International Conference on Learning Representations. New Orleans, LA
5.
Zurück zum Zitat Dayan P, Hinton GE (1992) Feudal reinforcement learning. Advances in neural information processing systems 5. Denver, Colorado, pp 271–278 Dayan P, Hinton GE (1992) Feudal reinforcement learning. Advances in neural information processing systems 5. Denver, Colorado, pp 271–278
6.
Zurück zum Zitat Deci EL, Ryan RM (2010) Intrinsic motivation. The corsini encyclopedia of psychology pp. 1–2 Deci EL, Ryan RM (2010) Intrinsic motivation. The corsini encyclopedia of psychology pp. 1–2
7.
Zurück zum Zitat Eysenbach B, Gupta A, Ibarz J, Levine S (2019) Diversity is all you need: Learning skills without a reward function. In: 7th International Conference on Learning Representations. New Orleans, LA Eysenbach B, Gupta A, Ibarz J, Levine S (2019) Diversity is all you need: Learning skills without a reward function. In: 7th International Conference on Learning Representations. New Orleans, LA
8.
Zurück zum Zitat Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:​1704.​03012
9.
Zurück zum Zitat Frans K, Ho J, Chen X, Abbeel P, Schulman J (2018) Meta learning shared hierarchies. In: 6th International Conference on Learning Representations. Vancouver, BC Frans K, Ho J, Chen X, Abbeel P, Schulman J (2018) Meta learning shared hierarchies. In: 6th International Conference on Learning Representations. Vancouver, BC
12.
Zurück zum Zitat Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2019) Soft actor-critic algorithms and applications Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2019) Soft actor-critic algorithms and applications
13.
Zurück zum Zitat Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems 29, p 1109–1117. Barcelona, Spain Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems 29, p 1109–1117. Barcelona, Spain
14.
Zurück zum Zitat Kim J, Park S, Kim G (2021) Unsupervised skill discovery with bottleneck option learning. In: International Conference on Machine Learning, p 5572–5582. PMLR Kim J, Park S, Kim G (2021) Unsupervised skill discovery with bottleneck option learning. In: International Conference on Machine Learning, p 5572–5582. PMLR
15.
Zurück zum Zitat Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations. Banff, AB Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations. Banff, AB
17.
Zurück zum Zitat Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. arXiv preprint arXiv:1809.07731 Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. arXiv preprint arXiv:​1809.​07731
18.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529CrossRef
19.
20.
Zurück zum Zitat Nachum, O., Gu, S., Lee, H., Levine, S (2018) Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems 31, p 3307–3317. Montréal, Canada Nachum, O., Gu, S., Lee, H., Levine, S (2018) Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems 31, p 3307–3317. Montréal, Canada
21.
Zurück zum Zitat Nachum O, Gu S, Lee H, Levine S (2019) Near-optimal representation learning for hierarchical reinforcement learning. In: 7th International Conference on Learning Representations. New Orleans, LA Nachum O, Gu S, Lee H, Levine S (2019) Near-optimal representation learning for hierarchical reinforcement learning. In: 7th International Conference on Learning Representations. New Orleans, LA
22.
Zurück zum Zitat Polydoros AS, Nalpantidis L (2017) Survey of model-based reinforcement learning: Applications on robotics. J Intell Robotic Syst 86(2):153–173CrossRef Polydoros AS, Nalpantidis L (2017) Survey of model-based reinforcement learning: Applications on robotics. J Intell Robotic Syst 86(2):153–173CrossRef
23.
Zurück zum Zitat Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, p 1889–1897. Lille, France Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, p 1889–1897. Lille, France
24.
Zurück zum Zitat Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359CrossRef Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359CrossRef
25.
Zurück zum Zitat Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems 28, p 3483–3491. Montreal, Quebec Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems 28, p 3483–3491. Montreal, Quebec
27.
Zurück zum Zitat Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press, Cambridge, MassachusettsMATH Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press, Cambridge, MassachusettsMATH
28.
Zurück zum Zitat Sutton RS, Precup D, Singh SP (1999) Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211MathSciNetCrossRefMATH Sutton RS, Precup D, Singh SP (1999) Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211MathSciNetCrossRefMATH
29.
Zurück zum Zitat Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, p 3540–3549. Sydney, NSW Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, p 3540–3549. Sydney, NSW
30.
Zurück zum Zitat Wulfmeier M, Rao D, Hafner R, Lampe T, Abdolmaleki A, Hertweck T, Neunert M, Tirumala D, Siegel N, Heess N, et al (2021) Data-efficient hindsight off-policy option learning. In: International Conference on Machine Learning, p 11340–11350. PMLR Wulfmeier M, Rao D, Hafner R, Lampe T, Abdolmaleki A, Hertweck T, Neunert M, Tirumala D, Siegel N, Heess N, et al (2021) Data-efficient hindsight off-policy option learning. In: International Conference on Machine Learning, p 11340–11350. PMLR
Metadaten
Titel
Variational Diversity Maximization for Hierarchical Skill Discovery
verfasst von
Yingnan Zhao
Peng Liu
Wei Zhao
Xianglong Tang
Publikationsdatum
09.06.2022
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 1/2023
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-022-10912-8

Weitere Artikel der Ausgabe 1/2023

Neural Processing Letters 1/2023 Zur Ausgabe

Neuer Inhalt