nach oben

Neural Processing Letters

Erschienen in:

09.06.2022

Variational Diversity Maximization for Hierarchical Skill Discovery

verfasst von: Yingnan Zhao, Peng Liu, Wei Zhao, Xianglong Tang

Erschienen in: Neural Processing Letters | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Hierarchical Reinforcement Learning (HRL) has led to rapid progress on structured exploration and solving challenging tasks. In HRL, planning with skills instead of actions, which effectively shortens the task’s length and decreases the complexity of the problems. Most works on skill discovery focus on finding diverse skills. However, existing methods fail to increase the diversity of the visited states when the agent performs skills. In this paper, "Variational Diversity Maximization" (VIM) is proposed to address this problem. VIM encourages the agent to maximize an information theoretic objective: The entropy of states conditioned on skills. Hence the agent explores more about the environment when performing skills, increasing the possibility of finding the optimal policy. Maximizing the proposed conditional entropy is not a trivial problem. VIM approximates it through the reconstruction error of conditional variational autoencoders, hence this problem is solved elegantly. Besides this entropy, the mutual information between states and skills is also maximized to discover diverse skills, as other methods. Furthermore, a novel method is proposed to measure the diversity of skills efficiently. Experimental results suggest that VIM allows the agent to learn exploratory skills in an unsupervised way, and the agent achieves strong performance on the challenging tasks with these learned skills. Moreover, the proposed method can be easily combined with other planning algorithms to solve complicated tasks.

Vorheriger Artikel DCT-Net: A Neurodynamic Approach with Definable Convergence Property for Real-Time Synchronization of Chaotic Systems

Nächster Artikel Counter Propagation Network Based Extreme Learning Machine

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Achiam J, Edwards H, Amodei D, Abbeel P (2018) Variational option discovery algorithms. CoRR arxiv: abs/1807.10299

Bacon P, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, p 1726–1734. San Francisco, California

Bagaria A, Senthil JK, Konidaris G (2021) Skill discovery for exploration and planning using deep skill graphs. In: International Conference on Machine Learning, p 521–531. PMLR

Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. In: 7th International Conference on Learning Representations. New Orleans, LA

Dayan P, Hinton GE (1992) Feudal reinforcement learning. Advances in neural information processing systems 5. Denver, Colorado, pp 271–278

Deci EL, Ryan RM (2010) Intrinsic motivation. The corsini encyclopedia of psychology pp. 1–2

Eysenbach B, Gupta A, Ibarz J, Levine S (2019) Diversity is all you need: Learning skills without a reward function. In: 7th International Conference on Learning Representations. New Orleans, LA

Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012

Frans K, Ho J, Chen X, Abbeel P, Schulman J (2018) Meta learning shared hierarchies. In: 6th International Conference on Learning Representations. Vancouver, BC

10.

Gregor K, Rezende DJ, Wierstra D (2016) Variational intrinsic control. arXiv preprint arXiv:1611.07507

11.

Ha DR, Schmidhuber J (2018) World models. ArXiv: abs/1803.10122

12.

Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2019) Soft actor-critic algorithms and applications

13.

Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems 29, p 1109–1117. Barcelona, Spain

14.

Kim J, Park S, Kim G (2021) Unsupervised skill discovery with bottleneck option learning. In: International Conference on Machine Learning, p 5572–5582. PMLR

15.

Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations. Banff, AB

16.

Li Y (2017) Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274

17.

Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. arXiv preprint arXiv:1809.07731

18.

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529CrossRef

19.

Moerland TM, Broekens J, Jonker CM (2020) Model-based reinforcement learning: A survey. arXiv preprint arXiv:2006.16712

20.

Nachum, O., Gu, S., Lee, H., Levine, S (2018) Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems 31, p 3307–3317. Montréal, Canada

21.

Nachum O, Gu S, Lee H, Levine S (2019) Near-optimal representation learning for hierarchical reinforcement learning. In: 7th International Conference on Learning Representations. New Orleans, LA

22.

Polydoros AS, Nalpantidis L (2017) Survey of model-based reinforcement learning: Applications on robotics. J Intell Robotic Syst 86(2):153–173CrossRef

23.

Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, p 1889–1897. Lille, France

24.

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359CrossRef

25.

Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems 28, p 3483–3491. Montreal, Quebec

26.

Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull 2(4):160–163. https://doi.org/10.1145/122344.122377CrossRef

27.

Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press, Cambridge, MassachusettsMATH

28.

Sutton RS, Precup D, Singh SP (1999) Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1–2):181–211MathSciNetCrossRefMATH

29.

Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, p 3540–3549. Sydney, NSW

30.

Wulfmeier M, Rao D, Hafner R, Lampe T, Abdolmaleki A, Hertweck T, Neunert M, Tirumala D, Siegel N, Heess N, et al (2021) Data-efficient hindsight off-policy option learning. In: International Conference on Machine Learning, p 11340–11350. PMLR

Titel: Variational Diversity Maximization for Hierarchical Skill Discovery
verfasst von: Yingnan Zhao
Peng Liu
Wei Zhao
Xianglong Tang
Publikationsdatum: 09.06.2022
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 1/2023
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-022-10912-8

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Strompreise/© vejaa / stock.adobe.com, Bunte Männchen, die Kunden darstelle, werden von einem riesigen Magneten angezogen. /© Oleksiy Mark, Dr. Daniel Schneider/© Fraunhofer IESE, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2023

First-order Layer in Artificial Pain Pathway

Stability Analysis of the Impulsive Projection Neural Network

Improving the Accuracy of Diabetes Diagnosis Applications through a Hybrid Feature Selection Algorithm

Nonlinear Neural Network Based Forecasting Model for Predicting COVID-19 Cases

A Non-invasive Approach to Identify Insulin Resistance with Triglycerides and HDL-c Ratio Using Machine learning

Recurrent Neural Network for Genome Sequencing for Personalized Cancer Treatment in Precision Healthcare

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.