Skip to main content
Top
Published in: Intelligent Service Robotics 2/2022

01-03-2022 | Original Research Paper

Deep latent-space sequential skill chaining from incomplete demonstrations

Authors: Minjae Kang, Songhwai Oh

Published in: Intelligent Service Robotics | Issue 2/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Imitation learning is a methodology, which trains an agent using demonstrations from skilled experts without external rewards. However, for a complex task with a long horizon, it is challenging to obtain data that exactly match the desired task. In general, humans can easily assign a sequence of simple tasks for performing complex tasks. If a person gives an agent an order of simple tasks to carry out a complex task, we can find a skill sequence efficiently by learning the corresponding skills. However, independently trained low-level skills (simple tasks) are incompatible, so they cannot be performed in sequence without additional refinement. In this context, we propose a method to create a skill chain by connecting independently learned skills. For connecting two consecutive low-level policies, we need to find a new policy defined as a bridge skill. To train a bridge skill, a well-designed reward function is required, but in real world, only sparse rewards can be given according to the success of the overall task. To complement this issue, we introduce a novel latent-distance reward function from fragmented demonstrations. Also, we use binary classifiers to determine whether the current state is capable of performing the skill that follows. As a result, the skill chain formed from incomplete demonstrations can successfully perform complex tasks which require performing multiple skills in a sequence. In the experiment, we solve manipulation tasks with RGBD images as input in the Baxter simulator implemented using MuJoCo. We verify that skill chains can be successfully trained from incomplete data while confirming that the agent can be trained much more efficiently and stably through the proposed latent-distance rewards. Also, we perform block stacking using a real Baxter robot in the simple set-up environment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence
2.
go back to reference Bagaria A, Konidaris G (2019) Option discovery using deep skill chaining. In: International Conference on Learning Representations Bagaria A, Konidaris G (2019) Option discovery using deep skill chaining. In: International Conference on Learning Representations
3.
go back to reference Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1582–1591 Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1582–1591
4.
go back to reference Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, PMLR, pp 1352–1361 Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, PMLR, pp 1352–1361
5.
go back to reference Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1856–1865 Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1856–1865
8.
go back to reference Konidaris G, Barto A (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. Adv Neural Inf Process Syst 22:1015–1023 Konidaris G, Barto A (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. Adv Neural Inf Process Syst 22:1015–1023
9.
go back to reference Lee G, Kim D, Oh W, et al (2020) Mixgail: Autonomous driving using demonstrations with mixed qualities. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5425–5430 Lee G, Kim D, Oh W, et al (2020) Mixgail: Autonomous driving using demonstrations with mixed qualities. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5425–5430
10.
go back to reference Lee K, Choi S, Oh S (2018) Maximum causal tsallis entropy imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4408–4418 Lee K, Choi S, Oh S (2018) Maximum causal tsallis entropy imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4408–4418
11.
13.
go back to reference Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proc. of the International Conference on Machine Learning Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proc. of the International Conference on Machine Learning
14.
go back to reference Osband I, Blundell C, Pritzel A et al (2016) Deep exploration via bootstrapped DQN. Adv Neural Inf Process Syst 26:4026–4034 Osband I, Blundell C, Pritzel A et al (2016) Deep exploration via bootstrapped DQN. Adv Neural Inf Process Syst 26:4026–4034
15.
16.
go back to reference Peng XB, Kanazawa A, Toyer S, et al (2018) Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arxiv:1810.00821 Peng XB, Kanazawa A, Toyer S, et al (2018) Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arxiv:​1810.​00821
17.
go back to reference Peng XB, Coumans E, Zhang T, et al (2020) Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (RSS) Peng XB, Coumans E, Zhang T, et al (2020) Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (RSS)
18.
go back to reference Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems
19.
go back to reference Ratliff N, Bagnell JA, Srinivasa SS (2007) Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 392–397 Ratliff N, Bagnell JA, Srinivasa SS (2007) Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 392–397
20.
go back to reference Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. CoRR Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. CoRR
21.
go back to reference Vezhnevets AS, Osindero S, Schaul T, et al (2017) Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, PMLR, pp 3540–3549 Vezhnevets AS, Osindero S, Schaul T, et al (2017) Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, PMLR, pp 3540–3549
22.
23.
go back to reference Zhang T, McCarthy Z, Jow O, et al (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5628–5635 Zhang T, McCarthy Z, Jow O, et al (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5628–5635
Metadata
Title
Deep latent-space sequential skill chaining from incomplete demonstrations
Authors
Minjae Kang
Songhwai Oh
Publication date
01-03-2022
Publisher
Springer Berlin Heidelberg
Published in
Intelligent Service Robotics / Issue 2/2022
Print ISSN: 1861-2776
Electronic ISSN: 1861-2784
DOI
https://doi.org/10.1007/s11370-021-00409-z

Other articles of this Issue 2/2022

Intelligent Service Robotics 2/2022 Go to the issue