Top

Intelligent Service Robotics

Published in:

01-03-2022 | Original Research Paper

Deep latent-space sequential skill chaining from incomplete demonstrations

Authors: Minjae Kang, Songhwai Oh

Published in: Intelligent Service Robotics | Issue 2/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Imitation learning is a methodology, which trains an agent using demonstrations from skilled experts without external rewards. However, for a complex task with a long horizon, it is challenging to obtain data that exactly match the desired task. In general, humans can easily assign a sequence of simple tasks for performing complex tasks. If a person gives an agent an order of simple tasks to carry out a complex task, we can find a skill sequence efficiently by learning the corresponding skills. However, independently trained low-level skills (simple tasks) are incompatible, so they cannot be performed in sequence without additional refinement. In this context, we propose a method to create a skill chain by connecting independently learned skills. For connecting two consecutive low-level policies, we need to find a new policy defined as a bridge skill. To train a bridge skill, a well-designed reward function is required, but in real world, only sparse rewards can be given according to the success of the overall task. To complement this issue, we introduce a novel latent-distance reward function from fragmented demonstrations. Also, we use binary classifiers to determine whether the current state is capable of performing the skill that follows. As a result, the skill chain formed from incomplete demonstrations can successfully perform complex tasks which require performing multiple skills in a sequence. In the experiment, we solve manipulation tasks with RGBD images as input in the Baxter simulator implemented using MuJoCo. We verify that skill chains can be successfully trained from incomplete data while confirming that the agent can be trained much more efficiently and stably through the proposed latent-distance rewards. Also, we perform block stacking using a real Baxter robot in the simple set-up environment.

previous article Pepper to fall: a perception method for sweet pepper robotic harvesting

next article Ensemble control of spatial variance of microbot systems through sequencing of motion primitives from optimal control trajectories

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence

Bagaria A, Konidaris G (2019) Option discovery using deep skill chaining. In: International Conference on Learning Representations

Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1582–1591

Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, PMLR, pp 1352–1361

Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1856–1865

Ho J, Ermon S (2016) Generative adversarial imitation learning. arxiv:1606.03476

Kingma DP, Welling M (2013) Auto-encoding variational bayes. arxiv:1312.6114

Konidaris G, Barto A (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. Adv Neural Inf Process Syst 22:1015–1023

Lee G, Kim D, Oh W, et al (2020) Mixgail: Autonomous driving using demonstrations with mixed qualities. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5425–5430

10.

Lee K, Choi S, Oh S (2018) Maximum causal tsallis entropy imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4408–4418

11.

Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arxiv:1509.0297

12.

Nachum O, Gu S, Lee H, et al (2018) Data-efficient hierarchical reinforcement learning. arxiv:1805.08296

13.

Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proc. of the International Conference on Machine Learning

14.

Osband I, Blundell C, Pritzel A et al (2016) Deep exploration via bootstrapped DQN. Adv Neural Inf Process Syst 26:4026–4034

15.

Pan Y, Cheng CA, Saigol K, et al (2017) Agile autonomous driving using end-to-end deep imitation learning. arxiv:1709.0717

16.

Peng XB, Kanazawa A, Toyer S, et al (2018) Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arxiv:1810.00821

17.

Peng XB, Coumans E, Zhang T, et al (2020) Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (RSS)

18.

Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems

19.

Ratliff N, Bagnell JA, Srinivasa SS (2007) Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 392–397

20.

Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. CoRR

21.

Vezhnevets AS, Osindero S, Schaul T, et al (2017) Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, PMLR, pp 3540–3549

22.

Xie F, Chowdhury A, Kaluza M, et al (2020) Deep imitation learning for bimanual robotic manipulation. arxiv:2010.0513

23.

Zhang T, McCarthy Z, Jow O, et al (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5628–5635

Title: Deep latent-space sequential skill chaining from incomplete demonstrations
Authors: Minjae Kang
Songhwai Oh
Publication date: 01-03-2022
Publisher: Springer Berlin Heidelberg
Published in: Intelligent Service Robotics / Issue 2/2022
Print ISSN: 1861-2776
Electronic ISSN: 1861-2784
DOI: https://doi.org/10.1007/s11370-021-00409-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 2/2022

Special Issue on the 2021 Ubiquitous Robots Conference

What if there was no revisit? Large-scale graph-based SLAM with traffic sign detection in an HD map using LiDAR inertial odometry

Pepper to fall: a perception method for sweet pepper robotic harvesting

Ensemble control of spatial variance of microbot systems through sequencing of motion primitives from optimal control trajectories

Enabling landings on irregular surfaces for unmanned aerial vehicles via a novel robotic landing gear

Feedforward operational stiffness modulation and external force estimation of planar robots equipped with variable stiffness actuators