Top

Neural Computing and Applications

Published in:

Open Access 04-12-2017 | S.I. : EANN 2016

Deep imitation learning for 3D navigation tasks

Authors: Ahmed Hussein, Eyad Elyan, Mohamed Medhat Gaber, Chrisina Jayne

Published in: Neural Computing and Applications | Issue 7/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

previous article FuSSFFra, a fuzzy semi-supervised forecasting framework: the case of the air pollution in Athens

next article Automated selection of optimal material for pressurized multi-layer composite tubes based on an evolutionary approach

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Hussein A, Gaber MM, Elyan E (2016) Deep active learning for autonomous navigation. In: International conference on engineering applications of neural networks. Springer, pp 3–17

Mash-Simulator (2014) Mash-simulator. https://github.com/idiap/mash-simulator

Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 3642–3649

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef

Levine S, Finn C, Darrell T, Abbeel P (2015) End-to-end training of deep visuomotor policies. arXiv preprint arXiv:150400702

Guo X, Singh S, Lee H, Lewis RL, Wang X (2014) Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Advances in neural information processing systems, pp 3338–3346

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971

10.

Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:160301121

11.

Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 1

12.

Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):21CrossRef

13.

Togelius J, De Nardi R, Lucas SM (2007) Towards automatic personalised content creation for racing games. In: Proceedings of IEEE symposium on computational intelligence and games, 2007. CIG 2007. IEEE, pp 252–259

14.

Sammut C, Hurst S, Kedzier D, Michie D et al (1992) Learning to fly. In: Proceedings of the ninth international workshop on machine learning, pp 385–393

15.

Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. Adv Neural Inf Process Syst 19:1

16.

Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Ang MH, Khatib O (eds) Experimental robotics IX. Springer, Berlin, Heidelberg, pp 363–372CrossRef

17.

Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: science and systems. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA

18.

Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2007) Boosting structured prediction for imitation learning. Robotics Institute, Pittsburgh, p 54

19.

Chernova S, Veloso M (2007) Confidence-based policy learning from demonstration using Gaussian mixture models. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. ACM, p 233

20.

Ollis M, Huang WH, Happold M (2007) A bayesian approach to imitation learning for robot navigation. In: IEEE/RSJ international conference on intelligent robots and systems, 2007. IROS 2007. IEEE, pp 709–714

21.

Saunders J, Nehaniv CL, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, pp 118–125

22.

Ross S, Melik-Barkhudarov N, Shankar KS, Wendel A, Dey D, Bagnell JA, Hebert M (2013) Learning monocular reactive uav control in cluttered natural environments. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 1765–1772

23.

Zhang T, Kahn G, Levine S, Abbeel P (2016) Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: IEEE international conference on robotics and automation (ICRA), 2016. IEEE, pp 528–535

24.

Dixon KR, Khosla PK (2004) Learning by observation with mobile robots: a computational approach. In: Proceedings of IEEE international conference on robotics and automation, 2004. ICRA’04 2004, vol 1. IEEE, pp 102–107

25.

Ross S, Bagnell D (2010) Efficient reductions for imitation learning. In: Proceedings of international conference on artificial intelligence and statistics, pp 661–668

26.

Munoz J, Gutierrez G, Sanchis A (2009) Controller for torcs created by imitation. In: IEEE symposium on computational intelligence and games, 2009. CIG 2009. IEEE, pp 271–278

27.

Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2016) Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv preprint arXiv:160905143

28.

Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of international conference on machine learning

29.

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv preprint arXiv:13125602

30.

Schliebs S, Fiasché M, Kasabov N (2012) Constructing robust liquid state machines to process highly variable data streams. In: Villa AEP, Duch W, Érdi P, Masulli F, Palm G (eds) Artificial neural networks and machine learning–ICANN 2012, vol 7552. Springer, Berlin, Heidelberg

31.

Schliebs S, Kasabov N (2013) Evolving spiking neural networka survey. Evol Syst 4(2):87–98CrossRef

32.

Gandhi D, Pinto L, Gupta A (2017) Learning to fly by crashing. arXiv preprint arXiv:170405588

33.

Ranjan K, Christensen A, Ramos B (2016) Recurrent deep Q-learning for PAC-MAN

34.

Wulfmeier M, Ondruska P, Posner I (2015) Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:150704888

35.

Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1766–1774

36.

Levine S, Koltun V (2013) Guided policy search. In: Proceedings of the 30th international conference on machine learning, pp 1–9

37.

Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J et al (2017) Learning from demonstrations for real world reinforcement learning. arXiv preprint arXiv:170403732

38.

Calinon S, Billard AG (2007) What is the teachers role in robot programming by demonstration?: toward benchmarks for improved learning. Interact. Stud. 8(3):441–464CrossRef

39.

Judah K, Fern A, Dietterich TG (2012) Active imitation learning via reduction to IID active learning. arXiv preprint arXiv:12104876

40.

Ikemoto S, Amor HB, Minato T, Jung B, Ishiguro H (2012) Physical human-robot interaction: mutual learning and adaptation. Robot Automat Mag IEEE 19(4):24–35CrossRef

41.

Fiasché M, Verma A, Cuzzola M, Morabito FC, Irrera G (2011) Incremental–adaptive–knowledge based-learning for informative rules extraction in classification analysis of aGvHD. In: Iliadis L, Jayne C (eds) Engineering applications of neural networks. Springer, Berlin, Heidelberg, pp 361–371CrossRef

42.

Kasabov N (2007) Evolving connectionist systems: the knowledge engineering approach. Springer, BerlinMATH

43.

Robins A (1995) Catastrophic forgetting, rehearsal and pseudorehearsal. Connect Sci 7(2):123–146CrossRef

44.

Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. In: Proceedings of the national academy of sciences, p 201611835

Title: Deep imitation learning for 3D navigation tasks
Authors: Ahmed Hussein
Eyad Elyan
Mohamed Medhat Gaber
Chrisina Jayne
Publication date: 04-12-2017
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 7/2018
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3241-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 7/2018

Constructive lower bounds on model complexity of shallow perceptron networks

Novel intuitionistic fuzzy soft multiple-attribute decision-making methods

Automated selection of optimal material for pressurized multi-layer composite tubes based on an evolutionary approach

Fast-flux hunter: a system for filtering online fast-flux botnet

Neutrosophic triplet group

Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

Premium Partner