nach oben

Neural Computing and Applications

Erschienen in:

Open Access 04.12.2017 | S.I. : EANN 2016

Deep imitation learning for 3D navigation tasks

verfasst von: Ahmed Hussein, Eyad Elyan, Mohamed Medhat Gaber, Chrisina Jayne

Erschienen in: Neural Computing and Applications | Ausgabe 7/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

Vorheriger Artikel FuSSFFra, a fuzzy semi-supervised forecasting framework: the case of the air pollution in Athens

Nächster Artikel Automated selection of optimal material for pressurized multi-layer composite tubes based on an evolutionary approach

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Hussein A, Gaber MM, Elyan E (2016) Deep active learning for autonomous navigation. In: International conference on engineering applications of neural networks. Springer, pp 3–17

Mash-Simulator (2014) Mash-simulator. https://github.com/idiap/mash-simulator

Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 3642–3649

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef

Levine S, Finn C, Darrell T, Abbeel P (2015) End-to-end training of deep visuomotor policies. arXiv preprint arXiv:150400702

Guo X, Singh S, Lee H, Lewis RL, Wang X (2014) Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Advances in neural information processing systems, pp 3338–3346

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971

10.

Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:160301121

11.

Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 1

12.

Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):21CrossRef

13.

Togelius J, De Nardi R, Lucas SM (2007) Towards automatic personalised content creation for racing games. In: Proceedings of IEEE symposium on computational intelligence and games, 2007. CIG 2007. IEEE, pp 252–259

14.

Sammut C, Hurst S, Kedzier D, Michie D et al (1992) Learning to fly. In: Proceedings of the ninth international workshop on machine learning, pp 385–393

15.

Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. Adv Neural Inf Process Syst 19:1

16.

Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Ang MH, Khatib O (eds) Experimental robotics IX. Springer, Berlin, Heidelberg, pp 363–372CrossRef

17.

Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: science and systems. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA

18.

Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2007) Boosting structured prediction for imitation learning. Robotics Institute, Pittsburgh, p 54

19.

Chernova S, Veloso M (2007) Confidence-based policy learning from demonstration using Gaussian mixture models. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. ACM, p 233

20.

Ollis M, Huang WH, Happold M (2007) A bayesian approach to imitation learning for robot navigation. In: IEEE/RSJ international conference on intelligent robots and systems, 2007. IROS 2007. IEEE, pp 709–714

21.

Saunders J, Nehaniv CL, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, pp 118–125

22.

Ross S, Melik-Barkhudarov N, Shankar KS, Wendel A, Dey D, Bagnell JA, Hebert M (2013) Learning monocular reactive uav control in cluttered natural environments. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 1765–1772

23.

Zhang T, Kahn G, Levine S, Abbeel P (2016) Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: IEEE international conference on robotics and automation (ICRA), 2016. IEEE, pp 528–535

24.

Dixon KR, Khosla PK (2004) Learning by observation with mobile robots: a computational approach. In: Proceedings of IEEE international conference on robotics and automation, 2004. ICRA’04 2004, vol 1. IEEE, pp 102–107

25.

Ross S, Bagnell D (2010) Efficient reductions for imitation learning. In: Proceedings of international conference on artificial intelligence and statistics, pp 661–668

26.

Munoz J, Gutierrez G, Sanchis A (2009) Controller for torcs created by imitation. In: IEEE symposium on computational intelligence and games, 2009. CIG 2009. IEEE, pp 271–278

27.

Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2016) Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv preprint arXiv:160905143

28.

Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of international conference on machine learning

29.

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv preprint arXiv:13125602

30.

Schliebs S, Fiasché M, Kasabov N (2012) Constructing robust liquid state machines to process highly variable data streams. In: Villa AEP, Duch W, Érdi P, Masulli F, Palm G (eds) Artificial neural networks and machine learning–ICANN 2012, vol 7552. Springer, Berlin, Heidelberg

31.

Schliebs S, Kasabov N (2013) Evolving spiking neural networka survey. Evol Syst 4(2):87–98CrossRef

32.

Gandhi D, Pinto L, Gupta A (2017) Learning to fly by crashing. arXiv preprint arXiv:170405588

33.

Ranjan K, Christensen A, Ramos B (2016) Recurrent deep Q-learning for PAC-MAN

34.

Wulfmeier M, Ondruska P, Posner I (2015) Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:150704888

35.

Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1766–1774

36.

Levine S, Koltun V (2013) Guided policy search. In: Proceedings of the 30th international conference on machine learning, pp 1–9

37.

Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J et al (2017) Learning from demonstrations for real world reinforcement learning. arXiv preprint arXiv:170403732

38.

Calinon S, Billard AG (2007) What is the teachers role in robot programming by demonstration?: toward benchmarks for improved learning. Interact. Stud. 8(3):441–464CrossRef

39.

Judah K, Fern A, Dietterich TG (2012) Active imitation learning via reduction to IID active learning. arXiv preprint arXiv:12104876

40.

Ikemoto S, Amor HB, Minato T, Jung B, Ishiguro H (2012) Physical human-robot interaction: mutual learning and adaptation. Robot Automat Mag IEEE 19(4):24–35CrossRef

41.

Fiasché M, Verma A, Cuzzola M, Morabito FC, Irrera G (2011) Incremental–adaptive–knowledge based-learning for informative rules extraction in classification analysis of aGvHD. In: Iliadis L, Jayne C (eds) Engineering applications of neural networks. Springer, Berlin, Heidelberg, pp 361–371CrossRef

42.

Kasabov N (2007) Evolving connectionist systems: the knowledge engineering approach. Springer, BerlinMATH

43.

Robins A (1995) Catastrophic forgetting, rehearsal and pseudorehearsal. Connect Sci 7(2):123–146CrossRef

44.

Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. In: Proceedings of the national academy of sciences, p 201611835

Titel: Deep imitation learning for 3D navigation tasks
verfasst von: Ahmed Hussein
Eyad Elyan
Mohamed Medhat Gaber
Chrisina Jayne
Publikationsdatum: 04.12.2017
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 7/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3241-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 7/2018

Automated selection of optimal material for pressurized multi-layer composite tubes based on an evolutionary approach

Constructive lower bounds on model complexity of shallow perceptron networks

Input-to-state stability for a class of discrete-time nonlinear input-saturated switched descriptor systems with unstable subsystems

Novel intuitionistic fuzzy soft multiple-attribute decision-making methods

A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots

Intelligent computing to solve fifth-order boundary value problem arising in induction motor models