Skip to main content
Erschienen in: Neural Computing and Applications 7/2018

Open Access 04.12.2017 | S.I. : EANN 2016

Deep imitation learning for 3D navigation tasks

verfasst von: Ahmed Hussein, Eyad Elyan, Mohamed Medhat Gaber, Chrisina Jayne

Erschienen in: Neural Computing and Applications | Ausgabe 7/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hussein A, Gaber MM, Elyan E (2016) Deep active learning for autonomous navigation. In: International conference on engineering applications of neural networks. Springer, pp 3–17 Hussein A, Gaber MM, Elyan E (2016) Deep active learning for autonomous navigation. In: International conference on engineering applications of neural networks. Springer, pp 3–17
3.
Zurück zum Zitat Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 3642–3649 Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 3642–3649
4.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
5.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
6.
Zurück zum Zitat Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef
7.
Zurück zum Zitat Levine S, Finn C, Darrell T, Abbeel P (2015) End-to-end training of deep visuomotor policies. arXiv preprint arXiv:150400702 Levine S, Finn C, Darrell T, Abbeel P (2015) End-to-end training of deep visuomotor policies. arXiv preprint arXiv:​150400702
8.
Zurück zum Zitat Guo X, Singh S, Lee H, Lewis RL, Wang X (2014) Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Advances in neural information processing systems, pp 3338–3346 Guo X, Singh S, Lee H, Lewis RL, Wang X (2014) Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Advances in neural information processing systems, pp 3338–3346
9.
Zurück zum Zitat Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:​150902971
10.
Zurück zum Zitat Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:160301121 Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:​160301121
11.
Zurück zum Zitat Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 1 Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 1
12.
Zurück zum Zitat Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):21CrossRef Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):21CrossRef
13.
Zurück zum Zitat Togelius J, De Nardi R, Lucas SM (2007) Towards automatic personalised content creation for racing games. In: Proceedings of IEEE symposium on computational intelligence and games, 2007. CIG 2007. IEEE, pp 252–259 Togelius J, De Nardi R, Lucas SM (2007) Towards automatic personalised content creation for racing games. In: Proceedings of IEEE symposium on computational intelligence and games, 2007. CIG 2007. IEEE, pp 252–259
14.
Zurück zum Zitat Sammut C, Hurst S, Kedzier D, Michie D et al (1992) Learning to fly. In: Proceedings of the ninth international workshop on machine learning, pp 385–393 Sammut C, Hurst S, Kedzier D, Michie D et al (1992) Learning to fly. In: Proceedings of the ninth international workshop on machine learning, pp 385–393
15.
Zurück zum Zitat Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. Adv Neural Inf Process Syst 19:1 Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. Adv Neural Inf Process Syst 19:1
16.
Zurück zum Zitat Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Ang MH, Khatib O (eds) Experimental robotics IX. Springer, Berlin, Heidelberg, pp 363–372CrossRef Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Ang MH, Khatib O (eds) Experimental robotics IX. Springer, Berlin, Heidelberg, pp 363–372CrossRef
17.
Zurück zum Zitat Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: science and systems. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: science and systems. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
18.
Zurück zum Zitat Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2007) Boosting structured prediction for imitation learning. Robotics Institute, Pittsburgh, p 54 Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2007) Boosting structured prediction for imitation learning. Robotics Institute, Pittsburgh, p 54
19.
Zurück zum Zitat Chernova S, Veloso M (2007) Confidence-based policy learning from demonstration using Gaussian mixture models. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. ACM, p 233 Chernova S, Veloso M (2007) Confidence-based policy learning from demonstration using Gaussian mixture models. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. ACM, p 233
20.
Zurück zum Zitat Ollis M, Huang WH, Happold M (2007) A bayesian approach to imitation learning for robot navigation. In: IEEE/RSJ international conference on intelligent robots and systems, 2007. IROS 2007. IEEE, pp 709–714 Ollis M, Huang WH, Happold M (2007) A bayesian approach to imitation learning for robot navigation. In: IEEE/RSJ international conference on intelligent robots and systems, 2007. IROS 2007. IEEE, pp 709–714
21.
Zurück zum Zitat Saunders J, Nehaniv CL, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, pp 118–125 Saunders J, Nehaniv CL, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, pp 118–125
22.
Zurück zum Zitat Ross S, Melik-Barkhudarov N, Shankar KS, Wendel A, Dey D, Bagnell JA, Hebert M (2013) Learning monocular reactive uav control in cluttered natural environments. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 1765–1772 Ross S, Melik-Barkhudarov N, Shankar KS, Wendel A, Dey D, Bagnell JA, Hebert M (2013) Learning monocular reactive uav control in cluttered natural environments. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 1765–1772
23.
Zurück zum Zitat Zhang T, Kahn G, Levine S, Abbeel P (2016) Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: IEEE international conference on robotics and automation (ICRA), 2016. IEEE, pp 528–535 Zhang T, Kahn G, Levine S, Abbeel P (2016) Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: IEEE international conference on robotics and automation (ICRA), 2016. IEEE, pp 528–535
24.
Zurück zum Zitat Dixon KR, Khosla PK (2004) Learning by observation with mobile robots: a computational approach. In: Proceedings of IEEE international conference on robotics and automation, 2004. ICRA’04 2004, vol 1. IEEE, pp 102–107 Dixon KR, Khosla PK (2004) Learning by observation with mobile robots: a computational approach. In: Proceedings of IEEE international conference on robotics and automation, 2004. ICRA’04 2004, vol 1. IEEE, pp 102–107
25.
Zurück zum Zitat Ross S, Bagnell D (2010) Efficient reductions for imitation learning. In: Proceedings of international conference on artificial intelligence and statistics, pp 661–668 Ross S, Bagnell D (2010) Efficient reductions for imitation learning. In: Proceedings of international conference on artificial intelligence and statistics, pp 661–668
26.
Zurück zum Zitat Munoz J, Gutierrez G, Sanchis A (2009) Controller for torcs created by imitation. In: IEEE symposium on computational intelligence and games, 2009. CIG 2009. IEEE, pp 271–278 Munoz J, Gutierrez G, Sanchis A (2009) Controller for torcs created by imitation. In: IEEE symposium on computational intelligence and games, 2009. CIG 2009. IEEE, pp 271–278
27.
Zurück zum Zitat Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2016) Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv preprint arXiv:160905143 Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2016) Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv preprint arXiv:​160905143
28.
Zurück zum Zitat Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of international conference on machine learning Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of international conference on machine learning
29.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv preprint arXiv:13125602 Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv preprint arXiv:​13125602
30.
Zurück zum Zitat Schliebs S, Fiasché M, Kasabov N (2012) Constructing robust liquid state machines to process highly variable data streams. In: Villa AEP, Duch W, Érdi P, Masulli F, Palm G (eds) Artificial neural networks and machine learning–ICANN 2012, vol 7552. Springer, Berlin, Heidelberg Schliebs S, Fiasché M, Kasabov N (2012) Constructing robust liquid state machines to process highly variable data streams. In: Villa AEP, Duch W, Érdi P, Masulli F, Palm G (eds) Artificial neural networks and machine learning–ICANN 2012, vol 7552. Springer, Berlin, Heidelberg
31.
Zurück zum Zitat Schliebs S, Kasabov N (2013) Evolving spiking neural networka survey. Evol Syst 4(2):87–98CrossRef Schliebs S, Kasabov N (2013) Evolving spiking neural networka survey. Evol Syst 4(2):87–98CrossRef
33.
Zurück zum Zitat Ranjan K, Christensen A, Ramos B (2016) Recurrent deep Q-learning for PAC-MAN Ranjan K, Christensen A, Ramos B (2016) Recurrent deep Q-learning for PAC-MAN
34.
Zurück zum Zitat Wulfmeier M, Ondruska P, Posner I (2015) Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:150704888 Wulfmeier M, Ondruska P, Posner I (2015) Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:​150704888
35.
Zurück zum Zitat Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1766–1774 Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1766–1774
36.
Zurück zum Zitat Levine S, Koltun V (2013) Guided policy search. In: Proceedings of the 30th international conference on machine learning, pp 1–9 Levine S, Koltun V (2013) Guided policy search. In: Proceedings of the 30th international conference on machine learning, pp 1–9
37.
Zurück zum Zitat Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J et al (2017) Learning from demonstrations for real world reinforcement learning. arXiv preprint arXiv:170403732 Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J et al (2017) Learning from demonstrations for real world reinforcement learning. arXiv preprint arXiv:​170403732
38.
Zurück zum Zitat Calinon S, Billard AG (2007) What is the teachers role in robot programming by demonstration?: toward benchmarks for improved learning. Interact. Stud. 8(3):441–464CrossRef Calinon S, Billard AG (2007) What is the teachers role in robot programming by demonstration?: toward benchmarks for improved learning. Interact. Stud. 8(3):441–464CrossRef
39.
Zurück zum Zitat Judah K, Fern A, Dietterich TG (2012) Active imitation learning via reduction to IID active learning. arXiv preprint arXiv:12104876 Judah K, Fern A, Dietterich TG (2012) Active imitation learning via reduction to IID active learning. arXiv preprint arXiv:​12104876
40.
Zurück zum Zitat Ikemoto S, Amor HB, Minato T, Jung B, Ishiguro H (2012) Physical human-robot interaction: mutual learning and adaptation. Robot Automat Mag IEEE 19(4):24–35CrossRef Ikemoto S, Amor HB, Minato T, Jung B, Ishiguro H (2012) Physical human-robot interaction: mutual learning and adaptation. Robot Automat Mag IEEE 19(4):24–35CrossRef
41.
Zurück zum Zitat Fiasché M, Verma A, Cuzzola M, Morabito FC, Irrera G (2011) Incremental–adaptive–knowledge based-learning for informative rules extraction in classification analysis of aGvHD. In: Iliadis L, Jayne C (eds) Engineering applications of neural networks. Springer, Berlin, Heidelberg, pp 361–371CrossRef Fiasché M, Verma A, Cuzzola M, Morabito FC, Irrera G (2011) Incremental–adaptive–knowledge based-learning for informative rules extraction in classification analysis of aGvHD. In: Iliadis L, Jayne C (eds) Engineering applications of neural networks. Springer, Berlin, Heidelberg, pp 361–371CrossRef
42.
Zurück zum Zitat Kasabov N (2007) Evolving connectionist systems: the knowledge engineering approach. Springer, BerlinMATH Kasabov N (2007) Evolving connectionist systems: the knowledge engineering approach. Springer, BerlinMATH
43.
Zurück zum Zitat Robins A (1995) Catastrophic forgetting, rehearsal and pseudorehearsal. Connect Sci 7(2):123–146CrossRef Robins A (1995) Catastrophic forgetting, rehearsal and pseudorehearsal. Connect Sci 7(2):123–146CrossRef
44.
Zurück zum Zitat Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. In: Proceedings of the national academy of sciences, p 201611835 Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. In: Proceedings of the national academy of sciences, p 201611835
Metadaten
Titel
Deep imitation learning for 3D navigation tasks
verfasst von
Ahmed Hussein
Eyad Elyan
Mohamed Medhat Gaber
Chrisina Jayne
Publikationsdatum
04.12.2017
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 7/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-3241-z

Weitere Artikel der Ausgabe 7/2018

Neural Computing and Applications 7/2018 Zur Ausgabe