Top

Neural Processing Letters

Published in:

23-03-2022

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

Authors: Yunlian Lyu, Yimin Shi, Xianggang Zhang

Published in: Neural Processing Letters | Issue 5/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Embodied Artificial Intelligence has become popular in recent years. Its task shifts from focusing on internet images to active settings, involving an embodied agent to perceive and act within 3D environments. In this paper, we study the Target-driven Visual Navigation (TDVN) in 3D indoor scenes using deep reinforcement learning techniques. The generalization of TDVN is a long-standing ill-posed issue, where the agent is expected to transfer intelligent knowledge from training domains to unseen domains. To address this issue, we propose a model that combines visual and relational graph features to learn the navigation policy. Graph convolutional networks are used to obtain graph features, which encodes spatial relations between objects. We also adopt a Target Skill Extension module to generate sub-targets, in order to allow the agent to learn from its failures. For evaluation, we perform experiments in the AI2-THOR. Experimental results show that our proposed model outperforms baselines under various metrics.

previous article Conditional Variational Autoencoder Networks for Autonomous Vehicle Path Prediction

next article Collaborative Representation Based Discriminant Local Preserving Projection

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J et al (2019) Habitat: a platform for embodied ai research. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9339–9347

Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., Farhadi, A.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings - IEEE international conference on robotics and automation (ICRA), pp 3357–3364 (2017)

Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, van den Hengel A (2018) Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 3674–3683

Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D (2018) Embodied question answering. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 2054–2063

Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255

Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of european conference on computer vision (ECCV), pp 740–755

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1):32–73MathSciNetCrossRef

Chaplot DS, Sathyendra KM, Pasumarthi RK, Rajagopal D, Salakhutdinov R (2018) Gated-attention architectures for task-oriented language grounding. In: Proceedings of AAAI conference on artificial intelligence (AAAI), pp 2819–2826

Wu Y, Wu Y, Gkioxari G, Tian Y (2018) Building generalizable agents with a realistic and rich 3d environmentc

10.

Dhiman V, Banerjee S, Griffin B, Siskind JM, Corso JJ (2018) A critical investigation of deep reinforcement learning for navigation

11.

Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474

12.

Bonin-Font F, Ortiz A, Oliver G (2008) Visual navigation for mobile robots: a survey. J. Intell. Robotic Syst. 53(3):263–296CrossRef

13.

Fuentes-Pacheco J, Ascencio JR, Rendón-Mancha JM (2015) Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1):55–81CrossRef

14.

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

15.

Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. MIT Press, CambridgeMATH

16.

Mousavi SS, Schukat M, Howley E (2016) Deep reinforcement learning: an overview. In: Proceedings of the SAI intelligent systems conference, pp. 426–440. Springer

17.

Zhu Y, Gordon D, Kolve E, Fox D, Fei-Fei L, Gupta A, Mottaghi R, Farhadi A (2017) Visual semantic planning using deep successor representations. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 483–492

18.

Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2019) Visual semantic navigation using scene priors. In: Proceedings of international conference on learning representations (ICLR)

19.

Mei H, Bansal M, Walter MR (2016) Listen, attend, and walk: neural mapping of navigational instructions to action sequences. In: Proceedings of AAAI conference on artificial intelligence (AAAI), pp 2772–2778

20.

Fried D, Hu R, Cirik V, Rohrbach A, Andreas J, Morency L-P, Berg-Kirkpatrick T, Saenko K, Klein D, Darrell T (2018) Speaker-follower models for vision-and-language navigation. In: Proceedings of the neural information processing systems (NIPS), pp 3314–3325

21.

Gordon D, Kembhavi A, Rastegari M, Redmon J, Fox D, Farhadi A (2018) Iqa: visual question answering in interactive environments. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 4089–4098

22.

Devo A, Mezzetti G, Costante G, Fravolini ML, Valigi P (2020) Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Trans. Robot. 36(5):1546–1561CrossRef

23.

Rao Z, Wu Y, Yang Z, Zhang W, Lu S, Lu W, Zha Z (2021) Visual navigation with multiple goals based on deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 32(12):5445–5455CrossRef

24.

Kulkarni TD, Saeedi A, Gautam S, Gershman SJ (2016) Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396

25.

Tessler C, Givony S, Zahavy T, Mankowitz DJ, Mannor S (2017) A deep hierarchical approach to lifelong learning in minecraft. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1553–1561

26.

Mirowski P, Pascanu R, Viola F, Soyer H, Ballard AJ, Banino A, Denil M, Goroshin R, Sifre L, Kavukcuoglu K, et al (2017) Learning to navigate in complex environments. In: Proceedings of the international conference on learning representations (ICLR)

27.

Jaderberg M, Mnih V, Czarnecki WM, Schaul T, Leibo JZ, Silver D, Kavukcuoglu K (2017) Reinforcement learning with unsupervised auxiliary tasks. In: Proceedings of international conference on learning representations (ICLR)

28.

Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) Vizdoom: a doom-based ai research platform for visual reinforcement learning. In: Proceedings of IEEE international conference on computational intelligence and games (CIG), pp. 1–8

29.

Oh J, Chockalingam V, Lee H, et al (2016) Control of memory, active perception, and action in minecraft. In: Proceedings of the international conference on machine learning (ICML), pp 2790–2799

30.

Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A, et al (2016) Deepmind lab

31.

Chaplot DS, Lample G, Sathyendra KM, Salakhutdinov R (2016) Transfer deep reinforcement learning in 3d environments: An empirical study. In: Proceedings of the international conference on neural information processing systems (NIPS)

32.

Parisotto E, Salakhutdinov R (2018) Neural map: structured memory for deep reinforcement learning. In: Proceedings of international conference on learning representations (ICLR)

33.

Oh J, Singh S, Lee H, Kohli P (2017) Zero-shot task generalization with multi-task deep reinforcement learning. In: Proceedings of the international conference on machine learning (ICML), pp 2661–2670

34.

Pathak D, Mahmoudieh P, Luo G, Agrawal P, Chen D, Shentu Y, Shelhamer E, Malik J, Efros AA, Darrell T (2018) Zero-shot visual imitation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 2050–2053

35.

Wortsman M, Ehsani K, Rastegari M, Farhadi A, Mottaghi R (2019) Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 6750–6759

36.

Fang Q, Xu X, Wang X, Zeng Y (2021) Target-driven visual navigation in indoor scenes using reinforcement learning and imitation learning. CAAI T. Intell, Technol

37.

Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 1746–1754

38.

Chang A, Dai A, Funkhouser TA, Halber M, Niebner M, Savva M, Song S, Zeng A, Zhang Y (2018) Matterport3d: Learning from rgb-d data in indoor environments. In: Proceedings of international conference on 3D vision (3DV), pp 667–676

39.

Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of international conference on learning representations (ICLR)

40.

Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of international conference on machine learning (ICML), pp 1928–1937

41.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780CrossRef

42.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 770–778

43.

Chopra S, Hadsell R, LeCun Y, et al (2005) Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 539–546

44.

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement

45.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the neural information processing systems (NIPS), pp 5998–6008

46.

Zhang H, Xiao Z, Wang J, Li F, Szczerbicki E (2019) A novel iot-perceptive human activity recognition (har) approach using multihead convolutional attention. IEEE Internet Things J. 7(2):1072–1080CrossRef

47.

Xiao Z, Xu X, Xing H, Luo S, Dai P, Zhan D (2021) Rtfn: a robust temporal feature network for time series classification. Inf. Sci. 571:65–86MathSciNetCrossRef

48.

Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel OP, Zaremba W (2017) Hindsight experience replay. In: Proceedings of the neural information processing systems (NIPS), pp 5048–5058

49.

Anderson P, Chang A, Chaplot DS, Dosovitskiy A, Gupta S, Koltun V, Kosecka J, Malik J, Mottaghi R, Savva M, et al (2018) On evaluation of embodied navigation agents

50.

Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J, et al (2016) End to end learning for self-driving cars

51.

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems

52.

Lvd Maaten, Hinton G (2008) Visualizing data using t-sne. J. Mach. Learn. Res. 9(Nov):2579–2605MATH

Title: Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships
Authors: Yunlian Lyu
Yimin Shi
Xianggang Zhang
Publication date: 23-03-2022
Publisher: Springer US
Published in: Neural Processing Letters / Issue 5/2022
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-022-10796-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 5/2022

Classification and Enumeration of Linearly Separable Boolean Functions Based on Optimal Separation System

Particle Swarm Optimization Algorithm with Multi-strategies for Delay Scheduling

Label-Aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

A Non-convex Optimization Model for Signal Recovery

Neuron Analysis of the Two-Point Singular Boundary Value Problems Arising in the Thermal Explosion’s Theory

Discovery of Interesting Itemsets for Web Service Composition Using Hybrid Genetic Algorithm