Top

Neural Processing Letters

Published in:

09-03-2023

Double Graph Attention Networks for Visual Semantic Navigation

Authors: Yunlian Lyu, Mohammad Sadegh Talebi

Published in: Neural Processing Letters | Issue 7/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Artificial Intelligence (AI) based on knowledge graphs has been invested in realizing human intelligence like thinking, learning, and logical reasoning. It is a great promise to make AI-based systems not only intelligent but also knowledgeable. In this paper, we investigate knowledge graph based visual semantic navigation using deep reinforcement learning, where an agent reasons actions against targets specified by text words in indoor scenes. The agent perceives its surroundings through egocentric RGB views and learns via trial-and-error. The fundamental problem of visual navigation is efficient learning across different targets and scenes. To obtain an empirical model, we propose a spatial attention model with knowledge graphs, DGVN, which combines both semantic information about observed objects and spatial information about their locations. Our spatial attention model is constructed based on interactions between a 3D global graph and local graphs. The two graphs we adopted encode the spatial relationships between objects and are expected to guide policy search effectively. With the knowledge graph and its robust feature representation using graph convolutional networks, we demonstrate that our agent is able to infer a more plausible attention mechanism for decision-making. Under several experimental metrics, our attention model is shown to achieve superior navigation performance in the AI2-THOR environment.

previous article Switching-Like Event-Triggered State Estimation for Reaction–Diffusion Neural Networks Against DoS Attacks

next article Identification of Hammerstein Systems with Random Fourier Features and Kernel Risk Sensitive Loss

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

One can consider a weighted graph, whose associated adjacency matrix is a weighted matrix. Then each element of A shows how significant the relations are. We leave extensions to weighted graphs as future work.

For a set A, \(\Delta _A\) denotes the set of probability distributions over A.

LSTMs are a particular type of Recurrent Neural Networks (RNNs), and are capable of remembering and using information about previous inputs. LSTMs can remember or forget information in the cell state using specialized neurons called gates. In this manner, LSTMs can retain long-term dependencies and connect information from the past to the present.

We note that A3C is different from A2C in the asynchronous part. Asynchronous means running multiple agents in parallel instead of one to update the shared network periodically and asynchronously.

Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743CrossRef

Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494–514MathSciNetCrossRef

Chen X, Xie H, Li Z, Cheng G (2021) Topic analysis and development in knowledge graph research: a bibliometric review on three decades. Neurocomputing 461:497–515CrossRef

Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems

Hamilton WL (2020) Graph representation learning. Synth Lect Artif Intell Mach Learn 14(3):1–159MATH

Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: international conference on learning representations

Gao C, Zhu J, Zhang F, Wang Z, Li X (2022) A novel representation learning for dynamic graphs based on graph convolutional networks. IEEE Transactions on Cybernetics

Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2019) Visual semantic navigation using scene priors. In: international conference on learning representations

Wortsman M, Ehsani K, Rastegari M, Farhadi A, Mottaghi R (2019) Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6750–6759

10.

Mayo B, Hazan T, Tal A (2021) Visual navigation with spatial attention. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16898–16907

11.

Zhu K, Zhang T (2021) Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Sci Technol 26(5):674–691CrossRef

12.

Möller R, Furnari A, Battiato S, Härmä A, Farinella GM (2021) A survey on human-aware robot navigation. Robot Auton Syst 145:103837CrossRef

13.

Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: international conference on machine learning, pp 1928–1937

14.

Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474

15.

Martinez-Rodriguez JL, López-Arévalo I, Rios-Alvarado AB (2018) Openie-based approach for knowledge graph construction from text. Expert Syst Appl 113:339–355CrossRef

16.

Elhammadi S, Lakshmanan LV, Ng R, Simpson M, Huai B, Wang Z, Wang L (2020) A high precision pipeline for financial knowledge graph construction. In: proceedings of the 28th international conference on computational linguistics, pp 967–977

17.

Li R, Zhang S, He X (2022) Sgtr: end-to-end scene graph generation with transformer. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19486–19496

18.

Lin X, Ding C, Zhan Y, Li Z, Tao D (2022) Hl-net: heterophily learning network for scene graph generation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19476–19485

19.

Shao P, Zhang D, Yang G, Tao J, Che F, Liu T (2022) Tucker decomposition-based temporal knowledge graph completion. Knowl-Based Syst 238:107841CrossRef

20.

Liu S, Grau B, Horrocks I, Kostylev E (2021) Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding. Adv Neural Inf Process Syst 34:2034–2045

21.

Zhao Y, Zhou H, Zhang A, Xie R, Li Q, Zhuang F (2022) Connecting embeddings based on multiplex relational graph attention networks for knowledge graph entity typing. IEEE Transactions on Knowledge and Data Engineering

22.

Huang H, Li C, Peng X, He L, Guo S, Peng H, Wang L, Li J (2022) Cross-knowledge-graph entity alignment via relation prediction. Knowl-Based Syst 240:107813CrossRef

23.

Wu L, Cui P, Pei J, Zhao L, Song L (2022) Graph neural networks. In: Graph Neural Networks: Foundations, Frontiers, and Applications, pp 27–37

24.

Zhu Q, Ponomareva N, Han J, Perozzi B (2021) Shift-robust gnns: overcoming the limitations of localized graph training data. Adv Neural Inf Process Syst 34:27965–27977

25.

Gan J, Hu R, Mo Y, Kang Z, Peng L, Zhu Y, Zhu X (2022) Multigraph fusion for dynamic graph convolutional network. IEEE Transactions on Neural Networks and Learning Systems

26.

Kazi A, Cosmo L, Ahmadi S-A, Navab N, Bronstein M (2022) Differentiable graph module (dgm) for graph convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence

27.

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

28.

Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, CambridgeMATH

29.

Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40(4–5):698–721CrossRef

30.

Sewak M (2019) Deep reinforcement learning. Springer, New YorkCrossRefMATH

31.

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef

32.

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

33.

Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: international conference on machine learning, pp 1861–1870

34.

Kuznetsov A, Shvechikov P, Grishin A, Vetrov D (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: international conference on machine learning, pp 5556–5566

35.

Le N, Rathour VS, Yamazaki K, Luu K, Savvides M (2021) Deep reinforcement learning in computer vision: a comprehensive survey. Artificial Intelligence Review, 1–87

36.

Liu R, Jiang D, Zhang X (2022) A stable deep reinforcement learning framework for recommendation. IEEE Intelligent Systems

37.

Huang Z, Wu J, Lv C (2022) Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Transactions on Neural Networks and Learning Systems

38.

Chen Y-F, Huang S-H (2021) Sentiment-influenced trading system based on multimodal deep reinforcement learning. Appl Soft Comput 112:107788CrossRef

39.

Wang L, Xi S, Qian Y, Huang C (2022) A context-aware sensing strategy with deep reinforcement learning for smart healthcare. Pervasive Mob Comput 83:101588CrossRef

40.

Bonin-Font F, Ortiz A, Oliver G (2008) Visual navigation for mobile robots: a survey. J Intell Robot Syst 53(3):263–296CrossRef

41.

Fuentes-Pacheco J, Ruiz-Ascencio J, Rendón-Mancha JM (2015) Visual simultaneous localization and mapping: a survey. Artif Intell Rev 43(1):55–81CrossRef

42.

Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364

43.

Devo A, Mezzetti G, Costante G, Fravolini ML, Valigi P (2020) Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Trans Robot 36(5):1546–1561CrossRef

44.

Lyu Y, Shi Y, Zhang X (2022) Improving target-driven visual navigation with attention on 3d spatial relationships. Neural Processing Letters 1–20

45.

Santos IBdA, Romero RA (2022) A deep reinforcement learning approach with visual semantic navigation with memory for mobile robots in indoor home context. J Intell Robot Syst 104(3):1–21CrossRef

46.

Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van Den Hengel A (2018) Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3683

47.

Deng Z, Narasimhan K, Russakovsky O (2020) Evolving graphical planner: contextual global planning for vision-and-language navigation. Adv Neural Inf Process Syst 33:20660–20672

48.

Wang H, Liang W, Shen J, Van Gool L, Wang W (2022) Counterfactual cycle-consistent learning for instruction following and generation in vision-language navigation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15471–15481

49.

Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D (2018) Embodied question answering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10

50.

Luo H, Lin G, Yao Y, Liu F, Liu Z, Tang Z (2022) Depth and video segmentation based visual attention for embodied question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence

51.

Azuma D, Miyanishi T, Kurita S, Kawanabe M (2022) Scanqa: 3d question answering for spatial scene understanding. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19129–19139

52.

Gan C, Zhang Y, Wu J, Gong B, Tenenbaum JB (2020) Look, listen, and act: towards audio-visual embodied navigation. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9701–9707

53.

Chen C, Jain U, Schissler C, Gari SVA, Al-Halah Z, Ithapu VK, Robinson P, Grauman K (2020) Soundspaces: audio-visual navigation in 3d environments. In: European conference on computer vision, pp 17–36

54.

Chen C, Al-Halah Z, Grauman K (2021) Semantic audio-visual navigation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15516–15525

55.

Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1746–1754

56.

Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Zhang Y (2017) Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158

57.

Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) Minos: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931

58.

Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J etal: (2019) Habitat: a platform for embodied ai research. In: proceedings of the IEEE/CVF international conference on computer vision, pp 9339–9347

59.

Shen B, Xia F, Li C, Martín-Martín R, Fan L, Wang G, Pérez-D’Arpino C, Buch S, Srivastava S, Tchapmi L etal: (2021) igibson 1.0: a simulation environment for interactive tasks in large realistic scenes. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 7520–7527

60.

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma D et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef

61.

Spaan MT (2012) Partially observable markov decision processes. In: Reinforcement Learning, pp 387–414

62.

Aractingi M, Dance C, Perez J, Silander T (2019) Improving the generalization of visual navigation policies using invariance regularization

63.

Ni T, Eysenbach B, Salakhutdinov R (2021) Recurrent model-free rl is a strong baseline for many pomdps. arXiv preprint arXiv:2110.05038

64.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

65.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

66.

Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: international conference on machine learning, pp 1126–1135

67.

Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

68.

Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

69.

Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81CrossRef

70.

Anderson P, Chang A, Chaplot DS, Dosovitskiy A, Gupta S, Koltun V, Kosecka J, Malik J, Mottaghi R, Savva M et al. (2018) On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757

Title: Double Graph Attention Networks for Visual Semantic Navigation
Authors: Yunlian Lyu
Mohammad Sadegh Talebi
Publication date: 09-03-2023
Publisher: Springer US
Published in: Neural Processing Letters / Issue 7/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-023-11190-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 7/2023

Multi-Scale Arc-Fusion Based Feature Embedding for Small-Scale Biometrics

New Results on Robust Finite-Time Extended Dissipativity for Uncertain Fractional-Order Neural Networks

Multi-class Motor Imagery Recognition of Single Joint in Upper Limb Based on Multi-domain Feature Fusion

Zeroing Neural Network Based on Neutrosophic Logic for Calculating Minimal-Norm Least-Squares Solutions to Time-Varying Linear Systems

Incomplete Multi-view Clustering Based on Self-representation

Novel Parallel Multiple Minor Components Extraction Algorithm by Diagonal Matrix Method