Skip to main content
Top
Published in: Neural Processing Letters 7/2023

09-03-2023

Double Graph Attention Networks for Visual Semantic Navigation

Authors: Yunlian Lyu, Mohammad Sadegh Talebi

Published in: Neural Processing Letters | Issue 7/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Artificial Intelligence (AI) based on knowledge graphs has been invested in realizing human intelligence like thinking, learning, and logical reasoning. It is a great promise to make AI-based systems not only intelligent but also knowledgeable. In this paper, we investigate knowledge graph based visual semantic navigation using deep reinforcement learning, where an agent reasons actions against targets specified by text words in indoor scenes. The agent perceives its surroundings through egocentric RGB views and learns via trial-and-error. The fundamental problem of visual navigation is efficient learning across different targets and scenes. To obtain an empirical model, we propose a spatial attention model with knowledge graphs, DGVN, which combines both semantic information about observed objects and spatial information about their locations. Our spatial attention model is constructed based on interactions between a 3D global graph and local graphs. The two graphs we adopted encode the spatial relationships between objects and are expected to guide policy search effectively. With the knowledge graph and its robust feature representation using graph convolutional networks, we demonstrate that our agent is able to infer a more plausible attention mechanism for decision-making. Under several experimental metrics, our attention model is shown to achieve superior navigation performance in the AI2-THOR environment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
One can consider a weighted graph, whose associated adjacency matrix is a weighted matrix. Then each element of A shows how significant the relations are. We leave extensions to weighted graphs as future work.
 
2
For a set A, \(\Delta _A\) denotes the set of probability distributions over A.
 
3
LSTMs are a particular type of Recurrent Neural Networks (RNNs), and are capable of remembering and using information about previous inputs. LSTMs can remember or forget information in the cell state using specialized neurons called gates. In this manner, LSTMs can retain long-term dependencies and connect information from the past to the present.
 
4
We note that A3C is different from A2C in the asynchronous part. Asynchronous means running multiple agents in parallel instead of one to update the shared network periodically and asynchronously.
 
Literature
1.
go back to reference Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743CrossRef Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743CrossRef
2.
go back to reference Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494–514MathSciNetCrossRef Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494–514MathSciNetCrossRef
3.
go back to reference Chen X, Xie H, Li Z, Cheng G (2021) Topic analysis and development in knowledge graph research: a bibliometric review on three decades. Neurocomputing 461:497–515CrossRef Chen X, Xie H, Li Z, Cheng G (2021) Topic analysis and development in knowledge graph research: a bibliometric review on three decades. Neurocomputing 461:497–515CrossRef
4.
go back to reference Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems
5.
go back to reference Hamilton WL (2020) Graph representation learning. Synth Lect Artif Intell Mach Learn 14(3):1–159MATH Hamilton WL (2020) Graph representation learning. Synth Lect Artif Intell Mach Learn 14(3):1–159MATH
6.
go back to reference Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: international conference on learning representations Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: international conference on learning representations
7.
go back to reference Gao C, Zhu J, Zhang F, Wang Z, Li X (2022) A novel representation learning for dynamic graphs based on graph convolutional networks. IEEE Transactions on Cybernetics Gao C, Zhu J, Zhang F, Wang Z, Li X (2022) A novel representation learning for dynamic graphs based on graph convolutional networks. IEEE Transactions on Cybernetics
8.
go back to reference Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2019) Visual semantic navigation using scene priors. In: international conference on learning representations Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2019) Visual semantic navigation using scene priors. In: international conference on learning representations
9.
go back to reference Wortsman M, Ehsani K, Rastegari M, Farhadi A, Mottaghi R (2019) Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6750–6759 Wortsman M, Ehsani K, Rastegari M, Farhadi A, Mottaghi R (2019) Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6750–6759
10.
go back to reference Mayo B, Hazan T, Tal A (2021) Visual navigation with spatial attention. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16898–16907 Mayo B, Hazan T, Tal A (2021) Visual navigation with spatial attention. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16898–16907
11.
go back to reference Zhu K, Zhang T (2021) Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Sci Technol 26(5):674–691CrossRef Zhu K, Zhang T (2021) Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Sci Technol 26(5):674–691CrossRef
12.
go back to reference Möller R, Furnari A, Battiato S, Härmä A, Farinella GM (2021) A survey on human-aware robot navigation. Robot Auton Syst 145:103837CrossRef Möller R, Furnari A, Battiato S, Härmä A, Farinella GM (2021) A survey on human-aware robot navigation. Robot Auton Syst 145:103837CrossRef
13.
go back to reference Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: international conference on machine learning, pp 1928–1937 Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: international conference on machine learning, pp 1928–1937
14.
go back to reference Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474 Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv preprint arXiv:​1712.​05474
15.
go back to reference Martinez-Rodriguez JL, López-Arévalo I, Rios-Alvarado AB (2018) Openie-based approach for knowledge graph construction from text. Expert Syst Appl 113:339–355CrossRef Martinez-Rodriguez JL, López-Arévalo I, Rios-Alvarado AB (2018) Openie-based approach for knowledge graph construction from text. Expert Syst Appl 113:339–355CrossRef
16.
go back to reference Elhammadi S, Lakshmanan LV, Ng R, Simpson M, Huai B, Wang Z, Wang L (2020) A high precision pipeline for financial knowledge graph construction. In: proceedings of the 28th international conference on computational linguistics, pp 967–977 Elhammadi S, Lakshmanan LV, Ng R, Simpson M, Huai B, Wang Z, Wang L (2020) A high precision pipeline for financial knowledge graph construction. In: proceedings of the 28th international conference on computational linguistics, pp 967–977
17.
go back to reference Li R, Zhang S, He X (2022) Sgtr: end-to-end scene graph generation with transformer. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19486–19496 Li R, Zhang S, He X (2022) Sgtr: end-to-end scene graph generation with transformer. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19486–19496
18.
go back to reference Lin X, Ding C, Zhan Y, Li Z, Tao D (2022) Hl-net: heterophily learning network for scene graph generation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19476–19485 Lin X, Ding C, Zhan Y, Li Z, Tao D (2022) Hl-net: heterophily learning network for scene graph generation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19476–19485
19.
go back to reference Shao P, Zhang D, Yang G, Tao J, Che F, Liu T (2022) Tucker decomposition-based temporal knowledge graph completion. Knowl-Based Syst 238:107841CrossRef Shao P, Zhang D, Yang G, Tao J, Che F, Liu T (2022) Tucker decomposition-based temporal knowledge graph completion. Knowl-Based Syst 238:107841CrossRef
20.
go back to reference Liu S, Grau B, Horrocks I, Kostylev E (2021) Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding. Adv Neural Inf Process Syst 34:2034–2045 Liu S, Grau B, Horrocks I, Kostylev E (2021) Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding. Adv Neural Inf Process Syst 34:2034–2045
21.
go back to reference Zhao Y, Zhou H, Zhang A, Xie R, Li Q, Zhuang F (2022) Connecting embeddings based on multiplex relational graph attention networks for knowledge graph entity typing. IEEE Transactions on Knowledge and Data Engineering Zhao Y, Zhou H, Zhang A, Xie R, Li Q, Zhuang F (2022) Connecting embeddings based on multiplex relational graph attention networks for knowledge graph entity typing. IEEE Transactions on Knowledge and Data Engineering
22.
go back to reference Huang H, Li C, Peng X, He L, Guo S, Peng H, Wang L, Li J (2022) Cross-knowledge-graph entity alignment via relation prediction. Knowl-Based Syst 240:107813CrossRef Huang H, Li C, Peng X, He L, Guo S, Peng H, Wang L, Li J (2022) Cross-knowledge-graph entity alignment via relation prediction. Knowl-Based Syst 240:107813CrossRef
23.
go back to reference Wu L, Cui P, Pei J, Zhao L, Song L (2022) Graph neural networks. In: Graph Neural Networks: Foundations, Frontiers, and Applications, pp 27–37 Wu L, Cui P, Pei J, Zhao L, Song L (2022) Graph neural networks. In: Graph Neural Networks: Foundations, Frontiers, and Applications, pp 27–37
24.
go back to reference Zhu Q, Ponomareva N, Han J, Perozzi B (2021) Shift-robust gnns: overcoming the limitations of localized graph training data. Adv Neural Inf Process Syst 34:27965–27977 Zhu Q, Ponomareva N, Han J, Perozzi B (2021) Shift-robust gnns: overcoming the limitations of localized graph training data. Adv Neural Inf Process Syst 34:27965–27977
25.
go back to reference Gan J, Hu R, Mo Y, Kang Z, Peng L, Zhu Y, Zhu X (2022) Multigraph fusion for dynamic graph convolutional network. IEEE Transactions on Neural Networks and Learning Systems Gan J, Hu R, Mo Y, Kang Z, Peng L, Zhu Y, Zhu X (2022) Multigraph fusion for dynamic graph convolutional network. IEEE Transactions on Neural Networks and Learning Systems
26.
go back to reference Kazi A, Cosmo L, Ahmadi S-A, Navab N, Bronstein M (2022) Differentiable graph module (dgm) for graph convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Kazi A, Cosmo L, Ahmadi S-A, Navab N, Bronstein M (2022) Differentiable graph module (dgm) for graph convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence
27.
go back to reference LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
28.
go back to reference Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, CambridgeMATH Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, CambridgeMATH
29.
go back to reference Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40(4–5):698–721CrossRef Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40(4–5):698–721CrossRef
31.
go back to reference Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
32.
33.
go back to reference Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: international conference on machine learning, pp 1861–1870 Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: international conference on machine learning, pp 1861–1870
34.
go back to reference Kuznetsov A, Shvechikov P, Grishin A, Vetrov D (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: international conference on machine learning, pp 5556–5566 Kuznetsov A, Shvechikov P, Grishin A, Vetrov D (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: international conference on machine learning, pp 5556–5566
35.
go back to reference Le N, Rathour VS, Yamazaki K, Luu K, Savvides M (2021) Deep reinforcement learning in computer vision: a comprehensive survey. Artificial Intelligence Review, 1–87 Le N, Rathour VS, Yamazaki K, Luu K, Savvides M (2021) Deep reinforcement learning in computer vision: a comprehensive survey. Artificial Intelligence Review, 1–87
36.
go back to reference Liu R, Jiang D, Zhang X (2022) A stable deep reinforcement learning framework for recommendation. IEEE Intelligent Systems Liu R, Jiang D, Zhang X (2022) A stable deep reinforcement learning framework for recommendation. IEEE Intelligent Systems
37.
go back to reference Huang Z, Wu J, Lv C (2022) Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Transactions on Neural Networks and Learning Systems Huang Z, Wu J, Lv C (2022) Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Transactions on Neural Networks and Learning Systems
38.
go back to reference Chen Y-F, Huang S-H (2021) Sentiment-influenced trading system based on multimodal deep reinforcement learning. Appl Soft Comput 112:107788CrossRef Chen Y-F, Huang S-H (2021) Sentiment-influenced trading system based on multimodal deep reinforcement learning. Appl Soft Comput 112:107788CrossRef
39.
go back to reference Wang L, Xi S, Qian Y, Huang C (2022) A context-aware sensing strategy with deep reinforcement learning for smart healthcare. Pervasive Mob Comput 83:101588CrossRef Wang L, Xi S, Qian Y, Huang C (2022) A context-aware sensing strategy with deep reinforcement learning for smart healthcare. Pervasive Mob Comput 83:101588CrossRef
40.
go back to reference Bonin-Font F, Ortiz A, Oliver G (2008) Visual navigation for mobile robots: a survey. J Intell Robot Syst 53(3):263–296CrossRef Bonin-Font F, Ortiz A, Oliver G (2008) Visual navigation for mobile robots: a survey. J Intell Robot Syst 53(3):263–296CrossRef
41.
go back to reference Fuentes-Pacheco J, Ruiz-Ascencio J, Rendón-Mancha JM (2015) Visual simultaneous localization and mapping: a survey. Artif Intell Rev 43(1):55–81CrossRef Fuentes-Pacheco J, Ruiz-Ascencio J, Rendón-Mancha JM (2015) Visual simultaneous localization and mapping: a survey. Artif Intell Rev 43(1):55–81CrossRef
42.
go back to reference Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364 Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364
43.
go back to reference Devo A, Mezzetti G, Costante G, Fravolini ML, Valigi P (2020) Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Trans Robot 36(5):1546–1561CrossRef Devo A, Mezzetti G, Costante G, Fravolini ML, Valigi P (2020) Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Trans Robot 36(5):1546–1561CrossRef
44.
go back to reference Lyu Y, Shi Y, Zhang X (2022) Improving target-driven visual navigation with attention on 3d spatial relationships. Neural Processing Letters 1–20 Lyu Y, Shi Y, Zhang X (2022) Improving target-driven visual navigation with attention on 3d spatial relationships. Neural Processing Letters 1–20
45.
go back to reference Santos IBdA, Romero RA (2022) A deep reinforcement learning approach with visual semantic navigation with memory for mobile robots in indoor home context. J Intell Robot Syst 104(3):1–21CrossRef Santos IBdA, Romero RA (2022) A deep reinforcement learning approach with visual semantic navigation with memory for mobile robots in indoor home context. J Intell Robot Syst 104(3):1–21CrossRef
46.
go back to reference Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van Den Hengel A (2018) Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3683 Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S, Van Den Hengel A (2018) Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3683
47.
go back to reference Deng Z, Narasimhan K, Russakovsky O (2020) Evolving graphical planner: contextual global planning for vision-and-language navigation. Adv Neural Inf Process Syst 33:20660–20672 Deng Z, Narasimhan K, Russakovsky O (2020) Evolving graphical planner: contextual global planning for vision-and-language navigation. Adv Neural Inf Process Syst 33:20660–20672
48.
go back to reference Wang H, Liang W, Shen J, Van Gool L, Wang W (2022) Counterfactual cycle-consistent learning for instruction following and generation in vision-language navigation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15471–15481 Wang H, Liang W, Shen J, Van Gool L, Wang W (2022) Counterfactual cycle-consistent learning for instruction following and generation in vision-language navigation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15471–15481
49.
go back to reference Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D (2018) Embodied question answering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10 Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D (2018) Embodied question answering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10
50.
go back to reference Luo H, Lin G, Yao Y, Liu F, Liu Z, Tang Z (2022) Depth and video segmentation based visual attention for embodied question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence Luo H, Lin G, Yao Y, Liu F, Liu Z, Tang Z (2022) Depth and video segmentation based visual attention for embodied question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence
51.
go back to reference Azuma D, Miyanishi T, Kurita S, Kawanabe M (2022) Scanqa: 3d question answering for spatial scene understanding. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19129–19139 Azuma D, Miyanishi T, Kurita S, Kawanabe M (2022) Scanqa: 3d question answering for spatial scene understanding. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19129–19139
52.
go back to reference Gan C, Zhang Y, Wu J, Gong B, Tenenbaum JB (2020) Look, listen, and act: towards audio-visual embodied navigation. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9701–9707 Gan C, Zhang Y, Wu J, Gong B, Tenenbaum JB (2020) Look, listen, and act: towards audio-visual embodied navigation. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9701–9707
53.
go back to reference Chen C, Jain U, Schissler C, Gari SVA, Al-Halah Z, Ithapu VK, Robinson P, Grauman K (2020) Soundspaces: audio-visual navigation in 3d environments. In: European conference on computer vision, pp 17–36 Chen C, Jain U, Schissler C, Gari SVA, Al-Halah Z, Ithapu VK, Robinson P, Grauman K (2020) Soundspaces: audio-visual navigation in 3d environments. In: European conference on computer vision, pp 17–36
54.
go back to reference Chen C, Al-Halah Z, Grauman K (2021) Semantic audio-visual navigation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15516–15525 Chen C, Al-Halah Z, Grauman K (2021) Semantic audio-visual navigation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15516–15525
55.
go back to reference Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1746–1754 Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1746–1754
56.
go back to reference Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Zhang Y (2017) Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158 Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Zhang Y (2017) Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:​1709.​06158
57.
go back to reference Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) Minos: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931 Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) Minos: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:​1712.​03931
58.
go back to reference Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J etal: (2019) Habitat: a platform for embodied ai research. In: proceedings of the IEEE/CVF international conference on computer vision, pp 9339–9347 Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J etal: (2019) Habitat: a platform for embodied ai research. In: proceedings of the IEEE/CVF international conference on computer vision, pp 9339–9347
59.
go back to reference Shen B, Xia F, Li C, Martín-Martín R, Fan L, Wang G, Pérez-D’Arpino C, Buch S, Srivastava S, Tchapmi L etal: (2021) igibson 1.0: a simulation environment for interactive tasks in large realistic scenes. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 7520–7527 Shen B, Xia F, Li C, Martín-Martín R, Fan L, Wang G, Pérez-D’Arpino C, Buch S, Srivastava S, Tchapmi L etal: (2021) igibson 1.0: a simulation environment for interactive tasks in large realistic scenes. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 7520–7527
60.
go back to reference Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma D et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma D et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef
61.
go back to reference Spaan MT (2012) Partially observable markov decision processes. In: Reinforcement Learning, pp 387–414 Spaan MT (2012) Partially observable markov decision processes. In: Reinforcement Learning, pp 387–414
62.
go back to reference Aractingi M, Dance C, Perez J, Silander T (2019) Improving the generalization of visual navigation policies using invariance regularization Aractingi M, Dance C, Perez J, Silander T (2019) Improving the generalization of visual navigation policies using invariance regularization
63.
go back to reference Ni T, Eysenbach B, Salakhutdinov R (2021) Recurrent model-free rl is a strong baseline for many pomdps. arXiv preprint arXiv:2110.05038 Ni T, Eysenbach B, Salakhutdinov R (2021) Recurrent model-free rl is a strong baseline for many pomdps. arXiv preprint arXiv:​2110.​05038
64.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
65.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
66.
go back to reference Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: international conference on machine learning, pp 1126–1135 Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: international conference on machine learning, pp 1126–1135
67.
go back to reference Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
68.
go back to reference Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
69.
go back to reference Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81CrossRef Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81CrossRef
70.
go back to reference Anderson P, Chang A, Chaplot DS, Dosovitskiy A, Gupta S, Koltun V, Kosecka J, Malik J, Mottaghi R, Savva M et al. (2018) On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 Anderson P, Chang A, Chaplot DS, Dosovitskiy A, Gupta S, Koltun V, Kosecka J, Malik J, Mottaghi R, Savva M et al. (2018) On evaluation of embodied navigation agents. arXiv preprint arXiv:​1807.​06757
Metadata
Title
Double Graph Attention Networks for Visual Semantic Navigation
Authors
Yunlian Lyu
Mohammad Sadegh Talebi
Publication date
09-03-2023
Publisher
Springer US
Published in
Neural Processing Letters / Issue 7/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-023-11190-8

Other articles of this Issue 7/2023

Neural Processing Letters 7/2023 Go to the issue