Skip to main content
Top

2025 | OriginalPaper | Chapter

Prioritized Semantic Learning for Zero-Shot Instance Navigation

Authors : Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang

Published in: Computer Vision – ECCV 2024

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training. Previous object navigation approaches apply the image-goal navigation (\(\texttt {ImageNav}\)) task (go to the location of an image) for pretraining, and transfer the agent to achieve object goals using a vision-language model. However, these approaches lead to issues of semantic neglect, where the model fails to learn meaningful semantic alignments. In this paper, we propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents. Specifically, a semantic-enhanced PSL agent is proposed and a prioritized semantic training strategy is introduced to select goal images that exhibit clear semantic supervision and relax the reward function from strict exact view matching. At inference time, a semantic expansion inference scheme is designed to preserve the same granularity level of the goal-semantic as training. Furthermore, for the popular HM3D environment, we present an Instance Navigation (\(\texttt {InstanceNav}\)) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (\(\texttt {ObjectNav}\)) task where the goal is defined merely by the object category. Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot \(\texttt {ObjectNav}\) in terms of success rate and is also superior on the new \(\texttt {InstanceNav}\) task. Code will be released at https://​github.​com/​XinyuSun/​PSL-InstanceNav.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
We consider that the Canny operator has undermined the semantics in the image.
 
Literature
1.
go back to reference Al-Halah, Z., Ramakrishnan, S.K., Grauman, K.: Zero experience required: plug & play modular transfer learning for semantic visual navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17010–17020 (2022) Al-Halah, Z., Ramakrishnan, S.K., Grauman, K.: Zero experience required: plug & play modular transfer learning for semantic visual navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17010–17020 (2022)
2.
go back to reference Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3674–3683 (2018) Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3674–3683 (2018)
3.
go back to reference Batra, D., et al.: Rearrangement: a challenge for embodied AI. CoRR abs/2011.01975 (2020) Batra, D., et al.: Rearrangement: a challenge for embodied AI. CoRR abs/2011.01975 (2020)
4.
5.
go back to reference Cai, W., et al.: Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill. CoRR abs/2309.10309 (2023) Cai, W., et al.: Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill. CoRR abs/2309.10309 (2023)
7.
go back to reference Chen, P., et al.: \(A^2\) NAV: action-aware zero-shot robot navigation by exploiting vision-and-language ability of foundation models. arXiv preprint arXiv:2308.07997 (2023) Chen, P., et al.: \(A^2\) NAV: action-aware zero-shot robot navigation by exploiting vision-and-language ability of foundation models. arXiv preprint arXiv:​2308.​07997 (2023)
8.
go back to reference Deitke, M., et al.: Procthor: large-scale embodied AI using procedural generation. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Proceedings of the International Conference on Neural Information Processing Systems (2022) Deitke, M., et al.: Procthor: large-scale embodied AI using procedural generation. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Proceedings of the International Conference on Neural Information Processing Systems (2022)
9.
go back to reference Du, Y., Gan, C., Isola, P.: Curious representation learning for embodied intelligence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021) Du, Y., Gan, C., Isola, P.: Curious representation learning for embodied intelligence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
10.
go back to reference Gadre, S.Y., Wortsman, M., Ilharco, G., Schmidt, L., Song, S.: CLIP on wheels: zero-shot object navigation as object localization and exploration. CoRR abs/2203.10421 (2022) Gadre, S.Y., Wortsman, M., Ilharco, G., Schmidt, L., Song, S.: CLIP on wheels: zero-shot object navigation as object localization and exploration. CoRR abs/2203.10421 (2022)
11.
go back to reference Gan, C., et al.: Finding fallen objects via asynchronous audio-visual integration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10523–10533 (2022) Gan, C., et al.: Finding fallen objects via asynchronous audio-visual integration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10523–10533 (2022)
12.
14.
go back to reference Hahn, M., Chaplot, D.S., Tulsiani, S., Mukadam, M., Rehg, J.M., Gupta, A.: No RL, no simulation: learning to navigate without navigating. In: Proceedings of the International Conference on Neural Information Processing Systems (2021) Hahn, M., Chaplot, D.S., Tulsiani, S., Mukadam, M., Rehg, J.M., Gupta, A.: No RL, no simulation: learning to navigate without navigating. In: Proceedings of the International Conference on Neural Information Processing Systems (2021)
15.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
16.
go back to reference Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: Proceedings of the International Conference on Machine Learning, pp. 4904–4916. PMLR (2021) Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: Proceedings of the International Conference on Machine Learning, pp. 4904–4916. PMLR (2021)
17.
go back to reference Kim, N., Kwon, O., Yoo, H., Choi, Y., Park, J., Oh, S.: Topological semantic graph memory for image-goal navigation. In: Conference on Robot Learning, pp. 393–402. PMLR (2023) Kim, N., Kwon, O., Yoo, H., Choi, Y., Park, J., Oh, S.: Topological semantic graph memory for image-goal navigation. In: Conference on Robot Learning, pp. 393–402. PMLR (2023)
18.
go back to reference Krantz, J., et al.: Navigating to objects specified by images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10916–10925 (2023) Krantz, J., et al.: Navigating to objects specified by images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10916–10925 (2023)
19.
go back to reference Krantz, J., Gokaslan, A., Batra, D., Lee, S., Maksymets, O.: Waypoint models for instruction-guided navigation in continuous environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15162–15171 (2021) Krantz, J., Gokaslan, A., Batra, D., Lee, S., Maksymets, O.: Waypoint models for instruction-guided navigation in continuous environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15162–15171 (2021)
20.
go back to reference Krantz, J., Lee, S., Malik, J., Batra, D., Chaplot, D.S.: Instance-specific image goal navigation: training embodied agents to find object instances. arXiv preprint arXiv:2211.15876 (2022) Krantz, J., Lee, S., Malik, J., Batra, D., Chaplot, D.S.: Instance-specific image goal navigation: training embodied agents to find object instances. arXiv preprint arXiv:​2211.​15876 (2022)
22.
go back to reference Ku, A., Anderson, P., Patel, R., Ie, E., Baldridge, J.: Room-across-room: multilingual vision-and-language navigation with dense spatiotemporal grounding. In: Conference on Empirical Methods for Natural Language Processing (2020) Ku, A., Anderson, P., Patel, R., Ie, E., Baldridge, J.: Room-across-room: multilingual vision-and-language navigation with dense spatiotemporal grounding. In: Conference on Empirical Methods for Natural Language Processing (2020)
23.
go back to reference Kwon, O., Kim, N., Choi, Y., Yoo, H., Park, J., Oh, S.: Visual graph memory with unsupervised representation for visual navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15890–15899 (2021) Kwon, O., Kim, N., Choi, Y., Yoo, H., Park, J., Oh, S.: Visual graph memory with unsupervised representation for visual navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15890–15899 (2021)
24.
go back to reference Li, C., et al.: iGibson 2.0: object-centric simulation for robot learning of everyday household tasks. In: Conference on Robot Learning, vol. 164, pp. 455–465 (2021). arXiv preprint arXiv:2108.03272 (2021) Li, C., et al.: iGibson 2.0: object-centric simulation for robot learning of everyday household tasks. In: Conference on Robot Learning, vol. 164, pp. 455–465 (2021). arXiv preprint arXiv:​2108.​03272 (2021)
25.
go back to reference Li, C., et al.: BEHAVIOR-1K: a benchmark for embodied AI with 1, 000 everyday activities and realistic simulation. In: Conference on Robot Learning, vol. 205, pp. 80–93 (2022) Li, C., et al.: BEHAVIOR-1K: a benchmark for embodied AI with 1, 000 everyday activities and realistic simulation. In: Conference on Robot Learning, vol. 205, pp. 80–93 (2022)
26.
go back to reference Li, C., et al.: Behavior-1k: a benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In: Conference on Robot Learning, pp. 80–93. PMLR (2023) Li, C., et al.: Behavior-1k: a benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In: Conference on Robot Learning, pp. 80–93. PMLR (2023)
27.
go back to reference Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023) Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:​2301.​12597 (2023)
28.
go back to reference Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., Hoi, S.C.H.: Align before fuse: vision and language representation learning with momentum distillation. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 34, pp. 9694–9705 (2021) Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., Hoi, S.C.H.: Align before fuse: vision and language representation learning with momentum distillation. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 34, pp. 9694–9705 (2021)
29.
go back to reference Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019) Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:​1908.​03557 (2019)
30.
go back to reference Li, W., Zhu, L., Wen, L., Yang, Y.: Decap: decoding clip latents for zero-shot captioning via text-only training. arXiv preprint arXiv:2303.03032 (2023) Li, W., Zhu, L., Wen, L., Yang, Y.: Decap: decoding clip latents for zero-shot captioning via text-only training. arXiv preprint arXiv:​2303.​03032 (2023)
31.
go back to reference Liang, V.W., Zhang, Y., Kwon, Y., Yeung, S., Zou, J.Y.: Mind the gap: understanding the modality gap in multi-modal contrastive representation learning. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 35, pp. 17612–17625 (2022) Liang, V.W., Zhang, Y., Kwon, Y., Yeung, S., Zou, J.Y.: Mind the gap: understanding the modality gap in multi-modal contrastive representation learning. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 35, pp. 17612–17625 (2022)
32.
34.
35.
go back to reference Majumdar, A., Aggarwal, G., Devnani, B., Hoffman, J., Batra, D.: ZSON: zero-shot object-goal navigation using multimodal goal embeddings. In: Proceedings of the International Conference on Neural Information Processing Systems (2022) Majumdar, A., Aggarwal, G., Devnani, B., Hoffman, J., Batra, D.: ZSON: zero-shot object-goal navigation using multimodal goal embeddings. In: Proceedings of the International Conference on Neural Information Processing Systems (2022)
36.
go back to reference Majumdar, A., Xia, F., Batra, D., Guibas, L., et al.: Findthis: language-driven object disambiguation in indoor environments. In: Conference on Robot Learning (2023) Majumdar, A., Xia, F., Batra, D., Guibas, L., et al.: Findthis: language-driven object disambiguation in indoor environments. In: Conference on Robot Learning (2023)
37.
go back to reference Mezghani, L., et al.: Memory-augmented reinforcement learning for image-goal navigation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2022) Mezghani, L., et al.: Memory-augmented reinforcement learning for image-goal navigation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2022)
38.
go back to reference OpenAI: GPT-4 technical report. CoRR abs/2303.08774 (2023) OpenAI: GPT-4 technical report. CoRR abs/2303.08774 (2023)
39.
go back to reference Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the International Conference on Machine Learning, vol. 139, pp. 8748–8763 (2021) Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the International Conference on Machine Learning, vol. 139, pp. 8748–8763 (2021)
40.
go back to reference Ramakrishnan, S.K., et al.: Habitat-matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:2109.08238 (2021) Ramakrishnan, S.K., et al.: Habitat-matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:​2109.​08238 (2021)
41.
go back to reference Ramrakhya, R., Batra, D., Wijmans, E., Das, A.: PIRLNav: pretraining with imitation and RL finetuning for objectnav. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17896–17906 (2023) Ramrakhya, R., Batra, D., Wijmans, E., Das, A.: PIRLNav: pretraining with imitation and RL finetuning for objectnav. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17896–17906 (2023)
42.
go back to reference Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
43.
go back to reference Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:​1707.​06347 (2017)
44.
go back to reference Srivastava, S., et al.: BEHAVIOR: benchmark for everyday household activities in virtual, interactive, and ecological environments. In: Conference on Robot Learning, vol. 164, pp. 477–490 (2021) Srivastava, S., et al.: BEHAVIOR: benchmark for everyday household activities in virtual, interactive, and ecological environments. In: Conference on Robot Learning, vol. 164, pp. 477–490 (2021)
45.
go back to reference Srivastava, S., et al.: Behavior: benchmark for everyday household activities in virtual, interactive, and ecological environments. In: Conference on Robot Learning, pp. 477–490. PMLR (2022) Srivastava, S., et al.: Behavior: benchmark for everyday household activities in virtual, interactive, and ecological environments. In: Conference on Robot Learning, pp. 477–490. PMLR (2022)
46.
go back to reference Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Proceedings of the International Conference on Neural Information Processing Systems (2021) Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Proceedings of the International Conference on Neural Information Processing Systems (2021)
47.
go back to reference Thomason, J., Murray, M., Cakmak, M., Zettlemoyer, L.: Vision-and-dialog navigation. In: Conference on Robot Learning, pp. 394–406. PMLR (2020) Thomason, J., Murray, M., Cakmak, M., Zettlemoyer, L.: Vision-and-dialog navigation. In: Conference on Robot Learning, pp. 394–406. PMLR (2020)
48.
go back to reference Udandarao, V., Burg, M.F., Albanie, S., Bethge, M.: Visual data-type understanding does not emerge from scaling vision-language models. In: Proceedings of the International Conference on Learning Representations (2023) Udandarao, V., Burg, M.F., Albanie, S., Bethge, M.: Visual data-type understanding does not emerge from scaling vision-language models. In: Proceedings of the International Conference on Learning Representations (2023)
50.
go back to reference Weihs, L., Deitke, M., Kembhavi, A., Mottaghi, R.: Visual room rearrangement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5922–5931 (2021) Weihs, L., Deitke, M., Kembhavi, A., Mottaghi, R.: Visual room rearrangement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5922–5931 (2021)
51.
go back to reference Wijmans, E., et al.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames. In: Proceedings of the International Conference on Learning Representations (2020) Wijmans, E., et al.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames. In: Proceedings of the International Conference on Learning Representations (2020)
52.
go back to reference Wu, Y., Wu, Y., Gkioxari, G., Tian, Y.: Building generalizable agents with a realistic and rich 3D environment. arXiv preprint arXiv:1801.02209 (2018) Wu, Y., Wu, Y., Gkioxari, G., Tian, Y.: Building generalizable agents with a realistic and rich 3D environment. arXiv preprint arXiv:​1801.​02209 (2018)
53.
go back to reference Xinyu, S., Peihao, C., Jugang, F., Thomas, H.L., Jian, C., Mingkui, T.: FGPrompt: fine-grained goal prompting for image-goal navigation. In: Proceedings of the International Conference on Neural Information Processing Systems (2023) Xinyu, S., Peihao, C., Jugang, F., Thomas, H.L., Jian, C., Mingkui, T.: FGPrompt: fine-grained goal prompting for image-goal navigation. In: Proceedings of the International Conference on Neural Information Processing Systems (2023)
54.
55.
go back to reference Yadav, K., et al.: Offline visual representation learning for embodied navigation. CoRR abs/2204.13226 (2022) Yadav, K., et al.: Offline visual representation learning for embodied navigation. CoRR abs/2204.13226 (2022)
56.
go back to reference Yadav, K., et al.: Habitat-matterport 3D semantics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4927–4936 (2023) Yadav, K., et al.: Habitat-matterport 3D semantics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4927–4936 (2023)
57.
go back to reference Yamauchi, B.: A frontier-based approach for autonomous exploration. In: Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 1997, pp. 146–151. IEEE (1997) Yamauchi, B.: A frontier-based approach for autonomous exploration. In: Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 1997, pp. 146–151. IEEE (1997)
59.
go back to reference Yu, B., Kasaei, H., Cao, M.: L3MVN: leveraging large language models for visual target navigation. arXiv preprint arXiv:2304.05501 (2023) Yu, B., Kasaei, H., Cao, M.: L3MVN: leveraging large language models for visual target navigation. arXiv preprint arXiv:​2304.​05501 (2023)
60.
go back to reference Zhang, R., et al.: Llama-adapter: efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023) Zhang, R., et al.: Llama-adapter: efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:​2303.​16199 (2023)
61.
go back to reference Zhou, K., et al.: ESC: exploration with soft commonsense constraints for zero-shot object navigation. In: Proceedings of the International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 42829–42842. PMLR (2023) Zhou, K., et al.: ESC: exploration with soft commonsense constraints for zero-shot object navigation. In: Proceedings of the International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 42829–42842. PMLR (2023)
62.
go back to reference Zhu, F., Liang, X., Zhu, Y., Yu, Q., Chang, X., Liang, X.: Soon: scenario oriented object navigation with graph-based exploration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12689–12699 (2021) Zhu, F., Liang, X., Zhu, Y., Yu, Q., Chang, X., Liang, X.: Soon: scenario oriented object navigation with graph-based exploration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12689–12699 (2021)
63.
go back to reference Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: IEEE International Conference on Robotics and Automation (2017) Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: IEEE International Conference on Robotics and Automation (2017)
Metadata
Title
Prioritized Semantic Learning for Zero-Shot Instance Navigation
Authors
Xinyu Sun
Lizhao Liu
Hongyan Zhi
Ronghe Qiu
Junwei Liang
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-73254-6_10

Premium Partner