Skip to main content

2024 | OriginalPaper | Buchkapitel

P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion

verfasst von : Linlian Jiang, Pan Chen, Ye Wang, Tieru Wu, Rui Ma

Erschienen in: Computer-Aided Design and Computer Graphics

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Inferring missing regions from severely occluded point clouds is highly challenging. Especially for 3D shapes with rich geometry and structure details, inherent ambiguities of the unknown parts are existing. Existing approaches either learn a one-to-one mapping in a supervised manner or train a generative model to synthesize the missing points for the completion of 3D point cloud shapes. These methods, however, lack the controllability for the completion process and the results are either deterministic or exhibiting uncontrolled diversity. Inspired by the prompt-driven data generation and editing, we propose a novel prompt-guided point cloud completion framework, coined P2M2-Net, to enable more controllable and more diverse shape completion. Given an input partial point cloud and a text prompt describing the part-aware information such as semantics and structure of the missing region, our Transformer-based completion network can efficiently fuse the multimodal features and generate diverse results following the prompt guidance. We train the P2M2-Net on a new large-scale PartNet-Prompt dataset and conduct extensive experiments on two challenging shape completion benchmarks. Quantitative and qualitative results show the efficacy of incorporating prompts for more controllable part-aware point cloud completion and generation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: International Conference on Machine Learning, pp. 40–49. PMLR (2018) Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: International Conference on Machine Learning, pp. 40–49. PMLR (2018)
2.
Zurück zum Zitat Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds (2017) Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds (2017)
3.
Zurück zum Zitat Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for 3d point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9902–9912 (2022) Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for 3d point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9902–9912 (2022)
4.
Zurück zum Zitat Arora, H., Mishra, S., Peng, S., Li, K., Mahdavi-Amiri, A.: Multimodal shape completion via implicit maximum likelihood estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2958–2967 (2022) Arora, H., Mishra, S., Peng, S., Li, K., Mahdavi-Amiri, A.: Multimodal shape completion via implicit maximum likelihood estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2958–2967 (2022)
5.
6.
Zurück zum Zitat Chen, X., Chen, B., Mitra, N.: Unpaired point cloud completion on real scans using adversarial training (2020) Chen, X., Chen, B., Mitra, N.: Unpaired point cloud completion on real scans using adversarial training (2020)
7.
Zurück zum Zitat Desai, K., Johnson, J.: Virtex: learning visual representations from textual annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11162–11173 (2021) Desai, K., Johnson, J.: Virtex: learning visual representations from textual annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11162–11173 (2021)
8.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
9.
Zurück zum Zitat Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017) Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)
10.
Zurück zum Zitat Garbade, M., Chen, Y.T., Sawatzky, J., Gall, J.: Two stream 3d semantic scene completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (2019) Garbade, M., Chen, Y.T., Sawatzky, J., Gall, J.: Two stream 3d semantic scene completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (2019)
11.
Zurück zum Zitat Hegde, D., Valanarasu, J.M.J., Patel, V.M.: Clip goes 3d: leveraging prompt tuning for language grounded 3d recognition. arXiv preprint arXiv:2303.11313 (2023) Hegde, D., Valanarasu, J.M.J., Patel, V.M.: Clip goes 3d: leveraging prompt tuning for language grounded 3d recognition. arXiv preprint arXiv:​2303.​11313 (2023)
12.
Zurück zum Zitat Huang, S., Chen, Y., Jia, J., Wang, L.: Multi-view transformer for 3d visual grounding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15524–15533 (2022) Huang, S., Chen, Y., Jia, J., Wang, L.: Multi-view transformer for 3d visual grounding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15524–15533 (2022)
13.
Zurück zum Zitat Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, pp. 4904–4916. PMLR (2021) Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, pp. 4904–4916. PMLR (2021)
15.
Zurück zum Zitat Li, G., Zheng, H., Wang, C., Li, C., Zheng, C., Tao, D.: 3ddesigner: towards photorealistic 3d object generation and editing with text-guided diffusion models. arXiv preprint arXiv:2211.14108 (2022) Li, G., Zheng, H., Wang, C., Li, C., Zheng, C., Tao, D.: 3ddesigner: towards photorealistic 3d object generation and editing with text-guided diffusion models. arXiv preprint arXiv:​2211.​14108 (2022)
16.
Zurück zum Zitat Li, J., et al.: RGBD based dimensional decomposition residual network for 3d semantic scene completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7693–7702 (2019) Li, J., et al.: RGBD based dimensional decomposition residual network for 3d semantic scene completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7693–7702 (2019)
17.
Zurück zum Zitat Li, J., Selvaraju, R.R., Gotmare, A., Joty, S., Xiong, C., Hoi, S.C.H.: Align before fuse: vision and language representation learning with momentum distillation (2021) Li, J., Selvaraju, R.R., Gotmare, A., Joty, S., Xiong, C., Hoi, S.C.H.: Align before fuse: vision and language representation learning with momentum distillation (2021)
19.
Zurück zum Zitat Liu, Y., Fan, Q., Zhang, S., Dong, H., Funkhouser, T., Yi, L.: Contrastive multimodal fusion with tupleinfonce. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 754–763 (2021) Liu, Y., Fan, Q., Zhang, S., Dong, H., Funkhouser, T., Yi, L.: Contrastive multimodal fusion with tupleinfonce. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 754–763 (2021)
20.
Zurück zum Zitat Liu, Z., Wang, Y., Qi, X., Fu, C.W.: Towards implicit text-guided 3d shape generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17896–17906 (2022) Liu, Z., Wang, Y., Qi, X., Fu, C.W.: Towards implicit text-guided 3d shape generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17896–17906 (2022)
21.
Zurück zum Zitat Mikaeili, A., Perel, O., Cohen-Or, D., Mahdavi-Amiri, A.: Sked: sketch-guided text-based 3d editing. arXiv preprint arXiv:2303.10735 (2023) Mikaeili, A., Perel, O., Cohen-Or, D., Mahdavi-Amiri, A.: Sked: sketch-guided text-based 3d editing. arXiv preprint arXiv:​2303.​10735 (2023)
22.
Zurück zum Zitat Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019) Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)
23.
Zurück zum Zitat Morgado, P., Vasconcelos, N., Misra, I.: Audio-visual instance discrimination with cross-modal agreement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12475–12486 (2021) Morgado, P., Vasconcelos, N., Misra, I.: Audio-visual instance discrimination with cross-modal agreement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12475–12486 (2021)
24.
Zurück zum Zitat Mou, C., et al.: T2i-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023) Mou, C., et al.: T2i-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:​2302.​08453 (2023)
25.
Zurück zum Zitat Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-e: a system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022) Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-e: a system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:​2212.​08751 (2022)
26.
Zurück zum Zitat van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv: Learning (2018) van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv:​ Learning (2018)
27.
Zurück zum Zitat Phan, A.V., Le Nguyen, M., Nguyen, Y.L.H., Bui, L.T.: DGCNN: a convolutional neural network over large-scale labeled graphs. Neural Netw. 108, 533–543 (2018)CrossRef Phan, A.V., Le Nguyen, M., Nguyen, Y.L.H., Bui, L.T.: DGCNN: a convolutional neural network over large-scale labeled graphs. Neural Netw. 108, 533–543 (2018)CrossRef
28.
Zurück zum Zitat Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022) Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. arXiv preprint arXiv:​2209.​14988 (2022)
29.
Zurück zum Zitat Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017) Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
30.
Zurück zum Zitat Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
31.
Zurück zum Zitat Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021) Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
32.
Zurück zum Zitat Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022) Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:​2204.​06125 (2022)
33.
Zurück zum Zitat Sanghi, A., et al.: Clip-forge: towards zero-shot text-to-shape generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18603–18613 (2022) Sanghi, A., et al.: Clip-forge: towards zero-shot text-to-shape generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18603–18613 (2022)
34.
Zurück zum Zitat Sella, E., Fiebelman, G., Hedman, P., Averbuch-Elor, H.: Vox-e: text-guided voxel editing of 3d objects. arXiv preprint arXiv:2303.12048 (2023) Sella, E., Fiebelman, G., Hedman, P., Averbuch-Elor, H.: Vox-e: text-guided voxel editing of 3d objects. arXiv preprint arXiv:​2303.​12048 (2023)
35.
Zurück zum Zitat Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3d reconstruction networks learn? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3405–3414 (2019) Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3d reconstruction networks learn? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3405–3414 (2019)
36.
Zurück zum Zitat Tchapmi, L.P., Kosaraju, V., Rezatofighi, H., Reid, I., Savarese, S.: TopNet: structural point cloud decoder. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 383–392 (2019) Tchapmi, L.P., Kosaraju, V., Rezatofighi, H., Reid, I., Savarese, S.: TopNet: structural point cloud decoder. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 383–392 (2019)
38.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
39.
Zurück zum Zitat Wang, X., Lin, D., Wan, L.: FfNet: frequency fusion network for semantic scene completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2550–2557 (2022) Wang, X., Lin, D., Wan, L.: FfNet: frequency fusion network for semantic scene completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2550–2557 (2022)
40.
Zurück zum Zitat Wen, X., et al.: PMP-NET: point cloud completion by learning multi-step point moving paths. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7443–7452 (2021) Wen, X., et al.: PMP-NET: point cloud completion by learning multi-step point moving paths. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7443–7452 (2021)
41.
Zurück zum Zitat Wen, X., et al.: PMP-NET++: point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 852–867 (2022)CrossRef Wen, X., et al.: PMP-NET++: point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 852–867 (2022)CrossRef
43.
Zurück zum Zitat Xiang, P., et al.: SnowflakeNet: point cloud completion by snowflake point deconvolution with skip-transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5499–5509 (2021) Xiang, P., et al.: SnowflakeNet: point cloud completion by snowflake point deconvolution with skip-transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5499–5509 (2021)
44.
Zurück zum Zitat Yang, J., et al.: Vision-language pre-training with triple contrastive learning, pp. 15671–15680 (2022) Yang, J., et al.: Vision-language pre-training with triple contrastive learning, pp. 15671–15680 (2022)
45.
Zurück zum Zitat Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018) Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)
46.
Zurück zum Zitat Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: PoinTr: diverse point cloud completion with geometry-aware transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12498–12507 (2021) Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: PoinTr: diverse point cloud completion with geometry-aware transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12498–12507 (2021)
47.
Zurück zum Zitat Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: PCN: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737. IEEE (2018) Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: PCN: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737. IEEE (2018)
48.
Zurück zum Zitat Zhang, J., Shao, J., Chen, J., Yang, D., Liang, B., Liang, R.: PFNET: an unsupervised deep network for polarization image fusion. Opt. Lett. 45(6), 1507–1510 (2020)CrossRef Zhang, J., Shao, J., Chen, J., Yang, D., Liang, B., Liang, R.: PFNET: an unsupervised deep network for polarization image fusion. Opt. Lett. 45(6), 1507–1510 (2020)CrossRef
49.
Zurück zum Zitat Zhang, J., et al.: Unsupervised 3d shape completion through GAN inversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1768–1777 (2021) Zhang, J., et al.: Unsupervised 3d shape completion through GAN inversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1768–1777 (2021)
50.
51.
Zurück zum Zitat Zhang, X., et al.: View-guided point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15890–15899 (2021) Zhang, X., et al.: View-guided point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15890–15899 (2021)
52.
Zurück zum Zitat Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021) Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
54.
Zurück zum Zitat Zhou, L., Du, Y., Wu, J.: 3d shape generation and completion through point-voxel diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5826–5835 (2021) Zhou, L., Du, Y., Wu, J.: 3d shape generation and completion through point-voxel diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5826–5835 (2021)
Metadaten
Titel
P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion
verfasst von
Linlian Jiang
Pan Chen
Ye Wang
Tieru Wu
Rui Ma
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-9666-7_23

Premium Partner