Skip to main content

2024 | Buch

Computer-Aided Design and Computer Graphics

18th International Conference, CAD/Graphics 2023, Shanghai, China, August 19–21, 2023, Proceedings

herausgegeben von: Shi-Min Hu, Yiyu Cai, Paul Rosin

Verlag: Springer Nature Singapore

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the CCF 18th International Conference on Computer-Aided Design and Computer Graphics, CAD/Graphics 2023, which took place in Shanghai, China, during August 19–21, 2023.
The 23 full papers included in this book were carefully reviewed and selected from 169 submissions. They focus on topics such as computer graphics and CAD, 3D Printing and Computational Fabrication, 3D Vision, Bio-CAD and Nano-CAD, Computer Animation, Deep Learning for Graphics, Geometric Modeling, Geometry Processing, Rendering, Virtual Reality, Augmented Reality, Visualization, and more.

Inhaltsverzeichnis

Frontmatter
Unsupervised 3D Articulated Object Correspondences with Part Approximation and Shape Refinement
Abstract
Reconstructing 3D human shapes with high-quality geometry as well as dense correspondences is important for many applications. Template fitting based methods can generate meshes with desired requirements but have difficulty in capturing high-quality details and accurate poses. The main challenge lies in the models have apparent discrepancies in different poses. Directly learning large-scale displacement of each point to account for different posed shapes is prone to artifacts and does not generalize well. Statistic representation based methods, can avoid artifacts by restricting human shapes to a limited shape expression space, which also makes it difficult to produce shape details. In this work, we propose a coarse-to-fine method to address the problem by dividing it into part approximation and shape refinement in an unsupervised manner. Our basic observation is that the poses of human parts account for most articulated shape variations and benefit pose generalization. Moreover, geometry details can be easily fitted once the part poses are estimated. At the coarse-fitting stage, we propose a part approximation network, to transform a template to fit inputs by a set of pose parameters. For refinement, we propose a shape refinement network, to fit shape details. Qualitative and quantitative studies on several datasets demonstrate that our method performs better than other unsupervised methods.
Junqi Diao, Haiyong Jiang, Feilong Yan, Yong Zhang, Jinhui Luan, Jun Xiao
MsF-HigherHRNet: Multi-scale Feature Fusion for Human Pose Estimation in Crowded Scenes
Abstract
To solve the problems of occlusion and human scale variation in crowded crowd scenes, we propose a Multi-scale Fusion HigherHRNet (MsF-HigherHRNet) based on HigherHRNet, which integrates Residual Feature Augmentation (RFA) and Double Refinement Feature Pyramid Network (DRFPN). Firstly, the introduction of RFA further enriches the semantic information of the multi-scale feature maps. Secondly, with the help of spatial attention mechanism and deformable convolution design ideas, the DRFPN is proposed. When the feature maps are fed into the DRFPN, the occlusion problem in crowded crowd scenes is effectively solved. The experimental results on the CrowdPose dataset under the same experimental environment and image resolution show that the average accuracy of MsF-HigherHRNet is 69.7%, which is 1.7% higher than the average accuracy of HigherHRNet under the same configuration and has better robustness.
Cuihong Yu, Cheng Han, Qi Zhang, Chao Zhang
FFANet: Dual Attention-Based Flow Field Aware Network for 3D Grid Classification and Segmentation
Abstract
Deep learning-based approaches for three-dimensional (3D) grid understanding and processing tasks have been extensively studied in recent years. Despite the great success in various scenarios, the existing approaches fail to effectively utilize the velocity information in the flow field, resulting in the actual requirements of post-processing tasks being difficult to meet by the extracted features. To fully integrate structural information in the 3D grid and velocity information, this paper constructs a flow-field-aware network (FFANet) for 3D grid classification and segmentation tasks. The main innovations include: (i) using the self-attention mechanism to build a multi-scale feature learning network to learn the distribution feature of the velocity field and structure feature of different scales in the 3D flow field grid, respectively, for generating a global feature with more discriminative representation information; (ii) constructing a fine-grained semantic learning network based on a co-attention mechanism to adaptively learn the weight matrix between the above two features to enhance the effective semantic utilization of the global feature; (iii) according to the practical requirements of post-processing in numerical simulation, we designed two downstream tasks: 1) surface grid identification task and 2) feature edge extraction task. The experimental results show that the accuracy (Acc) and intersection-over-union (IoU) performance of the FFANet compared favourably to the 3D mesh data analysis approaches.
Jiakang Deng, De Xing, Cheng Chen, Yongguo Han, Jianqiang Chen
A Lightweight Model for Feature Points Recognition of Tool Path Based on Deep Learning
Abstract
We propose a novel lightweight deep learning-based method that efficiently recognizes feature points with significantly shorter preprocessing time. Our method encodes CL points as matrices and stores them as text files. We have developed a neural network with an Encoder-Decoder architecture, named EDFP-Net, which takes the encoding matrices as input, extracts deeper features using the Encoder, and recognizes feature points using the Decoder. Our experiments on industrial parts demonstrate the superior efficiency of our method.
Shuo-Peng Chen, Hong-Yu Ma, Li-Yong Shen, Chun-Ming Yuan
Image Fusion Based on Feature Decoupling and Proportion Preserving
Abstract
Image fusion is a widely used technique for generating a new image by combining information from multiple input images. However, existing image fusion algorithms are often domain-specific, which limits their generalization ability and processing capacity. In this paper, we propose a fast unified fusion network called FDF, based on feature decoupling and intensity and gradient feature proportion maintenance. FDF is an end-to-end network that can perform multiple image fusion tasks. We first decouple the features of the source images into intensity features and texture features and then fuse them using the intensity and gradient paths. To improve the generalization ability, we design a unified loss function that can adapt to different fusion tasks. We evaluate FDF on three image fusion tasks, namely visible and infrared image fusion, multi-exposure image fusion, and medical image fusion. Our experimental results show that FDF outperforms state-of-the-art methods in terms of visual effects and multiple quantitative metrics. The proposed method has the potential to be applied to other image fusion tasks and domains, making it a promising approach for future research. Overall, FDF provides a fast and unified solution for image fusion tasks, which can significantly improve the efficiency and effectiveness of image fusion applications.
Bin Fang, Ran Yi, Lizhuang Ma
An Irregularly Shaped Plane Layout Generation Method with Boundary Constraints
Abstract
We propose a novel method that aims to automatically generate outdoor building layouts based on given boundary constraints. It effectively solves the problem of irregular shapes that occur in practical application scenarios, where boundary and building outlines are not only composed of horizontal and vertical lines but also include oblique lines. The proposed method is a two-stage process that uses a Graph Neural Network (GNN) to generate the location of each building and the minimum external polygon. The GNN utilizes a pre-defined relative location diagram and the given boundary. Afterwards, the Generative Adversarial Network (GAN) is utilized to generate building outlines that fit the boundary within the minimum external polygon area. Our method has demonstrated the ability to effectively handle diverse and complex outdoor building layouts, as evidenced by its superior performance on the Huizhou traditional village dataset. Both qualitative and quantitative evaluations demonstrate that our method outperforms current GNN-based layout methods in terms of realism and diversity.
Xiang Wang, Lin Li, Liang He, Xiaoping Liu
Influence of the Printing Direction on the Surface Appearance in Multi-material Fused Filament Fabrication
Abstract
Multi-material fused filament fabrication (FFF) offers the ability to print 3D objects with very diverse surface appearances. However, control of the surface appearance is largely a matter of trial and error unless the employed materials are very similar and very translucent, so we can think of them as blending together. When the multiple materials are fused into one filament in a diamond hotend extruder but do not blend, the resulting surface appearance depends on the printing direction. We explore how this leads to milli-scale colorations as a function of the printing direction. By having preferable printing directions, it is possible to exploit the limited color blending of this nozzle with multiple inlets and one outlet and further enhance particular color effects, such as goniochromatism. We present a framework based on both experimental and computational fluid dynamics analysis for controlling the extrusion process and the coloration of the surface according to preferable printing directions and mixing ratios with the aim of enabling fused filament fabrication of intricate surface appearances.
Riccardo Tonello, Md. Tusher Mollah, Kenneth Weiss, Jon Spangenberg, Are Strandlie, David Bue Pedersen, Jeppe Revall Frisvad
StrongOC-SORT: Make Observation-Centric SORT More Robust
Abstract
Multi-object tracking (MOT) becomes a challenging task as non-linear motion and occlusion cause problems such as contaminated appearance, inaccurate positions and disturbed tracks. Despite great progress made by current trackers, their performance still needs improvement due to their inability to adapt their components to these challenges. In this work, we propose a new method, StrongOC-SORT, which exploits the observation-centric nature and four new modules to tackle these challenges more effectively. Specifically, we design an IoU-ReID Fusion module to minimize disruptions from rapid changes in direction. Moreover, we develop Dynamic Embedding and Observation Expansion modules that correspond to prevent track embedding from being contaminated by detection noise and to address the issue of slight overlap between observations under long-term lack of observations. Lastly, we propose an Active State module to provide discriminative tracks for association in DanceTrack. Our proposed method achieves state-of-the-art performance on DanceTrack and MOT20 with 63.4 HOTA and 64.1 HOTA, while providing competitive performance on MOT17 with the best IDF1 and AssA. The experimental results demonstrate the robustness and effectiveness of StrongOC-SORT under occlusion and non-linear motion.
Yanhui Sun, Zhangjin Huang
Semi-direct Sparse Odometry with Robust and Accurate Pose Estimation for Dynamic Scenes
Abstract
The localization accuracy and robustness of visual odometry systems for static scenes can be significantly degraded in complex real-world environments with moving objects. This paper addresses the problem by proposing a semi-direct sparse visual odometry (SDSO) method designed for dynamic scenes. With the aid of the pixel-level semantic information, the system can not only eliminate dynamic points but also construct more accurate photometric errors for subsequent optimization. To obtain an accurate and robust camera pose in dynamic scenes, we propose a dual error optimization strategy that minimizes the reprojection and photometric errors consecutively. The proposed method has been extensively evaluated on the public datasets like the TUM dynamic dataset and KITTI dataset. The results demonstrate the effectiveness of our method in terms of localization accuracy and robustness compared with both the original direct sparse odometry (DSO) method and state-of-the-art methods for dynamic scenes.
Wufan Wang, Lei Zhang
Parallel Dense Vision Transformer and Augmentation Network for Occluded Person Re-identification
Abstract
Occluded person re-identification (ReID) is a challenging computer vision task in which the goal is to identify specific pedestrians in occluded scenes across different devices. Some existing methods mainly focus on developing effective data augmentation and representation learning techniques to improve the performance of person ReID systems. However, existing data augmentation strategies can not make full use of the information in the training data to accurately simulate the occlusion scenario, resulting in poor generalization ability. Additionally, recent Vision Transformer (ViT)-based methods have been shown beneficial for addressing occluded person ReID as they have powerful representation learning ability, but they always ignore the information fusion between different levels of features. To alleviate these two issues, an improved ViT-based framework called Parallel Dense Vision Transformer and Augmentation Network (PDANet) is proposed to extract well and robustly features. We first design a parallel data augmentation strategy based on random stripe erasure to enrich the diversity of input sample for better cover real scenes through various processing methods, and improve the generalization ability of the model by learning the relationship between these different samples. We then develop a Densely Connected Vision Transformer (DCViT) module for feature encoding, which strengthens the feature propagation and improves the effectiveness of learning by establishing connections between different layers. Experimental results demonstrate the proposed method outperforms the existing methods on both the occluded person and the holistic person ReID benchmarks.
Chuxia Yang, Wanshu Fan, Ziqi Wei, Xin Yang, Qiang Zhang, Dongsheng Zhou
Spatial-Temporal Consistency Constraints for Chinese Sign Language Synthesis
Abstract
Video splicing based sign language synthesis focuses on splicing word\(\setminus \)sentence-level sign language videos to produce new sign language videos. However, directly splicing or combining video clips may result in video jumping problems. To this end, this paper proposes a novel spatial-temporal consistency constraints (STCC) approach for sign synthesis, which enhances the authenticity and acceptability of the synthesized video by generating intermediate transition frames. First, we use the cubic Bézier curve to generate human pose key points of transition frames by modeling motion trajectories. Then, we use a hierarchical attention generative adversarial network to generate smooth transition frames based on the generated pose and source image. Finally, we validate the effectiveness of the proposed STCC framework on two public Chinese sign language datasets. The visualization comparison with existing transition frame generation methods shows that our STCC approach offers the advantages of realistic textures, smooth motion and high comprehensibility for the synthesized video.
Liqing Gao, Peidong Liu, Liang Wan, Wei Feng
An Easy-to-Build Modular Robot Implementation of Chain-Based Physical Transformation for STEM Education
Abstract
The physically realizable transformation between multiple 3D objects has attracted considerable attention recently since it has numerous potential applications in a variety of industries. In this paper, we presented EasySRRobot, a low-cost, easy-to-build self-reconfigurable modular robot, to realize the automatic transformation across different configurations, and overcomes the limitation of existing transformation methods requiring manual involvement. All on-board components in EasySRRobot are off-the-shelf and all support structures and shells are 3D printed, so that any novice users can make it at home. In addition, an algorithm to automatically find an optimal design for the interior structure was proposed, and the result has been demonstrated by comparing with another two feasible designs. Thirty modules were fabricated with the aid of 3D printing and the motions of two configurations (snake and wheel) were realized, which shows the working ability and effectiveness of the proposed EasySRRobot. We further explored the effect of EasySRRobot on spatial ability, a skill that is crucial for STEM education. The results indicated that interacting with EasySRRobot can effectively improve the performance of the transformation task, suggesting that it might improve mental rotation skills and other aspects of spatial ability.
Minjing Yu, Ting Liu, Jeffrey Too Chuan Tan, Yong-Jin Liu
Skeleton-Based Human Action Recognition via Multi-Knowledge Flow Embedding Hierarchically Decomposed Graph Convolutional Network
Abstract
Skeleton-based action recognition has great potential and extensive application scenarios such as virtual reality and human-robot interaction due to its robustness under complex background and different viewing angles. Recent approaches converted skeleton sequences into spatial-temporal graphs and adopted graph convolutional networks to extract features. Multi-modality recognition and attention mechanisms have also been proposed to boost accuracy. However, the complex feature extraction modules and multi-stream ensemble have increased computational complexity significantly. Thus, most existing methods failed to meet lightweight industrial requirements and lightweight methods were unable to output sufficiently accurate results. To tackle the problem, we propose multi-knowledge flow embedding graph convolutional network, which can achieve high accuracy while maintaining lightweight. We first construct multiple knowledge flows by extracting diverse features from different hierarchically decomposed graphs. Each knowledge flow not only contains information on target class, but also stores profound information for non-target class. Inspired by knowledge distillation, we designed a novel multi-knowledge flow embedding module, which can effectively embed the knowledge into a student model without increasing model complexity. Moreover, student model can be enhanced dramatically by learning simultaneously from complementary knowledge flows. Extensive experiments on authoritative datasets demonstrate that our approach outperforms state-of-the-art with significantly lower computational complexity.
Yanqiu Li, Yanan Liu, Hao Zhang, Shouzheng Sun, Dan Xu
Color-Correlated Texture Synthesis for Hybrid Indoor Scenes
Abstract
We introduce an automated pipeline for synthesizing texture maps in complex indoor scenes. With a style sample or color palette as inputs, our pipeline predicts theme color for each room using a GAN-based method, before generating texture maps using combinatorial optimization. We consider constraints on material selection, color correlation, and color palette matching. Our experiments show the pipeline’s ability to produce pleasing and harmonious textures for diverse layouts and our contribution of an interior furniture texture dataset with 4,337 texture images.
Yu He, Yi-Han Jin, Ying-Tian Liu, Bao-Li Lu, Ge Yu
Metaballs-Based Real-Time Elastic Object Simulation via Projective Dynamics
Abstract
In this paper we present a novel approach for real-time elastic object simulation. In our framework, an elastic object is represented by a hybrid model that couples metaballs which are distributed adaptively and a triangular surface mesh. We use the centers of metaballs as physical particles and apply Projective Dynamics for deformation simulation. To produce more realistic simulation, we propose a new Projective Dynamics constraint for metaballs that governs the elastic behavior using the deformation gradient approximation for each metaball. The metaballs generated by the self-adaptive method can better maintain the geometric feature details of the model, which enhance the model resolution in surface area. Furthermore, we propose a GPU based surface skinning method for the coupling of the triangular mesh and the metaballs. Our skinning method integrates the skinning process into the rendering pipeline and enables fast skinning of large-scale surface meshes. The experimental results show that the proposed method can obtain more plausible visual effect while achieving real-time deformation of 3D models.
Runze Yang, Shi Chen, Gang Xu, Shanshan Gao, Yuanfeng Zhou
NeRF Synthesis with Shading Guidance
Abstract
The emerging Neural Radiance Field (NeRF) shows great potential in representing 3D scenes, which can render photo-realistic images from novel view with only sparse views given. However, utilizing NeRF to reconstruct real-world scenes requires images from different viewpoints, which limits its practical application. This problem can be even more pronounced for large scenes. In this paper, we introduce a new task called NeRF synthesis that utilizes the structural content of a NeRF exemplar to construct a new radiance field of large size. We propose a two-phase method for synthesizing new scenes that are continuous in geometry and appearance. We also propose a boundary constraint method to synthesize scenes of arbitrary size without artifacts. Specifically, the lighting effects of synthesized scenes are controlled using shading guidance instead of decoupling the scene. The proposed method can generate high-quality results with consistent geometry and appearance, even for scenes with complex lighting. It can even synthesize new scenes on curved surface with arbitrary lighting effects, which enhances the practicality of our proposed NeRF synthesis approach.
Chenbin Li, Yu Xin, Gaoyi Liu, Xiang Zeng, Ligang Liu
Multi-scale Hybrid Transformer Network with Grouped Convolutional Embedding for Automatic Cephalometric Landmark Detection
Abstract
Detection of anatomical landmarks in lateral cephalometric images is critical for orthodontic and orthognathic surgery. However, the industry faces the challenge of developing automatic cephalometric detection methods that are both precise and cost-effective for detecting as many landmarks as possible. Although current deep learning-based approaches have attained high accuracy, they have limitations in detecting landmarks that lack distinct texture features, such as certain soft tissue landmarks, and identify fewer landmarks overall. To address these limitations, we propose a novel multi-scale deep learning network that can simultaneously detect more landmarks with high accuracy, optimize model size and performance, and improve accuracy in identifying soft tissue landmarks. Firstly, we exploit a hybrid encoder that combines CNN and Swin Transformer, extracting features from different scales of the input image and fuse them. Additionally, we group 1D convolutional layers for efficient feature embedding, reducing model parameters while maintaining model features. Finally, our method achieves very high accuracy and efficiency on both public and private datasets, particularly in detecting more soft tissue landmarks with less distinct texture features.
Fuli Wu, Lijie Chen, Bin Feng, Pengyi Hao
ZDL: Zero-Shot Degradation Factor Learning for Robust and Efficient Image Enhancement
Abstract
In recent years, many existing learning-based image enhancement methods have shown excellent performance. However, these methods heavily rely on the labeled training data and are limited by the data distribution and application scenarios. To address these limitations, inspired by Hadamard theory, we propose a Zero-shot Degradation Factor Learning (ZDL) for robust and efficient image enhancement, which also could be extended to various harsh scenarios. Specifically, we first design a degradation factor estimation network based on Hadamard theory, which estimates the degradation factors for images to be enhanced. Then, by introducing controlled model perturbations, we propose a new learning strategy. By synthesizing additional data and exploring the inherent connections between different data, we enhance the image by relying solely on the input image and not requiring any other reference. Extensive quantitative and qualitative experimental results fully demonstrate the superiority of the proposed method, and ablation studies also verify the effectiveness of our carefully designed learning strategy.
Hao Yang, Haijia Sun, Qianyu Zhou, Ran Yi, Lizhuang Ma
Self-supervised Contrastive Feature Refinement for Few-Shot Class-Incremental Learning
Abstract
Few-Shot Class-Incremental Learning (FSCIL) is to learn novel classes with few data points incrementally, without forgetting old classes. It is very hard to capture the underlying patterns and traits of the few-shot classes. To meet the challenges, we propose a Self-supervised Contrastive Feature Refinement (SCFR) framework which tackles the FSCIL issue from three aspects. Firstly, we employ a self-supervised learning framework to make the network to learn richer representations and promote feature refinement. Meanwhile, we design virtual classes to improve the models robustness and generalization during training process. To prevent catastrophic forgetting, we attach Gaussian Noise to encountered prototypes to recall the distribution of known classes and maintain stability in the embedding space. SCFR offers a systematic solution which can effectively mitigate the issues of catastrophic forgetting and over-fitting. Experiments on widely recognized datasets, including CUB200, miniImageNet and CIFAR100, show remarkable performance than other mainstream works.
Shengjin Ma, Wang Yuan, Yiting Wang, Xin Tan, Zhizhong Zhang, Lizhuang Ma
SRSSIS: Super-Resolution Screen Space Irradiance Sampling for Lightweight Collaborative Web3D Rendering Architecture
Abstract
In traditional collaborative rendering architecture, the front-end computes direct lighting, which imposes certain performance requirements on the front-end devices. To further reduce the front-end load in complex 3D scenes, we propose a Super-resolution Screen Space Irradiance Sampling technique (SRSSIS), which is applied to our designed architecture, a lightweight collaborative rendering system built on Web3D. In our system, the back-end samples low-resolution screen-space irradiance, while the front-end implements our SRSSIS technique to reconstruct high-resolution and high-quality images. We also introduce frame interpolation in the architecture to further reduce the backend load and the transmission frequency. Moreover, we propose a self-adaptive sampling strategy to improve the robustness of super-resolution. Our experiments show that, under ideal conditions, our reconstruction performance is comparable to DLSS and FSR real-time super-resolution technology. The bandwidth consumption of our system ranges from 8% to 66% of pixel streaming at different super-resolution rates, while the back-end’s computational cost is approximately 33% to 46% of pixel streaming at different super-resolution rates.
Huzhiyuan Long, Yufan Yang, Chang Liu, Jinyuan Jia
QuadSampling: A Novel Sampling Method for Remote Implicit Neural 3D Reconstruction Based on Quad-Tree
Abstract
Implicit neural representations have shown potential advantages in 3D reconstruction. But implicit neural 3D reconstruction methods require high-performance graphical computing power, which limits their application on low power consumption platforms. Remote 3D reconstruction framework can be employed to address this issue, but the sampling method needs to be further improved.
We present a novel sampling method, QuadSampling, for remote implicit neural 3D reconstruction. By hierarchically sampling pixels within blocks with larger loss value, QuadSampling can result in larger average loss and help the neural learning process by better representing the shape of regions with different loss value. Thus, under the same amount of transmission, our QuadSampling can obtain more accurate and complete implicit neural representation of the scene. Extensive evaluations show that comparing with prior methods (i.e. random sampling and active sampling), our QuadSampling framework can improve the accuracy by up to 4%, and the completion ratio by about 1–2%.
Xu-Qiang Hu, Yu-Ping Wang
RAGT: Learning Robust Features for Occluded Human Pose and Shape Estimation with Attention-Guided Transformer
Abstract
3D human pose and shape estimation from monocular images is a fundamental task in computer vision, but it is highly ill-posed and challenging due to occlusion. Occlusion can be caused by other objects that block parts of the body from being visible in the image. When an occlusion occurs, the image features become incomplete and ambiguous, leading to inaccurate or even wrong predictions. In this paper, we propose a novel method, named RAGT, that can handle occlusion robustly and recover the complete 3D pose and shape of humans. Our study focuses on achieving robust feature representation for human pose and shape estimation in the presence of occlusion. To this end, we introduce a dual-branch architecture that learns incorporation weights from visible parts to occluded parts and suppression weights to inhibit the integration of background features. To further improve the quality of visible and occluded maps, we leverage pseudo ground-truth maps generated by DensePose for pixel-level supervision. Additionally, we propose a novel transformer-based module called COAT (Contextual Occlusion-Aware Transformer) to effectively incorporate visible features into occluded regions. The COAT module is guided by an Occlusion-Guided Attention Loss (OGAL). OGAL is designed to explicitly encourage the COAT module to fuse more important and relevant features that are semantically and spatially closer to the occluded regions. We conduct experiments on various benchmarks and prove the robustness of RAGT to the different kinds of occluded scenes both quantitatively and qualitatively.
Ziqing Li, Yang Li, Shaohui Lin
P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion
Abstract
Inferring missing regions from severely occluded point clouds is highly challenging. Especially for 3D shapes with rich geometry and structure details, inherent ambiguities of the unknown parts are existing. Existing approaches either learn a one-to-one mapping in a supervised manner or train a generative model to synthesize the missing points for the completion of 3D point cloud shapes. These methods, however, lack the controllability for the completion process and the results are either deterministic or exhibiting uncontrolled diversity. Inspired by the prompt-driven data generation and editing, we propose a novel prompt-guided point cloud completion framework, coined P2M2-Net, to enable more controllable and more diverse shape completion. Given an input partial point cloud and a text prompt describing the part-aware information such as semantics and structure of the missing region, our Transformer-based completion network can efficiently fuse the multimodal features and generate diverse results following the prompt guidance. We train the P2M2-Net on a new large-scale PartNet-Prompt dataset and conduct extensive experiments on two challenging shape completion benchmarks. Quantitative and qualitative results show the efficacy of incorporating prompts for more controllable part-aware point cloud completion and generation.
Linlian Jiang, Pan Chen, Ye Wang, Tieru Wu, Rui Ma
Backmatter
Metadaten
Titel
Computer-Aided Design and Computer Graphics
herausgegeben von
Shi-Min Hu
Yiyu Cai
Paul Rosin
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9996-66-7
Print ISBN
978-981-9996-65-0
DOI
https://doi.org/10.1007/978-981-99-9666-7

Premium Partner