1 Introduction
2 Related Work
2.1 3D Shape Completion and Single-View 3D Reconstruction
2.1.1 3D Shape Completion
2.1.2 Single-View 3D Reconstruction
2.2 Shape Models
2.3 Amortized Inference
3 Method
3.1 Problem Formulation
3.2 Shape Prior
3.2.1 Variational Auto-Encoder (VAE)
3.2.2 Denoising VAE (DVAE)
3.3 Shape Inference
3.4 Practical Considerations
3.4.1 Encouraging Variety
3.4.2 Handling Noise
4 Experiments
4.1 Data
Synthetic | Real | |||
---|---|---|---|---|
SN-clean/-noisy | ModelNet | KITTI | Kinect | |
Training/test sets | ||||
#Shapes for shape prior, #Views for shape inference | ||||
#Shapes | 500/100 | 1000/200 | – | – |
#Views | 5000/1000 | 10,000/2000 | 8442/9194 | 30/10 |
Observed Voxels in % (\(<\mathbf 5 \%\)) and resolutions | ||||
Low = \(24\,\times \,54\,\times \,24\,\)/\(\,32^3\); Medium = \(32\,\times \,72\,\times \,32\)/\(\,48^3\); High = \(48\,\times \,108\,\times \,48\)/\(64^3\) | ||||
Low | 7.66/3.86 | 9.71 | 6.79 | 0.87 |
Medium | 6.1/2.13 | 8.74 | 5.24 | – |
High | 2.78/0.93 | 8.28 | 3.44 | – |
4.1.1 ShapeNet
4.1.2 KITTI
4.1.3 ModelNet
4.1.4 Kinect
4.2 Evaluation
4.3 Architectures and Training
4.4 Baselines
4.4.1 Data-Driven Approaches
Supervision in % | Method | SN-clean | SN-noisy | KITTI | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Ham \(\downarrow \) | IoU \(\uparrow \) | Acc [vx] \(\downarrow \) | Comp [vx] \(\downarrow \) | Ham \(\downarrow \) | IoU \(\uparrow \) | Acc [vx] \(\downarrow \) | Comp [vx] \(\downarrow \) | Comp [m] \(\downarrow \) | ||
Low resolution: \(24 \times 54 \times 24\) voxels; * independent of resolution | ||||||||||
(shape prior) | DVAE | 0.019 | 0.885 | 0.283 | 0.527 | (same shape prior as on SN-clean) | ||||
100 |
Dai et al. (2017) (Dai17) | 0.021 | 0.872 | 0.321 | 0.564 | 0.027 | 0.836 | 0.391 | 0.633 | 0.128 |
Sup | 0.026 | 0.841 | 0.409 | 0.607 | 0.028 | 0.833 | 0.407 | 0.637 | 0.091 | |
\(<7.7\) | Naïve | 0.067 | 0.596 | 0.999 | 1.335 | 0.064 | 0.609 | 0.941 | 1.29 | – |
Mean | 0.052 | 0.697 | 0.79 | 0.938 | 0.052 | 0.696 | 0.79 | 0.938 | – | |
ML | 0.04 | 0.756 | 0.637 | 0.8 | 0.041 | 0.755 | 0.625 | 0.829 | (too slow) | |
*Gupta et al. (2015) (ICP) | (mesh only) | 0.534 | 0.503 | (mesh only) | 7.551 | 6.372 | (too slow) | |||
*Engelmann et al. (2016) (Eng16) | (mesh only) | 1.235 | 1.237 | (mesh only) | 1.974 | 1.312 | 0.13 | |||
dAML | 0.034 | 0.784 | 0.532 | 0.741 | 0.036 | 0.772 | 0.557 | 0.76 | (see AML) | |
AML | 0.034 | 0.779 | 0.549 | 0.753 | 0.036 | 0.771 | 0.57 | 0.761 | 0.12 | |
Low resolution: \(24 \times 54 \times 24\) voxels; Multiple, \(k > 1\) Fused Views | ||||||||||
100 |
Dai et al. (2017) (Dai17), \(k = 5\) | 0.012 | 0.924 | 0.214 | 0.436 | 0.018 | 0.887 | 0.278 | 0.491 | n/a |
Sup, \(k = 5\) | 0.022 | 0.866 | 0.336 | 0.566 | 0.024 | 0.86 | 0.331 | 0.573 | ||
\(<16\) | AML, \(k = 2\) | 0.032 | 0.794 | 0.489 | 0.695 | 0.034 | 0.79 | 0.52 | 0.725 | n/a |
\(<24\) | AML, \(k = 3\) | 0.031 | 0.809 | 0.471 | 0.667 | 0.031 | 0.81 | 0.493 | 0.67 | |
\(<40\) | AML, \(k = 5\) | 0.031 | 0.804 | 0.502 | 0.686 | 0.035 | 0.799 | 0.523 | 0.7 | |
Medium resolution: \(32 \times 72 \times 32\) voxels | ||||||||||
(shape prior) | DVAE | 0.019 | 0.877 | 0.24 | 0.47 | (same shape prior as on SN-clean) | ||||
100 |
Dai et al. (2017) (Dai17) | 0.02 | 0.869 | 0.399 | 0.674 | 0.026 | 0.83 | 0.51 | 0.767 | 0.074 |
Sup | 0.027 | 0.834 | 0.498 | 0.789 | 0.029 | 0.815 | 0.571 | 0.843 | 0.09 | |
\(\le 6.1\) | AML | 0.031 | 0.788 | 0.415 | 0.584 | 0.036 | 0.766 | 0.721 | 0.953 | 0.083 |
High resolution: \(48 \times 108 \times 48\) voxels | ||||||||||
(shape prior) | DVAE | 0.018 | 0.87 | 0.272 | 0.434 | (same shape prior as on SN-clean) | ||||
100 |
Dai et al. (2017) (Dai17) | 0.017 | 0.88 | 0.517 | 0.827 | 0.054 | 0.664 | 1.559 | 2.067 | 0.066 |
Sup | 0.023 | 0.843 | 0.677 | 1.032 | 0.052 | 0.674 | 1.52 | 1.981 | 0.091 | |
\(<3.5\) | AML | 0.028 | 0.796 | 0.433 | 0.579 | 0.045 | 0.659 | 1.4 | 1.957 | 0.078 |
Supervision in % | Method | bathtub | Chair | Desk | Table | ModelNet10 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ham \(\downarrow \) | IoU \(\uparrow \) | Ham \(\downarrow \) | IoU \(\uparrow \) | Acc [vx]\(\downarrow \) | Comp [vx]\(\downarrow \) | Ham \(\downarrow \) | IoU \(\uparrow \) | Ham \(\downarrow \) | IoU \(\uparrow \) | Ham \(\downarrow \) | IoU \(\uparrow \) | ||
Low resolution: \(32^3\) voxels; * independent of resolution | |||||||||||||
(shape prior) | DVAE | 0.015 | 0.699 | 0.025 | 0.517 | 0.884 | 0.72 | 0.028 | 0.555 | 011 | 0.608 | 0.023 | 0.714 |
100 |
Dai et al. (2017) (Dai17) | 0.022 | 0.59 | 0.019 | 0.61 | 0.663 | 0.671 | 0.027 | 0.568 | 0.011 | 0.648 | 0.03 | 0.646 |
Sup | 0.023 | 0.618 | 0.03 | 0.478 | 0.873 | 0.813 | 0.036 | 0.458 | 0.017 | 0.497 | 0.038 | 0.589 | |
\(<10\) | * Gupta et al. (2015) (ICP) | (mesh only) | (mesh only) | 1.483 | 0.89 | (mesh only) | (mesh only) | (mesh only) | |||||
ML | 0.028 | 0.503 | 0.033 | 0.414 | 1.489 | 1.065 | 0.048 | 0.323 | 0.029 | 0.318 | (too slow) | ||
AML | 0.026 | 0.503 | 0.033 | 0.373 | 1.088 | 0.785 | 0.041 | 0.389 | 0.018 | 0.423 | 0.04 | 0.509 | |
Medium resolution: \(48^3\) voxels | |||||||||||||
(shape prior) | DVAE | 0.014 | 0.671 | 0.021 | 0.491 | 0.748 | 0.697 | 0.025 | 0.525 | 0.01 | 0.548 | ||
100 |
Dai et al. (2017) (Dai17) | 0.018 | 0.609 | 0.016 | 0.576 | 0.513 | 0.508 | 0.023 | 0.532 | 0.008 | 0.65 | ||
\(<9\) | AML | 0.024 | 0.459 | 0.029 | 0.347 | 1.025 | 0.805 | 0.034 | 0.361 | 0.015 | 0.384 | ||
High resolution: \(64^3\) voxels | |||||||||||||
(shape prior) | DVAE | 0.014 | 0.644 | 0.02 | 0.474 | 0.702 | 0.705 | 0.024 | 0.506 | 0.009 | 0.548 | ||
100 |
Dai et al. (2017) (Dai17) | 0.018 | 0.54 | 0.016 | 0.548 | 0.47 | 0.53 | 0.021 | 0.525 | 0.007 | 0.673 | ||
\(<9\) | AML | 0.023 | 0.46 | 0.026 | 0.333 | 0.893 | 0.852 | 0.042 | 0.31 | 0.012 | 0.407 |