1 Introduction
-
We introduce a novel autoencoder-like network architecture for GANs, which achieves state-of-the-art results in tasks such as 3D face representation, generation, and translation.
-
We introduce a novel training framework for GANs, especially tailored for 3D facial data.
-
We introduce a novel process for generating realistic 3D facial data, retaining the high frequency details of the 3D face.
2 3D Face Representations for Deep Nets
3 3DFaceGAN
3.1 Objective Function
3.2 Training Procedure
Part | Input \(\rightarrow \) Output shape | Layer information |
---|---|---|
Encoder | (h, w, 3) \(\rightarrow \) (h, w, n) | CONV-(Cn, K3x3, S1, P1), ELU |
(h, w, n) \(\rightarrow \) (\(\frac{h}{2}\), \(\frac{w}{2}\), 2n) | CONV-BLOCK-(Cn, 2n, K3x3, S1, P1), AvgPool2D(K2x2, S2) | |
(\(\frac{h}{2}\), \(\frac{w}{2}\), 2n) \(\rightarrow \) (\(\frac{h}{4}\), \(\frac{w}{4}\), 3n) | CONV-BLOCK-(C2n, C3n, K3x3, S1, P1), AvgPool2D(K2x2, S2) | |
(\(\frac{h}{4}\), \(\frac{w}{4}\), 3n) \(\rightarrow \) (\(\frac{h}{8}\), \(\frac{w}{8}\), 4n) | CONV-BLOCK-(C3n, C4n, K3x3, S1, P1), AvgPool2D(K2x2, S2) | |
(\(\frac{h}{8}\), \(\frac{w}{8}\), 4n) \(\rightarrow \) (\(\frac{h}{16}\), \(\frac{w}{16}\), 5n) | CONV-BLOCK-(C4n, C5n, K3x3, S1, P1), AvgPool2D(K2x2, S2) | |
(\(\frac{h}{16}\), \(\frac{w}{16}\), 5n) \(\rightarrow \) (\(\frac{h}{32}\), \(\frac{w}{32}\), 6n) | CONV-BLOCK-(C5n, C6n, K3x3, S1, P1), AvgPool2D(K2x2, S2) | |
(\(\frac{h}{32}\), \(\frac{w}{32}\), 6n) \(\rightarrow \) (\(\frac{h}{32}\), \(\frac{w}{32}\), 6n) | CONV-BLOCK-(C6n, C6n, K3x3, S1, P1) | |
\(\hbox {Bottleneck}_1\) | (\(\frac{h}{32}\times \frac{w}{32}\times \)6n) \(\rightarrow \) n | Fully connected |
\(\hbox {Bottleneck}_2\) | n \(\rightarrow \) (\(\frac{h}{32}\times \frac{w}{32}\times \)n) | Fully connected |
Decoder | (\(\frac{h}{32}\), \(\frac{w}{32}\), n) \(\rightarrow \) (\(\frac{h}{16}\), \(\frac{w}{16}\), n) | DECONV-BLOCK(Cn, Cn K3x3, S1, P1), UpNN(SF2) |
(\(\frac{h}{16}\), \(\frac{w}{16}\), n) \(\rightarrow \) (\(\frac{h}{8}\), \(\frac{w}{8}\), n) | DECONV-BLOCK(Cn, Cn, K3x3, S1, P1), UpNN(SF2) | |
(\(\frac{h}{8}\), \(\frac{w}{8}\), n) \(\rightarrow \) (\(\frac{h}{4}\), \(\frac{w}{4}\), n) | DECONV-BLOCK(Cn, Cn, K3x3, S1, P1), UpNN(SF2) | |
(\(\frac{h}{8}\), \(\frac{w}{8}\), n) \(\rightarrow \) (\(\frac{h}{4}\), \(\frac{w}{4}\), n) | DECONV-BLOCK(Cn, Cn, K3x3, S1, P1), UpNN(SF2) | |
(\(\frac{h}{2}\), \(\frac{w}{2}\), n) \(\rightarrow \) (h, w, n) | DECONV-BLOCK(Cn, Cn, K3x3, S1, P1), UpNN(SF2) | |
(h, w, n) \(\rightarrow \) (h, w, n) | DECONV-BLOCK(Cn, Cn, K3x3, S1, P1) | |
(h, w, n) \(\rightarrow \) (h, w, 3) | DECONV(Cn, C3, K3x3, S1, P1), Tanh |
3.3 3D Face Generation
3.4 3DFaceGAN for Multi-Label 3D Data
Method | Mean | std | AUC | FR (%) |
---|---|---|---|---|
3DFaceGAN | 0.0031 | ± 0.0028 | 0.741 | 1.42e−7 |
CoMA | 0.0038 | ± 0.0037 | 0.716 | 3.66e−7 |
PCA | 0.0040 | ± 0.0040 | 0.711 | 0.91e−6 |
PGAN | 0.0041 | ± 0.0041 | 0.705 | 1.22e−6 |
4 Experiments
4.1 Databases
4.1.1 The Hi-Lo database
4.1.2 4DFAB Database
4.2 Data Preprocessing
Method | AUC | FR (%) |
---|---|---|
3DFaceGAN | 0.741 | 1.42e−7 |
3DFaceGAN_V3 | 0.736 | 2.62e−7 |
3DFaceGAN_V2 | 0.704 | 3.15e−6 |
Baseline (AE) | 0.697 | 4.24−6 |
-
11.013 mm for a UV size of 128,
-
0.164 mm for a UV size of 256,
-
0.013 mm for a UV size of 512.
4.3 Training
4.4 3D Face Representation
4.4.1 Baseline Models
4.4.2 Vanilla Autoencoder (AE)
4.4.3 Convolutional Mesh Autoencoder (CoMA)
4.4.4 Principal Component Analysis (PCA)
4.4.5 Progressive GAN (PGAN)
4.4.6 Error Metric
4.4.7 Ablation Study
Method | AUC | Failure rate (%) |
---|---|---|
3DFaceGAN | 0.827 | 5.49e−6 |
pix2pixHD | 0.760 | 5.18e−5 |
pix2pix | 0.757 | 1.81e−5 |
Denoising CoMA | 0.742 | 2.41e−4 |
Method | AUC | Failure rate (%) |
---|---|---|
3DFaceGAN | 0.827 | 5.49e−6 |
3DFaceGAN_V3 | 0.819 | 8.70e−6 |
3DFaceGAN_V2 | 0.794 | 1.38e−5 |
Baseline (Denoising AE) | 0.758 | 1.95e−5 |
4.5 3D Face Translation
4.5.1 Baseline Models
4.5.2 Denoising Vanilla Autoencoder (Denoising AE)
4.5.3 Denoising Convolutional Mesh Autoencoder (Denoising CoMA)
4.5.4 pix2pix
4.5.5 pix2pixHD
4.5.6 Error Metric
4.5.7 Ablation Study
4.6 Multi-label 3D Face Translation
4.7 Cross-Dataset Attribute Transfer
4.8 3D Face Generation
Method | Mean | std |
---|---|---|
3DFaceGAN | 1.28 | ±0.183 |
CoMA | 1.40 | ±0.205 |
PCA | 1.43 | ±0.232 |
PGAN | 1.79 | ±0.189 |