1 Introduction
-
We propose an end-to-end unsupervised 2D image-based 3D model retrieval framework on top of ViT, dubbed transformer-based 3D model retrieval network (T3DRN) with a distinctive property of mining proper view information to guide the whole retrieval process.
-
A novel module, termed shared view-guided attentive module (SVAM), which can be easily integrated into T3DRN, is proposed to attend to the proper view information for the 3D model feature training.
-
Qualitative and quantitative experimental results on the challenging unsupervised 2D image-based 3D model retrieval datasets show that our method outperforms the state-of-the-art methods.
2 Related work
2.1 3D model retrieval
2.2 Domain adaptation
2.3 Transformer
3 Methodology
3.1 Problem statement
3.2 Transformer-based 3D model retrieval network
3.3 Shared view-guided attentive model
3.4 Training and optimization details
4 Experimental results and discussion
4.1 Datasets and evaluation metrics
4.1.1 Dataset
4.1.2 Evaluation metrics
4.2 Implementation details
4.3 Quantitative results and analysis
4.3.1 Comparative methods
4.3.2 Quantitative results with other methods and analysis
Methods | NN | FT | ST | F | DCG | ANMRR | AUC |
---|---|---|---|---|---|---|---|
AlexNet [52] | 0.424 | 0.323 | 0.469 | 0.099 | 0.345 | 0.667 | – |
DANN [10] | 0.650 | 0.505 | 0.643 | 0.112 | 0.542 | 0.474 | – |
JAN [30] | 0.446 | 0.343 | 0.495 | 0.085 | 0.363 | 0.647 | – |
JGSA [12] | 0.612 | 0.443 | 0.599 | 0.116 | 0.473 | 0.541 | – |
MEDA [11] | 0.430 | 0.344 | 0.501 | 0.046 | 0.361 | 0.646 | – |
MSTN [54] | 0.789 | 0.622 | 0.779 | 0.154 | 0.657 | 0.358 | 0.557 |
DLEA [29] | 0.764 | 0.558 | 0.716 | 0.143 | 0.597 | 0.421 | – |
HIFA [53] | 0.778 | 0.618 | 0.768 | 0.151 | 0.654 | 0.362 | – |
Ours | 0.801 | 0.632 | 0.787 | 0.155 | 0.667 | 0.348 | 0.569 |
4.4 Ablation studies
4.4.1 The effectiveness of SVAM
Methods | NN | FT | ST | F | DCG | ANMRR | AUC |
---|---|---|---|---|---|---|---|
T3DRN-SVAM | 0.790 | 0.629 | 0.775 | 0.153 | 0.658 | 0.359 | 0.559 |
T3DRN | 0.801 | 0.632 | 0.787 | 0.155 | 0.667 | 0.348 | 0.569 |
4.4.2 The effectiveness of balance coefficient
\(\lambda\) | NN | FT | ST | F | DCG | ANMRR | AUC |
---|---|---|---|---|---|---|---|
0.1 | 0.776 | 0.605 | 0.742 | 0.142 | 0.643 | 0.369 | 0.537 |
0.2 | 0.781 | 0.610 | 0.759 | 0.147 | 0.658 | 0.359 | 0.559 |
0.3 | 0.801 | 0.632 | 0.787 | 0.155 | 0.667 | 0.348 | 0.569 |
0.4 | 0.794 | 0.625 | 0.776 | 0.153 | 0.651 | 0.360 | 0.553 |
0.5 | 0.790 | 0.621 | 0.761 | 0.144 | 0.645 | 0.367 | 0.541 |
4.4.3 The effectiveness of the perturbation coefficient \(\alpha\)
\(\alpha\) | NN | FT | ST | F | DCG | ANMRR | AUC |
---|---|---|---|---|---|---|---|
0.2 | 0.801 | 0.632 | 0.787 | 0.155 | 0.667 | 0.348 | 0.569 |
0.3 | 0.786 | 0.618 | 0.765 | 0.149 | 0.653 | 0.369 | 0.548 |
0.4 | 0.770 | 0.607 | 0.761 | 0.137 | 0.643 | 0.375 | 0.543 |