Skip to main content

2022 | OriginalPaper | Buchkapitel

A Complementary Fusion Strategy for RGB-D Face Recognition

verfasst von : Haoyuan Zheng, Weihang Wang, Fei Wen, Peilin Liu

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

RGB-D Face Recognition (FR) with low-quality depth maps recently plays an important role in biometric identification. Intrinsic geometry properties and shape clues reflected by depth information significantly promote the FR robustness to light and pose variations. However, the existing multi-modal fusion methods mostly lack the ability of complementary feature learning and establishing correlated relationships between different facial features. In this paper, we propose a Complementary Multi-Modal Fusion Transformer (CMMF-Trans) network which is able to complement the fusion while preserving the modal-specific properties. In addition, the proposed novel tokenization and self-attention modules stimulate the Transformer to capture long-range dependencies supplementary to local representations of face areas. We test our model on two public datasets: Lock3DFace and IIIT-D which contain challenging variations in pose, occlusion, expression and illumination. Our strategy achieves the state-of-the-art performance on them. Another meaningful contribution in our work is that we have created a challenging RGB-D FR dataset which contains more kinds of difficult scenarios, such as, mask occlusion, backlight shadow, etc.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Goswami, G., Vatsa, M., Singh, R.: RGB-D face recognition with texture and attribute features. IEEE Trans. Inf. Forensics Secur. 9(10), 1629–1640 (2014)CrossRef Goswami, G., Vatsa, M., Singh, R.: RGB-D face recognition with texture and attribute features. IEEE Trans. Inf. Forensics Secur. 9(10), 1629–1640 (2014)CrossRef
2.
Zurück zum Zitat Lee, Y.C., Chen, J., Tseng, C.W., Lai, S.H.: Accurate and robust face recognition from RGB-D images with a deep learning approach. In: BMVC, pp. 123.1–123.14 (Sep 2016) Lee, Y.C., Chen, J., Tseng, C.W., Lai, S.H.: Accurate and robust face recognition from RGB-D images with a deep learning approach. In: BMVC, pp. 123.1–123.14 (Sep 2016)
3.
Zurück zum Zitat Chowdhury, A., Ghosh, S., Singh, R., Vatsa, M.: RGB-D face recognition via learning-based reconstruction. In: 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–7. IEEE (Sep 2016) Chowdhury, A., Ghosh, S., Singh, R., Vatsa, M.: RGB-D face recognition via learning-based reconstruction. In: 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–7. IEEE (Sep 2016)
4.
Zurück zum Zitat Zhang, H., Han, H., Cui, J., Shan, S., Chen, X.: RGB-D Face Recognition via Deep Complementary and Common Feature Learning. In: 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 8–15 (May 2018) Zhang, H., Han, H., Cui, J., Shan, S., Chen, X.: RGB-D Face Recognition via Deep Complementary and Common Feature Learning. In: 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 8–15 (May 2018)
5.
Zurück zum Zitat Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimedia Mag. 19(2), 4–10 (2012)CrossRef Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimedia Mag. 19(2), 4–10 (2012)CrossRef
6.
Zurück zum Zitat Keselman, L., Woodfill, J.I., Grunnet-Jepsen, A., Bhowmik, A.: Intel(R) RealSense (TM) Stereoscopic Depth Cameras. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–10 (Jul 2017) Keselman, L., Woodfill, J.I., Grunnet-Jepsen, A., Bhowmik, A.: Intel(R) RealSense (TM) Stereoscopic Depth Cameras. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–10 (Jul 2017)
7.
Zurück zum Zitat Lin, T.Y., Chiu, C.T., Tang, C.T.: RGB-D based multi-modal deep learning for face identification. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1668–1672. IEEE (May 2020) Lin, T.Y., Chiu, C.T., Tang, C.T.: RGB-D based multi-modal deep learning for face identification. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1668–1672. IEEE (May 2020)
8.
Zurück zum Zitat Jiang, L., Zhang, J., Deng, B.: Robust RGB-D face recognition using attribute-aware loss. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2552–2566 (2020)CrossRef Jiang, L., Zhang, J., Deng, B.: Robust RGB-D face recognition using attribute-aware loss. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2552–2566 (2020)CrossRef
9.
Zurück zum Zitat Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A guide to convolutional neural networks for computer vision. Synth. Lect. Comput. Vision 8(1), 1–207 (2018)CrossRef Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A guide to convolutional neural networks for computer vision. Synth. Lect. Comput. Vision 8(1), 1–207 (2018)CrossRef
10.
Zurück zum Zitat Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations, pp. 1–7 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations, pp. 1–7 (2021)
11.
Zurück zum Zitat Chhokra, P., Chowdhury, A., Goswami, G., Vatsa, M., Singh, R.: Unconstrained Kinect video face database. Inf. Fusion 44, 113–125 (2018)CrossRef Chhokra, P., Chowdhury, A., Goswami, G., Vatsa, M., Singh, R.: Unconstrained Kinect video face database. Inf. Fusion 44, 113–125 (2018)CrossRef
12.
Zurück zum Zitat Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: A Kinect database for face recognition. IEEE Trans. Syst. Man Cybern. Syst. 44(11), 1534–1548 (2014)CrossRef Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: A Kinect database for face recognition. IEEE Trans. Syst. Man Cybern. Syst. 44(11), 1534–1548 (2014)CrossRef
13.
Zurück zum Zitat Zhang, J., Huang, D., Wang, Y., Sun, J.: Lock3DFace: aA large-scale database of low-cost Kinect 3D faces. In: 2016 International Conference on Biometrics, pp. 1–8. IEEE (2016) Zhang, J., Huang, D., Wang, Y., Sun, J.: Lock3DFace: aA large-scale database of low-cost Kinect 3D faces. In: 2016 International Conference on Biometrics, pp. 1–8. IEEE (2016)
14.
Zurück zum Zitat Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: aA dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 67–74. IEEE (May 2018) Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: aA dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 67–74. IEEE (May 2018)
15.
Zurück zum Zitat Guo, Y., Zhang, L., Yuxiao, H., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, pp. 87–102. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6CrossRef Guo, Y., Zhang, L., Yuxiao, H., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, pp. 87–102. Springer International Publishing, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-46487-9_​6CrossRef
16.
Zurück zum Zitat Uppal, H., Sepas-Moghaddam, A., Greenspan, M., Etemad, A.: Depth as attention for face representation learning. IEEE Trans. Inf. Forensics Secur. 16, 2461–2476 (2021)CrossRef Uppal, H., Sepas-Moghaddam, A., Greenspan, M., Etemad, A.: Depth as attention for face representation learning. IEEE Trans. Inf. Forensics Secur. 16, 2461–2476 (2021)CrossRef
17.
Zurück zum Zitat Uppal, H., Sepas-Moghaddam, A., Greenspan, M., Etemad, A.: Two-level attention-based fusion learning for RGB-D face recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10120–10127. IEEE (Jan 2021) Uppal, H., Sepas-Moghaddam, A., Greenspan, M., Etemad, A.: Two-level attention-based fusion learning for RGB-D face recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10120–10127. IEEE (Jan 2021)
18.
Zurück zum Zitat Zhu, X., Liu, X., Lei, Z., Li, S.Z.: Face alignment in full pose range: a 3d total solution. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 78–92 (2017)CrossRef Zhu, X., Liu, X., Lei, Z., Li, S.Z.: Face alignment in full pose range: a 3d total solution. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 78–92 (2017)CrossRef
19.
Zurück zum Zitat Goswami, G., Bharadwaj, S., Vatsa, M., Singh, R.: On RGB-D face recognition using Kinect. In: 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–6. IEEE (Sep 2013) Goswami, G., Bharadwaj, S., Vatsa, M., Singh, R.: On RGB-D face recognition using Kinect. In: 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–6. IEEE (Sep 2013)
20.
Zurück zum Zitat Yuan, K., Guo, S., Liu, Z., Zhou, A., Yu, F., Wu, W.: Incorporating convolution designs into visual transformers. arXiv preprint arXiv:2103.11816 (2021) Yuan, K., Guo, S., Liu, Z., Zhou, A., Yu, F., Wu, W.: Incorporating convolution designs into visual transformers. arXiv preprint arXiv:2103.11816 (2021)
21.
Zurück zum Zitat Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L.: Cvt: introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808 (2021) Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L.: Cvt: introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808 (2021)
22.
Zurück zum Zitat Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 13–23 (Dec 2019) Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 13–23 (Dec 2019)
23.
Zurück zum Zitat Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D.: Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 1, pp. 11336–11344 (2020) Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D.: Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 1, pp. 11336–11344 (2020)
24.
Zurück zum Zitat Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021) Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)
25.
Zurück zum Zitat Mu, G., Huang, D., Hu, G., Sun, J., Wang, Y.: Led3D: a lightweight and efficient deep approach to recognizing low-quality 3D faces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5773–5782 (2019) Mu, G., Huang, D., Hu, G., Sun, J., Wang, Y.: Led3D: a lightweight and efficient deep approach to recognizing low-quality 3D faces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5773–5782 (2019)
26.
Zurück zum Zitat Rahman, M.M., Tan, Y., Xue, J., Lu, K.: RGB-D object recognition with multimodal deep convolutional neural networks. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 991–996. IEEE (July 2017) Rahman, M.M., Tan, Y., Xue, J., Lu, K.: RGB-D object recognition with multimodal deep convolutional neural networks. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 991–996. IEEE (July 2017)
27.
Zurück zum Zitat Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018) Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
29.
Zurück zum Zitat Cui, J., Zhang, H., Han, H., Shan, S., Chen, X.: Improving 2D face recognition via discriminative face depth estimation. In: 2018 International Conference on Biometrics (ICB), pp. 140–147. IEEE (Feb 2018) Cui, J., Zhang, H., Han, H., Shan, S., Chen, X.: Improving 2D face recognition via discriminative face depth estimation. In: 2018 International Conference on Biometrics (ICB), pp. 140–147. IEEE (Feb 2018)
30.
Zurück zum Zitat Chen, C.F., Fan, Q., Panda, R.: Crossvit: Cross-attention multi-scale vision transformer for image classification. arXiv preprint arXiv:2103.14899 (2021) Chen, C.F., Fan, Q., Panda, R.: Crossvit: Cross-attention multi-scale vision transformer for image classification. arXiv preprint arXiv:2103.14899 (2021)
Metadaten
Titel
A Complementary Fusion Strategy for RGB-D Face Recognition
verfasst von
Haoyuan Zheng
Weihang Wang
Fei Wen
Peilin Liu
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-98358-1_27

Premium Partner