Skip to main content
Erschienen in: International Journal of Computer Vision 4/2024

06.11.2023

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

verfasst von: Yaokun Li, Guang Tan, Chao Gou

Erschienen in: International Journal of Computer Vision | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Landmark detection under large pose with occlusion has been one of the challenging problems in the field of facial analysis. Recently, many works have predicted pose or occlusion together in the multi-task learning (MTL) paradigm, trying to tap into their dependencies and thus alleviate this issue. However, such implicit dependencies are weakly interpretable and inconsistent with the way humans exploit inter-task coupling relations, i.e., accommodating the induced explicit effects. This is one of the essentials that hinders their performance. To this end, in this paper, we propose a Cascaded Iterative Transformer (CIT) to jointly predict facial landmark, occlusion probability, and pose. The proposed CIT, besides implicitly mining task dependencies in a shared encoder, innovatively employs a cost-effective and portability-friendly strategy to pass the decoders’ predictions as prior knowledge to human-like exploit the coupling-induced effects. Moreover, to the best of our knowledge, no dataset contains all these task annotations simultaneously, so we introduce a new dataset termed MERL-RAV-FLOP based on the MERL-RAV dataset. We conduct extensive experiments on several challenging datasets (300W-LP, AFLW2000-3D, BIWI, COFW, and MERL-RAV-FLOP) and achieve remarkable results. The code and dataset can be accessed in https://​github.​com/​Iron-LYK/​CIT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
FLOP: Facial Landmark, Occlusion and Pose.
 
Literatur
Zurück zum Zitat Aghli, N., & Ribeiro, E. (2021). A data-driven approach to improve 3D head-pose estimation. In International Symposium on Visual Computing (pp. 546–558). Aghli, N., & Ribeiro, E. (2021). A data-driven approach to improve 3D head-pose estimation. In International Symposium on Visual Computing (pp. 546–558).
Zurück zum Zitat Albiero, V., Chen, X., Yin, X., Pang, G., & Hassner, T. (2021). img2pose: Face alignment and detection via 6dof, face pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7617–7627). Albiero, V., Chen, X., Yin, X., Pang, G., & Hassner, T. (2021). img2pose: Face alignment and detection via 6dof, face pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7617–7627).
Zurück zum Zitat Bhagavatula, C., Zhu, C., Luu, K., & Savvides, M. (2017). Faster than real-time facial alignment: A 3D spatial transformer network approach in unconstrained poses. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3980–3989). Bhagavatula, C., Zhu, C., Luu, K., & Savvides, M. (2017). Faster than real-time facial alignment: A 3D spatial transformer network approach in unconstrained poses. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3980–3989).
Zurück zum Zitat Bisogni, C., Nappi, M., Pero, C., & Ricciardi, S. (2021). Fashe: A fractal based strategy for head pose estimation. IEEE Transactions on Image Processing, 30, 3192–3203.CrossRef Bisogni, C., Nappi, M., Pero, C., & Ricciardi, S. (2021). Fashe: A fractal based strategy for head pose estimation. IEEE Transactions on Image Processing, 30, 3192–3203.CrossRef
Zurück zum Zitat Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision (pp. 1021–1030). Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision (pp. 1021–1030).
Zurück zum Zitat Burgos-Artizzu, X.P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1513–1520). Burgos-Artizzu, X.P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1513–1520).
Zurück zum Zitat Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.MathSciNetCrossRef Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.MathSciNetCrossRef
Zurück zum Zitat Cao, Z., Chu, Z., Liu, D., & Chen, Y. (2021). A vector-based representation to enhance head pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1188–1197). Cao, Z., Chu, Z., Liu, D., & Chen, Y. (2021). A vector-based representation to enhance head pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1188–1197).
Zurück zum Zitat Deng, J., Guo, J., Ververas, E., Kotsia, I., & Zafeiriou, S. (2020). Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5203–5212). Deng, J., Guo, J., Ververas, E., Kotsia, I., & Zafeiriou, S. (2020). Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5203–5212).
Zurück zum Zitat Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., & Sun, J. (2021). Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13733–13742). Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., & Sun, J. (2021). Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13733–13742).
Zurück zum Zitat Dong, X. , Yu, S.-I. , Weng, X. , Wei, S.-E. , Yang, Y. , & Sheikh, Y. (2018). Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 360–368). Dong, X. , Yu, S.-I. , Weng, X. , Wei, S.-E. , Yang, Y. , & Sheikh, Y. (2018). Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 360–368).
Zurück zum Zitat Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929.
Zurück zum Zitat Fanelli, G., Dantone, M., Gall, J., Fossati, A., & Van Gool, L. (2013). Random forests for real time 3D face analysis. International Journal of Computer Vision, 101(3), 437–458.CrossRef Fanelli, G., Dantone, M., Gall, J., Fossati, A., & Van Gool, L. (2013). Random forests for real time 3D face analysis. International Journal of Computer Vision, 101(3), 437–458.CrossRef
Zurück zum Zitat Fard, A. P., Abdollahi, H., & Mahoor, M. (2021). Asmnet: A lightweight deep neural network for face alignment and pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1521–1530). Fard, A. P., Abdollahi, H., & Mahoor, M. (2021). Asmnet: A lightweight deep neural network for face alignment and pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1521–1530).
Zurück zum Zitat Fard, A. P., & Mahoor, M. H. (2022). Acr loss: Adaptive coordinate-based regression loss for face alignment. arXiv preprint arXiv:2203.15835. Fard, A. P., & Mahoor, M. H. (2022). Acr loss: Adaptive coordinate-based regression loss for face alignment. arXiv preprint arXiv:​2203.​15835.
Zurück zum Zitat Feng, Z.-H., Kittler, J., Awais, M., & Wu, X.-J. (2020). Rectified wing loss for efficient and robust facial landmark localisation with convolutional neural networks. International Journal of Computer Vision, 128(8), 2126–2145.CrossRef Feng, Z.-H., Kittler, J., Awais, M., & Wu, X.-J. (2020). Rectified wing loss for efficient and robust facial landmark localisation with convolutional neural networks. International Journal of Computer Vision, 128(8), 2126–2145.CrossRef
Zurück zum Zitat Feng, Z.-H., Kittler, J., Awais, M., & Wu, X.-J. (2020). Rectified wing loss for efficient and robust facial landmark localisation with convolutional neural networks. International Journal of Computer Vision, 128, 2126–2145.CrossRef Feng, Z.-H., Kittler, J., Awais, M., & Wu, X.-J. (2020). Rectified wing loss for efficient and robust facial landmark localisation with convolutional neural networks. International Journal of Computer Vision, 128, 2126–2145.CrossRef
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
Zurück zum Zitat Hempel, T., Abdelrahman, A. A., & Al-Hamadi, A. (2022). 6D rotation representation for unconstrained head pose estimation. arXiv preprint arXiv:2202.12555. Hempel, T., Abdelrahman, A. A., & Al-Hamadi, A. (2022). 6D rotation representation for unconstrained head pose estimation. arXiv preprint arXiv:​2202.​12555.
Zurück zum Zitat Hsu, H.-W., Wu, T.-Y., Wan, S., Wong, W. H., & Lee, C.-Y. (2018). Quatnet: Quaternion-based head pose estimation with multiregression loss. IEEE Transactions on Multimedia, 21(4), 1035–1046.CrossRef Hsu, H.-W., Wu, T.-Y., Wan, S., Wong, W. H., & Lee, C.-Y. (2018). Quatnet: Quaternion-based head pose estimation with multiregression loss. IEEE Transactions on Multimedia, 21(4), 1035–1046.CrossRef
Zurück zum Zitat Jin, H., Liao, S., & Shao, L. (2021). Pixel-in-pixel net: Towards efficient facial landmark detection in the wild. International Journal of Computer Vision, 129(12), 3174–3194.CrossRef Jin, H., Liao, S., & Shao, L. (2021). Pixel-in-pixel net: Towards efficient facial landmark detection in the wild. International Journal of Computer Vision, 129(12), 3174–3194.CrossRef
Zurück zum Zitat Jourabloo, A., & Liu, X. (2015). December. Pose-invariant 3D face alignment. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Jourabloo, A., & Liu, X. (2015). December. Pose-invariant 3D face alignment. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Zurück zum Zitat Ju, Y.-J. , Lee, G.-H. , Hong, J.-H. , & Lee, S.-W. (2022). Complete face recovery GAN: Unsupervised joint face rotation and de-occlusion from a single-view image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3711–3721). Ju, Y.-J. , Lee, G.-H. , Hong, J.-H. , & Lee, S.-W. (2022). Complete face recovery GAN: Unsupervised joint face rotation and de-occlusion from a single-view image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3711–3721).
Zurück zum Zitat Kipf, T. N. , & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Kipf, T. N. , & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:​1609.​02907.
Zurück zum Zitat Koestinger, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) (pp. 2144–2151). Koestinger, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) (pp. 2144–2151).
Zurück zum Zitat Kumar, A., Alavi, A., & Chellappa, R. (2017). Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (fg 2017) (pp. 258–265). Kumar, A., Alavi, A., & Chellappa, R. (2017). Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (fg 2017) (pp. 258–265).
Zurück zum Zitat Kumar, A., Marks, T. K., Mou, W., Wang, Y., Jones, M., Cherian, A., & Feng, C. (2020). Luvli face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8236–8246). Kumar, A., Marks, T. K., Mou, W., Wang, Y., Jones, M., Cherian, A., & Feng, C. (2020). Luvli face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8236–8246).
Zurück zum Zitat Lan, X., Hu, Q., Chen, Q., Xue, J., & Cheng, J. (2021). Hih: Towards more accurate face alignment via heatmap in heatmap. arXiv preprint arXiv:2104.03100. Lan, X., Hu, Q., Chen, Q., Xue, J., & Cheng, J. (2021). Hih: Towards more accurate face alignment via heatmap in heatmap. arXiv preprint arXiv:​2104.​03100.
Zurück zum Zitat Lan, X., Hu, Q., & Cheng, J. (2022). Atf: An alternating training framework for weakly supervised face alignment. IEEE Transactions on Multimedia. Lan, X., Hu, Q., & Cheng, J. (2022). Atf: An alternating training framework for weakly supervised face alignment. IEEE Transactions on Multimedia.
Zurück zum Zitat Li, H., Guo, Z., Rhee, S.-M., Han, S., & Han, J.-J. (2022). Towards accurate facial landmark detection via cascaded transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4176–4185). Li, H., Guo, Z., Rhee, S.-M., Han, S., & Han, J.-J. (2022). Towards accurate facial landmark detection via cascaded transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4176–4185).
Zurück zum Zitat Li, J., Jin, H., Liao, S., Shao, L., & Heng, P.-A. (2022). Repformer: Refinement pyramid transformer for robust facial landmark detection. arXiv preprint arXiv:2207.03917. Li, J., Jin, H., Liao, S., Shao, L., & Heng, P.-A. (2022). Repformer: Refinement pyramid transformer for robust facial landmark detection. arXiv preprint arXiv:​2207.​03917.
Zurück zum Zitat Li, Y.-K., Yu, Y.-Z., Liu, Y.-L., Gou, C. (2022). MS-GCN: Multi-stream graph convolution network for driver head pose estimation. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) (pp. 3819–3824). Li, Y.-K., Yu, Y.-Z., Liu, Y.-L., Gou, C. (2022). MS-GCN: Multi-stream graph convolution network for driver head pose estimation. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) (pp. 3819–3824).
Zurück zum Zitat Mahpod, S., Das, R., Maiorana, E., Keller, Y., & Campisi, P. (2021). Facial landmarks localization using cascaded neural networks. Computer Vision and Image Understanding, 205, 103171.CrossRef Mahpod, S., Das, R., Maiorana, E., Keller, Y., & Campisi, P. (2021). Facial landmarks localization using cascaded neural networks. Computer Vision and Image Understanding, 205, 103171.CrossRef
Zurück zum Zitat Mo, S., & Miao, X. (2021). OSGG-net: One-step graph generation network for unbiased head pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia (pp. 2465–2473). Mo, S., & Miao, X. (2021). OSGG-net: One-step graph generation network for unbiased head pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia (pp. 2465–2473).
Zurück zum Zitat Ranjan, R., Patel, V. M., & Chellappa, R. (2017). Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 121–135.CrossRef Ranjan, R., Patel, V. M., & Chellappa, R. (2017). Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 121–135.CrossRef
Zurück zum Zitat Ruan, Z., Zou, C., Wu, L., Wu, G., & Wang, L. (2021). Sadrnet: Self-aligned dual face regression networks for robust 3d dense face alignment and reconstruction. IEEE Transactions on Image Processing, 30, 5793–5806.CrossRef Ruan, Z., Zou, C., Wu, L., Wu, G., & Wang, L. (2021). Sadrnet: Self-aligned dual face regression networks for robust 3d dense face alignment and reconstruction. IEEE Transactions on Image Processing, 30, 5793–5806.CrossRef
Zurück zum Zitat Ruiz, N., Chong, E., & Rehg, J. M. (2018). Fine-grained head pose estimation without keypoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 2074–2083). Ruiz, N., Chong, E., & Rehg, J. M. (2018). Fine-grained head pose estimation without keypoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 2074–2083).
Zurück zum Zitat Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 397–403). Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 397–403).
Zurück zum Zitat Shen, J., Zafeiriou, S., Chrysos, G. G., Kossaifi, J., Tzimiropoulos, G., & Pantic, M. (2015). The first facial landmark tracking in-the-wild challenge: Benchmark and results. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 50–58). Shen, J., Zafeiriou, S., Chrysos, G. G., Kossaifi, J., Tzimiropoulos, G., & Pantic, M. (2015). The first facial landmark tracking in-the-wild challenge: Benchmark and results. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 50–58).
Zurück zum Zitat Tong, Z., & Zhou, J. (2021). Face alignment using two-stage cascaded pose regression and mirror error correction. Pattern Recognition, 115, 107866.CrossRef Tong, Z., & Zhou, J. (2021). Face alignment using two-stage cascaded pose regression and mirror error correction. Pattern Recognition, 115, 107866.CrossRef
Zurück zum Zitat Valle, R., Buenaposada, J. M., & Baumela, L. (2020). Cascade of encoder-decoder CNNS with learned coordinates regressor for robust facial landmarks detection. Pattern Recognition Letters, 136, 326–332.CrossRef Valle, R., Buenaposada, J. M., & Baumela, L. (2020). Cascade of encoder-decoder CNNS with learned coordinates regressor for robust facial landmarks detection. Pattern Recognition Letters, 136, 326–332.CrossRef
Zurück zum Zitat Valle, R. , Buenaposada, J. M. , & Baumela, L. (2020b). Multi-task head pose estimation in-the-wild. IEEE Transactions on Pattern Analysis and Machine Intelligence. Valle, R. , Buenaposada, J. M. , & Baumela, L. (2020b). Multi-task head pose estimation in-the-wild. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zurück zum Zitat Valle, R., Buenaposada, J. M., Valdés, A., & Baumela, L. (2019). Face alignment using a 3d deeply-initialized ensemble of regression trees. Computer Vision and Image Understanding, 189, 102846.CrossRef Valle, R., Buenaposada, J. M., Valdés, A., & Baumela, L. (2019). Face alignment using a 3d deeply-initialized ensemble of regression trees. Computer Vision and Image Understanding, 189, 102846.CrossRef
Zurück zum Zitat Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., & Van Gool, L. (2021). Multi-task learning for dense prediction tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., & Van Gool, L. (2021). Multi-task learning for dense prediction tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, vol. 30. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, vol. 30.
Zurück zum Zitat Wang, H., Chen, Z., & Zhou, Y. (2019). Hybrid coarse-fine classification for head pose estimation. arXiv preprint arXiv:1901.06778. Wang, H., Chen, Z., & Zhou, Y. (2019). Hybrid coarse-fine classification for head pose estimation. arXiv preprint arXiv:​1901.​06778.
Zurück zum Zitat Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., et al. (2020). Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3349–3364.CrossRef Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., et al. (2020). Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3349–3364.CrossRef
Zurück zum Zitat Wang, Y., Cao, M., Fan, Z., & Peng, S. (2022). Learning to detect 3D facial landmarks via heatmap regression with graph convolutional network. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, pp. 2595–2603). Wang, Y., Cao, M., Fan, Z., & Peng, S. (2022). Learning to detect 3D facial landmarks via heatmap regression with graph convolutional network. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, pp. 2595–2603).
Zurück zum Zitat Wu, C.-Y. , Xu, Q., & Neumann, U. (2021). Synergy between 3DMM and 3D landmarks for accurate 3D facial geometry. In 2021 International Conference on 3D Vision (3DV) (pp. 453–463). Wu, C.-Y. , Xu, Q., & Neumann, U. (2021). Synergy between 3DMM and 3D landmarks for accurate 3D facial geometry. In 2021 International Conference on 3D Vision (3DV) (pp. 453–463).
Zurück zum Zitat Wu, Y., & Ji, Q. (2019). Facial landmark detection: A literature survey. International Journal of Computer Vision, 127(2), 115–142.CrossRef Wu, Y., & Ji, Q. (2019). Facial landmark detection: A literature survey. International Journal of Computer Vision, 127(2), 115–142.CrossRef
Zurück zum Zitat Xia, H., Liu, G., Xu, L., & Gan, Y. (2022a). Collaborative learning network for head pose estimation. Image and Vision Computing 104555. Xia, H., Liu, G., Xu, L., & Gan, Y. (2022a). Collaborative learning network for head pose estimation. Image and Vision Computing 104555.
Zurück zum Zitat Xia, J. , Qu, W. , Huang, W. , Zhang, J. , Wang, X. , & Xu, M. (2022a). Sparse local patch transformer for robust face alignment and landmarks inherent relation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4052–4061). Xia, J. , Qu, W. , Huang, W. , Zhang, J. , Wang, X. , & Xu, M. (2022a). Sparse local patch transformer for robust face alignment and landmarks inherent relation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4052–4061).
Zurück zum Zitat Xia, J., Zhang, H., Wen, S., Yang, S., & Xu, M. (2022c). An efficient multitask neural network for face alignment, head pose estimation and face tracking. Expert Systems with Applications 117368. Xia, J., Zhang, H., Wen, S., Yang, S., & Xu, M. (2022c). An efficient multitask neural network for face alignment, head pose estimation and face tracking. Expert Systems with Applications 117368.
Zurück zum Zitat Xia, J., Zhang, H., Wen, S., Yang, S., & Xu, M. (2022d). An efficient multitask neural network for face alignment, head pose estimation and face tracking. Expert Systems with Applications 117368. Xia, J., Zhang, H., Wen, S., Yang, S., & Xu, M. (2022d). An efficient multitask neural network for face alignment, head pose estimation and face tracking. Expert Systems with Applications 117368.
Zurück zum Zitat Xin, M., Mo, S., & Lin, Y. (2021). Eva-GCN: Head pose estimation based on graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1462–1471). Xin, M., Mo, S., & Lin, Y. (2021). Eva-GCN: Head pose estimation based on graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1462–1471).
Zurück zum Zitat Yang, S., Quan, Z., Nie, M., & Yang, W. (2021). Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 11802–11812). Yang, S., Quan, Z., Nie, M., & Yang, W. (2021). Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 11802–11812).
Zurück zum Zitat Yang, T.-Y., Chen, Y.-T., Lin, Y.-Y., & Chuang, Y.-Y. (2019). Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1087–1096). Yang, T.-Y., Chen, Y.-T., Lin, Y.-Y., & Chuang, Y.-Y. (2019). Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1087–1096).
Zurück zum Zitat Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., & Liu, T.-Y. (2021). Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34, 28877–28888. Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., & Liu, T.-Y. (2021). Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34, 28877–28888.
Zurück zum Zitat Yu, R., Saito, S., Li, H., Ceylan, D., & Li, H. (2017). Learning dense facial correspondences in unconstrained images. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4723–4732). Yu, R., Saito, S., Li, H., Ceylan, D., & Li, H. (2017). Learning dense facial correspondences in unconstrained images. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4723–4732).
Zurück zum Zitat Zhang, H., Li, Q., Sun, Z., & Liu, Y. (2018). Combining data-driven and model-driven methods for robust facial landmark detection. IEEE Transactions on Information Forensics and Security, 13(10), 2409–2422.CrossRef Zhang, H., Li, Q., Sun, Z., & Liu, Y. (2018). Combining data-driven and model-driven methods for robust facial landmark detection. IEEE Transactions on Information Forensics and Security, 13(10), 2409–2422.CrossRef
Zurück zum Zitat Zhang, H., Wang, M., Liu, Y., & Yuan, Y. (2020). Fdn: Feature decoupling network for head pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence (VOL. 34, pp. 12789–12796). Zhang, H., Wang, M., Liu, Y., & Yuan, Y. (2020). Fdn: Feature decoupling network for head pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence (VOL. 34, pp. 12789–12796).
Zurück zum Zitat Zhang, J., Kan, M., Shan, S., & Chen, X. (2016). Occlusion-free face alignment: Deep regression networks coupled with de-corrupt autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3428–3437). Zhang, J., Kan, M., Shan, S., & Chen, X. (2016). Occlusion-free face alignment: Deep regression networks coupled with de-corrupt autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3428–3437).
Zurück zum Zitat Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X., & Yang, J. (2018). Joint task-recursive learning for semantic segmentation and depth estimation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 235–251). Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X., & Yang, J. (2018). Joint task-recursive learning for semantic segmentation and depth estimation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 235–251).
Zurück zum Zitat Zhou, J., Li, M., & Pan, Y. (2022). Robust facial landmark localization based on texture and pose correlated initialization. In The International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021) (pp. 625–634). Zhou, J., Li, M., & Pan, Y. (2022). Robust facial landmark localization based on texture and pose correlated initialization. In The International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021) (pp. 625–634).
Zurück zum Zitat Zhou, Y., & Gregson, J. (2020). Whenet: Real-time fine-grained estimation for wide range head pose. arXiv preprint arXiv:2005.10353. Zhou, Y., & Gregson, J. (2020). Whenet: Real-time fine-grained estimation for wide range head pose. arXiv preprint arXiv:​2005.​10353.
Zurück zum Zitat Zhu, C., Li, X., Li, J., Dai, S., & Tong, W. (2022). Reasoning structural relation for occlusion-robust facial landmark localization. Pattern Recognition, 122, 108325.CrossRef Zhu, C., Li, X., Li, J., Dai, S., & Tong, W. (2022). Reasoning structural relation for occlusion-robust facial landmark localization. Pattern Recognition, 122, 108325.CrossRef
Zurück zum Zitat Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 146–155). Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 146–155).
Zurück zum Zitat Zhu, X., Liu, X., Lei, Z., & Li, S. Z. (2017). Face alignment in full pose range: A 3d total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 78–92.CrossRef Zhu, X., Liu, X., Lei, Z., & Li, S. Z. (2017). Face alignment in full pose range: A 3d total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 78–92.CrossRef
Metadaten
Titel
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose
verfasst von
Yaokun Li
Guang Tan
Chao Gou
Publikationsdatum
06.11.2023
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 4/2024
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01935-2

Weitere Artikel der Ausgabe 4/2024

International Journal of Computer Vision 4/2024 Zur Ausgabe

Premium Partner