Skip to main content
Top

2019 | OriginalPaper | Chapter

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

Authors : Yu Yu, Gang Liu, Jean-Marc Odobez

Published in: Computer Vision – ECCV 2018 Workshops

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As an indicator of attention, gaze is an important cue for human behavior and social interaction analysis. Recent deep learning methods for gaze estimation rely on plain regression of the gaze from images without accounting for potential mismatches in eye image cropping and normalization. This may impact the estimation of the implicit relation between visual cues and the gaze direction when dealing with low resolution images or when training with a limited amount of data. In this paper, we propose a deep multitask framework for gaze estimation, with the following contributions. (i) we proposed a multitask framework which relies on both synthetic data and real data for end-to-end training. During training, each dataset provides the label of only one task but the two tasks are combined in a constrained way. (ii) we introduce a Constrained Landmark-Gaze Model (CLGM) modeling the joint variation of eye landmark locations (including the iris center) and gaze directions. By relating explicitly visual information (landmarks) to the more abstract gaze values, we demonstrate that the estimator is more accurate and easier to learn. (iii) by decomposing our deep network into a network inferring jointly the parameters of the CLGM model and the scale and translation parameters of eye regions on one hand, and a CLGM based decoder deterministically inferring landmark positions and gaze from these parameters and head pose on the other hand, our framework decouples gaze estimation from irrelevant geometric variations in the eye image (scale, translation), resulting in a more robust model. Thorough experiments on public datasets demonstrate that our method achieves competitive results, improving over state-of-the-art results in challenging free head pose gaze estimation tasks and on eye landmark localization (iris location) ones.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that the corrected model relies on real data. In all experiments, the subject(s) used in the test set are never used for computing a corrected CLGM model.
 
Literature
1.
go back to reference Bixler, R., Blanchard, N., Garrison, L., D’Mello, S.: Automatic detection of mind wandering during reading using gaze and physiology. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI 2015, pp. 299–306. ACM, New York (2015) Bixler, R., Blanchard, N., Garrison, L., D’Mello, S.: Automatic detection of mind wandering during reading using gaze and physiology. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI 2015, pp. 299–306. ACM, New York (2015)
2.
go back to reference Hiraoka, R., Tanaka, H., Sakti, S., Neubig, G., Nakamura, S.: Personalized unknown word detection in non-native language reading using eye gaze. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pp. 66–70. ACM, New York (2016) Hiraoka, R., Tanaka, H., Sakti, S., Neubig, G., Nakamura, S.: Personalized unknown word detection in non-native language reading using eye gaze. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pp. 66–70. ACM, New York (2016)
3.
go back to reference Velichkovsky, B.M., Dornhoefer, S.M., Pannasch, S., Unema, P.J.: Visual fixations and level of attentional processing. In: Proceedings of the 2000 Symposium on Eye Tracking Research & Applications, ETRA 2000, pp. 79–85. ACM, New York (2000) Velichkovsky, B.M., Dornhoefer, S.M., Pannasch, S., Unema, P.J.: Visual fixations and level of attentional processing. In: Proceedings of the 2000 Symposium on Eye Tracking Research & Applications, ETRA 2000, pp. 79–85. ACM, New York (2000)
4.
go back to reference Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26(Suppl. C), 22–63 (1967)CrossRef Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26(Suppl. C), 22–63 (1967)CrossRef
5.
go back to reference Vidal, M., Turner, J., Bulling, A., Gellersen, H.: Wearable eye tracking for mental health monitoring. Comput. Commun. 35(11), 1306–1311 (2012)CrossRef Vidal, M., Turner, J., Bulling, A., Gellersen, H.: Wearable eye tracking for mental health monitoring. Comput. Commun. 35(11), 1306–1311 (2012)CrossRef
6.
go back to reference Ishii, R., Otsuka, K., Kumano, S., Yamato, J.: Prediction of who will be the next speaker and when using gaze behavior in multiparty meetings. ACM Trans. Interact. Intell. Syst. 6(1), 4:1–4:31 (2016)CrossRef Ishii, R., Otsuka, K., Kumano, S., Yamato, J.: Prediction of who will be the next speaker and when using gaze behavior in multiparty meetings. ACM Trans. Interact. Intell. Syst. 6(1), 4:1–4:31 (2016)CrossRef
7.
go back to reference Andrist, S., Tan, X.Z., Gleicher, M., Mutlu, B.: Conversational gaze aversion for humanlike robots. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, pp. 25–32. ACM, New York (2014) Andrist, S., Tan, X.Z., Gleicher, M., Mutlu, B.: Conversational gaze aversion for humanlike robots. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, pp. 25–32. ACM, New York (2014)
8.
go back to reference Moon, A., et al.: Meet me where i’m gazing: How shared attention gaze affects human-robot handover timing. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, pp. 334–341. ACM, New York (2014) Moon, A., et al.: Meet me where i’m gazing: How shared attention gaze affects human-robot handover timing. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, pp. 334–341. ACM, New York (2014)
9.
go back to reference Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H.: Eye tracking for everyone. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016) Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H.: Eye tracking for everyone. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)
10.
go back to reference Tonsen, M., Steil, J., Sugano, Y., Bulling, A.: InvisibleEye: mobile eye tracking using multiple low-resolution cameras and learning-based gaze estimation. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 1, no. 3 (2017)CrossRef Tonsen, M., Steil, J., Sugano, Y., Bulling, A.: InvisibleEye: mobile eye tracking using multiple low-resolution cameras and learning-based gaze estimation. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 1, no. 3 (2017)CrossRef
11.
go back to reference Huang, Q., Veeraraghavan, A., Sabharwal, A.: Tabletgaze: unconstrained appearance-based gaze estimation in mobile tablets. arXiv preprint arXiv:1508.01244 (2015) Huang, Q., Veeraraghavan, A., Sabharwal, A.: Tabletgaze: unconstrained appearance-based gaze estimation in mobile tablets. arXiv preprint arXiv:​1508.​01244 (2015)
12.
go back to reference Zhu, W., Deng, H.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Zhu, W., Deng, H.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
13.
go back to reference Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild, pp. 4511–4520 (2015) Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild, pp. 4511–4520 (2015)
14.
go back to reference Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation (2016) Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation (2016)
15.
go back to reference Wood, E., Baltrušaitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research and Applications, pp. 131–138 (2016) Wood, E., Baltrušaitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research and Applications, pp. 131–138 (2016)
16.
go back to reference Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500 (2010)CrossRef Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500 (2010)CrossRef
17.
go back to reference Venkateswarlu, R., et al.: Eye gaze estimation from a single image of one eye, pp. 136–143 (2003) Venkateswarlu, R., et al.: Eye gaze estimation from a single image of one eye, pp. 136–143 (2003)
18.
go back to reference Funes Mora, K.A., Odobez, J.M.: Geometric generative gaze estimation (G3E) for remote RGB-D cameras, pp. 1773–1780, June 2014 Funes Mora, K.A., Odobez, J.M.: Geometric generative gaze estimation (G3E) for remote RGB-D cameras, pp. 1773–1780, June 2014
20.
go back to reference Ishikawa, T.: Passive driver gaze tracking with active appearance models (2004) Ishikawa, T.: Passive driver gaze tracking with active appearance models (2004)
21.
go back to reference Wood, E., Bulling, A.: Eyetab: model-based gaze estimation on unmodified tablet computers, pp. 207–210 (2014) Wood, E., Bulling, A.: Eyetab: model-based gaze estimation on unmodified tablet computers, pp. 207–210 (2014)
22.
go back to reference Gou, C., Wu, Y., Wang, K., Wang, F.Y., Ji, Q.: Learning-by-synthesis for accurate eye detection. In: ICPR (2016) Gou, C., Wu, Y., Wang, K., Wang, F.Y., Ji, Q.: Learning-by-synthesis for accurate eye detection. In: ICPR (2016)
23.
go back to reference Gou, C., Wu, Y., Wang, K., Wang, K., Wang, F., Ji, Q.: A joint cascaded framework for simultaneous eye detection and eye state estimation. Pattern Recogn. 67, 23–31 (2017)CrossRef Gou, C., Wu, Y., Wang, K., Wang, K., Wang, F., Ji, Q.: A joint cascaded framework for simultaneous eye detection and eye state estimation. Pattern Recogn. 67, 23–31 (2017)CrossRef
24.
go back to reference Timm, F., Barth, E.: Accurate eye centre localisation by means of gradients. In: VISAPP (2011) Timm, F., Barth, E.: Accurate eye centre localisation by means of gradients. In: VISAPP (2011)
25.
go back to reference Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4) (2013)CrossRef Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4) (2013)CrossRef
26.
go back to reference Tan, K.H., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation, pp. 191–195 (2002) Tan, K.H., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation, pp. 191–195 (2002)
27.
go back to reference Noris, B., Keller, J.B., Billard, A.: A wearable gaze tracking system for children in unconstrained environments. Comput. Vis. Image Underst. 115(4), 476–486 (2011)CrossRef Noris, B., Keller, J.B., Billard, A.: A wearable gaze tracking system for children in unconstrained environments. Comput. Vis. Image Underst. 115(4), 476–486 (2011)CrossRef
28.
go back to reference Martinez, F., Carbone, A., Pissaloux, E.: Gaze estimation using local features and non-linear regression, pp. 1961–1964 (2012) Martinez, F., Carbone, A., Pissaloux, E.: Gaze estimation using local features and non-linear regression, pp. 1961–1964 (2012)
29.
go back to reference Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1821–1828 (2014) Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1821–1828 (2014)
30.
go back to reference Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression, pp. 153–160 (2011) Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression, pp. 153–160 (2011)
31.
go back to reference Funes-Mora, K.A., Odobez, J.M.: Gaze estimation in the 3D space using RGB-D sensors. Int. J. Comput. Vis. 118(2), 194–216 (2016)MathSciNetCrossRef Funes-Mora, K.A., Odobez, J.M.: Gaze estimation in the 3D space using RGB-D sensors. Int. J. Comput. Vis. 118(2), 194–216 (2016)MathSciNetCrossRef
32.
go back to reference Palmero, C., Selva, J., Bagheri, M.A., Escalera, S.: Recurrent CNN for 3d gaze estimation using appearance and shape cues, p. 251 (2018) Palmero, C., Selva, J., Bagheri, M.A., Escalera, S.: Recurrent CNN for 3d gaze estimation using appearance and shape cues, p. 251 (2018)
33.
go back to reference Park, S., Zhang, X., Bulling, A., Hilliges, O.: Learning to find eye region landmarks for remote gaze estimation in unconstrained settings, pp. 21:1–21:10 (2018) Park, S., Zhang, X., Bulling, A., Hilliges, O.: Learning to find eye region landmarks for remote gaze estimation in unconstrained settings, pp. 21:1–21:10 (2018)
34.
go back to reference Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation, June 2018 Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation, June 2018
35.
go back to reference Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation, September 2018 Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation, September 2018
36.
go back to reference Ruder, S.: An overview of multi-task learning in deep neural networks, June 2017 Ruder, S.: An overview of multi-task learning in deep neural networks, June 2017
37.
go back to reference Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016) Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016)
38.
go back to reference Ranjan, R., Sankaranarayanan, S., Castillo, C.D., Chellappa, R.: An all-in-one convolutional neural network for face analysis. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, Washington, DC, USA, 30 May–3 June 2017, pp. 17–24 (2017) Ranjan, R., Sankaranarayanan, S., Castillo, C.D., Chellappa, R.: An all-in-one convolutional neural network for face analysis. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, Washington, DC, USA, 30 May–3 June 2017, pp. 17–24 (2017)
39.
go back to reference Wang, F., Han, H., Shan, S., Chen, X.: Deep multi-task learning for joint prediction of heterogeneous face attributes. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, Washington, DC, USA, 30 May–3 June 2017, pp. 173–179 (2017) Wang, F., Han, H., Shan, S., Chen, X.: Deep multi-task learning for joint prediction of heterogeneous face attributes. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, Washington, DC, USA, 30 May–3 June 2017, pp. 173–179 (2017)
41.
go back to reference Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: CVPR, pp. 676–684. IEEE Computer Society (2015) Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: CVPR, pp. 676–684. IEEE Computer Society (2015)
42.
go back to reference Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. CoRR abs/1604.03539 (2016) Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. CoRR abs/1604.03539 (2016)
43.
go back to reference Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.S.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. CoRR abs/1611.05377 (2016) Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.S.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. CoRR abs/1611.05377 (2016)
44.
go back to reference IEEE: A 3D Face Model for Pose and Illumination Invariant Face Recognition. IEEE, Genova, Italy (2009) IEEE: A 3D Face Model for Pose and Illumination Invariant Face Recognition. IEEE, Genova, Italy (2009)
45.
go back to reference Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, January 2006 Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, January 2006
46.
go back to reference Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the ACM Symposium on Eye Tracking Research and Applications. ACM, March 2014 Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the ACM Symposium on Eye Tracking Research and Applications. ACM, March 2014
47.
go back to reference Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. CoRR abs/1612.07828 (2016) Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. CoRR abs/1612.07828 (2016)
48.
go back to reference Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4), 25:1–25:20 (2013)CrossRef Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4), 25:1–25:20 (2013)CrossRef
49.
go back to reference Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp. 1867–1874. IEEE Computer Society (2014) Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp. 1867–1874. IEEE Computer Society (2014)
50.
go back to reference Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2014)CrossRef Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2014)CrossRef
Metadata
Title
Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model
Authors
Yu Yu
Gang Liu
Jean-Marc Odobez
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-11012-3_35

Premium Partner