ABSTRACT
Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, appearance variability, and the head pose and gaze angle distribution they can synthesize. We present UnityEyes, a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real-time approximations for complex eyeball materials and structures as well as anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework available online for the benefit of the research community.
- Bélhumeur, P. N., Jacobs, D. W., Kriegman, D. J., and Kumar, N. 2011. Localizing parts of faces using a consensus of exemplars. In CVPR.Google Scholar
- Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. Highquality capture of eyes. ACM Transactions on Graphics. Google ScholarDigital Library
- Bermano, A., Beeler, T., Kozlov, Y., Bradley, D., Bickel, B., and Gross, M. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM Transactions on Graphics. Google ScholarDigital Library
- Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., 187--194. Google ScholarDigital Library
- Cao, C., Weng, Y., Zhou, S., Tong, Y., and Zhou, K. 2014. Facewarehouse: a 3d facial expression database for visual computing. Visualization and Computer Graphics, IEEE Transactions on 20, 3, 413--425. Google ScholarDigital Library
- Debevec, P. 2002. Image-based lighting. IEEE Computer Graphics and Applications 22, 2, 26--34. Google ScholarDigital Library
- Evinger, C., Manning, K. A., and Sibony, P. A. 1991. Eyelid movements. Invest. Ophthalmol. Vis. Sci 32, 2.Google Scholar
- Fanelli, G., Dantone, M., Gall, J., Fossati, A., and Van Gool, L. 2013. Random forests for real time 3d face analysis. International Journal of Computer Vision. Google ScholarDigital Library
- Huang, Q., Veeraraghavan, A., and Sabharwal, A. 2015. Tabletgaze: A dataset and baseline algorithms for unconstrained appearance-based gaze estimation in mobile tablets. arXiv preprint arXiv:1508.01244.Google Scholar
- Jimenez, J., Danvoye, E., and von der Pahlen, J. 2012. Photorealistic eyes rendering. In SIGGRAPH Talks, Advances in Real-Time Rendering, ACM.Google Scholar
- Le, V., Brandt, J., Lin, Z., Bourdev, L., and Huang, T. S. 2012. Interactive facial feature localization. In ECCV. Google ScholarDigital Library
- Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics. Google ScholarDigital Library
- Loop, C. 1987. Smooth subdivision surfaces based on triangles.Google Scholar
- Lu, F., Sugano, Y., Okabe, T., and Sato, Y. 2011. Inferring human gaze from appearance via adaptive linear regression. In ICCV, IEEE. Google ScholarDigital Library
- Lu, F., Sugano, Y., Okabe, T., and Sato, Y. 2012. Head pose-free appearance-based gaze sensing via eye image synthesis. In Pattern Recognition (ICPR), IEEE.Google Scholar
- Malbouisson, J. M., Messias, A., Leite, L., Rios, G., et al. 2005. Upper and lower eyelid saccades describe a harmonic oscillator function. Invest. Ophthalmol. Vis. Sci 46, 3.Google ScholarCross Ref
- Miller, E., and Pinskiy, D. 2009. Realistic eye motion using procedural geometric methods. In SIGGRAPH Talks, ACM. Google ScholarDigital Library
- Mora, K. A. F., and Odobez, J.-M. 2012. Gaze estimation from multimodal kinect data. In CVPRW, IEEE.Google Scholar
- Orvalho, V., Bastos, P., Parke, F., Oliveira, B., and Alvarez, X. 2012. A facial rigging survey. In Eurographics.Google Scholar
- Paysan, P., Knothe, R., Amberg, B., Romdhani, S., and Vetter, T. 2009. A 3d face model for pose and illumination invariant face recognition. In Advanced Video and Signal Based Surveillance, IEEE. Google ScholarDigital Library
- Penner, E., and Borshukov, G. 2011. Pre-integrated skin shading. Gpu Pro 2, 41--54.Google ScholarCross Ref
- Ruhland, K., Andrist, S., Badler, J., Peters, C., Badler, N., Gleicher, M., Mutlu, B., and Mcdonnell, R. 2014. Look me in the eyes: A survey of eye and gaze animation for virtual agents and artificial systems. In Eurographics, 69--91.Google Scholar
- Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., and Pantic, M. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In ICCV. Google ScholarDigital Library
- Shirley, P., Ashikhmin, M., and Marschner, S. 2009. Fundamentals of computer graphics. CRC Press. Google ScholarDigital Library
- Smith, B., Yin, Q., Feiner, S., and Nayar, S. 2013. Gaze Locking: Passive Eye Contact Detection for HumanObject Interaction. In UIST, ACM. Google ScholarDigital Library
- Sugano, Y., Matsushita, Y., and Sato, Y. 2014. Learning-by-Synthesis for Appearance-based 3D Gaze Estimation. In Proc. CVPR. Google ScholarDigital Library
- Vlasic, D., Brand, M., Pfister, H., and Popović, J. 2005. Face transfer with multilinear models. In ACM Transactions on Graphics, vol. 24, ACM, 426--433. Google ScholarDigital Library
- Wood, E., Baltrusaitis, T., Zhang, X., Sugano, Y., Robinson, P., and Bulling, A. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In ICCV.Google Scholar
- Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. 2015. Appearance-Based Gaze Estimation in the Wild. In CVPR.Google Scholar
- Zhu, X., and Ramanan, D. 2012. Face detection, pose estimation, and landmark localization in the wild. In CVPR. Google ScholarDigital Library
Index Terms
- Learning an appearance-based gaze estimator from one million synthesised images
Recommendations
Revisiting data normalization for appearance-based gaze estimation
ETRA '18: Proceedings of the 2018 ACM Symposium on Eye Tracking Research & ApplicationsAppearance-based gaze estimation is promising for unconstrained real-world settings, but the significant variability in head pose and user-camera distance poses significant challenges for training generic gaze estimators. Data normalization was proposed ...
Learning a gaze estimator with neighbor selection from large-scale synthetic eye images
Appearance-based gaze estimation works well in inferring human gaze under real-world condition. But one of the significant limitations in appearance-based methods is the need for huge amounts of training data. Eye image synthesis addresses this problem ...
Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsAppearance-based gaze estimation methods that only require an off-the-shelf camera have significantly improved but they are still not yet widely used in the human-computer interaction (HCI) community. This is partly because it remains unclear how they ...
Comments