ABSTRACT
Appearance-based gaze estimation is promising for unconstrained real-world settings, but the significant variability in head pose and user-camera distance poses significant challenges for training generic gaze estimators. Data normalization was proposed to cancel out this geometric variability by mapping input images and gaze labels to a normalized space. Although used successfully in prior works, the role and importance of data normalization remains unclear. To fill this gap, we study data normalization for the first time using principled evaluations on both simulated and real data. We propose a modification to the current data normalization formulation by removing the scaling factor and show that our new formulation performs significantly better (between 9.5% and 32.7%) in the different evaluation settings. Using images synthesized from a 3D face model, we demonstrate the benefit of data normalization for the efficiency of the model training. Experiments on real-world images confirm the advantages of data normalization in terms of gaze estimation performance.
- Shumeet Baluja and Dean Pomerleau. 1994. Non-intrusive gaze tracking using artificial neural networks. In Advances in Neural Inf. Process. Syst. 753--760. Google ScholarDigital Library
- Jinsoo Choi, Byungtae Ahn, Jaesik Parl, and In So Kweon. 2013. Appearance-based gaze estimation using kinect. In Proc. IEEE Conf. Ubiquitous Robots and Ambient Intell. 260--261.Google ScholarCross Ref
- Haoping Deng and Wangjiang Zhu. 2017. Monocular Free-head 3D Gaze Tracking with Deep Learning and Geometry Constraints. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 3162--3171.Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.Google ScholarCross Ref
- Kenneth Alberto Funes Mora, Florent Monay, and Jean-Marc Odobez. 2014. EYEDIAP: A Database for the Development and Evaluation of Gaze Estimation Algorithms from RGB and RGB-D Cameras. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications. ACM. Google ScholarDigital Library
- Kenneth Alberto Funes Mora and Jean-Marc Odobez. 2012. Gaze estimation from multimodal Kinect data. In IEEE Conf. Comput. Vis. Pattern Recognit. Workshop. 25--30.Google ScholarCross Ref
- Dan Witzner Hansen and Qiang Ji. 2010. In the eye of the beholder: A survey of models for eyes and gaze. IEEE transactions on pattern analysis and machine intelligence 32, 3 (2010), 478--500. Google ScholarDigital Library
- Qiuhai He, Xiaopeng Hong, Xiujuan Chai, Jukka Holappa, Guoying Zhao, Xilin Chen, and Matti Pietikäinen. 2015. OMEG: Oulu multi-pose eye gaze dataset. In Scandinavian Conference on Image Analysis. Springer, 418--427.Google ScholarCross Ref
- Qiong Huang, Ashok Veeraraghavan, and Ashutosh Sabharwal. 2017. TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Machine Vision and Applications 28, 5-6 (2017), 445--461. Google ScholarDigital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
- Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. The Int. Conf. on Learning Representations (2015).Google Scholar
- Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye tracking for everyone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2176--2184.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
- Feng Lu, Yusuke Sugano, Takahiro Okabe, and Yoichi Sato. 2012. Head Pose-free Appearance-based Gaze Sensing via Eye Image Synthesis. In Proc. IEEE Int. Conf. Pattern Recognit. 1008--1011.Google Scholar
- Feng Lu, Yusuke Sugano, Takahiro Okabe, and Yoichi Sato. 2015. Gaze estimation from eye appearance: a head pose-free method via eye image synthesis. IEEE Transactions on Image Processing 24, 11 (2015), 3680--3693.Google ScholarDigital Library
- Philipp Müller, Michael Xuelin Huang, Xucong Zhang, and Andreas Bulling. 2018. Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour. In Proc. International Symposium on Eye Tracking Research and Applications (ETRA). Google ScholarDigital Library
- Timo Schneider, Boris Schauerte, and Rainer Stiefelhagen. 2014. Manifold alignment for person independent appearance-based gaze estimation. In Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 1167--1172. Google ScholarDigital Library
- Laura Sesma, Arantxa Villanueva, and Rafael Cabeza. 2012. Evaluation of pupil center-eye corner vector for gaze estimation using a web cam. In Proceedings of the symposium on eye tracking research and applications. ACM, 217--220. Google ScholarDigital Library
- Weston Sewell and Oleg Komogortsev. 2010. Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network. In Ext. Abstr. ACM CHI Conf. on Human Factors in Comput. Syst. 3739--3744. Google ScholarDigital Library
- Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russ Webb. 2017a. Learning from Simulated and Unsupervised Images through Adversarial Training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. 2017b. Learning From Simulated and Unsupervised Images Through Adversarial Training. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Brian A Smith, Qi Yin, Steven K Feiner, and Shree K Nayar. 2013. Gaze locking: passive eye contact detection for human-object interaction. In Proceedings of the 26th annual ACM symposium on User interface software and technology. ACM, 271--280. Google ScholarDigital Library
- Rainer Stiefelhagen, Jie Yang, and Alex Waibel. 1997. A model-based gaze tracking system. International Journal on Artificial Intelligence Tools 6, 02 (1997), 193--209.Google ScholarCross Ref
- Yusuke Sugano, Yasuyuki Matsushita, and Yoichi Sato. 2014. Learning-by-synthesis for appearance-based 3d gaze estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1821--1828. Google ScholarDigital Library
- Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato, and Hideki Koike. 2008. An incremental learning method for unconstrained gaze estimation. In Proc. Eur. Conf. Comput. Vis. 656--667. Google ScholarDigital Library
- Yusuke Sugano, Xucong Zhang, and Andreas Bulling. 2016. AggreGaze: Collective Estimation of Audience Attention on Public Displays. In Proc. of the ACM Symposium on User Interface Software and Technology (UIST). 821--831. Google ScholarDigital Library
- Kar-Han Tan, David J Kriegman, and Narendra Ahuja. 2002. Appearance-based eye gaze estimation. In Applications of Computer Vision, 2002.(WACV 2002). Proceedings. Sixth IEEE Workshop on. IEEE, 191--195. Google ScholarDigital Library
- Roberto Valenti, Nicu Sebe, and Theo Gevers. 2012. Combining head pose and eye location information for gaze estimation. IEEE Transactions on Image Processing 21, 2 (2012), 802--815. Google ScholarDigital Library
- Ronda Venkateswarlu and others. 2003. Eye gaze estimation from a single image of one eye. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 136--143. Google ScholarDigital Library
- Kang Wang and Qiang Ji. 2017. Real Time Eye Gaze Tracking with 3D Deformable Eye-Face Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1003--1011.Google ScholarCross Ref
- Oliver Williams, Andrew Blake, and Roberto Cipolla. 2006. Sparse and Semi-supervised Visual Mapping with the S 3GP. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 1. IEEE, 230--237. Google ScholarDigital Library
- Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. 2016a. A 3D Morphable Eye Region Model for Gaze Estimation. In Proc. European Conference on Computer Vision (ECCV). 297--313.Google ScholarCross Ref
- Erroll Wood, Tadas Baltrusaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. 2016b. Learning an appearance-based gaze estimator from one million synthesised images. In Proc. of the 9th ACM International Symposium on Eye Tracking Research & Applications (ETRA 2016). 131--138. Google ScholarDigital Library
- Erroll Wood, Tadas Baltrusaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, and Andreas Bulling. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3756--3764. Google ScholarDigital Library
- Hirotake Yamazoe, Akira Utsumi, Tomoko Yonezawa, and Shinji Abe. 2008. Remote gaze estimation with a single camera based on facial-feature tracking without special calibration actions. In Proceedings of the 2008 symposium on Eye tracking research & applications. ACM, 245--250. Google ScholarDigital Library
- Pei Yu, Jiahuan Zhou, and Ying Wu. 2016. Learning reconstruction-based remote gaze estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3447--3455.Google ScholarCross Ref
- Xucong Zhang, Michael Xuelin Huang, Yusuke Sugano, and Andreas Bulling. 2018. Training Person-Specific Gaze Estimators from Interactions with Multiple Devices, In Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI). Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) (2018). Google ScholarDigital Library
- Xiaoyi Zhang, Harish Kulkarni, and Meredith Ringel Morris. 2017a. Smartphone-Based Gaze Gesture Communication for People with Motor Disabilities. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2878--2889. Google ScholarDigital Library
- Xucong Zhang, Yusuke Sugano, and Andreas Bulling. 2017b. Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery. In Proc. of the ACM Symposium on User Interface Software and Technology (UIST) (2017-06-26). 193--203. Google ScholarDigital Library
- Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based Gaze Estimation in the Wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4511--4520.Google ScholarCross Ref
- Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017-05-18). 2299--2308.Google ScholarCross Ref
- Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2018. MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).Google Scholar
Index Terms
- Revisiting data normalization for appearance-based gaze estimation
Recommendations
Learning an appearance-based gaze estimator from one million synthesised images
ETRA '16: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & ApplicationsLearning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this ...
Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsAppearance-based gaze estimation methods that only require an off-the-shelf camera have significantly improved but they are still not yet widely used in the human-computer interaction (HCI) community. This is partly because it remains unclear how they ...
Deep Pictorial Gaze Estimation
Computer Vision – ECCV 2018AbstractEstimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-...
Comments