ABSTRACT
In viewing an image or real-world scene, different observers may exhibit different viewing patterns. This is evidently due to a variety of different factors, involving both bottom-up and top-down processing. In the literature addressing prediction of visual saliency, agreement in gaze patterns across observers is often quantified according to a measure of inter-observer congruency (IOC). Intuitively, common viewership patterns may be expected to diagnose certain image qualities including the capacity for an image to draw attention, or perceptual qualities of an image relevant to applications in human computer interaction, visual design and other domains. Moreover, there is value in determining the extent to which different factors contribute to inter-observer variability, and corresponding dependence on the type of content being viewed. In this paper, we assess the extent to which different types of features contribute to variability in viewing patterns across observers. This is accomplished in considering correlation between image derived features and IOC values, and based on the capacity for more complex feature sets to predict IOC based on a regression model. Experimental results demonstrate the value of different feature types for predicting IOC. These results also establish the relative importance of top-down and bottom-up information in driving gaze and provide new insight into predictive analysis for gaze behavior associated with perceptual characteristics of images.
- Borji, A., Sihite, D. N., and Itti, L. 2013. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing 22, 1, 55--69. Google ScholarDigital Library
- Bruce, N. D. B., and Tsotsos, J. K. 2009. Saliency, attention, and visual search: An information theoretic approach. Journal of Vision 9, 3.Google ScholarCross Ref
- Chang, C. C., and Lin, C. J. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1--27:27. Google ScholarDigital Library
- Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9, 1627--1645. Google ScholarDigital Library
- Garcia-Diaz, A., Fdez-Vidal, X. R., Pardo, X. M., and Dosil, R. 2012. Saliency from hierarchical adaptation through decorrelation and variance normalization. Image and Vision Computing 30, 1, 51--64. Google ScholarDigital Library
- Harel, J., Koch, C., and Perona, P. 2007. Graph-based visual saliency. Advances in neural information processing systems 19, 545.Google Scholar
- Hossein Khatoonabadi, S., Vasconcelos, N., Bajic, I. V., and Shan, Y. 2015. How many bits does it take for a stimulus to be salient? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5501--5510.Google Scholar
- Hou, X., and Zhang, L. 2007. Saliency detection: A spectral residual approach. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google Scholar
- Hou, X., and Zhang, L. 2009. Dynamic visual attention: Searching for coding length increments. 681--688.Google Scholar
- Hou, X., Harel, J., and Koch, C. 2012. Image signature: Highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 1, 194--201. Google ScholarDigital Library
- Isola, P., Xiao, J., Torralba, A., and Oliva, A. 2011. What makes an image memorable? Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 145--152. Google ScholarDigital Library
- Itti, L., Koch, C., and Niebur, E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 11, 1254--1259. Google ScholarDigital Library
- Jazayeri, M., and Movshon, J. A. 2006. Optimal representation of sensory information by neural populations. Nature Neuroscience 9, 5, 690--696.Google ScholarCross Ref
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
- Judd, T., Ehinger, K., Durand, F., and Torralba, A. 2009. Learning to predict where humans look. In Computer Vision, 2009 IEEE 12th international conference on, IEEE, 2106--2113.Google Scholar
- Judd, T., Durand, F., and Torralba, A. 2011. Fixations on low-resolution images. Journal of Vision 11, 4, 1--20.Google ScholarCross Ref
- Kanan, C., Tong, M. H., Zhang, L., and Cottrell, G. W. 2009. Sun: Top-down saliency using natural statistics. Visual Cognition 17, 6-7, 979--1003.Google ScholarCross Ref
- Koehler, K., Guo, F., Zhang, S., and Eckstein, M. P. 2014. What do saliency models predict? Journal of Vision 14, 3.Google ScholarCross Ref
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097--1105.Google Scholar
- Le Meur, O., and Baccino, T. 2013. Methods for comparing scanpaths and saliency maps: Strengths and weaknesses. Behavior Research Methods 45, 1, 251--266.Google ScholarCross Ref
- Le Meur, O., Le Callet, P., Barba, D., and Thoreau, D. 2006. A coherent computational approach to model bottom-up visual attention. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28, 5, 802--817. Google ScholarDigital Library
- Le Meur, O., Baccino, T., and Roumy, A. 2011. Prediction of the inter-observer visual congruency (iovc) and application to image ranking. In Proceedings of the 19th ACM international conference on Multimedia, ACM, 373--382. Google ScholarDigital Library
- Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2 (Nov.), 91--110. Google ScholarDigital Library
- Mancas, M., and Le Meur, O. 2013. Memorability of natural scenes: The role of attention. 2013 IEEE International Conference on Image Processing, ICIP 2013 - Proceedings, 196--200.Google ScholarCross Ref
- Murray, N., Marchesotti, L., and Perronnin, F. 2012. Ava: A large-scale database for aesthetic visual analysis. 2408--2415. Google ScholarDigital Library
- Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3, 145--175. Google ScholarDigital Library
- Olshausen, B. A., and Field, D. J. 1997. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research 37, 23, 3311--3325.Google ScholarCross Ref
- Rahman, S., Rochan, M., Wang, Y., and Bruce, N. D. 2014. Examining visual saliency prediction in naturalistic scenes. In Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 4082--4086.Google Scholar
- Rosenholtz, R., Li, Y., and Nakano, L. 2007. Measuring visual clutter. Journal of Vision 7, 2.Google ScholarCross Ref
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L., 2014. ImageNet Large Scale Visual Recognition Challenge.Google Scholar
- Seo, H. J., and Milanfar, P. 2009. Static and space-time visual saliency detection by self-resemblance. Journal of Vision 9, 12, 1--27.Google ScholarCross Ref
- Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.Google Scholar
- Torralba, A., Oliva, A., Castelhano, M. S., and Henderson, J. M. 2006. Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review 113, 4, 766--786.Google ScholarCross Ref
- Torralba, A. 2003. Modeling global scene factors in attention. Journal of the Optical Society of America A: Optics and Image Science, and Vision 20, 7, 1407--1418.Google ScholarCross Ref
- Van der Maaten, L., and Hinton, G. 2008. Visualizing data using t-sne. Journal of Machine Learning Research 9, 2579--2605, 85.Google Scholar
- Viola, P., and Jones, M. J. 2004. Robust real-time face detection. International journal of computer vision 57, 2, 137--154. Google ScholarDigital Library
- Wainwright, M. J. 1999. Visual adaptation as optimal information transmission. Vision Research 39, 23, 3960--3974. cited By (since 1996)114.Google ScholarCross Ref
- Xiao, J., Hays, J., Ehinger, K., Oliva, A., and Torralba, A. 2010. Sun database: Large-scale scene recognition from abbey to zoo. 3485--3492.Google Scholar
- Yan, J., Liu, J., Li, Y., Niu, Z., and Liu, Y. 2010. Visual saliency detection via rank-sparsity decomposition. Proceedings of International Conference on Image Processing, ICIP, 1089--1092.Google Scholar
Index Terms
- Factors underlying inter-observer agreement in gaze patterns: predictive modelling and analysis
Recommendations
MOSion: Gaze Guidance with Motion-triggered Visual Cues by Mosaic Patterns
CHI '24: Proceedings of the CHI Conference on Human Factors in Computing SystemsWe propose a gaze-guiding method called MOSion to adjust the guiding strength reacted to observers’ motion based on a high-speed projector and the afterimage effect in the human vision system. Our method decomposes the target area into mosaic patterns to ...
Using gaze patterns to study and predict reading struggles due to distraction
CHI EA '11: CHI '11 Extended Abstracts on Human Factors in Computing SystemsWe analyze gaze patterns to study how users in online reading environments cope with visual distraction, and we report gaze markers that identify reading difficulties due to distraction. The amount of visual distraction is varied from none, medium to ...
Subtle gaze direction
This article presents a novel technique that combines eye-tracking with subtle image-space modulation to direct a viewer's gaze about a digital image. We call this paradigm subtle gaze direction. Subtle gaze direction exploits the fact that our ...
Comments