RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis
Introduction
There is no common definition of human attention, and it can differ depending on the domain (psychology, neuroscience or engineering) or the considered approach. But, in a general sense, human attention can be defined as the natural capacity to prioritize the incoming stimuli and selectively focus on part of them. The goal of the attentional process is to identify as quickly as possible those parts of our environment that are key to our survival. Humans but also all animals use this mechanism in their daily life and even during dreams when the rapid eye movements occur (REM stage), which are saccades and fixations on the dream scene.
The interest of attention prediction is more and more understood by the scientific community with an exponential number of papers dealing with saliency algorithms. Attention modeling has very wide applications such as machine vision, surveillance, data reduction and compression, human computer interfaces, advertising assessment or robotics. In this context, efficient attention models are of great importance for vision and signal processing algorithms improvements in the future.
In computer science, attention modeling is mainly based on the concept of “saliency maps”, which provides, for each pixel, its probability to attract human attention. The idea is that the gaze of people will direct to areas which, in some way, stand out from the background. Saliency implies a competition between an objective “bottom-up” attention and a subjective “top-down” information. Bottom-up attention is a generic approach also known as stimulus-driven or exogenous attention. Furthermore, it relies on the information innovation that the features extracted from the image can bring in a given spatial context. The top-down component of attention, which is also known as task-driven or endogenous attention, integrates specific knowledge that the viewer could have in specific situations (tasks, models of the kind of scene, recognized objects, etc.). The eye movements are not a direct output of the algorithms, but they can be computed from the saliency map by using winner-take-all [1] or more dynamical algorithms [2].
In this paper we present a novel attention algorithm and we focus on a fair comparison with other state of the art attention models. The algorithm proposed which we will call “RARE2012” is purely bottom-up. This is an important point for model evaluation as top-down information can drastically increase a model performance. Indeed, several models use additional post-processing which provide top-down information like centred Gaussians which leads to an artificial increase of their results. Moreover, several saliency models have a lot of parameters, which make fair comparison very difficult. Some research, like Borji and Itti [3] or Judd et al. [4], attempts to provide a benchmark between bottom-up models using several similarity measures and sometimes several datasets of images. We based our validation on Borji and Itti approach and codes [3]. A complementary statistical evaluation has also been added. The codes of the model proposed in this paper are freely available online [5].
The paper is organized as follows. Section 2 contains an overview of recent saliency models and more specifically of methods used in our comparative study. In Section 3, the architecture of our method is described in detail. The results are presented in Section 4: after a qualitative evaluation on psychophysical observations and three databases, two metrics are used to quantify the prediction of the proposed method. Section 5 details an additional two-metric based statistical analysis of the results showing the overall effectiveness of RARE2012. Finally, Section 6 provides a discussion and conclusion.
Section snippets
Related work
It is very hard to find an optimal taxonomy, which classifies all the saliency approaches. The literature is very active concerning still images saliency models. While some years ago only some labs in the world were working on the topic, nowadays a hundred different models have been published. Those models have various implementations and technical approaches despite that they all derive from the same idea of information innovation in a given context.
Some attempts of taxonomies proposed an
RARE2012: our proposed saliency model
In this section, the architecture of our method (Fig. 1) is described in detail. There are three main steps. First, we extract low-level colour and medium-level orientation features. Afterwards, a multi-scale rarity mechanism is applied. Finally, we fuse rarity maps into a single final saliency map. A comparison is then made with the RARE algorithms family. In the proposed taxonomy of Section 3, RARE2012 is a part of the second category as it considers information at several scales but globally
Saliency model evaluation
In this section, we compare our method with the 13 saliency models presented in related work on three datasets. After the dataset presentation, qualitative and quantitative results are detailed and explained.
Additional statistical validation
In this section, the statistical validation is studied. First, the statistical approach is carefully detailed. Its interest compared to a standard ANOVA test is exposed. The importance of effect sizes is highlighted. Then, the results are exposed and explained.
Conclusion and future work
This paper presents a novel multi-scale rarity-based saliency model for still images called RARE2012. An extensive evaluation and statistical analysis are carried out to compare this model with other important models of the state of the art.
RARE2012 presents major changes compared to the previously published models of the RARE family [31], [32]. First, the colour and orientation features are extracted in parallel or sequentially depending of their complexity. Colours are based on a PCA analysis
Acknowledgements
N. Riche is supported by the “Fonds pour la formation a la Recherche dans Industrie et dans Agriculture” (FRIA). N. Riche and M. Mancas contributed equally to this work. M. Duvinage is a FNRS (Fonds National de la Recherche Scientifique) Research Fellow and the corresponding author for statistical analysis. Thierry Dutoit is member of EURASIP. This work is also funded by the Belgian Walloon region NumediArt project.
This paper presents research results of the Belgian Network DYSCO (Dynamical
References (46)
- et al.
Modeling attention to salient proto-objects
Neural Networks
(2006) - et al.
A model of saliency-based visual attention for rapid scene analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1998) - et al.
From saliency to eye gaze: embodied visual selection for a pan-tilt-based robotic head
Advances in Visual Computing
(2011) - Ali Borji, Laurent Itti, State-of-the-art in visual attention modeling, IEEE Transactions on Pattern Analysis and...
- et al.
A benchmark of computational models of saliency to predict human fixations
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2012) - Matei Mancas, Nicolas Riche, Computational attention website,...
- Y.F. Ma, H.J. Zhang, Contrast-based image attention analysis by using fuzzy growing, in: International Multimedia...
- N. Bruce, J. Tsotsos, Saliency based on information maximization, in: Advances in Neural Information Processing...
- D. Walther, Interactions of Visual Attention and Object Recognition: Computational Modeling, Algorithms, and...
- C. Koch J. Harel, P. Perona, Graph-based visual saliency, in: Proceedings of Neural Information Processing Systems...
Static and space-time visual saliency detection by self-resemblance
Jounal of Vision
Unsupervised salient object segmentation based on kernel density estimation and two-phase graph cut
IEEE Transactions on Multimedia
Contextual guidance of eye movements and attention in real-world scenes: the role of global features on object search
Psychological Review
Cited by (184)
Classification of power quality disturbances using visual attention mechanism and feed-forward neural network
2022, Measurement: Journal of the International Measurement ConfederationVideo saliency prediction via spatio-temporal reasoning
2021, NeurocomputingPredicting Radiologists' Gaze With Computational Saliency Models in Mammogram Reading
2024, IEEE Transactions on MultimediaAge Differences in Gaze Following: Older Adults Follow Gaze More than Younger Adults When free-viewing Scenes
2024, Experimental Aging ResearchPredicting Visual Fixations
2023, Annual Review of Vision Science