RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis

https://doi.org/10.1016/j.image.2013.03.009Get rights and content

Highlights

  • Overview of 13 recent saliency models.

  • RARE2012: our new multi-scale rarity-based saliency detection model.

  • Comparative statistical analysis to determine the robustness of ranking between models.

Abstract

For the last decades, computer-based visual attention models aiming at automatically predicting human gaze on images or videos have exponentially increased. Even if several families of methods have been proposed and a lot of words like centre-surround difference, contrast, rarity, novelty, redundancy, irregularity, surprise or compressibility have been used to define those models, they are all based on the same and unique idea of information innovation in a given context.

In this paper, we propose a novel saliency prediction model, called RARE2012, which selects information worthy of attention based on multi-scale spatial rarity. RARE2012 is then evaluated using two complementary metrics, the Normalized Scanpath Saliency (NSS) and the Area Under the Receiver Operating Characteristic (AUROC) against 13 recently published saliency models. It is shown to be the best for NSS metric and second best for AUROC metric on three publicly available datasets (Toronto, Koostra and Jian Li).

Finally, based on an additional comparative statistical analysis and the effect-size Hedge' g measure, RARE2012 outperforms, at least slightly, the other models while considering both metrics on the three databases as a whole.

Introduction

There is no common definition of human attention, and it can differ depending on the domain (psychology, neuroscience or engineering) or the considered approach. But, in a general sense, human attention can be defined as the natural capacity to prioritize the incoming stimuli and selectively focus on part of them. The goal of the attentional process is to identify as quickly as possible those parts of our environment that are key to our survival. Humans but also all animals use this mechanism in their daily life and even during dreams when the rapid eye movements occur (REM stage), which are saccades and fixations on the dream scene.

The interest of attention prediction is more and more understood by the scientific community with an exponential number of papers dealing with saliency algorithms. Attention modeling has very wide applications such as machine vision, surveillance, data reduction and compression, human computer interfaces, advertising assessment or robotics. In this context, efficient attention models are of great importance for vision and signal processing algorithms improvements in the future.

In computer science, attention modeling is mainly based on the concept of “saliency maps”, which provides, for each pixel, its probability to attract human attention. The idea is that the gaze of people will direct to areas which, in some way, stand out from the background. Saliency implies a competition between an objective “bottom-up” attention and a subjective “top-down” information. Bottom-up attention is a generic approach also known as stimulus-driven or exogenous attention. Furthermore, it relies on the information innovation that the features extracted from the image can bring in a given spatial context. The top-down component of attention, which is also known as task-driven or endogenous attention, integrates specific knowledge that the viewer could have in specific situations (tasks, models of the kind of scene, recognized objects, etc.). The eye movements are not a direct output of the algorithms, but they can be computed from the saliency map by using winner-take-all [1] or more dynamical algorithms [2].

In this paper we present a novel attention algorithm and we focus on a fair comparison with other state of the art attention models. The algorithm proposed which we will call “RARE2012” is purely bottom-up. This is an important point for model evaluation as top-down information can drastically increase a model performance. Indeed, several models use additional post-processing which provide top-down information like centred Gaussians which leads to an artificial increase of their results. Moreover, several saliency models have a lot of parameters, which make fair comparison very difficult. Some research, like Borji and Itti [3] or Judd et al. [4], attempts to provide a benchmark between bottom-up models using several similarity measures and sometimes several datasets of images. We based our validation on Borji and Itti approach and codes [3]. A complementary statistical evaluation has also been added. The codes of the model proposed in this paper are freely available online [5].

The paper is organized as follows. Section 2 contains an overview of recent saliency models and more specifically of methods used in our comparative study. In Section 3, the architecture of our method is described in detail. The results are presented in Section 4: after a qualitative evaluation on psychophysical observations and three databases, two metrics are used to quantify the prediction of the proposed method. Section 5 details an additional two-metric based statistical analysis of the results showing the overall effectiveness of RARE2012. Finally, Section 6 provides a discussion and conclusion.

Section snippets

Related work

It is very hard to find an optimal taxonomy, which classifies all the saliency approaches. The literature is very active concerning still images saliency models. While some years ago only some labs in the world were working on the topic, nowadays a hundred different models have been published. Those models have various implementations and technical approaches despite that they all derive from the same idea of information innovation in a given context.

Some attempts of taxonomies proposed an

RARE2012: our proposed saliency model

In this section, the architecture of our method (Fig. 1) is described in detail. There are three main steps. First, we extract low-level colour and medium-level orientation features. Afterwards, a multi-scale rarity mechanism is applied. Finally, we fuse rarity maps into a single final saliency map. A comparison is then made with the RARE algorithms family. In the proposed taxonomy of Section 3, RARE2012 is a part of the second category as it considers information at several scales but globally

Saliency model evaluation

In this section, we compare our method with the 13 saliency models presented in related work on three datasets. After the dataset presentation, qualitative and quantitative results are detailed and explained.

Additional statistical validation

In this section, the statistical validation is studied. First, the statistical approach is carefully detailed. Its interest compared to a standard ANOVA test is exposed. The importance of effect sizes is highlighted. Then, the results are exposed and explained.

Conclusion and future work

This paper presents a novel multi-scale rarity-based saliency model for still images called RARE2012. An extensive evaluation and statistical analysis are carried out to compare this model with other important models of the state of the art.

RARE2012 presents major changes compared to the previously published models of the RARE family [31], [32]. First, the colour and orientation features are extracted in parallel or sequentially depending of their complexity. Colours are based on a PCA analysis

Acknowledgements

N. Riche is supported by the “Fonds pour la formation a la Recherche dans Industrie et dans Agriculture” (FRIA). N. Riche and M. Mancas contributed equally to this work. M. Duvinage is a FNRS (Fonds National de la Recherche Scientifique) Research Fellow and the corresponding author for statistical analysis. Thierry Dutoit is member of EURASIP. This work is also funded by the Belgian Walloon region NumediArt project.

This paper presents research results of the Belgian Network DYSCO (Dynamical

References (46)

  • D. Walther et al.

    Modeling attention to salient proto-objects

    Neural Networks

    (2006)
  • L. Itti et al.

    A model of saliency-based visual attention for rapid scene analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1998)
  • M. Mancas et al.

    From saliency to eye gaze: embodied visual selection for a pan-tilt-based robotic head

    Advances in Visual Computing

    (2011)
  • Ali Borji, Laurent Itti, State-of-the-art in visual attention modeling, IEEE Transactions on Pattern Analysis and...
  • T. Judd et al.

    A benchmark of computational models of saliency to predict human fixations

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2012)
  • Matei Mancas, Nicolas Riche, Computational attention website,...
  • Y.F. Ma, H.J. Zhang, Contrast-based image attention analysis by using fuzzy growing, in: International Multimedia...
  • N. Bruce, J. Tsotsos, Saliency based on information maximization, in: Advances in Neural Information Processing...
  • D. Walther, Interactions of Visual Attention and Object Recognition: Computational Modeling, Algorithms, and...
  • C. Koch J. Harel, P. Perona, Graph-based visual saliency, in: Proceedings of Neural Information Processing Systems...
  • Lei Xu Yin Li, Yue Zhou, Xiaochao Yang, Incremental sparse saliency detection, in: IEEE International Conference on...
  • Junchi Yan Yin Li, Yue Zhou, Visual saliency based on conditional entropy, in: The Asian Conference on Computer Vision...
  • Hae Jong Seo, Peyman Milanfar, Nonparametric bottom-up saliency detection by self-resemblance, in: IEEE Conference on...
  • Hae Jong Seo et al.

    Static and space-time visual saliency detection by self-resemblance

    Jounal of Vision

    (2009)
  • M. Cheng, et al., Global contrast based salient region detection, in: IEEE Conference on Computer Vision and Pattern...
  • Z. Liu

    Unsupervised salient object segmentation based on kernel density estimation and two-phase graph cut

    IEEE Transactions on Multimedia

    (2012)
  • F. Perazzil et al., Saliency filters: contrast based filtering for salient region detection, in: IEEE Conference on...
  • Antonio Torralba et al.

    Contextual guidance of eye movements and attention in real-world scenes: the role of global features on object search

    Psychological Review

    (2006)
  • A. Garcia-Diaz et al., Saliency based on decorrelation and distinctiveness of local responses, in: Proceedings of 13th...
  • A. Garcia-Diaz, et al., Decorrelation and distinctiveness provide with human-like saliency, in: J. Blanc-Talon, et al.,...
  • Ayellet Tal Stas Goferman, Lihi Zelnik-Manor, Context-aware saliency detection, in: IEEE Conference on Computer Vision...
  • B. Schauerte, G.A. Fink, Focusing computational visual attention in multi-modal human-robot interaction, in:...
  • X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: IEEE Conference on Computer Vision and Pattern...
  • Cited by (184)

    • Classification of power quality disturbances using visual attention mechanism and feed-forward neural network

      2022, Measurement: Journal of the International Measurement Confederation
    • Predicting Visual Fixations

      2023, Annual Review of Vision Science
    View all citing articles on Scopus
    View full text