Skip to main content
Erschienen in: Annals of Telecommunications 1-2/2014

01.02.2014

Video viewing: do auditory salient events capture visual attention?

verfasst von: Antoine Coutrot, Nathalie Guyader, Gelu Ionescu, Alice Caplier

Erschienen in: Annals of Telecommunications | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We assess whether salient auditory events contained in soundtracks modify eye movements when exploring videos. In a previous study, we found that, on average, nonspatial sound contained in video soundtracks impacts on eye movements. This result indicates that sound could play a leading part in visual attention models to predict eye movements. In this research, we go further and test whether the effect of sound on eye movements is stronger just after salient auditory events. To automatically spot salient auditory events, we used two auditory saliency models: the discrete energy separation algorithm and the energy model. Both models provide a saliency time curve, based on the fusion of several elementary audio features. The most salient auditory events were extracted by thresholding these curves. We examined some eye movement parameters just after these events rather than on all the video frames. We showed that the effect of sound on eye movements (variability between eye positions, saccade amplitude, and fixation duration) was not stronger after salient auditory events than on average over entire videos. Thus, we suggest that sound could impact on visual exploration not only after salient events but in a more global way.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bovik AC, Maragos P, Quatieri TF (1993) AM-FM energy detection and separation in noise using multiband energy operators. IEEE Trans Signal Process 41(12):3245–3265CrossRefMATH Bovik AC, Maragos P, Quatieri TF (1993) AM-FM energy detection and separation in noise using multiband energy operators. IEEE Trans Signal Process 41(12):3245–3265CrossRefMATH
2.
Zurück zum Zitat Bregman AS (1990) Auditory scene analysis, the perceptual organization of sound. MIT, Cambridge Bregman AS (1990) Auditory scene analysis, the perceptual organization of sound. MIT, Cambridge
3.
Zurück zum Zitat Cater K, Chalmers A, Ward G (2003) Detail to attention: exploiting visual tasks for selective rendering. In: Eurographics symposium on rendering, pp 270–280 Cater K, Chalmers A, Ward G (2003) Detail to attention: exploiting visual tasks for selective rendering. In: Eurographics symposium on rendering, pp 270–280
4.
Zurück zum Zitat Corneil BD, Munoz DP (1996) The influence of auditory and visual distractors on human orienting gaze shifts. J Neurosci 16(24):8193–8207 Corneil BD, Munoz DP (1996) The influence of auditory and visual distractors on human orienting gaze shifts. J Neurosci 16(24):8193–8207
5.
Zurück zum Zitat Coutrot A, Guyader N, Ionescu G, Caplier A (2012) Influence of soundtrack on eye movements during video exploration. J Eye Mov Res 5(4):1–10 Coutrot A, Guyader N, Ionescu G, Caplier A (2012) Influence of soundtrack on eye movements during video exploration. J Eye Mov Res 5(4):1–10
6.
Zurück zum Zitat Coutrot A, Ionescu G, Guyader N, Rivet B (2011) Audio tracks do not influence eye movements when watching videos. In: 34th European conference on visual perception (ECVP 2011), vol 137. Toulouse, France Coutrot A, Ionescu G, Guyader N, Rivet B (2011) Audio tracks do not influence eye movements when watching videos. In: 34th European conference on visual perception (ECVP 2011), vol 137. Toulouse, France
7.
Zurück zum Zitat Evangelopoulos G, Maragos P (2006) Multiband modulation energy tracking for noisy speech detection. IEEE Trans Audio, Speech Language Process 14(6):2024–2038CrossRef Evangelopoulos G, Maragos P (2006) Multiband modulation energy tracking for noisy speech detection. IEEE Trans Audio, Speech Language Process 14(6):2024–2038CrossRef
8.
Zurück zum Zitat Evangelopoulos G, Zlatintsi A, Skoumas G, Rapantzikos K, Potamianos A, Maragos P, Avrithis Y (2009) Video event detection and summarization using audio, visual and text saliency. In: Proc. IEEE international conf. on acoustics, speech and signal processing (ICASSP-09), Taipei, pp 553–3556 Evangelopoulos G, Zlatintsi A, Skoumas G, Rapantzikos K, Potamianos A, Maragos P, Avrithis Y (2009) Video event detection and summarization using audio, visual and text saliency. In: Proc. IEEE international conf. on acoustics, speech and signal processing (ICASSP-09), Taipei, pp 553–3556
9.
Zurück zum Zitat Fritz JB, Elhilali M, David SV (2007) Auditory attention—focusing the searchlight on sound. Curr Opin Neurobiol 17:1–19CrossRef Fritz JB, Elhilali M, David SV (2007) Auditory attention—focusing the searchlight on sound. Curr Opin Neurobiol 17:1–19CrossRef
10.
Zurück zum Zitat Garsoffky B, Huff M, Schwan S (2007) Changing viewpoints during dynamic events. Perceptions 36(3):366–374CrossRef Garsoffky B, Huff M, Schwan S (2007) Changing viewpoints during dynamic events. Perceptions 36(3):366–374CrossRef
11.
Zurück zum Zitat Gouras P (1967) The effects of light-adaptation on rod and cone receptive field organization of monkey ganglion cells. J Physiol 192(3):747–760 Gouras P (1967) The effects of light-adaptation on rod and cone receptive field organization of monkey ganglion cells. J Physiol 192(3):747–760
12.
Zurück zum Zitat Guski R, Troje NF (2003) Audiovisual phenomenal causality. Percept Psychophys 65(5):789–800CrossRef Guski R, Troje NF (2003) Audiovisual phenomenal causality. Percept Psychophys 65(5):789–800CrossRef
13.
Zurück zum Zitat Ho-Phuoc T, Guyader N, Landragin F, Guérin-Dugué A (2012) When viewing natural scenes, do abnormal colours impact on spatial or temporal parameters of eye movements? J Vis 12(2): 1–13CrossRef Ho-Phuoc T, Guyader N, Landragin F, Guérin-Dugué A (2012) When viewing natural scenes, do abnormal colours impact on spatial or temporal parameters of eye movements? J Vis 12(2): 1–13CrossRef
14.
Zurück zum Zitat Ionescu G, Guyader N, Guérin-Dugué A (2009) SoftEye software (IDDN.FR.001.200017.000.S.P.2010.003.31235) Ionescu G, Guyader N, Guérin-Dugué A (2009) SoftEye software (IDDN.FR.001.200017.000.S.P.2010.003.31235)
15.
Zurück zum Zitat Itti L (2004) Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process 13(10):1304–1318CrossRef Itti L (2004) Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process 13(10):1304–1318CrossRef
16.
Zurück zum Zitat Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef
17.
Zurück zum Zitat Kaiser J (1990) On a simple algorithm to calculate the “energy” of a signal. In: International conference on acoustics, speech, and signal processing, ICASSP-90, vol 1, Albuquerque, NM, USA, pp 381–384 Kaiser J (1990) On a simple algorithm to calculate the “energy” of a signal. In: International conference on acoustics, speech, and signal processing, ICASSP-90, vol 1, Albuquerque, NM, USA, pp 381–384
18.
Zurück zum Zitat Kalinli O, Narayanan S (2007) A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech. In: Eighth annual conference of the international speech communication association. Antwerp, Belgium, pp 1941–1944 Kalinli O, Narayanan S (2007) A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech. In: Eighth annual conference of the international speech communication association. Antwerp, Belgium, pp 1941–1944
19.
Zurück zum Zitat Kayser C, Petkov CI, Lippert M, Logothetis NK (2005) Mechanisms for allocating auditory attention: an auditory saliency map. Curr Biol 15:1943–1947CrossRef Kayser C, Petkov CI, Lippert M, Logothetis NK (2005) Mechanisms for allocating auditory attention: an auditory saliency map. Curr Biol 15:1943–1947CrossRef
20.
Zurück zum Zitat Kraus N, McGee T (1992) Electrophysiology of the human auditory system. In: Popper A, Fay R (eds) The mammalian auditory pathway: neurophysiology. Springer, New York, pp 335–403CrossRef Kraus N, McGee T (1992) Electrophysiology of the human auditory system. In: Popper A, Fay R (eds) The mammalian auditory pathway: neurophysiology. Springer, New York, pp 335–403CrossRef
21.
Zurück zum Zitat Li Z, Qin S, Itti L (2011) Visual attention guided bit allocation in video compression. Image Vis Comput 29:1–14CrossRefMATH Li Z, Qin S, Itti L (2011) Visual attention guided bit allocation in video compression. Image Vis Comput 29:1–14CrossRefMATH
22.
Zurück zum Zitat Marat S, Ho-Phuoc T, Granjon L, Guyader N, Pellerin D, Guérin-Dugué A (2009) Modelling spatio-temporal saliency to predict gaze direction for short videos. Int J Comput Vis 82(3):231–243CrossRef Marat S, Ho-Phuoc T, Granjon L, Guyader N, Pellerin D, Guérin-Dugué A (2009) Modelling spatio-temporal saliency to predict gaze direction for short videos. Int J Comput Vis 82(3):231–243CrossRef
23.
Zurück zum Zitat McGurk H, MacDonald J (1976) Hearing lips and seing voices. Nature 264:746–748CrossRef McGurk H, MacDonald J (1976) Hearing lips and seing voices. Nature 264:746–748CrossRef
24.
Zurück zum Zitat Meredith MA, Nemitz JW, Stein BE (1987) Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. J Neurosci 7(10):3215–3229 Meredith MA, Nemitz JW, Stein BE (1987) Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. J Neurosci 7(10):3215–3229
25.
Zurück zum Zitat Meredith MA, Stein BE (1986) Spatial factors determine the activity of muitisensory neurons in cat superior colliculus. Brain Res 365:350–354CrossRef Meredith MA, Stein BE (1986) Spatial factors determine the activity of muitisensory neurons in cat superior colliculus. Brain Res 365:350–354CrossRef
26.
Zurück zum Zitat Onat S, Libertus K, König P (2007) Integrating audiovisual information for the control of overt attention. J Vis 7(10):1–16CrossRef Onat S, Libertus K, König P (2007) Integrating audiovisual information for the control of overt attention. J Vis 7(10):1–16CrossRef
27.
Zurück zum Zitat Recanzone GH (2009) Interactions of auditory and visual stimuli in space and time. Hear Res 258(1–2):89–99CrossRef Recanzone GH (2009) Interactions of auditory and visual stimuli in space and time. Hear Res 258(1–2):89–99CrossRef
28.
Zurück zum Zitat Smith TJ, Levin D, Cutting JE (2012) A window on reality: perceiving edited moving images. Curr Dir Psychol Sci 21(2):107–113CrossRef Smith TJ, Levin D, Cutting JE (2012) A window on reality: perceiving edited moving images. Curr Dir Psychol Sci 21(2):107–113CrossRef
29.
Zurück zum Zitat Stein B, Meredith M (1993) The merging of the senses. MIT, Cambridge Stein B, Meredith M (1993) The merging of the senses. MIT, Cambridge
30.
Zurück zum Zitat Tatler BW, Baddeley RJ, Vincent BT (2006) The long and the short of it: spatial statistics at fixation vary with saccade amplitude and task. Vis Res 46:1857–1862CrossRef Tatler BW, Baddeley RJ, Vincent BT (2006) The long and the short of it: spatial statistics at fixation vary with saccade amplitude and task. Vis Res 46:1857–1862CrossRef
31.
Zurück zum Zitat Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Acoust, Speech Signal Process 28(5):599–601CrossRef Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Acoust, Speech Signal Process 28(5):599–601CrossRef
32.
Zurück zum Zitat Tingle D, Kim YE, Turnbull D (2010) Exploring automatic music annotation with “acoustically-objective” tags. In: Proceedings of the international conference on multimedia information retrieval, MIR ’10. ACM, New York, NY, USA, pp 55–62 Tingle D, Kim YE, Turnbull D (2010) Exploring automatic music annotation with “acoustically-objective” tags. In: Proceedings of the international conference on multimedia information retrieval, MIR ’10. ACM, New York, NY, USA, pp 55–62
33.
Zurück zum Zitat Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12:97–136CrossRef Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12:97–136CrossRef
34.
Zurück zum Zitat Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302CrossRef Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302CrossRef
35.
Zurück zum Zitat Vroomen J, de Gelder B (2000) Sound enhances visual perception: cross-modal effects of auditory organization on vision. J Exp Psychol 26(5):1583–1590 Vroomen J, de Gelder B (2000) Sound enhances visual perception: cross-modal effects of auditory organization on vision. J Exp Psychol 26(5):1583–1590
Metadaten
Titel
Video viewing: do auditory salient events capture visual attention?
verfasst von
Antoine Coutrot
Nathalie Guyader
Gelu Ionescu
Alice Caplier
Publikationsdatum
01.02.2014
Verlag
Springer Paris
Erschienen in
Annals of Telecommunications / Ausgabe 1-2/2014
Print ISSN: 0003-4347
Elektronische ISSN: 1958-9395
DOI
https://doi.org/10.1007/s12243-012-0352-5

Weitere Artikel der Ausgabe 1-2/2014

Annals of Telecommunications 1-2/2014 Zur Ausgabe

Premium Partner