Skip to main content
Top

2019 | OriginalPaper | Chapter

Give Ear to My Face: Modelling Multimodal Attention to Social Interactions

Authors : Giuseppe Boccignone, Vittorio Cuculo, Alessandro D’Amelio, Giuliano Grossi, Raffaella Lanzarotti

Published in: Computer Vision – ECCV 2018 Workshops

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We address the deployment of perceptual attention to social interactions as displayed in conversational clips, when relying on multimodal information (audio and video). A probabilistic modelling framework is proposed that goes beyond the classic saliency paradigm while integrating multiple information cues. Attentional allocation is determined not just by stimulus-driven selection but, importantly, by social value as modulating the selection history of relevant multimodal items. Thus, the construction of attentional priority is the result of a sampling procedure conditioned on the potential value dynamics of socially relevant objects emerging moment to moment within the scene. Preliminary experiments on a publicly available dataset are presented.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anderson, B.A.: A value-driven mechanism of attentional selection. J. Vis. 13(3), 7 (2013)CrossRef Anderson, B.A.: A value-driven mechanism of attentional selection. J. Vis. 13(3), 7 (2013)CrossRef
2.
go back to reference Awh, E., Belopolsky, A.V., Theeuwes, J.: Top-down versus bottom-up attentional control: a failed theoretical dichotomy. Trends Cogn. Sci. 16(8), 437–443 (2012)CrossRef Awh, E., Belopolsky, A.V., Theeuwes, J.: Top-down versus bottom-up attentional control: a failed theoretical dichotomy. Trends Cogn. Sci. 16(8), 437–443 (2012)CrossRef
3.
go back to reference Berridge, K.C., Robinson, T.E.: Parsing reward. Trends Neurosci. 26(9), 507–513 (2003)CrossRef Berridge, K.C., Robinson, T.E.: Parsing reward. Trends Neurosci. 26(9), 507–513 (2003)CrossRef
4.
go back to reference Boccignone, G., Ferraro, M.: Ecological sampling of gaze shifts. IEEE Trans. Cybern. 44(2), 266–279 (2014)CrossRef Boccignone, G., Ferraro, M.: Ecological sampling of gaze shifts. IEEE Trans. Cybern. 44(2), 266–279 (2014)CrossRef
5.
go back to reference Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)CrossRef Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)CrossRef
6.
go back to reference Bruce, N.D., Wloka, C., Frosst, N., Rahman, S., Tsotsos, J.K.: On computational modeling of visual saliency: examining what’s right, and what’s left. Vis. Res. 116, 95–112 (2015)CrossRef Bruce, N.D., Wloka, C., Frosst, N., Rahman, S., Tsotsos, J.K.: On computational modeling of visual saliency: examining what’s right, and what’s left. Vis. Res. 116, 95–112 (2015)CrossRef
8.
go back to reference Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, vol. 20 (2008) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, vol. 20 (2008)
9.
go back to reference Chikkerur, S., Serre, T., Tan, C., Poggio, T.: What and where: a Bayesian inference theory of attention. Vis. Res. 50(22), 2233–2247 (2010)CrossRef Chikkerur, S., Serre, T., Tan, C., Poggio, T.: What and where: a Bayesian inference theory of attention. Vis. Res. 50(22), 2233–2247 (2010)CrossRef
11.
go back to reference Chung, J.S., Zisserman, A.: Lip reading in profile. In: BMVC (2017) Chung, J.S., Zisserman, A.: Lip reading in profile. In: BMVC (2017)
12.
go back to reference Coutrot, A., Guyader, N.: An efficient audiovisual saliency model to predict eye positions when looking at conversations. In: 23rd European Signal Processing Conference, pp. 1531–1535, August 2015 Coutrot, A., Guyader, N.: An efficient audiovisual saliency model to predict eye positions when looking at conversations. In: 23rd European Signal Processing Conference, pp. 1531–1535, August 2015
13.
go back to reference Coutrot, A., Guyader, N.: How saliency, faces, and sound influence gaze in dynamic social scenes. J. Vis. 14(8), 5 (2014)CrossRef Coutrot, A., Guyader, N.: How saliency, faces, and sound influence gaze in dynamic social scenes. J. Vis. 14(8), 5 (2014)CrossRef
15.
16.
go back to reference Foulsham, T., Cheng, J.T., Tracy, J.L., Henrich, J., Kingstone, A.: Gaze allocation in a dynamic situation: effects of social status and speaking. Cognition 117(3), 319–331 (2010)CrossRef Foulsham, T., Cheng, J.T., Tracy, J.L., Henrich, J., Kingstone, A.: Gaze allocation in a dynamic situation: effects of social status and speaking. Cognition 117(3), 319–331 (2010)CrossRef
17.
go back to reference Hu, P., Ramanan, D.: Finding tiny faces. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530. IEEE (2017) Hu, P., Ramanan, D.: Finding tiny faces. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530. IEEE (2017)
18.
go back to reference Kaya, E.M., Elhilali, M.: Modelling auditory attention. Phil. Trans. R. Soc. B 372(1714), 20160101 (2017)CrossRef Kaya, E.M., Elhilali, M.: Modelling auditory attention. Phil. Trans. R. Soc. B 372(1714), 20160101 (2017)CrossRef
19.
go back to reference Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: an auditory saliency map. Curr. Biol. 15(21), 1943–1947 (2005)CrossRef Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: an auditory saliency map. Curr. Biol. 15(21), 1943–1947 (2005)CrossRef
20.
go back to reference Le Meur, O., Coutrot, A.: Introducing context-dependent and spatially-variant viewing biases in saccadic models. Vis. Res. 121, 72–84 (2016)CrossRef Le Meur, O., Coutrot, A.: Introducing context-dependent and spatially-variant viewing biases in saccadic models. Vis. Res. 121, 72–84 (2016)CrossRef
22.
go back to reference Napoletano, P., Boccignone, G., Tisato, F.: Attentive monitoring of multiple video streams driven by a bayesian foraging strategy. IEEE Trans. Image Process. 24(11), 3266–3281 (2015)MathSciNetCrossRef Napoletano, P., Boccignone, G., Tisato, F.: Attentive monitoring of multiple video streams driven by a bayesian foraging strategy. IEEE Trans. Image Process. 24(11), 3266–3281 (2015)MathSciNetCrossRef
23.
go back to reference Onat, S., Libertus, K., König, P.: Integrating audiovisual information for the control of overt attention. J. Vis. 7(10), 11 (2007)CrossRef Onat, S., Libertus, K., König, P.: Integrating audiovisual information for the control of overt attention. J. Vis. 7(10), 11 (2007)CrossRef
25.
go back to reference Rahman, I.M., Hollitt, C., Zhang, M.: Feature map quality score estimation through regression. IEEE Trans. Image Process. 27(4), 1793–1808 (2018)MathSciNetCrossRef Rahman, I.M., Hollitt, C., Zhang, M.: Feature map quality score estimation through regression. IEEE Trans. Image Process. 27(4), 1793–1808 (2018)MathSciNetCrossRef
26.
go back to reference Rodríguez-Hidalgo, A., Peláez-Moreno, C., Gallardo-Antolín, A.: Towards multimodal saliency detection: an enhancement of audio-visual correlation estimation. In: Proceedings of 16th International Conference on Cognitive Informatics and Cognitive Computing, pp. 438–443. IEEE (2017) Rodríguez-Hidalgo, A., Peláez-Moreno, C., Gallardo-Antolín, A.: Towards multimodal saliency detection: an enhancement of audio-visual correlation estimation. In: Proceedings of 16th International Conference on Cognitive Informatics and Cognitive Computing, pp. 438–443. IEEE (2017)
27.
go back to reference Schütz, A., Braun, D., Gegenfurtner, K.: Eye movements and perception: a selective review. J. Vis. 11(5), 9 (2011)CrossRef Schütz, A., Braun, D., Gegenfurtner, K.: Eye movements and perception: a selective review. J. Vis. 11(5), 9 (2011)CrossRef
28.
go back to reference Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1–27 (2009)CrossRef Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1–27 (2009)CrossRef
29.
go back to reference Shinn-Cunningham, B.G.: Object-based auditory and visual attention. Trends Cogn. Sci. 12(5), 182–186 (2008)CrossRef Shinn-Cunningham, B.G.: Object-based auditory and visual attention. Trends Cogn. Sci. 12(5), 182–186 (2008)CrossRef
30.
go back to reference Suda, Y., Kitazawa, S.: A model of face selection in viewing video stories. Sci. Rep. 5, 7666 (2015)CrossRef Suda, Y., Kitazawa, S.: A model of face selection in viewing video stories. Sci. Rep. 5, 7666 (2015)CrossRef
31.
go back to reference Tatler, B., Hayhoe, M., Land, M., Ballard, D.: Eye guidance in natural vision: Reinterpreting salience. J. Vis. 11(5), 5 (2011)CrossRef Tatler, B., Hayhoe, M., Land, M., Ballard, D.: Eye guidance in natural vision: Reinterpreting salience. J. Vis. 11(5), 5 (2011)CrossRef
32.
go back to reference Tatler, B., Vincent, B.: The prominence of behavioural biases in eye guidance. Vis. Cogn. 17(6–7), 1029–1054 (2009)CrossRef Tatler, B., Vincent, B.: The prominence of behavioural biases in eye guidance. Vis. Cogn. 17(6–7), 1029–1054 (2009)CrossRef
33.
go back to reference Torralba, A.: Contextual priming for object detection. Int. J. Comput. Vis. 53, 153–167 (2003)CrossRef Torralba, A.: Contextual priming for object detection. Int. J. Comput. Vis. 53, 153–167 (2003)CrossRef
34.
go back to reference Wolfe, J.M.: When is it time to move to the next raspberry bush? Foraging rules in human visual search. J. Vis. 13(3), 10 (2013)CrossRef Wolfe, J.M.: When is it time to move to the next raspberry bush? Foraging rules in human visual search. J. Vis. 13(3), 10 (2013)CrossRef
35.
go back to reference Yang, S.C.H., Wolpert, D.M., Lengyel, M.: Theoretical perspectives on active sensing. Curr. Opin. Behav. Sci. 11, 100–108 (2016)CrossRef Yang, S.C.H., Wolpert, D.M., Lengyel, M.: Theoretical perspectives on active sensing. Curr. Opin. Behav. Sci. 11, 100–108 (2016)CrossRef
Metadata
Title
Give Ear to My Face: Modelling Multimodal Attention to Social Interactions
Authors
Giuseppe Boccignone
Vittorio Cuculo
Alessandro D’Amelio
Giuliano Grossi
Raffaella Lanzarotti
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-11012-3_27

Premium Partner