Skip to main content
Top

2018 | OriginalPaper | Chapter

Towards Crossmodal Learning for Smooth Multimodal Attention Orientation

Authors : Frederik Haarslev, David Docherty, Stefan-Daniel Suvei, William Kristian Juel, Leon Bodenhagen, Danish Shaikh, Norbert Krüger, Poramate Manoonpong

Published in: Social Robotics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Orienting attention towards another person of interest is a fundamental social behaviour prevalent in human-human interaction and crucial in human-robot interaction. This orientation behaviour is often governed by the received audio-visual stimuli. We present an adaptive neural circuit for multisensory attention orientation that combines auditory and visual directional cues. The circuit learns to integrate sound direction cues, extracted via a model of the peripheral auditory system of lizards, with visual directional cues via deep learning based object detection. We implement the neural circuit on a robot and demonstrate that integrating multisensory information via the circuit generates appropriate motor velocity commands that control the robot’s orientation movements. We experimentally validate the adaptive neural circuit for co-located human target and a loudspeaker emitting a fixed tone.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Alonso-Martín, F., Gorostiza, J.F., Malfaz, M., Salichs, M.A.: User localization during human-robot interaction. Sensors 12(7), 9913–9935 (2012)CrossRef Alonso-Martín, F., Gorostiza, J.F., Malfaz, M., Salichs, M.A.: User localization during human-robot interaction. Sensors 12(7), 9913–9935 (2012)CrossRef
3.
go back to reference Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)CrossRef Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)CrossRef
4.
go back to reference van den Brule, R., Dotsch, R., Bijlstra, G., Wigboldus, D.H.J., Haselager, P.: Do robot performance and behavioral style affect human trust? Int. J. Soc. Robot. 6(4), 519–531 (2014)CrossRef van den Brule, R., Dotsch, R., Bijlstra, G., Wigboldus, D.H.J., Haselager, P.: Do robot performance and behavioral style affect human trust? Int. J. Soc. Robot. 6(4), 519–531 (2014)CrossRef
5.
go back to reference Christensen-Dalsgaard, J., Manley, G.: Directionality of the lizard ear. J. Exp. Biol. 208(6), 1209–1217 (2005)CrossRef Christensen-Dalsgaard, J., Manley, G.: Directionality of the lizard ear. J. Exp. Biol. 208(6), 1209–1217 (2005)CrossRef
6.
go back to reference D’Arca, E., Robertson, N.M., Hopgood, J.: Person tracking via audio and video fusion. In: 9th IET Data Fusion Target Tracking Conference: Algorithms Applications, pp. 1–6 (2012) D’Arca, E., Robertson, N.M., Hopgood, J.: Person tracking via audio and video fusion. In: 9th IET Data Fusion Target Tracking Conference: Algorithms Applications, pp. 1–6 (2012)
7.
go back to reference David, B., David, A.: Combining visual and auditory information. In: Martinez-Conde, S., Macknik, S., Martinez, L., Alonso, J.M., Tse, P. (eds.) Visual Perception-Fundamentals of Awareness: Multi-Sensory Integration and High-Order Perception, Progress in Brain Research, Part B, vol. 155, pp. 243–258. Elsevier (2006) David, B., David, A.: Combining visual and auditory information. In: Martinez-Conde, S., Macknik, S., Martinez, L., Alonso, J.M., Tse, P. (eds.) Visual Perception-Fundamentals of Awareness: Multi-Sensory Integration and High-Order Perception, Progress in Brain Research, Part B, vol. 155, pp. 243–258. Elsevier (2006)
8.
go back to reference Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 118–121 (2005) Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 118–121 (2005)
9.
go back to reference Graf, B., Reiser, U., Hägele, M., Mauz, K., Klein, P.: Robotic home assistant Care-O-bot 3 - product vision and innovation platform. In: IEEE Workshop on Advanced Robotics and its Social Impacts (2009) Graf, B., Reiser, U., Hägele, M., Mauz, K., Klein, P.: Robotic home assistant Care-O-bot 3 - product vision and innovation platform. In: IEEE Workshop on Advanced Robotics and its Social Impacts (2009)
10.
go back to reference Hoseinnezhad, R., Vo, B.N., Vo, B.T., Suter, D.: Bayesian integration of audio and visual information for multi-target tracking using a CB-member filter. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2300–2303 (2011) Hoseinnezhad, R., Vo, B.N., Vo, B.T., Suter, D.: Bayesian integration of audio and visual information for multi-target tracking using a CB-member filter. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2300–2303 (2011)
11.
go back to reference Kheradiya, J., Reddy, S., Hegde, R.: Active Speaker Detection using audio-visual sensor array. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 480–484 (2014) Kheradiya, J., Reddy, S., Hegde, R.: Active Speaker Detection using audio-visual sensor array. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 480–484 (2014)
12.
go back to reference Kiliç, V., Barnard, M., Wang, W., Kittler, J.: Audio assisted robust visual tracking with adaptive particle filtering. IEEE Trans. Multimedia 17(2), 186–200 (2015)CrossRef Kiliç, V., Barnard, M., Wang, W., Kittler, J.: Audio assisted robust visual tracking with adaptive particle filtering. IEEE Trans. Multimedia 17(2), 186–200 (2015)CrossRef
13.
go back to reference Mayer, A.R., Dorflinger, J.M., Rao, S.M., Seidenberg, M.: Neural networks underlying endogenous and exogenous visual-spatial orienting. Neuroimage 23(2), 534–541 (2004)CrossRef Mayer, A.R., Dorflinger, J.M., Rao, S.M., Seidenberg, M.: Neural networks underlying endogenous and exogenous visual-spatial orienting. Neuroimage 23(2), 534–541 (2004)CrossRef
14.
go back to reference Porr, B., Wörgötter, F.: Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comput. 18(6), 1380–1412 (2006)CrossRefMATH Porr, B., Wörgötter, F.: Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comput. 18(6), 1380–1412 (2006)CrossRefMATH
15.
go back to reference Posner, M.I.: Orienting of attention. Q. J. Exp. Psychol. 32(1), 3–25 (1980)CrossRef Posner, M.I.: Orienting of attention. Q. J. Exp. Psychol. 32(1), 3–25 (1980)CrossRef
16.
go back to reference Qian, X., Brutti, A., Omologo, M., Cavallaro, A.: 3D audio-visual speaker tracking with an adaptive particle filter. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2896–2900 (2017) Qian, X., Brutti, A., Omologo, M., Cavallaro, A.: 3D audio-visual speaker tracking with an adaptive particle filter. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2896–2900 (2017)
18.
go back to reference Sanchez-Riera, J., et al.: Online multimodal speaker detection for humanoid robots. In: 12th IEEE-RAS International Conference on Humanoid Robots, pp. 126–133 (2012) Sanchez-Riera, J., et al.: Online multimodal speaker detection for humanoid robots. In: 12th IEEE-RAS International Conference on Humanoid Robots, pp. 126–133 (2012)
19.
go back to reference Shaikh, D., Hallam, J., Christensen-Dalsgaard, J.: From “ear” to there: a review of biorobotic models of auditory processing in lizards. Biol. Cybern. 110(4), 303–317 (2016)CrossRefMATH Shaikh, D., Hallam, J., Christensen-Dalsgaard, J.: From “ear” to there: a review of biorobotic models of auditory processing in lizards. Biol. Cybern. 110(4), 303–317 (2016)CrossRefMATH
20.
go back to reference Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.: Audio-visual active speaker tracking in cluttered indoors environments. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 7–15 (2009)CrossRef Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.: Audio-visual active speaker tracking in cluttered indoors environments. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 7–15 (2009)CrossRef
Metadata
Title
Towards Crossmodal Learning for Smooth Multimodal Attention Orientation
Authors
Frederik Haarslev
David Docherty
Stefan-Daniel Suvei
William Kristian Juel
Leon Bodenhagen
Danish Shaikh
Norbert Krüger
Poramate Manoonpong
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-05204-1_31

Premium Partner