Skip to main content
Erschienen in: International Journal of Social Robotics 4/2015

01.08.2015

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

verfasst von: Laurence Devillers, Marie Tahon, Mohamed A. Sehili, Agnes Delaborde

Erschienen in: International Journal of Social Robotics | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The challenge of this study is twofold: recognizing emotions from audio signals in naturalistic Human–Robot Interaction (HRI) environment, and using a cross-dataset recognition for robustness evaluation. The originality of this work lies in the use of six emotional models in parallel, generated using two training corpora and three acoustic feature sets. The models are obtained from two databases collected in different tasks, and a third independent real-life HRI corpus (collected within the ROMEO project—http://​www.​projetromeo.​com/​) is used for test. As primary results, for the task of four-emotion recognition, and by combining the probabilistic outputs of six different systems in a very simplistic way, we obtained better results compared to the best baseline system. Moreover, to investigate the potential of fusing many systems’ outputs using a “perfect” fusion method, we calculate the oracle performance (oracle considers a correct prediction if at least one of the systems outputs a correct prediction). The obtained oracle score is 73 % while the auto-coherence score on the same corpus (i.e. performance obtained by using the same data for training and for testing) is about 57 %. We experiment a reliability estimation protocol that makes use of outputs from many systems. Such reliability measurement of an emotion recognition system’s decision could help to construct a relevant emotional and interactional user profile which could be used to drive the expressive behavior of the robot.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Many efforts are made to release at least one of these corpora to the community, which will require specific data formatting and obtaining the agreement of all the participants.
 
2
Vision Institute, 11 rue Moreau, 75012 Paris.
 
Literatur
1.
Zurück zum Zitat Alonso-Martin F, Malfaz M, Sequeira J, Gorostiza J, Salichs M (2013) A multimodal emotion detection system during human-robot interaction. Sensors 13:15549–15581CrossRef Alonso-Martin F, Malfaz M, Sequeira J, Gorostiza J, Salichs M (2013) A multimodal emotion detection system during human-robot interaction. Sensors 13:15549–15581CrossRef
2.
Zurück zum Zitat Batliner A, Hacker C, Steidl S, Neth E, D’Arcy S, Russell M, Wong M (2004) “You stupid tin box”—children interacting with the aibo robot: a cross-linguistic emotional speech corpus. In: LREC, Lisbon, pp 171–174 Batliner A, Hacker C, Steidl S, Neth E, D’Arcy S, Russell M, Wong M (2004) “You stupid tin box”—children interacting with the aibo robot: a cross-linguistic emotional speech corpus. In: LREC, Lisbon, pp 171–174
3.
Zurück zum Zitat Batliner A, Schuller B, Seppi D, Steidl S, Devillers L, Vidrascu L, Vogt T, Aharonson V, Amir N (2011) Cognitive technologies. In: The automatic recognition of emotions in speech. Springer, Heidelberg, pp 71–99 Batliner A, Schuller B, Seppi D, Steidl S, Devillers L, Vidrascu L, Vogt T, Aharonson V, Amir N (2011) Cognitive technologies. In: The automatic recognition of emotions in speech. Springer, Heidelberg, pp 71–99
4.
Zurück zum Zitat Batliner A, Steidl S, Neth E (2007) Laryngealizations and emotions: how many babushkas? In: Proceedings of the international workshop on paralinguistic speech—between models and data (ParaLing’ 07), Saarbrucken, pp 17–22 Batliner A, Steidl S, Neth E (2007) Laryngealizations and emotions: how many babushkas? In: Proceedings of the international workshop on paralinguistic speech—between models and data (ParaLing’ 07), Saarbrucken, pp 17–22
5.
Zurück zum Zitat Benziger T, Scherer KR (2005) The role of intonation in emotional expressions. Speech Commun 46(3–4):252–267CrossRef Benziger T, Scherer KR (2005) The role of intonation in emotional expressions. Speech Commun 46(3–4):252–267CrossRef
6.
Zurück zum Zitat Brendel M, Zaccarelli R, Devillers L (2010) Building a system for emotions detection from speech to control an affective avatar. In: LREC, Valetta, Malta Brendel M, Zaccarelli R, Devillers L (2010) Building a system for emotions detection from speech to control an affective avatar. In: LREC, Valetta, Malta
7.
Zurück zum Zitat Buendia A, Devillers L (2014) From informative cooperative dialogues to long-term social relation with a robot. In: Mariani J, Rosset S, Garnier-Rizet M, Devillers L (eds) Natural interaction with robots, knowbots and smartphones. Springer, New York, pp 135–151CrossRef Buendia A, Devillers L (2014) From informative cooperative dialogues to long-term social relation with a robot. In: Mariani J, Rosset S, Garnier-Rizet M, Devillers L (eds) Natural interaction with robots, knowbots and smartphones. Springer, New York, pp 135–151CrossRef
8.
Zurück zum Zitat Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, WeissI B (2005) A database of german emotional speech. In: Interspeech, Lisbon, pp 1517–1520 Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, WeissI B (2005) A database of german emotional speech. In: Interspeech, Lisbon, pp 1517–1520
9.
Zurück zum Zitat Castellano G, Leite I, Pereira A, Martinho C, Paiva A, McOwan P (2010) Inter-act: an affective and contextually rich multimodal video corpus for studying interaction with robots. In: International ACM conference on multimedia Castellano G, Leite I, Pereira A, Martinho C, Paiva A, McOwan P (2010) Inter-act: an affective and contextually rich multimodal video corpus for studying interaction with robots. In: International ACM conference on multimedia
10.
Zurück zum Zitat Chastagnol C, Clavel C, Courgeon M, Devillers L (2013) Designing an emotion detection system for a socially-intelligent human-robot interaction. In: Jokinen K, Wilcock G (eds) Towards a natural interaction with robots, knowbots and smartphones, putting spoken dialog systems into practice. Springer, New York Chastagnol C, Clavel C, Courgeon M, Devillers L (2013) Designing an emotion detection system for a socially-intelligent human-robot interaction. In: Jokinen K, Wilcock G (eds) Towards a natural interaction with robots, knowbots and smartphones, putting spoken dialog systems into practice. Springer, New York
11.
Zurück zum Zitat Cordeschi R (2013) Automatic decision-making and reliability in robotic systems: some implications in the case of robot weapons. AI Soc 28:431–441CrossRef Cordeschi R (2013) Automatic decision-making and reliability in robotic systems: some implications in the case of robot weapons. AI Soc 28:431–441CrossRef
12.
Zurück zum Zitat Dautenhahn K, Werry I (2002) A quantitative technique for analyzing robot-human interactions. In: International conference on intelligent robots and systems, Lausanne Dautenhahn K, Werry I (2002) A quantitative technique for analyzing robot-human interactions. In: International conference on intelligent robots and systems, Lausanne
13.
Zurück zum Zitat de Visser E, Parasuraman R (2011) Adaptive aiding of human-robot teaming effects of imperfect automation on performance, trust, and workload. J Cognit Eng Decis Mak 5(2):209–231CrossRef de Visser E, Parasuraman R (2011) Adaptive aiding of human-robot teaming effects of imperfect automation on performance, trust, and workload. J Cognit Eng Decis Mak 5(2):209–231CrossRef
14.
Zurück zum Zitat Delaborde A, Devillers L (2010) Use of nonverbal speech cues in social interaction between human and robot: emotional and interactional markers. In: International Workshop on affective interaction in natural environements (AFFINE), Firenze Delaborde A, Devillers L (2010) Use of nonverbal speech cues in social interaction between human and robot: emotional and interactional markers. In: International Workshop on affective interaction in natural environements (AFFINE), Firenze
15.
Zurück zum Zitat Desai M, Medvedev M, Vázquez M, McSheehy S, Gadea-Omelchenko S, Bruggeman C, Yanco H (2012) Effects of changing reliability on trust of robot systems. In: ACM/IEEE international conference on human-robot interaction, pp 73–80 Desai M, Medvedev M, Vázquez M, McSheehy S, Gadea-Omelchenko S, Bruggeman C, Yanco H (2012) Effects of changing reliability on trust of robot systems. In: ACM/IEEE international conference on human-robot interaction, pp 73–80
16.
Zurück zum Zitat Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine leraning based detection. J Neural Netw Spec Issue Emot Brain 18(4):407–422 Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine leraning based detection. J Neural Netw Spec Issue Emot Brain 18(4):407–422
17.
Zurück zum Zitat Devillers L, Martin JC (2008) Coding emotional events in audiovisual corpora. In: LREC, Marrakech Devillers L, Martin JC (2008) Coding emotional events in audiovisual corpora. In: LREC, Marrakech
18.
Zurück zum Zitat Devillers L, Vidrascu L, Layachi O (2010) A blueprint for an affectively competent agent, cross-fertilization between emotion psychology, affective neuroscience, and affective computing. In: Automatic detection of emotion from vocal expression. Oxford University Press, Oxford Devillers L, Vidrascu L, Layachi O (2010) A blueprint for an affectively competent agent, cross-fertilization between emotion psychology, affective neuroscience, and affective computing. In: Automatic detection of emotion from vocal expression. Oxford University Press, Oxford
19.
Zurück zum Zitat Duhaut D (2012) A way to put empathy in a robot. In: ICAI’10, Las Vegas Duhaut D (2012) A way to put empathy in a robot. In: ICAI’10, Las Vegas
20.
Zurück zum Zitat Ekman P (1999) Handbook of cognition and emotion, Wiley, New York, chap Basic emotion Ekman P (1999) Handbook of cognition and emotion, Wiley, New York, chap Basic emotion
21.
Zurück zum Zitat Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. Eurospeech, Rhodes Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. Eurospeech, Rhodes
22.
Zurück zum Zitat Eyben F, Batliner A, Schuller B, Seppi D, Steidl S (2010) Cross-corpus classification of realistic emotions: some pilot experiments. In: LREC, workshop on EMOTION: corpora for research on emotion and Affect, ELRA, Valetta, pp 77–82 Eyben F, Batliner A, Schuller B, Seppi D, Steidl S (2010) Cross-corpus classification of realistic emotions: some pilot experiments. In: LREC, workshop on EMOTION: corpora for research on emotion and Affect, ELRA, Valetta, pp 77–82
23.
Zurück zum Zitat Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159CrossRefMATH Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159CrossRefMATH
24.
Zurück zum Zitat Han JG, Gilmartin E, Looze CD, Vaughan B, Campbell N (2012) Speech & multimodal resources: the herme database of spontaneous multimodal human-robot dialogues. In: LREC, Istanbul Han JG, Gilmartin E, Looze CD, Vaughan B, Campbell N (2012) Speech & multimodal resources: the herme database of spontaneous multimodal human-robot dialogues. In: LREC, Istanbul
25.
Zurück zum Zitat Hegel F, Gieselmann S, Peters A, Holthaus P, Wrede B (2011) Towards a typology of meaningful signals and cues in social robotics. In: IEEE RO-MAN, 2011 Hegel F, Gieselmann S, Peters A, Holthaus P, Wrede B (2011) Towards a typology of meaningful signals and cues in social robotics. In: IEEE RO-MAN, 2011
26.
Zurück zum Zitat Jung M, Lee J, DePalma N, Adalgeirsson S, Hinds P, Breazeal C (2013) Engaging robots: easing complex human-robot teamwork using backchanneling. In: Conference on computer supported cooperative work, San Antonio Jung M, Lee J, DePalma N, Adalgeirsson S, Hinds P, Breazeal C (2013) Engaging robots: easing complex human-robot teamwork using backchanneling. In: Conference on computer supported cooperative work, San Antonio
27.
Zurück zum Zitat Keizer S, Foster M, Lemon O, Gaschler A, Giuliani M (2013) Training and evaluation of an mdp model for social multi-user human-robot interaction. In: SIGDIAL Keizer S, Foster M, Lemon O, Gaschler A, Giuliani M (2013) Training and evaluation of an mdp model for social multi-user human-robot interaction. In: SIGDIAL
28.
Zurück zum Zitat Marchi E, Batliner A, Schuller B (2012) Speech, emotion, age, language, task and typicality: trying to disentangle performance and future relevance. In: Workshop on wide spectrum social signal processing (ASE/IEEE international conference on social computing), Amsterdam Marchi E, Batliner A, Schuller B (2012) Speech, emotion, age, language, task and typicality: trying to disentangle performance and future relevance. In: Workshop on wide spectrum social signal processing (ASE/IEEE international conference on social computing), Amsterdam
29.
Zurück zum Zitat McKeown G, Valstar M, Cowie R, Pantic M, Schröder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17CrossRef McKeown G, Valstar M, Cowie R, Pantic M, Schröder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17CrossRef
30.
Zurück zum Zitat Mower E, Metallinou A, Lee CC, Kazemzadeh A, Busso C, Lee S, Narayanan S (2009) Interpreting ambiguous emotional expressions. In: ACII vol 978(1), Amsterdam, pp 4244–4799 Mower E, Metallinou A, Lee CC, Kazemzadeh A, Busso C, Lee S, Narayanan S (2009) Interpreting ambiguous emotional expressions. In: ACII vol 978(1), Amsterdam, pp 4244–4799
31.
Zurück zum Zitat Ochs M, Sadek D, Pelachaud C (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agents Multi-Agent Syst 24(3):410–440CrossRef Ochs M, Sadek D, Pelachaud C (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agents Multi-Agent Syst 24(3):410–440CrossRef
32.
Zurück zum Zitat Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (ed) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74 Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (ed) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74
33.
Zurück zum Zitat Ringeval F, Chetouani M, Schuller B (2012) Novel metrics of speech rhythm for the assessment of emotion. In: Proceedings of the interspeech Ringeval F, Chetouani M, Schuller B (2012) Novel metrics of speech rhythm for the assessment of emotion. In: Proceedings of the interspeech
34.
Zurück zum Zitat Scherer KR (1986) Vocal affect expressions: a review and a model for future research. Psychol Bull 99(2):143–165MathSciNetCrossRef Scherer KR (1986) Vocal affect expressions: a review and a model for future research. Psychol Bull 99(2):143–165MathSciNetCrossRef
35.
Zurück zum Zitat Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2010) The interspeech 2010 paralinguistic challenge. In: Interspeech. Makuhari, pp 2830–2833 Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2010) The interspeech 2010 paralinguistic challenge. In: Interspeech. Makuhari, pp 2830–2833
36.
Zurück zum Zitat Schuller B, Vlasenko B, Eyben F, Wöllmer M, Stühlsatz A, Wendemuth A, Rigoll G (2010b) Cross-corpus acoustic emotion recognition: variances and strategies. Trans Affect Comput IEEE 1(2):119–131CrossRef Schuller B, Vlasenko B, Eyben F, Wöllmer M, Stühlsatz A, Wendemuth A, Rigoll G (2010b) Cross-corpus acoustic emotion recognition: variances and strategies. Trans Affect Comput IEEE 1(2):119–131CrossRef
37.
Zurück zum Zitat Schuller B, Batliner A, Steidl S, Seppi D (2011a) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun, Special Issue on (Sensing Emotion and Affect-Facing Realism in Speech Processing) 53 (9/10):1062–1087 Schuller B, Batliner A, Steidl S, Seppi D (2011a) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun, Special Issue on (Sensing Emotion and Affect-Facing Realism in Speech Processing) 53 (9/10):1062–1087
38.
Zurück zum Zitat Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Interspeech, Brighton, Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Interspeech, Brighton,
39.
Zurück zum Zitat Schuller B, Steidl S, Batliner A, Nöth E, Vinciarelli A, Burkhardt F, van Son R, Weninger F, Eyben F, Bocklet T, Mohammadi G, Weiss B (2012) The interspeech 2012 speaker trait challenge. In: Interspeech, Portland Schuller B, Steidl S, Batliner A, Nöth E, Vinciarelli A, Burkhardt F, van Son R, Weninger F, Eyben F, Bocklet T, Mohammadi G, Weiss B (2012) The interspeech 2012 speaker trait challenge. In: Interspeech, Portland
40.
Zurück zum Zitat Schuller B, Steidl S, Batliner A, Schiel F, Krajewski J (2011b) The interspeech 2011 speaker state challenge. In: Interspeech, Firenze Schuller B, Steidl S, Batliner A, Schiel F, Krajewski J (2011b) The interspeech 2011 speaker state challenge. In: Interspeech, Firenze
41.
Zurück zum Zitat Schuller B, Zaccarelli R, Rollet N, Devillers L (2010c) Cinemo–a french spoken language resource for complex emotions: facts and baselines. In: LREC, Valetta Schuller B, Zaccarelli R, Rollet N, Devillers L (2010c) Cinemo–a french spoken language resource for complex emotions: facts and baselines. In: LREC, Valetta
42.
Zurück zum Zitat Schuller B, Zhang Z, Weninger F, Rigoll G (2011c) Using multiple databases for training emotion recognition: to unite or to vote ? In: Interspeech, Florence Schuller B, Zhang Z, Weninger F, Rigoll G (2011c) Using multiple databases for training emotion recognition: to unite or to vote ? In: Interspeech, Florence
43.
Zurück zum Zitat Sehili M, Yang F, Leynaert V, Devillers L (2014) A corpus of social interaction between nao and elderly people. In: 5th international workshop on emotion, social signals, sentiment & linked open data (ES3LOD2014), LREC Sehili M, Yang F, Leynaert V, Devillers L (2014) A corpus of social interaction between nao and elderly people. In: 5th international workshop on emotion, social signals, sentiment & linked open data (ES3LOD2014), LREC
44.
Zurück zum Zitat Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics for human-robot interaction. In: HRI’06, Salt Lake City Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics for human-robot interaction. In: HRI’06, Salt Lake City
45.
Zurück zum Zitat Sun R, Moore EI (2013) Using rover for multiple databases training at the decision level for binary emotional recognition. In: ICASSP Sun R, Moore EI (2013) Using rover for multiple databases training at the decision level for binary emotional recognition. In: ICASSP
46.
Zurück zum Zitat Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: Interspeech, Firenze Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: Interspeech, Firenze
47.
Zurück zum Zitat Walker M, Litman D, Kamm C, Abella A (1997) Paradise: a framework for evaluating spoken dialogue agents. In: EACL ’97, Madrid Walker M, Litman D, Kamm C, Abella A (1997) Paradise: a framework for evaluating spoken dialogue agents. In: EACL ’97, Madrid
48.
Zurück zum Zitat Yagoda RE, Gillian DJ (2012) You want me to trust a robot? the development of a huma-robot interaction trust scale. Int J Soc Robot 4:235–248CrossRef Yagoda RE, Gillian DJ (2012) You want me to trust a robot? the development of a huma-robot interaction trust scale. Int J Soc Robot 4:235–248CrossRef
49.
Zurück zum Zitat Zhang Z, Weninger F, Wöllmer M, Schuller B (2011) Unsupervised learning in cross-corpus acoustic emotion recognition. In: ASRU, Honolulu Zhang Z, Weninger F, Wöllmer M, Schuller B (2011) Unsupervised learning in cross-corpus acoustic emotion recognition. In: ASRU, Honolulu
Metadaten
Titel
Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions
verfasst von
Laurence Devillers
Marie Tahon
Mohamed A. Sehili
Agnes Delaborde
Publikationsdatum
01.08.2015
Verlag
Springer Netherlands
Erschienen in
International Journal of Social Robotics / Ausgabe 4/2015
Print ISSN: 1875-4791
Elektronische ISSN: 1875-4805
DOI
https://doi.org/10.1007/s12369-015-0297-8

Weitere Artikel der Ausgabe 4/2015

International Journal of Social Robotics 4/2015 Zur Ausgabe

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.