Skip to main content
Erschienen in: Journal on Multimodal User Interfaces 1/2014

01.03.2014 | Original Paper

Analysis of significant dialog events in realistic human–computer interaction

verfasst von: Dmytro Prylipko, Dietmar Rösner, Ingo Siegert, Stephan Günther, Rafael Friesen, Matthias Haase, Bogdan Vlasenko, Andreas Wendemuth

Erschienen in: Journal on Multimodal User Interfaces | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper addresses issues of automatically detecting significant dialog events (SDEs) in naturalistic HCI, and of deducing trait-specific conclusions relevant for the design of spoken dialog systems. We perform our investigations on the multimodal LAST MINUTE corpus with records from naturalistic interactions. First, we used textual transcripts to analyse interaction styles and discourse structures. We found indications that younger subjects prefer a more technical style in communication with dialog systems. Next, we model the subject’s internal success state with a hidden Markov model trained using the observed sequences of system feedback. This reveals that younger subjects interact significantly more successful with technical systems. Aiming on automatic detection of specific subjects’s reactions, we then semi-automatically annotate SDEs—phrases indicating an irregular, i.e. not-task-oriented subject behavior. We use both acoustic and linguistic features to build several trait-specific classifiers for dialog phases, which showed pronouncedly different accuracies for diverse age and gender groups. The presented investigations coherently support age-dependence of both expressiveness and problem-solving ability. This in turn induces design rules for future automatic designated “companion” systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Here and further ‘W’ stands for wizard and ‘S’ for subject. The subject’s code is given once at his first phrase. Transcripts are given with the GAT 2 minimal coding. English glosses added as a convenience for the reader.
 
4
The following features are calculated as ratio of corresponding items to the number of words.
 
Literatur
1.
Zurück zum Zitat Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143 Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143
2.
Zurück zum Zitat Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28CrossRef Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28CrossRef
3.
Zurück zum Zitat Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9/10):341–345 Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9/10):341–345
4.
Zurück zum Zitat Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433CrossRef Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433CrossRef
5.
Zurück zum Zitat Campbell N (2007) On the use of nonverbal speech sounds in human communication. Cost 2012 workshop (Vietri), LNAI. Springer, Berlin, Heidelberg, pp 117–128 Campbell N (2007) On the use of nonverbal speech sounds in human communication. Cost 2012 workshop (Vietri), LNAI. Springer, Berlin, Heidelberg, pp 117–128
6.
Zurück zum Zitat Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1):49–66 Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1):49–66
7.
Zurück zum Zitat Cohn JF, Schmidt K (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2(2):121–132CrossRef Cohn JF, Schmidt K (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2(2):121–132CrossRef
8.
Zurück zum Zitat Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80CrossRef Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80CrossRef
9.
Zurück zum Zitat Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: Proceedings of Interspeech’05, pp 813–816 Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: Proceedings of Interspeech’05, pp 813–816
10.
Zurück zum Zitat Edlund J, Gustafson J, Heldner M, Hjalmarsson A (2008) Towards human-like spoken dialogue systems. Speech Commun 50(8):630–645CrossRef Edlund J, Gustafson J, Heldner M, Hjalmarsson A (2008) Towards human-like spoken dialogue systems. Speech Commun 50(8):630–645CrossRef
11.
Zurück zum Zitat Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M (2012) Detection and avoidance of failures in dialogues—wizard of Oz experiment operator’s manual. Pabst Science Publishers, Germany Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M (2012) Detection and avoidance of failures in dialogues—wizard of Oz experiment operator’s manual. Pabst Science Publishers, Germany
12.
Zurück zum Zitat Fukuda S, Matsuura Y (1996) Understanding of emotional feelings in sound. Trans Jpn Soc Mech Eng Part C 62(598):2293–2298CrossRef Fukuda S, Matsuura Y (1996) Understanding of emotional feelings in sound. Trans Jpn Soc Mech Eng Part C 62(598):2293–2298CrossRef
13.
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18CrossRef
14.
Zurück zum Zitat Jimenez-Fernandez A, Del Pozo F, Munoz C, Zoreda JL (1987) Pattern recognition in the vocal expression of emotional categories. In: Proceedings of the 25th annual Conference of the IEEE Engineering in Medicine and Biology Society, pp 2090–2091 Jimenez-Fernandez A, Del Pozo F, Munoz C, Zoreda JL (1987) Pattern recognition in the vocal expression of emotional categories. In: Proceedings of the 25th annual Conference of the IEEE Engineering in Medicine and Biology Society, pp 2090–2091
16.
Zurück zum Zitat Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736CrossRef Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736CrossRef
17.
Zurück zum Zitat Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450CrossRef Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450CrossRef
18.
Zurück zum Zitat Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303CrossRef Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303CrossRef
19.
Zurück zum Zitat Prylipko D, Schuller B, Wendemuth A (2012) Fine-tuning HMMs for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp 4625–4628 Prylipko D, Schuller B, Wendemuth A (2012) Fine-tuning HMMs for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp 4625–4628
20.
Zurück zum Zitat Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J (2011) Human–computer Interaction. Towards mobile and intelligent interaction environments. Intentionality in interacting with companion systems: an empirical approach. Springer, Berlin, pp 593–602 Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J (2011) Human–computer Interaction. Towards mobile and intelligent interaction environments. Intentionality in interacting with companion systems: an empirical approach. Springer, Berlin, pp 593–602
21.
Zurück zum Zitat Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M (2012) LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: Conference on Language Resources and Evaluation, LREC’12 Abstracts Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M (2012) LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: Conference on Language Resources and Evaluation, LREC’12 Abstracts
22.
Zurück zum Zitat Rösner D, Kunze M, Otto M, Frommer J (2012) Linguistic analyses of the LAST MINUTE corpus. In: Proceedings of KONVENS’12, ÖGAI, pp 145–154 Rösner D, Kunze M, Otto M, Frommer J (2012) Linguistic analyses of the LAST MINUTE corpus. In: Proceedings of KONVENS’12, ÖGAI, pp 145–154
23.
Zurück zum Zitat Scherer KR, Ceschi G (1997) Lost luggage: a field study of emotion-antecedent appraisal. Motiv Emot 21:211–235CrossRef Scherer KR, Ceschi G (1997) Lost luggage: a field study of emotion-antecedent appraisal. Motiv Emot 21:211–235CrossRef
24.
Zurück zum Zitat Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141CrossRef Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141CrossRef
25.
Zurück zum Zitat Schmidt T, Schütte W (2010) Folker: An annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of LREC’10, pp 2091–2096 Schmidt T, Schütte W (2010) Folker: An annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of LREC’10, pp 2091–2096
26.
Zurück zum Zitat Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087 Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087
27.
Zurück zum Zitat Selting M, et al (2009) Gesprächsanalytisches Transkriptionssystem 2 (GAT 2) Selting M, et al (2009) Gesprächsanalytisches Transkriptionssystem 2 (GAT 2)
28.
Zurück zum Zitat Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A (2011) Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment ,anikins. In: Proceedings of ICME’11 Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A (2011) Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment ,anikins. In: Proceedings of ICME’11
29.
Zurück zum Zitat Suwa M, Sugie N, Fujimora K (1978) A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp 408–410 Suwa M, Sugie N, Fujimora K (1978) A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp 408–410
30.
Zurück zum Zitat Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of Interspeech’11, pp 1577–1580 Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of Interspeech’11, pp 1577–1580
31.
Zurück zum Zitat Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang (Article in press) Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang (Article in press)
32.
Zurück zum Zitat Walker M, Langkilde I, Wright J, Gorin A, Litman D (2000) Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you? In: Proceedings of NAACL’00, pp 210–217 Walker M, Langkilde I, Wright J, Gorin A, Litman D (2000) Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you? In: Proceedings of NAACL’00, pp 210–217
33.
Zurück zum Zitat Wilks Y (2010) Close engagements with artificial companions: key social, psychological, ethical and design issues. John Benjamins, AmsterdamCrossRef Wilks Y (2010) Close engagements with artificial companions: key social, psychological, ethical and design issues. John Benjamins, AmsterdamCrossRef
34.
Zurück zum Zitat Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(4B):1238–1250CrossRef Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(4B):1238–1250CrossRef
35.
Zurück zum Zitat Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech’08, pp 597–600 Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech’08, pp 597–600
36.
Zurück zum Zitat Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Trans Access Comput 2(1):1–39 Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Trans Access Comput 2(1):1–39
37.
Zurück zum Zitat Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge
38.
Zurück zum Zitat Zeng Z, Tu J, Liu M, Huang T, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimed 9(2):424–428CrossRef Zeng Z, Tu J, Liu M, Huang T, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimed 9(2):424–428CrossRef
39.
Zurück zum Zitat Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58CrossRef Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58CrossRef
Metadaten
Titel
Analysis of significant dialog events in realistic human–computer interaction
verfasst von
Dmytro Prylipko
Dietmar Rösner
Ingo Siegert
Stephan Günther
Rafael Friesen
Matthias Haase
Bogdan Vlasenko
Andreas Wendemuth
Publikationsdatum
01.03.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Journal on Multimodal User Interfaces / Ausgabe 1/2014
Print ISSN: 1783-7677
Elektronische ISSN: 1783-8738
DOI
https://doi.org/10.1007/s12193-013-0144-x

Weitere Artikel der Ausgabe 1/2014

Journal on Multimodal User Interfaces 1/2014 Zur Ausgabe

Premium Partner