nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human-Computer Conversations

verfasst von : Oleg Akhtiamov, Vasily Palkov

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The present research is focused on multimodal addressee detection in human-human-computer conversations. A modern spoken dialogue system operating under realistic conditions that may include multiparty interaction (several people solve a cooperative task by addressing the system while talking to each other) is supposed to distinguish machine- from human-addressed utterances. Machine-addressed queries should be directly responded to, while human-addressed utterances should be either ignored or processed in an implicit way. We propose a multimodal system performing the visual, acoustic-prosodic, and textual analysis of users’ utterances. We managed to outperform the existing baseline for the Smart Video Corpus by applying our system. We also investigated the performance of different models for separate speech categories with various speech spontaneity and determined that the acoustical model has difficulties in classifying constrained speech, and the textual model performs worse for spontaneous speech, while the performance of the visual model drops for read human-addressed speech and for spontaneous human-addressed speech significantly due to the ambiguous behaviour of users.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis

Spirina, A., Minker, W., Sidorov, M.: Could emotions be beneficial for interaction quality modelling in human-human conversations? In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 447–455. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_50CrossRef

Batliner, A., Hacker, C., Noeth, E.: To talk or not to talk with a computer. J. Multimodal User Interfaces 2(3), 171–186 (2008)CrossRef

Maglio, P.P., Matlock, T., Campbell, C.S., Zhai, S., Smith, B.A.: Gaze and speech in attentive user interfaces. In: Tan, T., Shi, Y., Gao, W. (eds.) ICMI 2000. LNCS, vol. 1948, pp. 1–7. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-40063-X_1CrossRef

Lee, M.K., Kiesler, S., Forlizzi, J.: Receptionist or information kiosk: how do people talk with a robot? In: Proceedings of ACM Conference on Computer-Supported Cooperative Work, pp. 31–40 (2010)

Schuller, B., et al.: The INTERSPEECH 2017 computational paralinguistics challenge: addressee, cold & snoring. In: Proceedings of Interspeech, Stockholm (2017)

Ouchi, H., Tsuboi, Y.: Addressee and response selection for multi-party conversation. In: Proceedings of EMNLP, Austin, pp. 2133–2143 (2016)

Akhtiamov, O., Sidorov, M., Karpov, A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: Proceedings of Interspeech, Stockholm, pp. 2521–2525 (2017)

Ishii, R., Shiro, K., Kazuhiro, O.: Prediction of next-utterance timing using head movement in multi-party meetings. In: Proceedings of the 5th International Conference on Human Agent Interaction. ACM (2017)

Skantze, G., Gustafson, J.: Attention and interaction control in a human-human-computer dialogue setting. In: Proceedings of SIGDIAL. Association for Computational Linguistics (2009)

10.

Shriberg, E., Stolcke, A., Ravuri, S.: Addressee detection for dialog systems using temporal and spectral dimensions of speaking style. In: Proceedings of Interspeech (2013)

11.

Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Proceedings of Interspeech, pp. 135–139 (2015)

12.

Tsai, T.J., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Trans. Multimed. 17(9), 1550–1561 (2015)CrossRef

13.

Akhtiamov, O., Sergienko, R., Minker, W.: An approach to off-talk detection based on text classification within an automatic spoken dialogue system. In: Proceedings of ICINCO, Lisbon, vol. 2, pp. 288–293 (2016)

14.

Akhtiamov, O., Ubskii, D., Feldina, E., Pugachev, A., Karpov, A., Minker, W.: Are you addressing me? Multimodal addressee detection in human-human-computer conversations. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 152–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_14CrossRef

15.

Pugachev, A., Akhtiamov, O., Karpov, A., Minker, W.: Deep learning for acoustic addressee detection in spoken dialogue systems. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2017. CCIS, vol. 789, pp. 45–53. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-71746-3_4CrossRef

16.

Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRef

17.

Schuller, B., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of Interspeech, Lyon (2013)

18.

Schuller, B., et al: The INTERSPEECH 2016 computational paralinguistics challenge: deception, sincerity & native language. In: Proceedings of Interspeech (2016)

19.

Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, Doha, vol. 14, pp. 1532–1543 (2014)

20.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. J. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

21.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. ML Res. 15(1), 1929–1958 (2014)MathSciNetMATH

22.

Noth, E., Hacker, C., Batliner, A.: Does multimodality really help? The classification of emotion and of on/off-focus in multimodal dialogues. In: ELMAR. IEEE (2007)

Titel: Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human-Computer Conversations
verfasst von: Oleg Akhtiamov
Vasily Palkov
Verlag: Springer International Publishing
Buch: Speech and Computer
Print ISBN: 978-3-319-99578-6

Electronic ISBN: 978-3-319-99579-3

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-99579-3_1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"