Skip to main content

2019 | OriginalPaper | Buchkapitel

Toward RNN Based Micro Non-verbal Behavior Generation for Virtual Listener Agents

verfasst von : Hung-Hsuan Huang, Masato Fukuda, Toyoaki Nishida

Erschienen in: Social Computing and Social Media. Design, Human Behavior and Analytics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work aims to develop a model to generate fine grained and reactive non-verbal idling behaviors of a virtual listener agent when a human user is talking to it. The target micro behaviors are facial expressions, head movements, and postures. The following two research questions then emerge. Whether these behaviors can be trained from the corresponding ones from the user’s behaviors? If the answer is true, what kind of learning model can get high precision? We explored the use of two recurrent neural network (RNN) models (Gated Recurrent Unit, GRU and Long Short-term Memory, LSTM) to learn these behaviors from a human-human data corpus of active listening conversation. The data corpus contains 16 elderly-speaker/young-listener sessions and was collected by ourselves. The results show that this task can be achieved to some degree even with the baseline multi-layer perceptron models. Also, GRU showed best performance among the three compared structures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Chen, M., Wang, S., Liang, P.P., Baltrusaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: 19th ACM International Conference on Multimodal Interaction (ICMI 2017), Glasgow, UK, November 2017 Chen, M., Wang, S., Liang, P.P., Baltrusaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: 19th ACM International Conference on Multimodal Interaction (ICMI 2017), Glasgow, UK, November 2017
5.
Zurück zum Zitat Hasegawa, D., Kaneko, N., Shirakawa, S., Sakuta, H., Sumi, K.: Evaluation of speech-to-gesture generation using bi-directional LSTM network. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA 2018), Sydney, Australia, pp. 79–86, November 2018 Hasegawa, D., Kaneko, N., Shirakawa, S., Sakuta, H., Sumi, K.: Evaluation of speech-to-gesture generation using bi-directional LSTM network. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA 2018), Sydney, Australia, pp. 79–86, November 2018
6.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
7.
Zurück zum Zitat Huang, H.H., Fukuda, M., van der Struijk, S., Nishida, T.: Integration of DNN generated spontaneous reactions with a generic multimodal framework for embodied conversational agents. In: 6th International Conference on Human-Agent Interaction (HAI 2018), Southampton, UK, December 2018 Huang, H.H., Fukuda, M., van der Struijk, S., Nishida, T.: Integration of DNN generated spontaneous reactions with a generic multimodal framework for embodied conversational agents. In: 6th International Conference on Human-Agent Interaction (HAI 2018), Southampton, UK, December 2018
8.
Zurück zum Zitat Huang, H.H., et al.: Toward a memory assistant companion for the individuals with mild memory impairment. In: 11th IEEE International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC 2012), Kyoto, pp. 295–299, August 2012 Huang, H.H., et al.: Toward a memory assistant companion for the individuals with mild memory impairment. In: 11th IEEE International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC 2012), Kyoto, pp. 295–299, August 2012
10.
Zurück zum Zitat Huang, Y., Khan, S.M.: DyadGAN: generating facial expressions in dyadic interactions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, USA, pp. 11–18, July 2017 Huang, Y., Khan, S.M.: DyadGAN: generating facial expressions in dyadic interactions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, USA, pp. 11–18, July 2017
11.
Zurück zum Zitat Lausberg, H., Sloetjes, H.: Coding gestural behavior with the NEUROGES-ELAN system. Behav. Res. Methods 41(3), 841–849 (2009)CrossRef Lausberg, H., Sloetjes, H.: Coding gestural behavior with the NEUROGES-ELAN system. Behav. Res. Methods 41(3), 841–849 (2009)CrossRef
12.
Zurück zum Zitat Otsuka, K., Kasuga, K., Kohler, M.: Estimating visual focus of attention in multiparty meetings using deep convolutional neural networks. In: 20th ACM International Conference on Multimodal Interaction (ICMI 2018), Boulder, USA, pp. 191–199, October 2018 Otsuka, K., Kasuga, K., Kohler, M.: Estimating visual focus of attention in multiparty meetings using deep convolutional neural networks. In: 20th ACM International Conference on Multimodal Interaction (ICMI 2018), Boulder, USA, pp. 191–199, October 2018
13.
Zurück zum Zitat Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: 10th Annual Conference of the International Speech Communication Association (INTERSPEECH 2009), Brighton, United Kingdom, September 2009 Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: 10th Annual Conference of the International Speech Communication Association (INTERSPEECH 2009), Brighton, United Kingdom, September 2009
14.
Zurück zum Zitat Tickle-Degnen, L., Rosenthal, R.: The nature of rapport and its nonverbal correlates. Psychol. Inq. 1(4), 285–293 (1990)CrossRef Tickle-Degnen, L., Rosenthal, R.: The nature of rapport and its nonverbal correlates. Psychol. Inq. 1(4), 285–293 (1990)CrossRef
15.
Zurück zum Zitat Wu, J., Ghosh, S., Chollet, M., Ly, S., Mozgai, S., Scherer, S.: NADiA: neural network driven virtual human conversation agents. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA 2018), Sydney, Australia, pp. 173–178, November 2018 Wu, J., Ghosh, S., Chollet, M., Ly, S., Mozgai, S., Scherer, S.: NADiA: neural network driven virtual human conversation agents. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA 2018), Sydney, Australia, pp. 173–178, November 2018
Metadaten
Titel
Toward RNN Based Micro Non-verbal Behavior Generation for Virtual Listener Agents
verfasst von
Hung-Hsuan Huang
Masato Fukuda
Toyoaki Nishida
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-21902-4_5