Skip to main content
Top

2021 | OriginalPaper | Chapter

Automatic Head-Nod Generation Using Utterance Text Considering Personality Traits

Authors : Ryo Ishii, Taichi Katayama, Ryuichiro Higashinaka, Junji Tomita

Published in: Increasing Naturalness and Flexibility in Spoken Dialogue Interaction

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a model for generating head nods from an utterance text considering personality traits. We have been investigating the automatic generation of body motion, such as nodding, from utterance text in dialog agent systems. Human body motion varies greatly depending on personality. Therefore, it is important to appropriately generate body motion according to the personality of the dialog agent. To construct our model, we first compiled a Japanese corpus of 24 dialogues including utterance, nod information, and personality traits (Big Five) of participants. Our nod-generation model also estimates the presence, frequency, and depth during each phrase by using various types of language information extracted from utterance text and personality traits. We evaluated how well the model can generate and estimate nods based on individual personality traits. The results indicate that our model using language information and personality trails outperformed a model using only language information.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Amos B, Ludwiczuk B, Satyanarayanan M (2016) Openface: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science Amos B, Ludwiczuk B, Satyanarayanan M (2016) Openface: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science
2.
go back to reference Beskow J, Granstrom B, House D (2006) Visual correlates to prominence in several expressive modes. In: INTERSPEECH Beskow J, Granstrom B, House D (2006) Visual correlates to prominence in several expressive modes. In: INTERSPEECH
3.
go back to reference BirdWhistell RL (1970) Kinesics and context. University of Pennsylvania Press BirdWhistell RL (1970) Kinesics and context. University of Pennsylvania Press
4.
go back to reference Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007) Rigid head motion in expressive speech animation: analysis and synthesis. In: IEEE transactions on audio, speech, and language processing, pp 1075–1086 Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007) Rigid head motion in expressive speech animation: analysis and synthesis. In: IEEE transactions on audio, speech, and language processing, pp 1075–1086
5.
go back to reference Fuchi T, Takagi S (1998) Japanese morphological analyzer using word cooccurrence -jtag. In: International conference on computational linguistics, pp 409–413 Fuchi T, Takagi S (1998) Japanese morphological analyzer using word cooccurrence -jtag. In: International conference on computational linguistics, pp 409–413
6.
go back to reference Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: IEEE international conference on automatic face and gesture recognition, pp 381–386 Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: IEEE international conference on automatic face and gesture recognition, pp 381–386
7.
go back to reference Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: International conference on computational linguistics, pp 928–939 Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: International conference on computational linguistics, pp 928–939
8.
go back to reference Ishi CT, Haas J, Wilbers FP, Ishiguro H, Hagita N (2007) Analysis of head motions and speech, and head motion control in an android. In: IEEE/RSJ international conference on intelligent robots and systems, pp 548–553 Ishi CT, Haas J, Wilbers FP, Ishiguro H, Hagita N (2007) Analysis of head motions and speech, and head motion control in an android. In: IEEE/RSJ international conference on intelligent robots and systems, pp 548–553
9.
go back to reference Ishi CT, Ishiguro H, Hagita N (2010) Head motion during dialogue speech and nod timing control in humanoid robots. In: ACM/IEEE international conference on human-robot interaction, pp 293–300 Ishi CT, Ishiguro H, Hagita N (2010) Head motion during dialogue speech and nod timing control in humanoid robots. In: ACM/IEEE international conference on human-robot interaction, pp 293–300
10.
go back to reference Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Automatic generation of head nods using utterance texts. In: 2018 27th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 1143–1149 Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Automatic generation of head nods using utterance texts. In: 2018 27th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 1143–1149
11.
go back to reference Ishii R, Higashinaka R, Nishida K, Katayama T, Kobayashi N, Tomita J (2018) Automatically generating head nods with linguistic information. In: Meiselwitz G (ed) Social computing and social media. Springer International Publishing, Cham, Technologies and analytics, pp 383–391 Ishii R, Higashinaka R, Nishida K, Katayama T, Kobayashi N, Tomita J (2018) Automatically generating head nods with linguistic information. In: Meiselwitz G (ed) Social computing and social media. Springer International Publishing, Cham, Technologies and analytics, pp 383–391
12.
go back to reference Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Automatic generation system of virtual agent’s motion using natural language. In: Proceedings of the 18th international conference on intelligent virtual agents, IVA ’18, New York, NY, USA, 2018. ACM, pp 357–358 Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Automatic generation system of virtual agent’s motion using natural language. In: Proceedings of the 18th international conference on intelligent virtual agents, IVA ’18, New York, NY, USA, 2018. ACM, pp 357–358
13.
go back to reference Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Generating body motions using spoken language in dialogue. In: Intelligent virtual agents (IVA’18) Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Generating body motions using spoken language in dialogue. In: Intelligent virtual agents (IVA’18)
14.
go back to reference Iwano Y, Kageyama S, Morikawa E, Nakazato S, Shirai K (1996) Analysis of head movements and its role in spoken dialogue. In: International conference on spoken language, pp 2167–2170 Iwano Y, Kageyama S, Morikawa E, Nakazato S, Shirai K (1996) Analysis of head movements and its role in spoken dialogue. In: International conference on spoken language, pp 2167–2170
15.
go back to reference Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception 15(2):133–137 Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception 15(2):133–137
16.
go back to reference Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An analysis of turn-taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. Lang Speech 41:295–321CrossRef Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An analysis of turn-taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. Lang Speech 41:295–321CrossRef
17.
go back to reference Lohse M, Rothuis R, Gallego-Pérez J, Karreman DE, Evers V (2014) Robot gestures make difficult tasks easier: the impact of gestures on perceived workload and task performance. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, New York, NY, USA, 2014. ACM, pp 1459–1466 Lohse M, Rothuis R, Gallego-Pérez J, Karreman DE, Evers V (2014) Robot gestures make difficult tasks easier: the impact of gestures on perceived workload and task performance. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, New York, NY, USA, 2014. ACM, pp 1459–1466
18.
go back to reference McBreen HM, Jack MA (2001) Evaluating humanoid synthetic agents in e-retail applications. IEEE Trans Syst, Man, Cybern - Part A: Syst Humans 31:5CrossRef McBreen HM, Jack MA (2001) Evaluating humanoid synthetic agents in e-retail applications. IEEE Trans Syst, Man, Cybern - Part A: Syst Humans 31:5CrossRef
19.
go back to reference Meguro T, Higashinaka R, Minami Y, Dohsaka K (2010) Controlling listening-oriented dialogue using partially observable markov decision processes. In: International conference on computational linguistics, pp 761–769 Meguro T, Higashinaka R, Minami Y, Dohsaka K (2010) Controlling listening-oriented dialogue using partially observable markov decision processes. In: International conference on computational linguistics, pp 761–769
20.
go back to reference Quinlan JR (1996) Improved use of continuous attributes in c4.5. J Artif Intell Res 4:77–90CrossRef Quinlan JR (1996) Improved use of continuous attributes in c4.5. J Artif Intell Res 4:77–90CrossRef
21.
go back to reference Watanabe T, Danbara R, Okubo M (2003) Effects of a speech-driven embodied interactive actor interactor on talker’s speech characteristics. In: IEEE international workshop on robot-human interactive communication, pp 211–216 Watanabe T, Danbara R, Okubo M (2003) Effects of a speech-driven embodied interactive actor interactor on talker’s speech characteristics. In: IEEE international workshop on robot-human interactive communication, pp 211–216
22.
go back to reference Wittenburg P, Brugman H, Russel A, Klassmann A, Sloetjes H (2006) Elan a professional framework for multimodality research. In: International conference on language resources and evaluation Wittenburg P, Brugman H, Russel A, Klassmann A, Sloetjes H (2006) Elan a professional framework for multimodality research. In: International conference on language resources and evaluation
23.
go back to reference Yehia HC, Kuratate T, Vatikiotis-Bateson E (2002) Linking facial animation, head motion and speech acoustics 30(3):555–568 Yehia HC, Kuratate T, Vatikiotis-Bateson E (2002) Linking facial animation, head motion and speech acoustics 30(3):555–568
Metadata
Title
Automatic Head-Nod Generation Using Utterance Text Considering Personality Traits
Authors
Ryo Ishii
Taichi Katayama
Ryuichiro Higashinaka
Junji Tomita
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-9323-9_26