Skip to main content

2018 | OriginalPaper | Buchkapitel

5. Formant-Based Lip Motion Generation and Evaluation in Humanoid Robots

verfasst von : Carlos T. Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita

Erschienen in: Geminoid Studies

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Generating natural motion in robots is important for improving human–robot interaction. We have developed a teleoperation system in which the lip motion of a remote humanoid robot is automatically controlled by the operator’s voice. In the present work, we introduce an improved version of our proposed speech-driven lip motion generation method, where lip height and width degrees are estimated based on vowel formant information. The method requires the calibration of only one parameter for speaker normalization. Lip height control is evaluated in two types of humanoid robots (Telenoid-R2 and Geminoid-F). Subjective evaluations indicate that the proposed audio-based method can generate lip motion with superior naturalness to vision-based and motion capture-based approaches. Partial lip width control is shown to improve lip motion naturalness in Geminoid-F, which also has an actuator for stretching the lip corners. Issues regarding online real-time processing are also discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ishi, C., C. Liu, H. Ishiguro, and Hagita, N. 2012. Evaluation of formant-based lip motion generation in tele-operated humanoid robots. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS 2012), 2377–2382. Ishi, C., C. Liu, H. Ishiguro, and Hagita, N. 2012. Evaluation of formant-based lip motion generation in tele-operated humanoid robots. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS 2012), 2377–2382.
2.
Zurück zum Zitat Cohen, M., and D. Massaro. 1993. Modeling coarticulation in synthetic visual speech. In Models and techniques in computer animation.CrossRef Cohen, M., and D. Massaro. 1993. Modeling coarticulation in synthetic visual speech. In Models and techniques in computer animation.CrossRef
3.
Zurück zum Zitat Tamura, M., S. Kondo, T. Masuko, and T. Kobayashi. 1998. Text-to-visual speech synthesis based on parameter generation from HMM. In Proceedings of ICASSP98, 3745–3748. Tamura, M., S. Kondo, T. Masuko, and T. Kobayashi. 1998. Text-to-visual speech synthesis based on parameter generation from HMM. In Proceedings of ICASSP98, 3745–3748.
4.
Zurück zum Zitat Hong, P., Z. Wen, and T. Huang. 2002. Real-time speech-driven face animation with expressions using neural networks. IEEE Transactions on Neural Networks 13 (4): 916–927.CrossRef Hong, P., Z. Wen, and T. Huang. 2002. Real-time speech-driven face animation with expressions using neural networks. IEEE Transactions on Neural Networks 13 (4): 916–927.CrossRef
5.
Zurück zum Zitat Beskow, J., and M. Nordenberg, 2005. Data-driven synthesis of expressive visual speech using an MPEG-4 talking head. In Proceedings of interspeech 2005, 793–796. Beskow, J., and M. Nordenberg, 2005. Data-driven synthesis of expressive visual speech using an MPEG-4 talking head. In Proceedings of interspeech 2005, 793–796.
6.
Zurück zum Zitat Hofer, G., J. Yamagishi, and H. Shimodaira. 2008. Speech-driven lip motion generation with a trajectory HMM. In Proceedings of the interspeech 2008, 2314–2317. Hofer, G., J. Yamagishi, and H. Shimodaira. 2008. Speech-driven lip motion generation with a trajectory HMM. In Proceedings of the interspeech 2008, 2314–2317.
7.
Zurück zum Zitat Salvi, G. 2006. Dynamic behaviour of connectionist speech recognition with strong latency constraints. Speech Communication 48 (7): 802–818.CrossRef Salvi, G. 2006. Dynamic behaviour of connectionist speech recognition with strong latency constraints. Speech Communication 48 (7): 802–818.CrossRef
8.
Zurück zum Zitat Takacs, G. 2009. Direct, modular and hybrid audio to visual speech conversion methods—a comparative study. In Proceedings of the Interspeech09, 2267–2270. Takacs, G. 2009. Direct, modular and hybrid audio to visual speech conversion methods—a comparative study. In Proceedings of the Interspeech09, 2267–2270.
9.
Zurück zum Zitat Hofer, G., and K. Richmond. 2010. Comparison of HMM and TMDN methods for lip synchronization. In Proceedings of the Interspeech 2010, 454–457. Hofer, G., and K. Richmond. 2010. Comparison of HMM and TMDN methods for lip synchronization. In Proceedings of the Interspeech 2010, 454–457.
10.
Zurück zum Zitat Zhuang, X., et al. 2010. A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. In Proceedings of interspeech 2010, 1726–1739. Zhuang, X., et al. 2010. A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. In Proceedings of interspeech 2010, 1726–1739.
11.
Zurück zum Zitat Wu, J., et al. 2008. Statistical correlation analysis between lip contour parameters and formant parameters for Mandarin monophthongs. In Proceedings of the AVSP2008, 121–126. Wu, J., et al. 2008. Statistical correlation analysis between lip contour parameters and formant parameters for Mandarin monophthongs. In Proceedings of the AVSP2008, 121–126.
12.
Zurück zum Zitat Ishi, C., C. Liu, H. Ishiguro, and N. Hagita. 2011. Speech-driven lip motion generation for tele-operated humanoid robots. In Proceedings of the Auditory-Visual Speech Processing, 2011 (AVSP2011), 131–135. Ishi, C., C. Liu, H. Ishiguro, and N. Hagita. 2011. Speech-driven lip motion generation for tele-operated humanoid robots. In Proceedings of the Auditory-Visual Speech Processing, 2011 (AVSP2011), 131–135.
13.
Zurück zum Zitat Markel, J.D., and A.H. Gray. 1976. Linear prediction of speech. Berlin, Heidelberg, New York: Springer.CrossRef Markel, J.D., and A.H. Gray. 1976. Linear prediction of speech. Berlin, Heidelberg, New York: Springer.CrossRef
14.
Zurück zum Zitat Titze, I.R. 1994. Principles of voice production, 136–168. NJ: Prentice Hall. Titze, I.R. 1994. Principles of voice production, 136–168. NJ: Prentice Hall.
Metadaten
Titel
Formant-Based Lip Motion Generation and Evaluation in Humanoid Robots
verfasst von
Carlos T. Ishi
Chaoran Liu
Hiroshi Ishiguro
Norihiro Hagita
Copyright-Jahr
2018
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8702-8_5

Neuer Inhalt