Skip to main content

2017 | OriginalPaper | Buchkapitel

Speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting

verfasst von : Jun Yu

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a speech synchronized tongue animation system from text or speech. Firstly, an anatomically accurate physiological tongue model is built, and then produces tremendous tongue deformation samples according to the randomly input muscle activation samples. Secondly, these input and output samples are used to train a neural network for establishing the relationship between the muscle activation and tongue contour deformation. Thirdly, the neural network is used to estimate the non-rigid tongue movement parameters, namely tongue muscle activations, from a collected X-ray tongue movement image database of Mandarin Chinese phonemes after removing the rigid tongue movement, and then the estimation results are used for constructing the tongue physeme (the sequences of the tongue muscle activations and the rigid movement) database corresponding to the Mandarin Chinese phoneme database. Finally, the physemes corresponding to the phonemes extracted from input text or speech are blended to drive the physiological tongue model for producing the speech synchronized tongue animation according to the durations of phonemes. Simulation results demonstrate that the synthesized tongue animations are visually realistic and approximate the tongue medical data well.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Parke, F.I.: Computer generated animation of faces. In: Proceedings ACM National Conference, pp. 451–457. ACM: New York (1972) Parke, F.I.: Computer generated animation of faces. In: Proceedings ACM National Conference, pp. 451–457. ACM: New York (1972)
2.
Zurück zum Zitat Waters, K.: A muscle model for animating three dimensional facial expression. In: Stone, M.C. (ed.) Computer Graphics, vol. 21, pp. 17–24. Anaheim, CA (1987) Waters, K.: A muscle model for animating three dimensional facial expression. In: Stone, M.C. (ed.) Computer Graphics, vol. 21, pp. 17–24. Anaheim, CA (1987)
3.
Zurück zum Zitat Sanguineti, V., Laboissiere, R., Payan, Y.: A control model of human tongue movements in speech. Biol. Cybern. 77(1), 11–22 (1997)CrossRefMATH Sanguineti, V., Laboissiere, R., Payan, Y.: A control model of human tongue movements in speech. Biol. Cybern. 77(1), 11–22 (1997)CrossRefMATH
4.
Zurück zum Zitat Fujita, S., Dang, J., Suzuki, N., et al.: A computational tongue model and its clinical application. Oral Sci. Int. 4(2), 97–109 (2007)CrossRef Fujita, S., Dang, J., Suzuki, N., et al.: A computational tongue model and its clinical application. Oral Sci. Int. 4(2), 97–109 (2007)CrossRef
5.
Zurück zum Zitat Modeling coarticulation in synthetic visual speech Modeling coarticulation in synthetic visual speech
6.
Zurück zum Zitat Badin, P., Bailly, G., et al.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phonetics 30(3), 533–553 (2002)CrossRef Badin, P., Bailly, G., et al.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phonetics 30(3), 533–553 (2002)CrossRef
7.
Zurück zum Zitat Engwall, O.: A 3D tongue model based on MRI data. In: INTERSPEECH, pp. 901–904 (2000) Engwall, O.: A 3D tongue model based on MRI data. In: INTERSPEECH, pp. 901–904 (2000)
8.
Zurück zum Zitat Wilhelms-Tricarico, R.: Physiological modeling of speech production: methods for modeling soft -tissue articulators. JASA 97(5), 3085–3098 (1995)CrossRef Wilhelms-Tricarico, R.: Physiological modeling of speech production: methods for modeling soft -tissue articulators. JASA 97(5), 3085–3098 (1995)CrossRef
9.
Zurück zum Zitat King, S.A., Parent, R.E.: A 3D parametric tongue model for animated speech. J. Vis. Comput. Anim. 12(3), 107–115 (2001)CrossRefMATH King, S.A., Parent, R.E.: A 3D parametric tongue model for animated speech. J. Vis. Comput. Anim. 12(3), 107–115 (2001)CrossRefMATH
10.
Zurück zum Zitat Ilie, M.D., Negrescu, C., Stanomir, D.: An efficient parametric model for real-time 3D tongue skeletal animation. In: ICC, pp. 129–132 (2012) Ilie, M.D., Negrescu, C., Stanomir, D.: An efficient parametric model for real-time 3D tongue skeletal animation. In: ICC, pp. 129–132 (2012)
11.
Zurück zum Zitat Engwall, O., Combining, M.R.I.: EMA and EPG measurements in a three-dimensional tongue model. Speech Commun. 41(2), 303–329 (2003)CrossRef Engwall, O., Combining, M.R.I.: EMA and EPG measurements in a three-dimensional tongue model. Speech Commun. 41(2), 303–329 (2003)CrossRef
12.
Zurück zum Zitat Miyawaki, K.: A study of the musculature of the human tongue. Annu. Bull. Res. Inst. Logopedics Phoniatrics 8, 23–50 (1974) Miyawaki, K.: A study of the musculature of the human tongue. Annu. Bull. Res. Inst. Logopedics Phoniatrics 8, 23–50 (1974)
13.
Zurück zum Zitat Agur, A.M.R., et al.: Grant’s Atlas of Anatomy. Lippincott Williams & Wilkins, Baltimore (2009) Agur, A.M.R., et al.: Grant’s Atlas of Anatomy. Lippincott Williams & Wilkins, Baltimore (2009)
14.
Zurück zum Zitat Mac Neilage, P.F., Sholes, G.N.: An electromyographic study of the tongue during vowel production. J. Speech Lang. Hear. Res. 7(3), 209–232 (1964)CrossRef Mac Neilage, P.F., Sholes, G.N.: An electromyographic study of the tongue during vowel production. J. Speech Lang. Hear. Res. 7(3), 209–232 (1964)CrossRef
15.
Zurück zum Zitat Shewchuk, J.R.: Constrained Delaunay Tetrahedronlizations and provably good boundary recovery. In: IMR, pp. 193–204 (2002) Shewchuk, J.R.: Constrained Delaunay Tetrahedronlizations and provably good boundary recovery. In: IMR, pp. 193–204 (2002)
16.
Zurück zum Zitat Takemoto, H.: Morphological analyses of the human tongue musculature for three-dimensional modeling. JSLHR 44(1), 95–107 (2001) Takemoto, H.: Morphological analyses of the human tongue musculature for three-dimensional modeling. JSLHR 44(1), 95–107 (2001)
17.
Zurück zum Zitat Weiss, J.A., Maker, B.N., Govindjee, S.: Finite element implementation of incompressible, transversely isotropic hyperelasticity. CMAME 135(1), 107–128 (1996)MATH Weiss, J.A., Maker, B.N., Govindjee, S.: Finite element implementation of incompressible, transversely isotropic hyperelasticity. CMAME 135(1), 107–128 (1996)MATH
18.
Zurück zum Zitat Sifakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion capture marker data. TOG ACM 24(3), 417–425 (2005)CrossRef Sifakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion capture marker data. TOG ACM 24(3), 417–425 (2005)CrossRef
19.
Zurück zum Zitat Simo, J.C., Taylor, R.L.: Quasi-incompressible finite elasticity in principal stretches. Continuum Basis Numer. Algorithms CMAME 85(3), 273–310 (1991)MATH Simo, J.C., Taylor, R.L.: Quasi-incompressible finite elasticity in principal stretches. Continuum Basis Numer. Algorithms CMAME 85(3), 273–310 (1991)MATH
20.
Zurück zum Zitat Tang, C.Y., et al.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behaviour. J. Biomech. 42(7), 865–872 (2009)CrossRef Tang, C.Y., et al.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behaviour. J. Biomech. 42(7), 865–872 (2009)CrossRef
21.
Zurück zum Zitat Baer, T., Alfonso, P.J., Honda, K.: Electromyography of the tongue muscles during vowels in /gpvp/ environment. Ann Bull RILP 22, 7–19 (1988) Baer, T., Alfonso, P.J., Honda, K.: Electromyography of the tongue muscles during vowels in /gpvp/ environment. Ann Bull RILP 22, 7–19 (1988)
22.
Zurück zum Zitat Agur A M R, et al., Grant’s atlas of anatomy. Lippincott Williams & Wilkins, 2009 Agur A M R, et al., Grant’s atlas of anatomy. Lippincott Williams & Wilkins, 2009
23.
Zurück zum Zitat Cootes, T.F., et al.: Active appearance models. TPAMI 23(6), 681–685 (2001)CrossRef Cootes, T.F., et al.: Active appearance models. TPAMI 23(6), 681–685 (2001)CrossRef
24.
Zurück zum Zitat Laprie, Y., Berger, M.O.: Extraction of tongue contours in x-ray images with minimal user interaction. ICSLP 1, 268–271 (1996) Laprie, Y., Berger, M.O.: Extraction of tongue contours in x-ray images with minimal user interaction. ICSLP 1, 268–271 (1996)
25.
Zurück zum Zitat Deng, Z., Chiang, P.Y., Fox, P. et al.: Animating blendshape faces by cross-mapping motion capture data. Interactive 3D graphics and games, pp. 43–48. ACM (2006) Deng, Z., Chiang, P.Y., Fox, P. et al.: Animating blendshape faces by cross-mapping motion capture data. Interactive 3D graphics and games, pp. 43–48. ACM (2006)
26.
Zurück zum Zitat Sock, R., Hirsch, F., Laprie, Y. et al.: An X-ray database, tools and procedures for the study of speech production. In: ISSP, pp. 41–48 (2011) Sock, R., Hirsch, F., Laprie, Y. et al.: An X-ray database, tools and procedures for the study of speech production. In: ISSP, pp. 41–48 (2011)
27.
Zurück zum Zitat Yu, J., Li, A.: 3D visual pronunciation of Mandarine Chinese for language learning. In: IEEE International Conference on Image Processing, pp. 2036–2040 (2014) Yu, J., Li, A.: 3D visual pronunciation of Mandarine Chinese for language learning. In: IEEE International Conference on Image Processing, pp. 2036–2040 (2014)
Metadaten
Titel
Speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting
verfasst von
Jun Yu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-51811-4_59

Neuer Inhalt