Skip to main content

2017 | OriginalPaper | Buchkapitel

A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

verfasst von : Jun Yu

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A facial animation system is proposed for visual singing synthesis. With a reconstructed 3D head mesh model, both finite element method and anatomical model are used to simulate articulatory deformation corresponding to each phoneme with musical note. Based on an articulatory song corpus, articulatory movements, phonemes and musical notes are trained simultaneously to obtain the visual co-articulation model by a context-dependent Hidden Markov Model. Articulatory animations corresponding to all phonemes are concatenated by visual co-articulation model to produce the song synchronized articulatory animation. Experimental results demonstrate the system can synthesize realistic song synchronized articulatory animation for increasing the human computer interaction capability objectively and subjectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Taylor, S.L., Mahler, M., Theobald, B.J., Matthews, I.: Dynamic units of visual speech. In: ACM/Eurographics Symposium on Computer Animation, pp. 245–250 (2012) Taylor, S.L., Mahler, M., Theobald, B.J., Matthews, I.: Dynamic units of visual speech. In: ACM/Eurographics Symposium on Computer Animation, pp. 245–250 (2012)
2.
Zurück zum Zitat Wang, L., Chen, H., Li, S., Meng, H.M.: Phoneme-level articulatory animation in pronunciation training. Speech Commun. 54(7), 845–856 (2012)CrossRef Wang, L., Chen, H., Li, S., Meng, H.M.: Phoneme-level articulatory animation in pronunciation training. Speech Commun. 54(7), 845–856 (2012)CrossRef
3.
Zurück zum Zitat Badin, P., Elisei, F., Bailly, G., Tarabalka, Y.: An audiovisual talking head for augmented speech generation: models and animations based on a real speaker’s articulatory data. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2008. LNCS, vol. 5098, pp. 132–143. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70517-8_14 CrossRef Badin, P., Elisei, F., Bailly, G., Tarabalka, Y.: An audiovisual talking head for augmented speech generation: models and animations based on a real speaker’s articulatory data. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2008. LNCS, vol. 5098, pp. 132–143. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-70517-8_​14 CrossRef
4.
Zurück zum Zitat Fagel, S., Clemens, C.: An articulation model for audio-visual speech synthesis - determination, adjustment, evaluation. Speech Commun. 44(1–4), 141–154 (2004)CrossRef Fagel, S., Clemens, C.: An articulation model for audio-visual speech synthesis - determination, adjustment, evaluation. Speech Commun. 44(1–4), 141–154 (2004)CrossRef
5.
Zurück zum Zitat Deng, A., Neumann, U.: Expressive speech animation synthesis with phoneme-level controls. Comput. Graph. 27(8), 2096–2113 (2008) Deng, A., Neumann, U.: Expressive speech animation synthesis with phoneme-level controls. Comput. Graph. 27(8), 2096–2113 (2008)
6.
Zurück zum Zitat Ma, J., Cole, R., Pellom, W., Ward, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Comput. Animat. Virtual Worlds 15, 485–500 (2004)CrossRef Ma, J., Cole, R., Pellom, W., Ward, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Comput. Animat. Virtual Worlds 15, 485–500 (2004)CrossRef
7.
Zurück zum Zitat Anderson, R., Stenger, B., Wan, V., et al.: Expressive visual text-to-speech using active appearance models. In: CVPR, pp. 146–152 (2013) Anderson, R., Stenger, B., Wan, V., et al.: Expressive visual text-to-speech using active appearance models. In: CVPR, pp. 146–152 (2013)
8.
Zurück zum Zitat Wang, L., Han, W., Qian, X., Soong, F.: Photo-real lips synthesis with trajectory-guided sample selection. In: ISCASSW, pp. 1–6 (2010) Wang, L., Han, W., Qian, X., Soong, F.: Photo-real lips synthesis with trajectory-guided sample selection. In: ISCASSW, pp. 1–6 (2010)
9.
Zurück zum Zitat Liu K., Ostermann, J.: Realistic facial expression synthesis for an image based talking head. In: ICME, pp. 1–6 (2011) Liu K., Ostermann, J.: Realistic facial expression synthesis for an image based talking head. In: ICME, pp. 1–6 (2011)
10.
Zurück zum Zitat Wang, A., Emmi, M., Faloutsos, P.: Assembling an expressive facial animation system. In: SIGGRAPH VGS, pp. 21–26 (2007) Wang, A., Emmi, M., Faloutsos, P.: Assembling an expressive facial animation system. In: SIGGRAPH VGS, pp. 21–26 (2007)
11.
Zurück zum Zitat Xu, Y., Feng, A.W., Marsella, S., Shapiro, A.: A practical and configurable lip sync method for games. In: SIGGRAPH CMG, pp. 84–89 (2013) Xu, Y., Feng, A.W., Marsella, S., Shapiro, A.: A practical and configurable lip sync method for games. In: SIGGRAPH CMG, pp. 84–89 (2013)
12.
Zurück zum Zitat Hartholt, A., Traum, D., Marsella, Stacy, C., Shapiro, A., Stratou, G., Leuski, A., Morency, L.-P., Gratch, J.: All together now: introducing the virtual human toolkit. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds.) IVA 2013. LNCS (LNAI), vol. 8108, pp. 368–381. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40415-3_33 CrossRef Hartholt, A., Traum, D., Marsella, Stacy, C., Shapiro, A., Stratou, G., Leuski, A., Morency, L.-P., Gratch, J.: All together now: introducing the virtual human toolkit. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds.) IVA 2013. LNCS (LNAI), vol. 8108, pp. 368–381. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-40415-3_​33 CrossRef
13.
Zurück zum Zitat Taylor, S.L., Mahler, M., et al.: Dynamic units of visual speech. In: SCA, pp. 245–250 (2012) Taylor, S.L., Mahler, M., et al.: Dynamic units of visual speech. In: SCA, pp. 245–250 (2012)
14.
Zurück zum Zitat Li, S., Wang, L., Qi, E.: The phoneme-level articulator dynamics for pronunciation animation. In: ICALP, pp. 283–286 (2011) Li, S., Wang, L., Qi, E.: The phoneme-level articulator dynamics for pronunciation animation. In: ICALP, pp. 283–286 (2011)
15.
Zurück zum Zitat Wang, L., Chen, H., et al.: Evaluation of external and internal articulator dynamics for pronunciation learning. In: InterSpeech, pp. 2247–2250 (2009) Wang, L., Chen, H., et al.: Evaluation of external and internal articulator dynamics for pronunciation learning. In: InterSpeech, pp. 2247–2250 (2009)
16.
Zurück zum Zitat Li, H., Yang, M.H., Tao, J.H.: Speaker-independent lips and tongue visualization of vowels. In: ICASSP, pp. 8106–8110 (2013) Li, H., Yang, M.H., Tao, J.H.: Speaker-independent lips and tongue visualization of vowels. In: ICASSP, pp. 8106–8110 (2013)
17.
Zurück zum Zitat Oura, K., et al.: Pitch adaptive training for HMM-based singing voice synthesis. In: ICASSP, pp. 5377–5380 (2012) Oura, K., et al.: Pitch adaptive training for HMM-based singing voice synthesis. In: ICASSP, pp. 5377–5380 (2012)
18.
Zurück zum Zitat Saino, K., et al.: An HMM-based singing voice synthesis system. In: Proceedings of InterSpeech, pp. 2274–2277 (2006) Saino, K., et al.: An HMM-based singing voice synthesis system. In: Proceedings of InterSpeech, pp. 2274–2277 (2006)
20.
Zurück zum Zitat Tang, C.Y., Zhang, G., Tsui, C.P.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behavior. J. Biomech. 42(7), 865–872 (2009)CrossRef Tang, C.Y., Zhang, G., Tsui, C.P.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behavior. J. Biomech. 42(7), 865–872 (2009)CrossRef
21.
Zurück zum Zitat Wang, Z.F., Zheng, Z.G.: A region based stereo matching algorithm using cooperative optimization. In: Proceedings of CVPR, pp. 701–708 (2008) Wang, Z.F., Zheng, Z.G.: A region based stereo matching algorithm using cooperative optimization. In: Proceedings of CVPR, pp. 701–708 (2008)
22.
Zurück zum Zitat Kampmann, M.: Automatic 3-D face mode adaption for model-based coding of videophone sequences. IEEE Trans. CSVT 12(3), 172–182 (2002) Kampmann, M.: Automatic 3-D face mode adaption for model-based coding of videophone sequences. IEEE Trans. CSVT 12(3), 172–182 (2002)
23.
Zurück zum Zitat Ekman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Psychologists Press, New York (1978) Ekman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Psychologists Press, New York (1978)
24.
Zurück zum Zitat Ling, Z.H., Richmond, K., Yamagishi, J., Wang, R.H.: Articulatory control of hmm-based parametric speech synthesis driven by phonetic knowledge. In: Proceedings of InterSpeech, pp. 573–576 (2008) Ling, Z.H., Richmond, K., Yamagishi, J., Wang, R.H.: Articulatory control of hmm-based parametric speech synthesis driven by phonetic knowledge. In: Proceedings of InterSpeech, pp. 573–576 (2008)
25.
Zurück zum Zitat Youssef, B., Badin, P., Bailly, G., Heracleous, C.: Acoustic-to articulatory inversion using speech recognition and trajectory formation based on phoneme hidden markov models, In: Proceedings of Interspeech, pp. 2255–2258 (2009) Youssef, B., Badin, P., Bailly, G., Heracleous, C.: Acoustic-to articulatory inversion using speech recognition and trajectory formation based on phoneme hidden markov models, In: Proceedings of Interspeech, pp. 2255–2258 (2009)
26.
Zurück zum Zitat Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from hmm using dynamic features. In: Proceedings of ICASSP, pp. 660–663 (1995) Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from hmm using dynamic features. In: Proceedings of ICASSP, pp. 660–663 (1995)
27.
Zurück zum Zitat Waters, K.: A muscle model for animating three dimensional facial expression. Comput. Graph. 22(4), 17–24 (1987)CrossRef Waters, K.: A muscle model for animating three dimensional facial expression. Comput. Graph. 22(4), 17–24 (1987)CrossRef
28.
Zurück zum Zitat Marcos, S., Gómez-García-Bermejob, J., Zalama, E,: A realistic facial animation suitable for human-robot interfacing. In: ICIRS, , pp. 3810–3815 (2008) Marcos, S., Gómez-García-Bermejob, J., Zalama, E,: A realistic facial animation suitable for human-robot interfacing. In: ICIRS, , pp. 3810–3815 (2008)
29.
Zurück zum Zitat Safakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion. ACM Trans. Graph. 24(3), 417–425 (2005)CrossRef Safakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion. ACM Trans. Graph. 24(3), 417–425 (2005)CrossRef
30.
Zurück zum Zitat Lee, Y.C., Terzopoulos, D., et al.: Realistic modeling for facial animation. In: SIGGRAPH, pp. 55–62 (1995) Lee, Y.C., Terzopoulos, D., et al.: Realistic modeling for facial animation. In: SIGGRAPH, pp. 55–62 (1995)
31.
Zurück zum Zitat Miyawaki, K.: A study on the musculature of the human tongue. Annu. Bull. Res. Inst. Logop. Phoniatr. 8, 23–50 (1974) Miyawaki, K.: A study on the musculature of the human tongue. Annu. Bull. Res. Inst. Logop. Phoniatr. 8, 23–50 (1974)
32.
Zurück zum Zitat Mooney, M.: A theory of large elastic deformation. J. Appl. Phys. 11(9), 582–592 (1940)CrossRefMATH Mooney, M.: A theory of large elastic deformation. J. Appl. Phys. 11(9), 582–592 (1940)CrossRefMATH
33.
Zurück zum Zitat Kojic, M., Mijailovic, S., Zdravkovic, N.: Modelling of muscle behaviour by the finite element method using Hill’s three-element model. Int. J. Numer. Methods Eng. 43(5), 941–953 (1998)CrossRefMATH Kojic, M., Mijailovic, S., Zdravkovic, N.: Modelling of muscle behaviour by the finite element method using Hill’s three-element model. Int. J. Numer. Methods Eng. 43(5), 941–953 (1998)CrossRefMATH
34.
Zurück zum Zitat Yu, J., Li, A.J., Wang, Z.F.: Data-driven 3D visual pronunciation of Chinese IPA for language learning. In: Proceedings of OCOCOSDA, pp. 93–98 (2013) Yu, J., Li, A.J., Wang, Z.F.: Data-driven 3D visual pronunciation of Chinese IPA for language learning. In: Proceedings of OCOCOSDA, pp. 93–98 (2013)
35.
Zurück zum Zitat Muller, M.: Information Retrieval for Music and Motion. Springer, Berlin (2007)CrossRef Muller, M.: Information Retrieval for Music and Motion. Springer, Berlin (2007)CrossRef
Metadaten
Titel
A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators
verfasst von
Jun Yu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-51811-4_5

Neuer Inhalt