nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

verfasst von : Jun Yu

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A facial animation system is proposed for visual singing synthesis. With a reconstructed 3D head mesh model, both finite element method and anatomical model are used to simulate articulatory deformation corresponding to each phoneme with musical note. Based on an articulatory song corpus, articulatory movements, phonemes and musical notes are trained simultaneously to obtain the visual co-articulation model by a context-dependent Hidden Markov Model. Articulatory animations corresponding to all phonemes are concatenated by visual co-articulation model to produce the song synchronized articulatory animation. Experimental results demonstrate the system can synthesize realistic song synchronized articulatory animation for increasing the human computer interaction capability objectively and subjectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A Framework of Privacy-Preserving Image Recognition for Image-Based Information Services

Nächstes Kapitel A Structural Coupled-Layer Tracking Method Based on Correlation Filters

Taylor, S.L., Mahler, M., Theobald, B.J., Matthews, I.: Dynamic units of visual speech. In: ACM/Eurographics Symposium on Computer Animation, pp. 245–250 (2012)

Wang, L., Chen, H., Li, S., Meng, H.M.: Phoneme-level articulatory animation in pronunciation training. Speech Commun. 54(7), 845–856 (2012)CrossRef

Badin, P., Elisei, F., Bailly, G., Tarabalka, Y.: An audiovisual talking head for augmented speech generation: models and animations based on a real speaker’s articulatory data. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2008. LNCS, vol. 5098, pp. 132–143. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70517-8_14 CrossRef

Fagel, S., Clemens, C.: An articulation model for audio-visual speech synthesis - determination, adjustment, evaluation. Speech Commun. 44(1–4), 141–154 (2004)CrossRef

Deng, A., Neumann, U.: Expressive speech animation synthesis with phoneme-level controls. Comput. Graph. 27(8), 2096–2113 (2008)

Ma, J., Cole, R., Pellom, W., Ward, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Comput. Animat. Virtual Worlds 15, 485–500 (2004)CrossRef

Anderson, R., Stenger, B., Wan, V., et al.: Expressive visual text-to-speech using active appearance models. In: CVPR, pp. 146–152 (2013)

Wang, L., Han, W., Qian, X., Soong, F.: Photo-real lips synthesis with trajectory-guided sample selection. In: ISCASSW, pp. 1–6 (2010)

Liu K., Ostermann, J.: Realistic facial expression synthesis for an image based talking head. In: ICME, pp. 1–6 (2011)

10.

Wang, A., Emmi, M., Faloutsos, P.: Assembling an expressive facial animation system. In: SIGGRAPH VGS, pp. 21–26 (2007)

11.

Xu, Y., Feng, A.W., Marsella, S., Shapiro, A.: A practical and configurable lip sync method for games. In: SIGGRAPH CMG, pp. 84–89 (2013)

12.

Hartholt, A., Traum, D., Marsella, Stacy, C., Shapiro, A., Stratou, G., Leuski, A., Morency, L.-P., Gratch, J.: All together now: introducing the virtual human toolkit. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds.) IVA 2013. LNCS (LNAI), vol. 8108, pp. 368–381. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40415-3_33 CrossRef

13.

Taylor, S.L., Mahler, M., et al.: Dynamic units of visual speech. In: SCA, pp. 245–250 (2012)

14.

Li, S., Wang, L., Qi, E.: The phoneme-level articulator dynamics for pronunciation animation. In: ICALP, pp. 283–286 (2011)

15.

Wang, L., Chen, H., et al.: Evaluation of external and internal articulator dynamics for pronunciation learning. In: InterSpeech, pp. 2247–2250 (2009)

16.

Li, H., Yang, M.H., Tao, J.H.: Speaker-independent lips and tongue visualization of vowels. In: ICASSP, pp. 8106–8110 (2013)

17.

Oura, K., et al.: Pitch adaptive training for HMM-based singing voice synthesis. In: ICASSP, pp. 5377–5380 (2012)

18.

Saino, K., et al.: An HMM-based singing voice synthesis system. In: Proceedings of InterSpeech, pp. 2274–2277 (2006)

19.

Yu, S.Z.: Hidden semi-Markov model. Artif. Intell. 174(2), 215–243 (2010)MathSciNetCrossRefMATH

20.

Tang, C.Y., Zhang, G., Tsui, C.P.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behavior. J. Biomech. 42(7), 865–872 (2009)CrossRef

21.

Wang, Z.F., Zheng, Z.G.: A region based stereo matching algorithm using cooperative optimization. In: Proceedings of CVPR, pp. 701–708 (2008)

22.

Kampmann, M.: Automatic 3-D face mode adaption for model-based coding of videophone sequences. IEEE Trans. CSVT 12(3), 172–182 (2002)

23.

Ekman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Psychologists Press, New York (1978)

24.

Ling, Z.H., Richmond, K., Yamagishi, J., Wang, R.H.: Articulatory control of hmm-based parametric speech synthesis driven by phonetic knowledge. In: Proceedings of InterSpeech, pp. 573–576 (2008)

25.

Youssef, B., Badin, P., Bailly, G., Heracleous, C.: Acoustic-to articulatory inversion using speech recognition and trajectory formation based on phoneme hidden markov models, In: Proceedings of Interspeech, pp. 2255–2258 (2009)

26.

Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from hmm using dynamic features. In: Proceedings of ICASSP, pp. 660–663 (1995)

27.

Waters, K.: A muscle model for animating three dimensional facial expression. Comput. Graph. 22(4), 17–24 (1987)CrossRef

28.

Marcos, S., Gómez-García-Bermejob, J., Zalama, E,: A realistic facial animation suitable for human-robot interfacing. In: ICIRS, , pp. 3810–3815 (2008)

29.

Safakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion. ACM Trans. Graph. 24(3), 417–425 (2005)CrossRef

30.

Lee, Y.C., Terzopoulos, D., et al.: Realistic modeling for facial animation. In: SIGGRAPH, pp. 55–62 (1995)

31.

Miyawaki, K.: A study on the musculature of the human tongue. Annu. Bull. Res. Inst. Logop. Phoniatr. 8, 23–50 (1974)

32.

Mooney, M.: A theory of large elastic deformation. J. Appl. Phys. 11(9), 582–592 (1940)CrossRefMATH

33.

Kojic, M., Mijailovic, S., Zdravkovic, N.: Modelling of muscle behaviour by the finite element method using Hill’s three-element model. Int. J. Numer. Methods Eng. 43(5), 941–953 (1998)CrossRefMATH

34.

Yu, J., Li, A.J., Wang, Z.F.: Data-driven 3D visual pronunciation of Chinese IPA for language learning. In: Proceedings of OCOCOSDA, pp. 93–98 (2013)

35.

Muller, M.: Information Retrieval for Music and Motion. Springer, Berlin (2007)CrossRef

Titel: A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators
verfasst von: Jun Yu
Verlag: Springer International Publishing
Buch: MultiMedia Modeling
Print ISBN: 978-3-319-51810-7

Electronic ISBN: 978-3-319-51811-4

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-51811-4_5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.