ABSTRACT
Data-driven approaches have been successfully used for realistic visual speech synthesis. However, little effort has been devoted to real-time lip-synching for interactive applications. In particular, algorithms that are based on a graph of motions are notorious for their exponential complexity. In this paper, we present a greedy graph search algorithm that yields vastly superior performance and allows real-time motion synthesis from a large database of motions. The time complexity of the algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.
Supplemental Material
- {AF02} Arikan O., Forsyth D. A.: Interactive motion generation from examples. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques (2002), ACM Press, pp. 483--490. Google ScholarDigital Library
- {BCS97} Bregler C., Covell M., Slaney M.: Video rewrite: driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings (Aug. 1997), ACM SIGGRAPH, pp. 353--360. Google ScholarDigital Library
- {Bra99} Brand M.: Voice puppetry. In Proceedings of ACM SIGGRAPH 1999 (1999), ACM Press/Addison-Wesley Publishing Co., pp. 21--28. Google ScholarDigital Library
- {BS94} Brook N., Scott S.: Computer graphics animations of talking faces based on stochastic models. In International Symposium on Speech, Image Processing, and Neural Networkds (1994).Google ScholarCross Ref
- {Buh03} Buhmann M. D.: Radial Basis Functions: Theory and Implementations. Cambridge University Press, 2003. Google ScholarDigital Library
- {CM93} Cohen N., Massaro D. W.: Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation (1993), Thalmann N. M., Thalmann D., (Eds.), Springer-Verlang, pp. 139--156.Google ScholarCross Ref
- {CPB*94} Cassell J., Pelachaud C., Badler N., Steedman M., Achorn B., Becket W., Douville B., Prevost S., Stone M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994 (1994). Google ScholarDigital Library
- {CXH03} Chai J., Xiao J., Hodgins J.: Vision-based control of 3d facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003), Eurographics Association, pp. 193--206. Google ScholarDigital Library
- {EGP02} Ezzat T., Geiger G., Poggio T.: Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002 (2002), ACM Press, pp. 388--398. Google ScholarDigital Library
- {Int} International Computer Science Institute, Berkeley, CA: Rasta software. www.icsi.berkeley.edu/Speech/rasta.html.Google Scholar
- {KGP02} Kovar L., Gleicher M., Pighin F.: Motion graphs. In Proceedings of ACM SIGGRAPH 2002 (2002), ACM Press, pp. 473--482. Google ScholarDigital Library
- {KMG02} Kalberer G. A., Mueller P., Gool L. V.: Speech animation using viseme space. In Vision, Modeling, and Visualization VMV 2002 (2002), Akademische Verlags-gesellschaft Aka GmbH, Berlin, pp. 463--470.Google Scholar
- {LCR*02} Lee J., Chai J., Reitsma P., Hodgins J., Pollard N.: Interactive control of avatars animated with human motion data, 2002.Google Scholar
- {LTW95} Lee Y., Terzopoulos D., Waters K.: Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings (Aug. 1995), ACM SIGGRAPH, pp. 55--62. Google ScholarDigital Library
- {LWS02} Li Y., Wang T., Shum H.-Y.: Motion texture: A two-level statistical model for character motion synthesis. ACM Transactions on Graphics 21, 3 (July 2002), 465--472. Google ScholarDigital Library
- {MKT*98} Masuko T., Kobayashi T., Tamura M., Masubuchi J., K. Tokuda: Text-to-visual speech synthesis based on parameter generation from hmm. In ICASSP (1998).Google Scholar
- {Pel91} Pelachaud C.: Realistic Face Animation for Speech. PhD thesis, University of Pennsylvania, 1991.Google Scholar
- {SBCS04} Saisan P., Bissacco A., Chiuso A., Soatto S.: Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision 2004 (2004), pp. 456--467.Google ScholarCross Ref
- {SG} Speech Group C. M. U.:. www.speech.cs.cmu.edu/festival.Google Scholar
- {Wat87} Waters K.: A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings) (July 1987), vol. 21, ACM SIGGRAPH, pp. 17--24. Google ScholarDigital Library
Index Terms
- Real-time speech motion synthesis from recorded motions
Recommendations
Video-guided motion synthesis using example motions
Video taken from a single monocular camera is the most common means of recording human motion. In this article, we present a practical, semiautomatic method for synthesizing a human motion that is guided by such video. After preprocessing an input video,...
Keyframe-Editable Real-Time Motion Synthesis
Since existing motion synthesis methods often lack precise controls to the synthesis process, we propose a keyframe-editable motion synthesis framework which allows users to edit the keyframes of an expected motion sequence and use the edited keyframes to ...
Speech synthesis in telecommunications
A text-to-speech synthesis system that synthesizes speech from unrestricted text is discussed. The text analysis system, which includes text preprocessing, phrasing and intonation, and letter-to-phoneme conversion, is described. The analyzed text is ...
Comments