Article

Real-time speech motion synthesis from recorded motions

Authors:
Yong Cao

University of California at Los Angeles

University of California at Los Angeles
View Profile

,
Petros Faloutsos

University of California at Los Angeles

University of California at Los Angeles
View Profile

,
Eddie Kohler

University of California at Los Angeles

University of California at Los Angeles
View Profile

,
Frédéric Pighin

University of Southern California

University of Southern California
View Profile

SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animationAugust 2004Pages 345–353https://doi.org/10.1145/1028523.1028570

Published:27 August 2004Publication History

SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation

Pages 345–353

ABSTRACT

Data-driven approaches have been successfully used for realistic visual speech synthesis. However, little effort has been devoted to real-time lip-synching for interactive applications. In particular, algorithms that are based on a graph of motions are notorious for their exponential complexity. In this paper, we present a greedy graph search algorithm that yields vastly superior performance and allows real-time motion synthesis from a large database of motions. The time complexity of the algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.

Supplemental Material

p345-cao.mpeg

mpeg

40.9 MB

Download

References

{AF02} Arikan O., Forsyth D. A.: Interactive motion generation from examples. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques (2002), ACM Press, pp. 483--490. Google ScholarDigital Library
{BCS97} Bregler C., Covell M., Slaney M.: Video rewrite: driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings (Aug. 1997), ACM SIGGRAPH, pp. 353--360. Google ScholarDigital Library
{Bra99} Brand M.: Voice puppetry. In Proceedings of ACM SIGGRAPH 1999 (1999), ACM Press/Addison-Wesley Publishing Co., pp. 21--28. Google ScholarDigital Library
{BS94} Brook N., Scott S.: Computer graphics animations of talking faces based on stochastic models. In International Symposium on Speech, Image Processing, and Neural Networkds (1994).Google ScholarCross Ref
{Buh03} Buhmann M. D.: Radial Basis Functions: Theory and Implementations. Cambridge University Press, 2003. Google ScholarDigital Library
{CM93} Cohen N., Massaro D. W.: Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation (1993), Thalmann N. M., Thalmann D., (Eds.), Springer-Verlang, pp. 139--156.Google ScholarCross Ref
{CPB*94} Cassell J., Pelachaud C., Badler N., Steedman M., Achorn B., Becket W., Douville B., Prevost S., Stone M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994 (1994). Google ScholarDigital Library
{CXH03} Chai J., Xiao J., Hodgins J.: Vision-based control of 3d facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003), Eurographics Association, pp. 193--206. Google ScholarDigital Library
{EGP02} Ezzat T., Geiger G., Poggio T.: Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002 (2002), ACM Press, pp. 388--398. Google ScholarDigital Library
{Int} International Computer Science Institute, Berkeley, CA: Rasta software. www.icsi.berkeley.edu/Speech/rasta.html.Google Scholar
{KGP02} Kovar L., Gleicher M., Pighin F.: Motion graphs. In Proceedings of ACM SIGGRAPH 2002 (2002), ACM Press, pp. 473--482. Google ScholarDigital Library
{KMG02} Kalberer G. A., Mueller P., Gool L. V.: Speech animation using viseme space. In Vision, Modeling, and Visualization VMV 2002 (2002), Akademische Verlags-gesellschaft Aka GmbH, Berlin, pp. 463--470.Google Scholar
{LCR*02} Lee J., Chai J., Reitsma P., Hodgins J., Pollard N.: Interactive control of avatars animated with human motion data, 2002.Google Scholar
{LTW95} Lee Y., Terzopoulos D., Waters K.: Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings (Aug. 1995), ACM SIGGRAPH, pp. 55--62. Google ScholarDigital Library
{LWS02} Li Y., Wang T., Shum H.-Y.: Motion texture: A two-level statistical model for character motion synthesis. ACM Transactions on Graphics 21, 3 (July 2002), 465--472. Google ScholarDigital Library
{MKT*98} Masuko T., Kobayashi T., Tamura M., Masubuchi J., K. Tokuda: Text-to-visual speech synthesis based on parameter generation from hmm. In ICASSP (1998).Google Scholar
{Pel91} Pelachaud C.: Realistic Face Animation for Speech. PhD thesis, University of Pennsylvania, 1991.Google Scholar
{SBCS04} Saisan P., Bissacco A., Chiuso A., Soatto S.: Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision 2004 (2004), pp. 456--467.Google ScholarCross Ref
{SG} Speech Group C. M. U.:. www.speech.cs.cmu.edu/festival.Google Scholar
{Wat87} Waters K.: A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings) (July 1987), vol. 21, ACM SIGGRAPH, pp. 17--24. Google ScholarDigital Library

Index Terms

Real-time speech motion synthesis from recorded motions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging
  2. Computer graphics
    1. Animation
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Video-guided motion synthesis using example motions

Video taken from a single monocular camera is the most common means of recording human motion. In this article, we present a practical, semiautomatic method for synthesizing a human motion that is guided by such video. After preprocessing an input video,...
Read More
Keyframe-Editable Real-Time Motion Synthesis
Since existing motion synthesis methods often lack precise controls to the synthesis process, we propose a keyframe-editable motion synthesis framework which allows users to edit the keyframes of an expected motion sequence and use the edited keyframes to ...
Read More
Speech synthesis in telecommunications

A text-to-speech synthesis system that synthesizes speech from unrestricted text is discussed. The text analysis system, which includes text preprocessing, phrasing and intonation, and letter-to-phoneme conversion, is described. The analyzed text is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
August 2004
388 pages
ISBN:3905673142
Conference Chairs:
Norman Badler
University of Pennsylvania
,
Mathieu Desbrun
University of Southern California
,
Ronan Boulic,
Dinesh Pai
Sponsors
In-Cooperation
Publisher
Eurographics Association
Goslar, Germany
Publication History
- Published: 27 August 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate183of487submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 825
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Real-time speech motion synthesis from recorded motions

SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Video-guided motion synthesis using example motions

Keyframe-Editable Real-Time Motion Synthesis

Speech synthesis in telecommunications