Article

Speech driven facial animation

Authors:
P. Kakumanu

Wright State University, Dayton, OH

Wright State University, Dayton, OH
View Profile

,
R. Gutierrez-Osuna

Wright State University, Dayton, OH

Wright State University, Dayton, OH
View Profile

,
A. Esposito

Wright State University, Dayton, OH

Wright State University, Dayton, OH
View Profile

,
R. Bryll

Wright State University, Dayton, OH

Wright State University, Dayton, OH
View Profile

,
A. Goshtasby

Wright State University, Dayton, OH

Wright State University, Dayton, OH
View Profile

,
O. N. Garcia

Wright State University, Dayton, OH

Wright State University, Dayton, OH
View Profile

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfacesNovember 2001Pages 1–5https://doi.org/10.1145/971478.971488

Published:15 November 2001Publication History

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

Pages 1–5

ABSTRACT

The results reported in this article are an integral part of a larger project aimed at achieving perceptually realistic animations, including the individualized nuances, of three-dimensional human faces driven by speech. The audiovisual system that has been developed for learning the spatio-temporal relationship between speech acoustics and facial animation is described, including video and speech processing, pattern analysis, and MPEG-4 compliant facial animation for a given speaker. In particular, we propose a perceptual transformation of the speech spectral envelope, which is shown to capture the dynamics of articulatory movements. An efficient nearest-neighbor algorithm is used to predict novel articulatory trajectories from the speech dynamics. The results are very promising and suggest a new way to approach the modeling of synthetic lip motion of a given speaker driven by his/her speech. This would also provide clues toward a more general cross-speaker realistic animation.

References

M. Cohen and D. Massaro, 1993, "Modeling coarticulation in synthetic visual speech," in N. M. Thalmann and D. Thalmann, editors, Models and Techniques in Computer Animation, pp. 141--155. Springer Verlag, Tokyo.Google Scholar
F. I. Parke, "Parameterized models for facial animation" IEEE Computer Graphics and Applications, vol. 2, no. 9, pp. 61--68, November 1982.Google Scholar
P. Kalra, 1993, "An interactive multimodal facial interaction", Ph.D Dissertation No. 1183, Ecole polytechnique fédérale de Lausanne, Switzerland.Google Scholar
J. Fischl, B. Miller and J. Robinson, 1993, "Parameter tracking in muscle-based analysis-synthesis system," in Proceedings of Picture Coding Symposium (PCS93), Lausanne, Switzerland.Google Scholar
J. Ostermann and E. Haratsch, 1997, "An animation definition interface - rapid design of MPEG-4 compliant animated faces and bodies," in Proceedings of the International Workshop on Synthetic-Natural Hybrid Coding and 3D Imaging, Rhodes, Greece, September 5--9 1997.Google Scholar
E. Cosatto and H. P. Graf, 1998, "Sample-based synthesis of photo-realistic talking heads," in Computer Animation, pp. 103--110, Philadelphia, Pennsylvania, June 8--10, 1998. Google ScholarDigital Library
J. Ostermann, 1998, "Animation of synthetic face in MPEG-4," in proceedings of Computer Animation, Philadelphia, PA. Google ScholarDigital Library
A. M. Tekalp and J. Ostermann, 2000, "Face and 2-D mesh animation in MPEG-4", in Signal Processing: Image Communication 15, pp. 387--421.Google ScholarCross Ref
I. S. Pandzic, J. Ostermann and D. Millen, 1999, "User evaluation: synthetic talking faces for interactive services", Visual Computer 15, pp. 330--340.Google ScholarDigital Library
D. W. Massaro, 1997, Perceiving Talking Faces: From Speech Perception to a Behavioral Principle, MIT Press.Google Scholar
S. Morishima and H. Harashima, 1991, "A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface," in IEEE Journal on Selected Areas in Communications 9(4), 594--600.Google ScholarDigital Library
K. Waters and T. M. Levergood, 1993, "DECface: an automatic lip synchronization algorithm for synthetic faces," Technical Report CRL 93/4, DEC Cambridge Research Laboratory, Cambridge, MA.Google Scholar
C. Pelachaud, N. I. Badler and M. Steedman, 1996, "Generating facial Expressions for Speech," in Cognitive Science 20, pp. 1--46.Google ScholarCross Ref
J. Beskow, 1995, "Rule-based visual speech synthesis," in ESCA EUROSPEECH '95, 4th European Conference on Speech Communication and Technology, Madrid, Spain.Google Scholar
L. M. Arslan and D. Talkin, 1999, "Codebook Based Face Point Trajectory Synthesis Algorithm using Speech Input," in Speech Communication 27, pp. 81--93. Google ScholarDigital Library
T. Ohman, 1998, "An audio-visual speech database and automatic measurements of visual speech," in Quarterly Progress and Status Report, Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden, Stockholm, Sweden.Google Scholar
E. Yamamoto, S. Nakamura and K. Shikano, 1998, "Lip movement synthesis from speech based on Hidden Markov models," in Speech Communication 28, pp. 105--115. Google ScholarDigital Library
M. Brand, 1999, "Voice Puppetry," in Proceedings of SIGGRAPH'99 Computer Graphics, Annual Conference Series, pp. 21--28. Google ScholarDigital Library
F. Lavagetto, 1995, "Converting speech into lip movements: A multimedia telephone for hard of hearing people," in IEEE Transactions on Rehabilitation Engineering 3(1), pp. 90--102.Google ScholarCross Ref
H. Yehia, P. Rubin and E. Vatikiotis-Bateson, 1998, "Quantitative Association of Vocal-tract and Facial Behavior," in Speech Communications 26, pp. 23--43. Google ScholarDigital Library
L. R. Rabiner and B. H. Juang, 1993, Fundamentals of Speech Recognition, Prentice-Hall, 1993. Google ScholarDigital Library
H. Hermansky, 1990, "Perceptual linear predictive (PLP) analysis of speech," in Journal of Acoustic Society of America, vol. 87(4), pp. 1738--1792.Google ScholarCross Ref
H. Hermansky and N. Morgan, 1994, "RASTA Processing of Speech," in IEEE Transactions on Speech and Audio Processing 2(4), pp. 578--589.Google ScholarCross Ref
D. H. Klatt, 1982, "Prediction of perceived phonetic distance from critical band spectra: a first step," in Proceedings of the International Congress on Acoustics, Speech, Signal Processing, Paris, IEEE Press, pp. 1278--1281.Google Scholar
L. R. Rabiner and R. W. Schafer, 1978, Digital processing of speech signals, Prentice-hall, 1978.Google Scholar
G. Aversano, A. Esposito and M. Marinaro, 2001, "A new text-independent method for phoneme segmentation," to appear in the Proceedings of IEEE Midwest Symposium on Circuits and Systems, Dayton 14--17 August 2001.Google Scholar
F. Lavagetto and R. Pockaj, 1999, "The Facial Animation Engine: towards a high-level interface for the design of MPEG-4 compliant animated faces", in IEEE Transactions on Circuits and Systems for Video Technology 9(2), pp. 277--289. Google ScholarDigital Library
R. Y. Tsai, 1987, "A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses," in IEEE Journal of Robotics and Automation 3, pp. 323--344.Google ScholarCross Ref
R. Bryll, X. Ma and F. Quek, 1999, "Camera Calibration Utility Description," Technical Report VISLab-01-15, Vision Interfaces and Systems Laboratory, Wright State University. http://vislab.cs.wright.edu/Publications/ technical-reports/BryMQ01.htmlGoogle Scholar
F. Quek, D. McNeill, R. Bryll, C. Kirbas, H. Arslan, K. McCullough, N. Furuyama and R. Ansari, 2000, "Gesture, Speech and Gaze Cues for Discourse Segmentation," in Proceedings of CVPR 2000, Hilton Head Island, South Carolina, June 13--15, 2000.Google Scholar
J. Garofolo et al, 1998, DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database, National Institute of Standards and Technology, Gaithersburg, MD.Google Scholar

Index Terms

Speech driven facial animation

Recommendations

Video-audio driven real-time facial animation

We present a real-time facial tracking and animation system based on a Kinect sensor with video and audio input. Our method requires no user-specific training and is robust to occlusions, large head rotations, and background noise. Given the color, ...
Read More
Thai Speech-Driven Facial Animation
CULTURE-COMPUTING '11: Proceedings of the 2011 Second International Conference on Culture and Computing

We consider the problem of making lip movement for an animated talking character, which consumes workload and cost during the animation development process. The main idea is to extract and capture a vise me from the video of a human talking and the ...
Read More
Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces
November 2001
241 pages
ISBN:9781450374736
DOI:10.1145/971478

Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MPEG-4
computer vision
facial animation
lip-syncing
speech processing
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 823
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Speech driven facial animation

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Video-audio driven real-time facial animation

Thai Speech-Driven Facial Animation

Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Speech driven facial animation

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Video-audio driven real-time facial animation

Thai Speech-Driven Facial Animation

Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media