Corpus-based generation of head and eyebrow motion for an embodied conversational agent

Foster, Mary Ellen; Oberlander, Jon

doi:10.1007/s10579-007-9055-3

Corpus-based generation of head and eyebrow motion for an embodied conversational agent

Published: 29 February 2008

Volume 41, pages 305–323, (2007)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Mary Ellen Foster¹ &
Jon Oberlander²

467 Accesses
31 Citations
Explore all metrics

Abstract

Humans are known to use a wide range of non-verbal behaviour while speaking. Generating naturalistic embodied speech for an artificial agent is therefore an application where techniques that draw directly on recorded human motions can be helpful. We present a system that uses corpus-based selection strategies to specify the head and eyebrow motion of an animated talking head. We first describe how a domain-specific corpus of facial displays was recorded and annotated, and outline the regularities that were found in the data. We then present two different methods of selecting motions for the talking head based on the corpus data: one that chooses the majority option in all cases, and one that makes a weighted choice among all of the options. We compare these methods to each other in two ways: through cross-validation against the corpus, and by asking human judges to rate the output. The results of the two evaluation studies differ: the cross-validation study favoured the majority strategy, while the human judges preferred schedules generated using weighted choice. The judges in the second study also showed a preference for the original corpus data over the output of either of the generation strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Head Motion Generation

Conversational Interaction Recognition Based on Bodily and Facial Movement

Notes

No sentence in the script had more than two clauses.
We did not select any motions on words for which the speech-synthesiser output was very short, such as but and is, because the synthesiser could not make those words long enough to make any motion sensible.
A baseline system that never proposes any motion scores 0.79 on this measure.
The corpus schedules were modified to remove motions on short words such as but and is, for the reasons discussed in Sect. 4.

References

Artstein, R., & Poesio, M. (2005). Kappa³ = alpha (or beta). Technical Report CSM-437, University of Essex Department of Computer Science.
Bangalore, S., Rambow, O., & Whittaker, S. (2000). Evaluation metrics for generation. In Proceedings of INLG 2000.
Belz, A., Gatt, A., Reiter, E., & Viethen, J. (2007). First NLG shared task and evaluation challenge on attribute selection for referring expression generation. http://www.csd.abdn.ac.uk/research/evaluation/
Belz, A., & Reiter, E. (2006). Comparing automatic and human evaluation of NLG systems. In Proceedings of EACL 2006 (pp. 313–320).
Belz, A., & Varges, S. (Eds.) (2005) Corpus linguistics 2005 workshop on using corpora for natural language generation.
Cassell, J., Bickmore, T., Vilhjálmsson H., & Yan, H. (2001a). More than just a pretty face: Conversational protocols and the affordances of embodiment. Knowledge-Based Systems, 14(1–2), 55–64.
Article Google Scholar
Cassell, J., Nakano, Y., Bickmore, T. W., Sidner, C. L., & Rich, C. (2001b). Non-verbal cues for discourse structure. In Proceedings of ACL 2001.
Cassell, J., Sullivan, J., Prevost, S., & Churchill, E. (2000). Embodied conversational agents. MIT Press.
Clark, R. A. J., Richmond, K., & King, S. (2004) Festival 2 – Build your own general purpose unit selection speech synthesiser. In Proceedings of the 5th ISCA Workshop on Speech Synthesis.
de Carolis, B., Carofiglio, V., & Pelachaud, C. (2002). From discourse plans to believable behavior generation. In Proceedings of INLG 2002.
DeCarlo, D., Stone, M., Revilla, C., & Venditti, J. (2004). Specifying and animating facial signals for discourse in embodied conversational agents. Computer Animation and Virtual Worlds, 15(1), 27–38.
Article Google Scholar
Ekman, P. (1979). About brows: Emotional and conversational signals. In M. von Cranach, K. Foppa, W. Lepenies, & D. Ploog (Eds.), Human ethology: Claims and limits of a new discipline. Cambridge University Press.
Foster, M. E. (2007). Evaluating the impact of variation in automatically generated embodied object descriptions. Ph.D. thesis, School of Informatics, University of Edinburgh.
Foster, M. E., & Oberlander, J. (2006). Data-driven generation of emphatic facial displays. In Proceedings of EACL 2006 (pp. 353–360).
Foster, M. E., White, M., Setzer, A., & Catizone, R. (2005). Multimodal generation in the COMIC dialogue system. In Proceedings of the ACL 2005 Demo Session.
Fox, J. (2002). An R and S-Plus companion to applied regression. Sage Publications.
Graf, H., Cosatto, E., Strom, V., & Huang, F. (2002). Visual prosody: Facial movements accompanying speech. In Proceedings of FG 2002 (pp. 397–401).
Kipp, M. (2004). Gesture generation by imitation – From human behavior to computer character animation. Dissertation.com.
Krahmer, E., & Swerts, M. (2005). How children and adults produce and perceive uncertainty in audiovisual speech. Language and Speech, 48(1), 29–53.
Article Google Scholar
Langkilde, I., & Knight, K. (1998). Generation that exploits corpus-based statistical knowledge. In Proceedings of COLING-ACL 1998.
Langkilde-Geary, I. (2002). An empirical verification of coverage and correctness for a general-purpose sentence generator. In Proceedings of INLG 2002.
Mana, N., & Pianesi, F. (2006). HMM-based synthesis of emotional facial expressions during speech in synthetic talking heads. In Proceedings of ICMI 2006.
Martin, J.-C., Kühnlein, P., Paggio, P., Stiefelhagen, R., & Pianesi, F. (Eds.) (2006). LREC 2006 workshop on multimodal corpora: From multimodal behaviour theories to usable models.
McNeill, D. (Ed.) (2000). Language and gesture: Window into thought and action. Cambridge University Press.
Passonneau, R. J. (2004). Computing reliability for coreference annotation. In Proceedings, Fourth International Conference on Language Resources and Evaluation (LREC 2004) (Vol. 4, pp. 1503–1506). Lisbon.
Rehm, M., & André, E. (2005). Catch me if you can – Exploring lying agents in social settings. In Proceedings of AAMAS 2005 (pp. 937–944).
Steedman, M. (2000). Information structure and the syntax-phonology interface. Linguistic Inquiry, 31(4), 649–689.
Article Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Lees, A., Stere, A., & Bregler, C. (2004). Speaking with hands: Creating animated conversational characters from recordings of human performance. ACM Trans. Graphics, 23(3), 506–513.
Article Google Scholar
White, M. (2006). Efficient realization of coordinate structures in combinatory categorial grammar. Research on Language and Computation, 4(1), 39–75.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the EU projects COMIC (IST-2001-32311) and JAST (FP6-003747-IP). An initial version of this study was published as Foster and Oberlander (2006).

Author information

Authors and Affiliations

Informatik VI: Robotics and Embedded Systems, Technische Universität München, Boltzmannstr. 3, 85748, Garching, Germany
Mary Ellen Foster
School of Informatics, University of Edinburgh, 2 Buccleuch Place, EH8 9LW, Edinburgh, UK
Jon Oberlander

Authors

Mary Ellen Foster
View author publications
You can also search for this author in PubMed Google Scholar
Jon Oberlander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mary Ellen Foster.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foster, M.E., Oberlander, J. Corpus-based generation of head and eyebrow motion for an embodied conversational agent. Lang Resources & Evaluation 41, 305–323 (2007). https://doi.org/10.1007/s10579-007-9055-3

Download citation

Received: 27 December 2006
Accepted: 12 December 2007
Published: 29 February 2008
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10579-007-9055-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Corpus-based generation of head and eyebrow motion for an embodied conversational agent

Abstract

Access this article

Similar content being viewed by others

Head Motion Generation

Head Motion Generation

Conversational Interaction Recognition Based on Bodily and Facial Movement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Corpus-based generation of head and eyebrow motion for an embodied conversational agent

Abstract

Access this article

Similar content being viewed by others

Head Motion Generation

Head Motion Generation

Conversational Interaction Recognition Based on Bodily and Facial Movement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation