poster

Multimodal end-of-turn prediction in multi-party meetings

Authors:
Iwan de Kok

University of Twente, Enschede, Netherlands

University of Twente, Enschede, Netherlands
View Profile

,
Dirk Heylen

University of Twente, Enschede, Netherlands

University of Twente, Enschede, Netherlands
View Profile

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfacesNovember 2009Pages 91–98https://doi.org/10.1145/1647314.1647332

Published:02 November 2009Publication History

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

Pages 91–98

ABSTRACT

One of many skills required to engage properly in a conversation is to know the appropiate use of the rules of engagement. In order to engage properly in a conversation, a virtual human or robot should, for instance, be able to know when it is being addressed or when the speaker is about to hand over the turn. The paper presents a multimodal approach to end-of-speaker-turn prediction using sequential probabilistic models (Conditional Random Fields) to learn a model from observations of real-life multi-party meetings. Although the results are not as good as expected, we provide insight into which modalities are important when taking a multimodal approach to the problem based on literature and our own results.

References

M. Argyle and M. Cook. Gaze and mutual gaze. Cambridge University Press, London, United Kingdom, 1976.Google Scholar
M. Atterer, T. Baumann, and D. Schlangen. Towards incremental end-of-utterance detection in dialogue systems. In Proceedings of International Conference on Computational Linguistics, 2008.Google Scholar
P. Barkhuysen, E. Krahmer, and M. Swerts. The interplay between auditory and visual cues for end-of-utterance detection. Journal of Acoustical Society of America, 123(1):354 -- 365, 2008.Google ScholarCross Ref
P. Boersma and V. van Heuven. Speak and unspeak with praat. Glot International, 5(9-10):341--347, November 2001.Google Scholar
J. Cassell, Y. I. Nakano, T. W. Bickmore, C. L. Sidner, and C. Rich. Non-verbal cues for discourse structure. In ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 114--123, Morristown, NJ, USA, 2001. Association for Computational Linguistics. Google ScholarDigital Library
J. Cassell, J. Sullivan, S. Prevost, and E. F. Churchill. Embodied Conversational Agents. MIT Press, Cambridge Massachusetts, London England, 2000.Google Scholar
J. Cassell, O. E. Torres, and S. Prevost. Turn taking vs. discourse structure: How best to model multimodal conversation. In Machine Conversations, pages 143--154. Kluwer, 1998.Google Scholar
J. de Ruiter, H. Mitterer, and N. Enfield. Projecting the end of a speaker's turn: A cognitive cornerstone of conversation. Language, 82(3):515 -- 535, 2006.Google ScholarCross Ref
S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2):283 -- 292, 1972.Google ScholarCross Ref
S. Duncan and G. Niederehe. On signalling that it's your turn to speak. Journal of Experimental Social Psychology, 10:234--247, 1974.Google ScholarCross Ref
O. Fuentes, D. Vera, and T. Solorio. A filter-based approach to detect end-of-utterances from prosody in dialog systems. In HLT-NAACL (Short Papers), pages 45--48. The Association for Computational Linguistics, 2007. Google ScholarDigital Library
J. Fung, D. Hakkani-Tur, M. Magimai-Doss, E. Shriberg, S. Cuendet, and N. Mirghafori. Prosodic features and feature selection for multi-lingual sentence segmentation. In Proceedings of Interspeech 2007, pages 2585--2588, 2007.Google Scholar
C. Goodwin. Conversational Organization: interaction between speakers and hearers. Academic Press, 1981.Google Scholar
D. Heylen. Head gestures, gaze and the principles of conversational structure. International Journal of Humanoid Robotics, 3(3):241--267, 2006.Google ScholarCross Ref
D. Heylen. Listening heads. In I. Wachsmuth and G. Knoblich, editors, Modeling Communication with robots and virtual humans, volume 4930 of Lecture Notes in Artificial Intelligence, pages 241--259. Springer Verlag, Berlin, 2008. Google ScholarDigital Library
http://corpus.amiproject.org. The AMI Meeting Corpus, May 2009.Google Scholar
A. Kendon. Some functions of gaze direction in social interaction. Acta Psychologica, 26:22--63, 1967.Google ScholarCross Ref
J. Laerty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labelling sequence data. In ICML, 2001.Google Scholar
T. Minato, Y. Yoshikawa, T. Noda, S. Ikemoto, H. Ishiguro, and M. Asada. CB2: A child robot with biomimetic body for cognitive developmental robotics. In IROS 2008: Proceedings of the IEEE/RSJ 2008 International Conference on Intelligent RObots and Systems, pages 193--200, 2008.Google Scholar
L.-P. Morency, I. de Kok, and J. Gratch. Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In ICMI '08: Proceedings of the 10th International Conference on Multimodal Interfaces, pages 181--188, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
L.-P. Morency, I. de Kok, and J. Gratch. Predicting listener backchannels: A probabilistic multimodal approach. In Intelligent Virtual Agents (IVA '08), pages 176--190, 2008. Google ScholarDigital Library
D. C. O'Connell, S. Kowal, and E. Kaltenbacher. Turn-taking: A critical analysis of the research tradition. Journal of Psycholinguistic Research, 19(6):345 -- 373, 1990.Google ScholarCross Ref
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286, 1989.Google ScholarCross Ref
R. J. Rienks, R. Poppe, and D. Heylen. Di erences in head orientation behavior for speakers and listeners: an experiment in a virtual environment. Transactions on Applied Perception, 7(1):accepted for publication, 2010. Google ScholarDigital Library
H. Sacks, E. A. Scheglo , and G. Je erson. A simplest systematics for the organization of turn-taking for conversation. Language, 50(4):696 -- 735, 1974.Google ScholarCross Ref
D. Sakamoto, T. Kanda, T. Ono, H. Ishiguro, and N. Hagita. Android as a telecommunication medium with a human-like presence. In HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 193--200, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
D. Schlangen. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of Interspeech 2006, 2006.Google Scholar
T. Sikorski and J. F. Allen. A task-based evaluation of the trains-95 dialogue system. In ECAI '96: Workshop on Dialogue Processing in Spoken Language Systems, pages 207--220, London, UK, 1997. Springer-Verlag. Google ScholarDigital Library
R. Vertegaal, R. Slagter, G. van der Veer, and A. Nijholt. Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In Proceedings of CHI'01, pages 301 -- 308. ACM, 2001. Google ScholarDigital Library
R. Vertegaal, G. van der Veer, and H. Vons. E ects of gaze on multiparty mediated communication. In Proceedings of Graphics Interface, pages 95 -- 102, Montreal, Canada, 2000. Morgan Kaufmann Publishers.Google Scholar
N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 32(8):1177--1207, 2000.Google ScholarCross Ref

Index Terms

Multimodal end-of-turn prediction in multi-party meetings
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Intelligent agents
    2. Natural language processing
      1. Discourse, dialogue and pragmatics

Recommendations

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a ...
Read More
Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into ...
Read More
A conversation analytical study on multimodal turn-giving cues: end-of-turn prediction
COST'11: Proceedings of the 2011 international conference on Cognitive Behavioural Systems

The present paper focuses on the systematic study of the sequential organization of verbal as well as nonverbal behavior in spontaneous interaction. The study concerns one of the most universal structural features of conversation, the phenomenon of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
November 2009
374 pages
ISBN:9781605587721
DOI:10.1145/1647314
General Chairs:
James L. Crowley
INRIA Grenoble Rhône-Alpes Research Centre, France
,
Yuri Ivanov
MERL, USA
,
Christopher Wren
Google, USA
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Michael Johnston
AT&T Research, USA
,
Rainer Stiefelhagen
University of Karlsruhe, Germany
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
end-of-turn prediction
multimodal
probabilistic model
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 360
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimodal end-of-turn prediction in multi-party meetings

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

A conversation analytical study on multimodal turn-giving cues: end-of-turn prediction