research-article

Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings

Authors:
Ryo Ishii

NTT Corporation

NTT Corporation
View Profile

,
Kazuhiro Otsuka

NTT Corporation

NTT Corporation
View Profile

,
Shiro Kumano

NTT Corporation

NTT Corporation
View Profile

,
Junji Yamato

NTT Corporation

NTT Corporation
View Profile

ACM Transactions on Interactive Intelligent Systems Volume 6 Issue 1Article No.: 4pp 1–31https://doi.org/10.1145/2757284

Published:05 May 2016Publication History

ACM Transactions on Interactive Intelligent Systems

Abstract

In multiparty meetings, participants need to predict the end of the speaker’s utterance and who will start speaking next, as well as consider a strategy for good timing to speak next. Gaze behavior plays an important role in smooth turn-changing. This article proposes a prediction model that features three processing steps to predict (I) whether turn-changing or turn-keeping will occur, (II) who will be the next speaker in turn-changing, and (III) the timing of the start of the next speaker’s utterance. For the feature values of the model, we focused on gaze transition patterns and the timing structure of eye contact between a speaker and a listener near the end of the speaker’s utterance. Gaze transition patterns provide information about the order in which gaze behavior changes. The timing structure of eye contact is defined as who looks at whom and who looks away first, the speaker or listener, when eye contact between the speaker and a listener occurs. We collected corpus data of multiparty meetings, using the data to demonstrate relationships between gaze transition patterns and timing structure and situations (I), (II), and (III). The results of our analyses indicate that the gaze transition pattern of the speaker and listener and the timing structure of eye contact have a strong association with turn-changing, the next speaker in turn-changing, and the start time of the next utterance. On the basis of the results, we constructed prediction models using the gaze transition patterns and timing structure. The gaze transition patterns were found to be useful in predicting turn-changing, the next speaker in turn-changing, and the start time of the next utterance. Contrary to expectations, we did not find that the timing structure is useful for predicting the next speaker and the start time. This study opens up new possibilities for predicting the next speaker and the timing of the next utterance using gaze transition patterns in multiparty meetings.

References

Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010. WEKA--experiences with a Java open-source project. Journal of Machine Learning Research 11, 2533--2541. Google ScholarDigital Library
Lei Chen and Mary P. Harper. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22. Google ScholarDigital Library
Anthony J. Conger. 1980. Integration and generalization of kappas for multiple raters. Psychological Bulletin 88, 2, 322--328.Google ScholarCross Ref
Iwan de Kok and Dirk Heylen. 2009. Multimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98. Google ScholarDigital Library
Alfred Dielmann, Giulia Garau, and Hervé Bourlard. 2010. Floor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.Google Scholar
Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23, 2, 283--292.Google ScholarCross Ref
Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human--computer dialog. In Proceedings of the Annual Conference on the International Speech Communication Association, Vol. 3. 2061--2064.Google Scholar
Daniel Gatica-Perez. 2006. Analyzing group interactions in conversations: A review. In Proceedings of the International Conference on Multisensor Fusion and Integration for Intelligent Systems. 41--46.Google ScholarCross Ref
Shelby J. Haberman. 1973. The analysis of residuals in cross-classified tables. Biometrics 29, 205--220.Google ScholarCross Ref
Lixing Huang, Louis-Philippe Morency, and Jonathan Gratch. 2011. A multimodal end-of-turn prediction model: Learning from para social consensus sampling. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems. Google ScholarDigital Library
Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and turn-taking behavior in casual conversational interactions. ACM Transactions on Interactive Intelligent Systems 3, 2, 12. Google ScholarDigital Library
Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
Tatsuya Kawahara, Takuma Iwatate, and Katsuya Takanashii. 2012. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In Proceedings of the Annual Conference on the International Speech Communication Association.Google Scholar
S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K. R. Krishna Murthy. 2001. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13, 3, 637--649. Google ScholarDigital Library
Adam Kendon. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarCross Ref
Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. In Language and Speech, Vol. 41. 295--321.Google ScholarCross Ref
Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011. A single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.Google ScholarCross Ref
Gina-Anne Levow. 2005. Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop on Chinese Language Processing.Google Scholar
Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a map task dialogue system. Computer Speech and Language 28, 4, 903--922.Google ScholarCross Ref
Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the International Conference on Intelligent Virtual Agents. 176--190. Google ScholarDigital Library
Kazuhiro Otsuka. 2011. Conversational scene analysis. IEEE Signal Processing Magazine 28, 127--131.Google ScholarCross Ref
Kazuhiro Otsuka, Shoko Araki, Dan Mikami, Kentaro Ishizuka, Masakiyo Fujimoto, and Junji Yamato. 2009. Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors. In Proceedings of the International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multimodal Interaction. 219--220. Google ScholarDigital Library
Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974. A simplest systematics for the organisation of turn taking for conversation. Language 50, 696--735.Google ScholarCross Ref
David Schlangen. 2006. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21.Google Scholar
Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3, 199--222. Google ScholarDigital Library
Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley, New York.Google ScholarCross Ref

Index Terms

Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

To build a model for predicting the next speaker and the start time of the next utterance in multi-party meetings, we performed a fundamental study of how respiration could be effective for the prediction model. The results of the analysis reveal that a ...
Read More
Predicting next speaker and timing from gaze transition patterns in multi-party meetings
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

In multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper ...
Read More
Using Respiration to Predict Who Will Speak Next and When in Multiparty Meetings
Regular Articles, Special Issue on Highlights of IUI 2015 (Part 2 of 2) and Special Issue on Highlights of ICMI 2014 (Part 1 of 2)

Techniques that use nonverbal behaviors to predict turn-changing situations—such as, in multiparty meetings, who the next speaker will be and when the next utterance will occur—have been receiving a lot of attention in recent research. To build a model ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Interactive Intelligent Systems Volume 6, Issue 1
Special Issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 2 of 2), Regular Articles and Special Issue on Highlights of IUI 2015 (Part 1 of 2)
May 2016
219 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/2896319
Editors:
Anthony Jameson
German Research Center for Artifi cial Intelligence (DFKI), Germany
,
Krzysztof Gajos
Harvard University, U.S.A.
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 May 2016
- Accepted: 1 February 2016
- Revised: 1 January 2016
- Received: 1 December 2014
Published in tiis Volume 6, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Turn-changing
gaze behavior
multiparty meetings
next speaker prediction
speech timing prediction
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 434
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings

ACM Transactions on Interactive Intelligent Systems

Abstract

References

Cited By

Index Terms

Recommendations

Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings

Predicting next speaker and timing from gaze transition patterns in multi-party meetings

Using Respiration to Predict Who Will Speak Next and When in Multiparty Meetings