Abstract
In multiparty meetings, participants need to predict the end of the speaker’s utterance and who will start speaking next, as well as consider a strategy for good timing to speak next. Gaze behavior plays an important role in smooth turn-changing. This article proposes a prediction model that features three processing steps to predict (I) whether turn-changing or turn-keeping will occur, (II) who will be the next speaker in turn-changing, and (III) the timing of the start of the next speaker’s utterance. For the feature values of the model, we focused on gaze transition patterns and the timing structure of eye contact between a speaker and a listener near the end of the speaker’s utterance. Gaze transition patterns provide information about the order in which gaze behavior changes. The timing structure of eye contact is defined as who looks at whom and who looks away first, the speaker or listener, when eye contact between the speaker and a listener occurs. We collected corpus data of multiparty meetings, using the data to demonstrate relationships between gaze transition patterns and timing structure and situations (I), (II), and (III). The results of our analyses indicate that the gaze transition pattern of the speaker and listener and the timing structure of eye contact have a strong association with turn-changing, the next speaker in turn-changing, and the start time of the next utterance. On the basis of the results, we constructed prediction models using the gaze transition patterns and timing structure. The gaze transition patterns were found to be useful in predicting turn-changing, the next speaker in turn-changing, and the start time of the next utterance. Contrary to expectations, we did not find that the timing structure is useful for predicting the next speaker and the start time. This study opens up new possibilities for predicting the next speaker and the timing of the next utterance using gaze transition patterns in multiparty meetings.
- Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010. WEKA--experiences with a Java open-source project. Journal of Machine Learning Research 11, 2533--2541. Google ScholarDigital Library
- Lei Chen and Mary P. Harper. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22. Google ScholarDigital Library
- Anthony J. Conger. 1980. Integration and generalization of kappas for multiple raters. Psychological Bulletin 88, 2, 322--328.Google ScholarCross Ref
- Iwan de Kok and Dirk Heylen. 2009. Multimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98. Google ScholarDigital Library
- Alfred Dielmann, Giulia Garau, and Hervé Bourlard. 2010. Floor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.Google Scholar
- Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23, 2, 283--292.Google ScholarCross Ref
- Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human--computer dialog. In Proceedings of the Annual Conference on the International Speech Communication Association, Vol. 3. 2061--2064.Google Scholar
- Daniel Gatica-Perez. 2006. Analyzing group interactions in conversations: A review. In Proceedings of the International Conference on Multisensor Fusion and Integration for Intelligent Systems. 41--46.Google ScholarCross Ref
- Shelby J. Haberman. 1973. The analysis of residuals in cross-classified tables. Biometrics 29, 205--220.Google ScholarCross Ref
- Lixing Huang, Louis-Philippe Morency, and Jonathan Gratch. 2011. A multimodal end-of-turn prediction model: Learning from para social consensus sampling. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems. Google ScholarDigital Library
- Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and turn-taking behavior in casual conversational interactions. ACM Transactions on Interactive Intelligent Systems 3, 2, 12. Google ScholarDigital Library
- Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
- Tatsuya Kawahara, Takuma Iwatate, and Katsuya Takanashii. 2012. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In Proceedings of the Annual Conference on the International Speech Communication Association.Google Scholar
- S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K. R. Krishna Murthy. 2001. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13, 3, 637--649. Google ScholarDigital Library
- Adam Kendon. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarCross Ref
- Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. In Language and Speech, Vol. 41. 295--321.Google ScholarCross Ref
- Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011. A single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.Google ScholarCross Ref
- Gina-Anne Levow. 2005. Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop on Chinese Language Processing.Google Scholar
- Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a map task dialogue system. Computer Speech and Language 28, 4, 903--922.Google ScholarCross Ref
- Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the International Conference on Intelligent Virtual Agents. 176--190. Google ScholarDigital Library
- Kazuhiro Otsuka. 2011. Conversational scene analysis. IEEE Signal Processing Magazine 28, 127--131.Google ScholarCross Ref
- Kazuhiro Otsuka, Shoko Araki, Dan Mikami, Kentaro Ishizuka, Masakiyo Fujimoto, and Junji Yamato. 2009. Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors. In Proceedings of the International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multimodal Interaction. 219--220. Google ScholarDigital Library
- Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974. A simplest systematics for the organisation of turn taking for conversation. Language 50, 696--735.Google ScholarCross Ref
- David Schlangen. 2006. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21.Google Scholar
- Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3, 199--222. Google ScholarDigital Library
- Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley, New York.Google ScholarCross Ref
Index Terms
- Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings
Recommendations
Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings
ICMI '14: Proceedings of the 16th International Conference on Multimodal InteractionTo build a model for predicting the next speaker and the start time of the next utterance in multi-party meetings, we performed a fundamental study of how respiration could be effective for the prediction model. The results of the analysis reveal that a ...
Predicting next speaker and timing from gaze transition patterns in multi-party meetings
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionIn multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper ...
Using Respiration to Predict Who Will Speak Next and When in Multiparty Meetings
Regular Articles, Special Issue on Highlights of IUI 2015 (Part 2 of 2) and Special Issue on Highlights of ICMI 2014 (Part 1 of 2)Techniques that use nonverbal behaviors to predict turn-changing situations—such as, in multiparty meetings, who the next speaker will be and when the next utterance will occur—have been receiving a lot of attention in recent research. To build a model ...
Comments