Abstract
Eye gaze is an important means for controlling interaction and coordinating the participants' turns smoothly. We have studied how eye gaze correlates with spoken interaction and especially focused on the combined effect of the speech signal and gazing to predict turn taking possibilities. It is well known that mutual gaze is important in the coordination of turn taking in two-party dialogs, and in this article, we investigate whether this fact also holds for three-party conversations. In group interactions, it may be that different features are used for managing turn taking than in two-party dialogs. We collected casual conversational data and used an eye tracker to systematically observe a participant's gaze in the interactions. By studying the combined effect of speech and gaze on turn taking, we aimed to answer our main questions: How well can eye gaze help in predicting turn taking? What is the role of eye gaze when the speaker holds the turn? Is the role of eye gaze as important in three-party dialogs as in two-party dialogue? We used Support Vector Machines (SVMs) to classify turn taking events with respect to speech and gaze features, so as to estimate how well the features signal a change of the speaker or a continuation of the same speaker. The results confirm the earlier hypothesis that eye gaze significantly helps in predicting the partner's turn taking activity, and we also get supporting evidence for our hypothesis that the speaker is a prominent coordinator of the interaction space. Such a turn taking model could be used in interactive applications to improve the system's conversational performance.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Gaze and turn-taking behavior in casual conversational interactions
- Allwood, J. 1976. Linguistic communication as action and cooperation. Tech. rep., Department of Linguistics, University of Goteborg. Gothenburg Monographs in Linguistics 2.Google Scholar
- Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., and Paggio, P. 2007. The mumin coding scheme for the annotation of feedback, turn management, and sequencing phenomena. Int. J. Lang. Res. Eval. 41, 3--4, 273--287.Google ScholarCross Ref
- André, E. and Pelachaud, C. 2010. Interacting with embodied conversational agents. In New Trends in Speech-Based Interactive Systems, K. Jokinen and F. Cheng, Eds., Springer, New York.Google Scholar
- Argyle, M. and Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.Google Scholar
- Battersby, S. 2011. Moving together: The organization of non-verbal cues during multiparty conversation. Ph.D. thesis, Queen Mary, University of London.Google Scholar
- Bavelas, J. B. 2005. Appreciating face-to-face dialogue. In Proceedings of the 9th International Conference on Auditory-Visual Speech Processing (AVSP'05).Google Scholar
- Bavelas, J. B., Chovil, N., Lawrie, D. A., and Wade, A. 1992. Interactive gestures. Discourse Process. 15, 469--489.Google ScholarCross Ref
- Bell, L., Boye, J., and Gustafson, J. 2001. Real-time handling of fragmented utterances. In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL'01).Google Scholar
- Bunt, H. 2009. The dit++ taxonomy for functional dialogue markup. In Proceedings of the Workshop Towards a Standard Markup Language for Embodied Dialogue Acts (EDAML'09) at the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09).Google Scholar
- Carletta, J. 2006. Announcing the ami meeting corpus. Euro. Lang. Res. Assoc. Newslett. 11, 1, 3--5.Google Scholar
- Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. F., Eds. 2000. Embodied Conversational Agents. The MIT Press, Cambridge, MA.Google Scholar
- Chen, L. and Harper, M. P. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interfaces (ICMI-MLMI'09). 15--22. Google ScholarDigital Library
- Clark, H. H. and Schaefer, E. F. 1989. Contributing to discourse. Cogn. Sci. 13, 259--94Google ScholarCross Ref
- Dumais, S., Platt, J., Heckerman, D., and Sahami, M. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management. 148--155. Google ScholarDigital Library
- Edlund J., Heldner, M., and Gustafson, J. 2005. Utterance segmentation and turn taking in spoken dialogue systems. In Computer Studies in Language and Speech, B. Fisseni, H.-C. Schmitz, B. Schröder, and P. Wagner, Eds., Peter Lang, 576--587.Google Scholar
- Edlund, J., Skantze, G., and Carlson, R. 2004. Higgins - A spoken dialogue system for investigating error handling techniques. In Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP'04). 229--231.Google Scholar
- Endrass, B., Rehm, M., and André, E. 2009. Culture-specific communication management for virtual agents. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09). 281--288 Google ScholarDigital Library
- Ferrer, L., Shriberg, E., and Stolcke, A. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings the 6th International Conference on Spoken Language Processing (ICSLP'02). Vol. 3, 2061--2064.Google Scholar
- Ford, C. and Thompson, S. A. 1996. Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the management of turns. In Interaction and Grammar, E. Ochs, E. Schegloff, and S. A. Thompson, Eds., Cambridge University Press, 134--184.Google Scholar
- Goodwin, C. 1981. Conversational Organization: Interaction between Speakers and Hearers. Academic Press, New York.Google Scholar
- Gullberg, M. and Holmqvist, K. 1999. Keeping an eye on gestures: Visual perception of gestures in face-to-face communication. Pragmat. Cogn. 7, 1, 35--63.Google ScholarCross Ref
- Healey, P. G. T. and Battersby, S. A. 2009. The interactional geometry of a three-way conversation. In Proceedings of the 31st Annual Conference of the Cognitive Science Society. 785--790.Google Scholar
- Hjalmarsson, A. 2010. The vocal intensity of turn-initial cue phrases and filled pauses in dialogue. In Proceedings of the SIGdial Conference. 225--228. Google ScholarDigital Library
- Ishii, R. and Nakano, Y. 2008. Estimating user's conversational engagement based on gaze behaviors. In Proceedings of the 8th International Conference on Intelligent Virtual Agents (IVA'08). H. Prendinger, J. Lester, and M. Ishizuka, Eds., Lecture Notes in Artificial Intelligence, vol. 5208, Springer, 200--207. Google ScholarDigital Library
- Jacob, R. J. K. and Karn, K. S. 2003. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises (section commentary). In The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research, J. Hyöna, R. Radach, and H. Deubel, Eds., Elsevier Science, Amsterdam, 573--605.Google Scholar
- Jokinen, K. 2009a. Constructive Dialogue Modelling. Rational Agents and Speech Interfaces. John Wiley, Chichester. Google ScholarDigital Library
- Jokinen, K. 2009b. Gaze and gesture activity in communication. In Proceedings of the 5th International Conference of Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments (UAHCI'09), HCI International. C. Stephanidis, Ed. Google ScholarDigital Library
- Jokinen, K., Nishida, M., and Yamamoto, S. 2009. Eye gaze experiments for conversation monitoring. In Proceedings of the 3rd International Universal Communications Symposium (IUCS'09). ACM Press, New York. Google ScholarDigital Library
- Jokinen, K., Harada, K., Nishida, M., and Yamamoto, S. 2010a. Turn-alignment using eye gaze and speech in conversational interaction. In Proceedings of the European Conference on Speech Communication and Technology (INTERSPEECH'10).Google Scholar
- Jokinen, K., Nishida, M., and Yamamoto, S. 2010b. Collecting and annotating conversational eye gaze data. In Proceedings of the Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality at the Language Resources and Evaluation Conference (LREC'10).Google Scholar
- Jokinen, K., Harada, K., Nishida, M., and Yamamoto, S. 2010c. On eye gaze and turn taking. In Proceedings of the International Conference on Intelligent User Interfaces Workshop on Eye Gaze in Intelligent Human-Machine Interaction. Google ScholarDigital Library
- Kendon, A. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarCross Ref
- Kendon, A. 2004. Gesture: Visual Action as Utterance. Cambridge University Press.Google ScholarCross Ref
- Kipp, M. 2001. Anvil -- A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1367--1370.Google Scholar
- Kita, S. 2000. How representational gestures help speaking. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, 162--185.Google Scholar
- Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., and Den, Y. 1998. An analysis of turn taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. Lang. Speech 41, 295--321.Google ScholarCross Ref
- Lee, J., Marsella, T., Traum, D., Gratch, J., and Lance, B. 2007. The rickel gaze model: A window on the mind of a virtual human. In Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA'07). Lecture Notes in Artificial Intelligence, vol. 4722, Springer, 296--303. Google ScholarDigital Library
- Morency, L.-P., Christoudias, C. H., and Darrell, T. 2006. Recognizing gaze aversion gestures in embodied conversational discourse. In Proceedings of the 8th International Conference on Multimodal Interfaces (ICMI'06). F. K. H. Quek, J. Yang, D. W. Massaro, A. A. Alwan, and T. J. Hazen, Eds., 287--294. Google ScholarDigital Library
- Nakano, Y. and Nishida, T. 2007. Attentional behaviours as nonverbal communicative signals in situated interactions with conversational agents. In Engineering Approaches to Conversational Informatics, T. Nishida, Ed., John Wiley.Google Scholar
- Noguchi, H. and Den, Y. 1998. Prosody-based detection of the context of backchannel responses. In Proceedings of the 5th International Conference on Spoken Language Processing. 487--490.Google Scholar
- Novick, D. G., Hansen, B., and Ward, K. 1996. Coordinating turn taking with gaze. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96). 1888--1891.Google Scholar
- Ohsuga, T., Nishida, M., Horiuchi, Y., and Ichikawa, A. 2005. Investigation of the relationship between turn taking and prosodic features in spontaneous dialogue. In Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH'05). 33--36.Google Scholar
- Padilha, E. G. and Carletta, J. C. 2003. Nonverbal behaviours improving a simulation of small group discussion. In Proceedings of the 1st International Nordic Symposium of Multi-Modal Communication.Google Scholar
- Pickering, M. and Garrod, S. 2004. Towards a mechanistic psychology of dialogue. Behav. Brain Sci. 27, 169--226Google ScholarCross Ref
- Prasov, Z. and Chai, J. 2008. What's in a gaze? The role of eye gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 12th ACM International Conference on Intelligent User Interfaces (IUI'08). 20--29. Google ScholarDigital Library
- Qu, S. and Chai, J. 2009. The role of interactivity in human-machine conversation for automatic word acquisition. In Proceedings of the 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL'09). 188--195 Google ScholarDigital Library
- Rietveld, T. and Van Hout, R. 1993. Statistical Techniques for the Study of Language and Language Behaviour. Mouton de Gruyter.Google Scholar
- Rizzolatti, G. and Arbib, M. A. 1998. Language within our grasp. Trends Neurosci. 21, 5, 188--194.Google ScholarCross Ref
- Sacks, H., Schegloff, E. A., and Jefferson, G. 1974. A simplest systematics for the organization of turn taking for conversation, Lang. 50, 696--735.Google ScholarCross Ref
- Sato, R., Higashinaka, R., Tamoto, M., Nakano, M., and Aikawa, K. 2002. Learning decision trees to determine turn taking by spoken dialogue systems. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP'02). 861--864.Google Scholar
- Schlangen, D. 2006. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of the InterSpeech'06 Conference Panel on Prosody of Dialouge Acts and Turn-Taking.Google Scholar
- Selfridge, E. and Heeman, P. 2010. Importance-driven turn-bidding for spoken dialogue systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Selting, M. 2000. The construction of units in conversational talk. Lang. Soc. 29, 477--517.Google ScholarCross Ref
- Sidner, C. L., Lee, C., Kidd, C., Lesh, N., and Rich, C. 2005. Explorations in engagement for humans and robots. Artif. Intell. 166, 1--2, 140--164. Google ScholarDigital Library
- Skantze, G. and Schlangen, D. 2009. Incremental dialogue processing in a micro-domain. In Proceedings of the 12th Conference of the of the European Chapter of the Association for Computational Linguistics (EACL'09). 745--753. Google ScholarDigital Library
- Tomasello, M. 2008. The Origins of Human Communication. MIT Press.Google Scholar
- Traum, D. R. 1999. Computational models of grounding in collaborative systems. In Working Notes of the AAAI Fall Symposium on Psychological Models of Communication, 124--131.Google Scholar
- Traum, D. R. and Heeman, P. A. 1997. Utterance units in spoken dialogue. In Dialogue Processing in Spoken Language Systems, E. Maier, M. Mast, and S. LuperFoy, Eds., Springer, 125--140. Google ScholarDigital Library
- Wennerström, A. and Siegel, A. F. 2003. Keeping the floor in multiparty conversations: Intonation, syntax, and pause. Discourse Process. 36, 2, 77--107.Google ScholarCross Ref
- Vertegaal, R., Slagter, R., van Deer Veer G., and Nijholt, A. 2001. Eye gaze patterns in conversations: There is more the conversational agents than meets the eyes. In Proceedings of the ACM CHI Human Factors in Computing Systems Conference (CHI'01). Google ScholarDigital Library
- Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques 2nd Ed. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
- Yngve, V. H. 1970. On getting a word in edgewise. In Papers from the 6th Regional Meeting of the Chicago Linguistics Society. 567--577.Google Scholar
Index Terms
- Gaze and turn-taking behavior in casual conversational interactions
Recommendations
On eye-gaze and turn-taking
EGIHMI '10: Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interactionIn this paper we describe our eye-tracking data collection and preliminary experiments concerning the relation between eyegazing and turn-taking in natural human-human conversations, and how these observations can be extended to multimodal human-machine ...
Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions
ICMI '14: Proceedings of the 16th International Conference on Multimodal InteractionWhen humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a ...
Avatar’s gaze control to facilitate conversational turn-taking in virtual-space multi-user voice chat system
IVA'06: Proceedings of the 6th international conference on Intelligent Virtual AgentsAiming at facilitating multi-party conversations in a shared-virtual-space voice chat environment, we propose an avatar’s gaze behavior model for turn-taking in multi-party conversations, and a shared-virtual-space voice chat system with automatic ...
Comments