research-article

Gaze and turn-taking behavior in casual conversational interactions

Authors:
Kristiina Jokinen

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

,
Hirohisa Furukawa

Doshisha University, Japan

Doshisha University, Japan
View Profile

,
Masafumi Nishida

Doshisha University, Japan

Doshisha University, Japan
View Profile

,
Seiichi Yamamoto

Doshisha University, Japan

Doshisha University, Japan
View Profile

ACM Transactions on Interactive Intelligent Systems Volume 3 Issue 2Article No.: 12pp 1–30https://doi.org/10.1145/2499474.2499481

Published:05 August 2013Publication History

ACM Transactions on Interactive Intelligent Systems

Abstract

Eye gaze is an important means for controlling interaction and coordinating the participants' turns smoothly. We have studied how eye gaze correlates with spoken interaction and especially focused on the combined effect of the speech signal and gazing to predict turn taking possibilities. It is well known that mutual gaze is important in the coordination of turn taking in two-party dialogs, and in this article, we investigate whether this fact also holds for three-party conversations. In group interactions, it may be that different features are used for managing turn taking than in two-party dialogs. We collected casual conversational data and used an eye tracker to systematically observe a participant's gaze in the interactions. By studying the combined effect of speech and gaze on turn taking, we aimed to answer our main questions: How well can eye gaze help in predicting turn taking? What is the role of eye gaze when the speaker holds the turn? Is the role of eye gaze as important in three-party dialogs as in two-party dialogue? We used Support Vector Machines (SVMs) to classify turn taking events with respect to speech and gaze features, so as to estimate how well the features signal a change of the speaker or a continuation of the same speaker. The results confirm the earlier hypothesis that eye gaze significantly helps in predicting the partner's turn taking activity, and we also get supporting evidence for our hypothesis that the speaker is a prominent coordinator of the interaction space. Such a turn taking model could be used in interactive applications to improve the system's conversational performance.

Supplemental Material

Available for Download

zip

jokinen.zip (29.8 KB)

Supplemental movie, appendix, image and software files for, Gaze and turn-taking behavior in casual conversational interactions

References

Allwood, J. 1976. Linguistic communication as action and cooperation. Tech. rep., Department of Linguistics, University of Goteborg. Gothenburg Monographs in Linguistics 2.Google Scholar
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., and Paggio, P. 2007. The mumin coding scheme for the annotation of feedback, turn management, and sequencing phenomena. Int. J. Lang. Res. Eval. 41, 3--4, 273--287.Google ScholarCross Ref
André, E. and Pelachaud, C. 2010. Interacting with embodied conversational agents. In New Trends in Speech-Based Interactive Systems, K. Jokinen and F. Cheng, Eds., Springer, New York.Google Scholar
Argyle, M. and Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.Google Scholar
Battersby, S. 2011. Moving together: The organization of non-verbal cues during multiparty conversation. Ph.D. thesis, Queen Mary, University of London.Google Scholar
Bavelas, J. B. 2005. Appreciating face-to-face dialogue. In Proceedings of the 9^th International Conference on Auditory-Visual Speech Processing (AVSP'05).Google Scholar
Bavelas, J. B., Chovil, N., Lawrie, D. A., and Wade, A. 1992. Interactive gestures. Discourse Process. 15, 469--489.Google ScholarCross Ref
Bell, L., Boye, J., and Gustafson, J. 2001. Real-time handling of fragmented utterances. In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL'01).Google Scholar
Bunt, H. 2009. The dit++ taxonomy for functional dialogue markup. In Proceedings of the Workshop Towards a Standard Markup Language for Embodied Dialogue Acts (EDAML'09) at the 8^th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09).Google Scholar
Carletta, J. 2006. Announcing the ami meeting corpus. Euro. Lang. Res. Assoc. Newslett. 11, 1, 3--5.Google Scholar
Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. F., Eds. 2000. Embodied Conversational Agents. The MIT Press, Cambridge, MA.Google Scholar
Chen, L. and Harper, M. P. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interfaces (ICMI-MLMI'09). 15--22. Google ScholarDigital Library
Clark, H. H. and Schaefer, E. F. 1989. Contributing to discourse. Cogn. Sci. 13, 259--94Google ScholarCross Ref
Dumais, S., Platt, J., Heckerman, D., and Sahami, M. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7^th International Conference on Information and Knowledge Management. 148--155. Google ScholarDigital Library
Edlund J., Heldner, M., and Gustafson, J. 2005. Utterance segmentation and turn taking in spoken dialogue systems. In Computer Studies in Language and Speech, B. Fisseni, H.-C. Schmitz, B. Schröder, and P. Wagner, Eds., Peter Lang, 576--587.Google Scholar
Edlund, J., Skantze, G., and Carlson, R. 2004. Higgins - A spoken dialogue system for investigating error handling techniques. In Proceedings of the 8^th International Conference on Spoken Language Processing (ICSLP'04). 229--231.Google Scholar
Endrass, B., Rehm, M., and André, E. 2009. Culture-specific communication management for virtual agents. In Proceedings of the 8^th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09). 281--288 Google ScholarDigital Library
Ferrer, L., Shriberg, E., and Stolcke, A. 2002. Is the speaker done yet&quest; Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings the 6^th International Conference on Spoken Language Processing (ICSLP'02). Vol. 3, 2061--2064.Google Scholar
Ford, C. and Thompson, S. A. 1996. Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the management of turns. In Interaction and Grammar, E. Ochs, E. Schegloff, and S. A. Thompson, Eds., Cambridge University Press, 134--184.Google Scholar
Goodwin, C. 1981. Conversational Organization: Interaction between Speakers and Hearers. Academic Press, New York.Google Scholar
Gullberg, M. and Holmqvist, K. 1999. Keeping an eye on gestures: Visual perception of gestures in face-to-face communication. Pragmat. Cogn. 7, 1, 35--63.Google ScholarCross Ref
Healey, P. G. T. and Battersby, S. A. 2009. The interactional geometry of a three-way conversation. In Proceedings of the 31^st Annual Conference of the Cognitive Science Society. 785--790.Google Scholar
Hjalmarsson, A. 2010. The vocal intensity of turn-initial cue phrases and filled pauses in dialogue. In Proceedings of the SIGdial Conference. 225--228. Google ScholarDigital Library
Ishii, R. and Nakano, Y. 2008. Estimating user's conversational engagement based on gaze behaviors. In Proceedings of the 8^th International Conference on Intelligent Virtual Agents (IVA'08). H. Prendinger, J. Lester, and M. Ishizuka, Eds., Lecture Notes in Artificial Intelligence, vol. 5208, Springer, 200--207. Google ScholarDigital Library
Jacob, R. J. K. and Karn, K. S. 2003. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises (section commentary). In The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research, J. Hyöna, R. Radach, and H. Deubel, Eds., Elsevier Science, Amsterdam, 573--605.Google Scholar
Jokinen, K. 2009a. Constructive Dialogue Modelling. Rational Agents and Speech Interfaces. John Wiley, Chichester. Google ScholarDigital Library
Jokinen, K. 2009b. Gaze and gesture activity in communication. In Proceedings of the 5^th International Conference of Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments (UAHCI'09), HCI International. C. Stephanidis, Ed. Google ScholarDigital Library
Jokinen, K., Nishida, M., and Yamamoto, S. 2009. Eye gaze experiments for conversation monitoring. In Proceedings of the 3^rd International Universal Communications Symposium (IUCS'09). ACM Press, New York. Google ScholarDigital Library
Jokinen, K., Harada, K., Nishida, M., and Yamamoto, S. 2010a. Turn-alignment using eye gaze and speech in conversational interaction. In Proceedings of the European Conference on Speech Communication and Technology (INTERSPEECH'10).Google Scholar
Jokinen, K., Nishida, M., and Yamamoto, S. 2010b. Collecting and annotating conversational eye gaze data. In Proceedings of the Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality at the Language Resources and Evaluation Conference (LREC'10).Google Scholar
Jokinen, K., Harada, K., Nishida, M., and Yamamoto, S. 2010c. On eye gaze and turn taking. In Proceedings of the International Conference on Intelligent User Interfaces Workshop on Eye Gaze in Intelligent Human-Machine Interaction. Google ScholarDigital Library
Kendon, A. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarCross Ref
Kendon, A. 2004. Gesture: Visual Action as Utterance. Cambridge University Press.Google ScholarCross Ref
Kipp, M. 2001. Anvil -- A generic annotation tool for multimodal dialogue. In Proceedings of the 7^th European Conference on Speech Communication and Technology. 1367--1370.Google Scholar
Kita, S. 2000. How representational gestures help speaking. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, 162--185.Google Scholar
Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., and Den, Y. 1998. An analysis of turn taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. Lang. Speech 41, 295--321.Google ScholarCross Ref
Lee, J., Marsella, T., Traum, D., Gratch, J., and Lance, B. 2007. The rickel gaze model: A window on the mind of a virtual human. In Proceedings of the 7^th International Conference on Intelligent Virtual Agents (IVA'07). Lecture Notes in Artificial Intelligence, vol. 4722, Springer, 296--303. Google ScholarDigital Library
Morency, L.-P., Christoudias, C. H., and Darrell, T. 2006. Recognizing gaze aversion gestures in embodied conversational discourse. In Proceedings of the 8^th International Conference on Multimodal Interfaces (ICMI'06). F. K. H. Quek, J. Yang, D. W. Massaro, A. A. Alwan, and T. J. Hazen, Eds., 287--294. Google ScholarDigital Library
Nakano, Y. and Nishida, T. 2007. Attentional behaviours as nonverbal communicative signals in situated interactions with conversational agents. In Engineering Approaches to Conversational Informatics, T. Nishida, Ed., John Wiley.Google Scholar
Noguchi, H. and Den, Y. 1998. Prosody-based detection of the context of backchannel responses. In Proceedings of the 5^th International Conference on Spoken Language Processing. 487--490.Google Scholar
Novick, D. G., Hansen, B., and Ward, K. 1996. Coordinating turn taking with gaze. In Proceedings of the 4^th International Conference on Spoken Language Processing (ICSLP'96). 1888--1891.Google Scholar
Ohsuga, T., Nishida, M., Horiuchi, Y., and Ichikawa, A. 2005. Investigation of the relationship between turn taking and prosodic features in spontaneous dialogue. In Proceedings of the 9^th European Conference on Speech Communication and Technology (INTERSPEECH'05). 33--36.Google Scholar
Padilha, E. G. and Carletta, J. C. 2003. Nonverbal behaviours improving a simulation of small group discussion. In Proceedings of the 1^st International Nordic Symposium of Multi-Modal Communication.Google Scholar
Pickering, M. and Garrod, S. 2004. Towards a mechanistic psychology of dialogue. Behav. Brain Sci. 27, 169--226Google ScholarCross Ref
Prasov, Z. and Chai, J. 2008. What's in a gaze&quest; The role of eye gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 12^th ACM International Conference on Intelligent User Interfaces (IUI'08). 20--29. Google ScholarDigital Library
Qu, S. and Chai, J. 2009. The role of interactivity in human-machine conversation for automatic word acquisition. In Proceedings of the 10^th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL'09). 188--195 Google ScholarDigital Library
Rietveld, T. and Van Hout, R. 1993. Statistical Techniques for the Study of Language and Language Behaviour. Mouton de Gruyter.Google Scholar
Rizzolatti, G. and Arbib, M. A. 1998. Language within our grasp. Trends Neurosci. 21, 5, 188--194.Google ScholarCross Ref
Sacks, H., Schegloff, E. A., and Jefferson, G. 1974. A simplest systematics for the organization of turn taking for conversation, Lang. 50, 696--735.Google ScholarCross Ref
Sato, R., Higashinaka, R., Tamoto, M., Nakano, M., and Aikawa, K. 2002. Learning decision trees to determine turn taking by spoken dialogue systems. In Proceedings of the 6^th International Conference on Spoken Language Processing (ICSLP'02). 861--864.Google Scholar
Schlangen, D. 2006. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of the InterSpeech'06 Conference Panel on Prosody of Dialouge Acts and Turn-Taking.Google Scholar
Selfridge, E. and Heeman, P. 2010. Importance-driven turn-bidding for spoken dialogue systems. In Proceedings of the 48^th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
Selting, M. 2000. The construction of units in conversational talk. Lang. Soc. 29, 477--517.Google ScholarCross Ref
Sidner, C. L., Lee, C., Kidd, C., Lesh, N., and Rich, C. 2005. Explorations in engagement for humans and robots. Artif. Intell. 166, 1--2, 140--164. Google ScholarDigital Library
Skantze, G. and Schlangen, D. 2009. Incremental dialogue processing in a micro-domain. In Proceedings of the 12^th Conference of the of the European Chapter of the Association for Computational Linguistics (EACL'09). 745--753. Google ScholarDigital Library
Tomasello, M. 2008. The Origins of Human Communication. MIT Press.Google Scholar
Traum, D. R. 1999. Computational models of grounding in collaborative systems. In Working Notes of the AAAI Fall Symposium on Psychological Models of Communication, 124--131.Google Scholar
Traum, D. R. and Heeman, P. A. 1997. Utterance units in spoken dialogue. In Dialogue Processing in Spoken Language Systems, E. Maier, M. Mast, and S. LuperFoy, Eds., Springer, 125--140. Google ScholarDigital Library
Wennerström, A. and Siegel, A. F. 2003. Keeping the floor in multiparty conversations: Intonation, syntax, and pause. Discourse Process. 36, 2, 77--107.Google ScholarCross Ref
Vertegaal, R., Slagter, R., van Deer Veer G., and Nijholt, A. 2001. Eye gaze patterns in conversations: There is more the conversational agents than meets the eyes. In Proceedings of the ACM CHI Human Factors in Computing Systems Conference (CHI'01). Google ScholarDigital Library
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques 2^nd Ed. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
Yngve, V. H. 1970. On getting a word in edgewise. In Papers from the 6^th Regional Meeting of the Chicago Linguistics Society. 567--577.Google Scholar

Index Terms

Gaze and turn-taking behavior in casual conversational interactions
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

On eye-gaze and turn-taking
EGIHMI '10: Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interaction

In this paper we describe our eye-tracking data collection and preliminary experiments concerning the relation between eyegazing and turn-taking in natural human-human conversations, and how these observations can be extended to multimodal human-machine ...
Read More
Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

When humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a ...
Read More
Avatar’s gaze control to facilitate conversational turn-taking in virtual-space multi-user voice chat system
IVA'06: Proceedings of the 6th international conference on Intelligent Virtual Agents

Aiming at facilitating multi-party conversations in a shared-virtual-space voice chat environment, we propose an avatar’s gaze behavior model for turn-taking in multi-party conversations, and a shared-virtual-space voice chat system with automatic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Interactive Intelligent Systems Volume 3, Issue 2
Special issue on interaction with smart objects, Special section on eye gaze and conversation
July 2013
150 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/2499474
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 August 2013
- Accepted: 1 August 2012
- Revised: 1 June 2012
- Received: 1 December 2010
Published in tiis Volume 3, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 1,135
  Total Downloads
- Downloads (Last 12 months)183
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gaze and turn-taking behavior in casual conversational interactions

ACM Transactions on Interactive Intelligent Systems

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

On eye-gaze and turn-taking

Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions

Avatar’s gaze control to facilitate conversational turn-taking in virtual-space multi-user voice chat system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Gaze and turn-taking behavior in casual conversational interactions

ACM Transactions on Interactive Intelligent Systems

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

On eye-gaze and turn-taking

Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions

Avatar’s gaze control to facilitate conversational turn-taking in virtual-space multi-user voice chat system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media