skip to main content
research-article

Gaze and turn-taking behavior in casual conversational interactions

Published:05 August 2013Publication History
Skip Abstract Section

Abstract

Eye gaze is an important means for controlling interaction and coordinating the participants' turns smoothly. We have studied how eye gaze correlates with spoken interaction and especially focused on the combined effect of the speech signal and gazing to predict turn taking possibilities. It is well known that mutual gaze is important in the coordination of turn taking in two-party dialogs, and in this article, we investigate whether this fact also holds for three-party conversations. In group interactions, it may be that different features are used for managing turn taking than in two-party dialogs. We collected casual conversational data and used an eye tracker to systematically observe a participant's gaze in the interactions. By studying the combined effect of speech and gaze on turn taking, we aimed to answer our main questions: How well can eye gaze help in predicting turn taking? What is the role of eye gaze when the speaker holds the turn? Is the role of eye gaze as important in three-party dialogs as in two-party dialogue? We used Support Vector Machines (SVMs) to classify turn taking events with respect to speech and gaze features, so as to estimate how well the features signal a change of the speaker or a continuation of the same speaker. The results confirm the earlier hypothesis that eye gaze significantly helps in predicting the partner's turn taking activity, and we also get supporting evidence for our hypothesis that the speaker is a prominent coordinator of the interaction space. Such a turn taking model could be used in interactive applications to improve the system's conversational performance.

Skip Supplemental Material Section

Supplemental Material

References

  1. Allwood, J. 1976. Linguistic communication as action and cooperation. Tech. rep., Department of Linguistics, University of Goteborg. Gothenburg Monographs in Linguistics 2.Google ScholarGoogle Scholar
  2. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., and Paggio, P. 2007. The mumin coding scheme for the annotation of feedback, turn management, and sequencing phenomena. Int. J. Lang. Res. Eval. 41, 3--4, 273--287.Google ScholarGoogle ScholarCross RefCross Ref
  3. André, E. and Pelachaud, C. 2010. Interacting with embodied conversational agents. In New Trends in Speech-Based Interactive Systems, K. Jokinen and F. Cheng, Eds., Springer, New York.Google ScholarGoogle Scholar
  4. Argyle, M. and Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.Google ScholarGoogle Scholar
  5. Battersby, S. 2011. Moving together: The organization of non-verbal cues during multiparty conversation. Ph.D. thesis, Queen Mary, University of London.Google ScholarGoogle Scholar
  6. Bavelas, J. B. 2005. Appreciating face-to-face dialogue. In Proceedings of the 9th International Conference on Auditory-Visual Speech Processing (AVSP'05).Google ScholarGoogle Scholar
  7. Bavelas, J. B., Chovil, N., Lawrie, D. A., and Wade, A. 1992. Interactive gestures. Discourse Process. 15, 469--489.Google ScholarGoogle ScholarCross RefCross Ref
  8. Bell, L., Boye, J., and Gustafson, J. 2001. Real-time handling of fragmented utterances. In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL'01).Google ScholarGoogle Scholar
  9. Bunt, H. 2009. The dit++ taxonomy for functional dialogue markup. In Proceedings of the Workshop Towards a Standard Markup Language for Embodied Dialogue Acts (EDAML'09) at the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09).Google ScholarGoogle Scholar
  10. Carletta, J. 2006. Announcing the ami meeting corpus. Euro. Lang. Res. Assoc. Newslett. 11, 1, 3--5.Google ScholarGoogle Scholar
  11. Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. F., Eds. 2000. Embodied Conversational Agents. The MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  12. Chen, L. and Harper, M. P. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interfaces (ICMI-MLMI'09). 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Clark, H. H. and Schaefer, E. F. 1989. Contributing to discourse. Cogn. Sci. 13, 259--94Google ScholarGoogle ScholarCross RefCross Ref
  14. Dumais, S., Platt, J., Heckerman, D., and Sahami, M. 1998. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management. 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edlund J., Heldner, M., and Gustafson, J. 2005. Utterance segmentation and turn taking in spoken dialogue systems. In Computer Studies in Language and Speech, B. Fisseni, H.-C. Schmitz, B. Schröder, and P. Wagner, Eds., Peter Lang, 576--587.Google ScholarGoogle Scholar
  16. Edlund, J., Skantze, G., and Carlson, R. 2004. Higgins - A spoken dialogue system for investigating error handling techniques. In Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP'04). 229--231.Google ScholarGoogle Scholar
  17. Endrass, B., Rehm, M., and André, E. 2009. Culture-specific communication management for virtual agents. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09). 281--288 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ferrer, L., Shriberg, E., and Stolcke, A. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings the 6th International Conference on Spoken Language Processing (ICSLP'02). Vol. 3, 2061--2064.Google ScholarGoogle Scholar
  19. Ford, C. and Thompson, S. A. 1996. Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the management of turns. In Interaction and Grammar, E. Ochs, E. Schegloff, and S. A. Thompson, Eds., Cambridge University Press, 134--184.Google ScholarGoogle Scholar
  20. Goodwin, C. 1981. Conversational Organization: Interaction between Speakers and Hearers. Academic Press, New York.Google ScholarGoogle Scholar
  21. Gullberg, M. and Holmqvist, K. 1999. Keeping an eye on gestures: Visual perception of gestures in face-to-face communication. Pragmat. Cogn. 7, 1, 35--63.Google ScholarGoogle ScholarCross RefCross Ref
  22. Healey, P. G. T. and Battersby, S. A. 2009. The interactional geometry of a three-way conversation. In Proceedings of the 31st Annual Conference of the Cognitive Science Society. 785--790.Google ScholarGoogle Scholar
  23. Hjalmarsson, A. 2010. The vocal intensity of turn-initial cue phrases and filled pauses in dialogue. In Proceedings of the SIGdial Conference. 225--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ishii, R. and Nakano, Y. 2008. Estimating user's conversational engagement based on gaze behaviors. In Proceedings of the 8th International Conference on Intelligent Virtual Agents (IVA'08). H. Prendinger, J. Lester, and M. Ishizuka, Eds., Lecture Notes in Artificial Intelligence, vol. 5208, Springer, 200--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jacob, R. J. K. and Karn, K. S. 2003. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises (section commentary). In The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research, J. Hyöna, R. Radach, and H. Deubel, Eds., Elsevier Science, Amsterdam, 573--605.Google ScholarGoogle Scholar
  26. Jokinen, K. 2009a. Constructive Dialogue Modelling. Rational Agents and Speech Interfaces. John Wiley, Chichester. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jokinen, K. 2009b. Gaze and gesture activity in communication. In Proceedings of the 5th International Conference of Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments (UAHCI'09), HCI International. C. Stephanidis, Ed. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jokinen, K., Nishida, M., and Yamamoto, S. 2009. Eye gaze experiments for conversation monitoring. In Proceedings of the 3rd International Universal Communications Symposium (IUCS'09). ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jokinen, K., Harada, K., Nishida, M., and Yamamoto, S. 2010a. Turn-alignment using eye gaze and speech in conversational interaction. In Proceedings of the European Conference on Speech Communication and Technology (INTERSPEECH'10).Google ScholarGoogle Scholar
  30. Jokinen, K., Nishida, M., and Yamamoto, S. 2010b. Collecting and annotating conversational eye gaze data. In Proceedings of the Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality at the Language Resources and Evaluation Conference (LREC'10).Google ScholarGoogle Scholar
  31. Jokinen, K., Harada, K., Nishida, M., and Yamamoto, S. 2010c. On eye gaze and turn taking. In Proceedings of the International Conference on Intelligent User Interfaces Workshop on Eye Gaze in Intelligent Human-Machine Interaction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kendon, A. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarGoogle ScholarCross RefCross Ref
  33. Kendon, A. 2004. Gesture: Visual Action as Utterance. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  34. Kipp, M. 2001. Anvil -- A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1367--1370.Google ScholarGoogle Scholar
  35. Kita, S. 2000. How representational gestures help speaking. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, 162--185.Google ScholarGoogle Scholar
  36. Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., and Den, Y. 1998. An analysis of turn taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. Lang. Speech 41, 295--321.Google ScholarGoogle ScholarCross RefCross Ref
  37. Lee, J., Marsella, T., Traum, D., Gratch, J., and Lance, B. 2007. The rickel gaze model: A window on the mind of a virtual human. In Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA'07). Lecture Notes in Artificial Intelligence, vol. 4722, Springer, 296--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Morency, L.-P., Christoudias, C. H., and Darrell, T. 2006. Recognizing gaze aversion gestures in embodied conversational discourse. In Proceedings of the 8th International Conference on Multimodal Interfaces (ICMI'06). F. K. H. Quek, J. Yang, D. W. Massaro, A. A. Alwan, and T. J. Hazen, Eds., 287--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nakano, Y. and Nishida, T. 2007. Attentional behaviours as nonverbal communicative signals in situated interactions with conversational agents. In Engineering Approaches to Conversational Informatics, T. Nishida, Ed., John Wiley.Google ScholarGoogle Scholar
  40. Noguchi, H. and Den, Y. 1998. Prosody-based detection of the context of backchannel responses. In Proceedings of the 5th International Conference on Spoken Language Processing. 487--490.Google ScholarGoogle Scholar
  41. Novick, D. G., Hansen, B., and Ward, K. 1996. Coordinating turn taking with gaze. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96). 1888--1891.Google ScholarGoogle Scholar
  42. Ohsuga, T., Nishida, M., Horiuchi, Y., and Ichikawa, A. 2005. Investigation of the relationship between turn taking and prosodic features in spontaneous dialogue. In Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH'05). 33--36.Google ScholarGoogle Scholar
  43. Padilha, E. G. and Carletta, J. C. 2003. Nonverbal behaviours improving a simulation of small group discussion. In Proceedings of the 1st International Nordic Symposium of Multi-Modal Communication.Google ScholarGoogle Scholar
  44. Pickering, M. and Garrod, S. 2004. Towards a mechanistic psychology of dialogue. Behav. Brain Sci. 27, 169--226Google ScholarGoogle ScholarCross RefCross Ref
  45. Prasov, Z. and Chai, J. 2008. What's in a gaze? The role of eye gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 12th ACM International Conference on Intelligent User Interfaces (IUI'08). 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Qu, S. and Chai, J. 2009. The role of interactivity in human-machine conversation for automatic word acquisition. In Proceedings of the 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL'09). 188--195 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Rietveld, T. and Van Hout, R. 1993. Statistical Techniques for the Study of Language and Language Behaviour. Mouton de Gruyter.Google ScholarGoogle Scholar
  48. Rizzolatti, G. and Arbib, M. A. 1998. Language within our grasp. Trends Neurosci. 21, 5, 188--194.Google ScholarGoogle ScholarCross RefCross Ref
  49. Sacks, H., Schegloff, E. A., and Jefferson, G. 1974. A simplest systematics for the organization of turn taking for conversation, Lang. 50, 696--735.Google ScholarGoogle ScholarCross RefCross Ref
  50. Sato, R., Higashinaka, R., Tamoto, M., Nakano, M., and Aikawa, K. 2002. Learning decision trees to determine turn taking by spoken dialogue systems. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP'02). 861--864.Google ScholarGoogle Scholar
  51. Schlangen, D. 2006. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of the InterSpeech'06 Conference Panel on Prosody of Dialouge Acts and Turn-Taking.Google ScholarGoogle Scholar
  52. Selfridge, E. and Heeman, P. 2010. Importance-driven turn-bidding for spoken dialogue systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Selting, M. 2000. The construction of units in conversational talk. Lang. Soc. 29, 477--517.Google ScholarGoogle ScholarCross RefCross Ref
  54. Sidner, C. L., Lee, C., Kidd, C., Lesh, N., and Rich, C. 2005. Explorations in engagement for humans and robots. Artif. Intell. 166, 1--2, 140--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Skantze, G. and Schlangen, D. 2009. Incremental dialogue processing in a micro-domain. In Proceedings of the 12th Conference of the of the European Chapter of the Association for Computational Linguistics (EACL'09). 745--753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tomasello, M. 2008. The Origins of Human Communication. MIT Press.Google ScholarGoogle Scholar
  57. Traum, D. R. 1999. Computational models of grounding in collaborative systems. In Working Notes of the AAAI Fall Symposium on Psychological Models of Communication, 124--131.Google ScholarGoogle Scholar
  58. Traum, D. R. and Heeman, P. A. 1997. Utterance units in spoken dialogue. In Dialogue Processing in Spoken Language Systems, E. Maier, M. Mast, and S. LuperFoy, Eds., Springer, 125--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Wennerström, A. and Siegel, A. F. 2003. Keeping the floor in multiparty conversations: Intonation, syntax, and pause. Discourse Process. 36, 2, 77--107.Google ScholarGoogle ScholarCross RefCross Ref
  60. Vertegaal, R., Slagter, R., van Deer Veer G., and Nijholt, A. 2001. Eye gaze patterns in conversations: There is more the conversational agents than meets the eyes. In Proceedings of the ACM CHI Human Factors in Computing Systems Conference (CHI'01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques 2nd Ed. Morgan Kaufmann, San Francisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yngve, V. H. 1970. On getting a word in edgewise. In Papers from the 6th Regional Meeting of the Chicago Linguistics Society. 567--577.Google ScholarGoogle Scholar

Index Terms

  1. Gaze and turn-taking behavior in casual conversational interactions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Interactive Intelligent Systems
      ACM Transactions on Interactive Intelligent Systems  Volume 3, Issue 2
      Special issue on interaction with smart objects, Special section on eye gaze and conversation
      July 2013
      150 pages
      ISSN:2160-6455
      EISSN:2160-6463
      DOI:10.1145/2499474
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 August 2013
      • Accepted: 1 August 2012
      • Revised: 1 June 2012
      • Received: 1 December 2010
      Published in tiis Volume 3, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader