skip to main content
research-article

Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings

Published:05 May 2016Publication History
Skip Abstract Section

Abstract

In multiparty meetings, participants need to predict the end of the speaker’s utterance and who will start speaking next, as well as consider a strategy for good timing to speak next. Gaze behavior plays an important role in smooth turn-changing. This article proposes a prediction model that features three processing steps to predict (I) whether turn-changing or turn-keeping will occur, (II) who will be the next speaker in turn-changing, and (III) the timing of the start of the next speaker’s utterance. For the feature values of the model, we focused on gaze transition patterns and the timing structure of eye contact between a speaker and a listener near the end of the speaker’s utterance. Gaze transition patterns provide information about the order in which gaze behavior changes. The timing structure of eye contact is defined as who looks at whom and who looks away first, the speaker or listener, when eye contact between the speaker and a listener occurs. We collected corpus data of multiparty meetings, using the data to demonstrate relationships between gaze transition patterns and timing structure and situations (I), (II), and (III). The results of our analyses indicate that the gaze transition pattern of the speaker and listener and the timing structure of eye contact have a strong association with turn-changing, the next speaker in turn-changing, and the start time of the next utterance. On the basis of the results, we constructed prediction models using the gaze transition patterns and timing structure. The gaze transition patterns were found to be useful in predicting turn-changing, the next speaker in turn-changing, and the start time of the next utterance. Contrary to expectations, we did not find that the timing structure is useful for predicting the next speaker and the start time. This study opens up new possibilities for predicting the next speaker and the timing of the next utterance using gaze transition patterns in multiparty meetings.

References

  1. Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010. WEKA--experiences with a Java open-source project. Journal of Machine Learning Research 11, 2533--2541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lei Chen and Mary P. Harper. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anthony J. Conger. 1980. Integration and generalization of kappas for multiple raters. Psychological Bulletin 88, 2, 322--328.Google ScholarGoogle ScholarCross RefCross Ref
  4. Iwan de Kok and Dirk Heylen. 2009. Multimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alfred Dielmann, Giulia Garau, and Hervé Bourlard. 2010. Floor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.Google ScholarGoogle Scholar
  6. Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23, 2, 283--292.Google ScholarGoogle ScholarCross RefCross Ref
  7. Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human--computer dialog. In Proceedings of the Annual Conference on the International Speech Communication Association, Vol. 3. 2061--2064.Google ScholarGoogle Scholar
  8. Daniel Gatica-Perez. 2006. Analyzing group interactions in conversations: A review. In Proceedings of the International Conference on Multisensor Fusion and Integration for Intelligent Systems. 41--46.Google ScholarGoogle ScholarCross RefCross Ref
  9. Shelby J. Haberman. 1973. The analysis of residuals in cross-classified tables. Biometrics 29, 205--220.Google ScholarGoogle ScholarCross RefCross Ref
  10. Lixing Huang, Louis-Philippe Morency, and Jonathan Gratch. 2011. A multimodal end-of-turn prediction model: Learning from para social consensus sampling. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and turn-taking behavior in casual conversational interactions. ACM Transactions on Interactive Intelligent Systems 3, 2, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  13. Tatsuya Kawahara, Takuma Iwatate, and Katsuya Takanashii. 2012. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In Proceedings of the Annual Conference on the International Speech Communication Association.Google ScholarGoogle Scholar
  14. S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K. R. Krishna Murthy. 2001. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13, 3, 637--649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Adam Kendon. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. In Language and Speech, Vol. 41. 295--321.Google ScholarGoogle ScholarCross RefCross Ref
  17. Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011. A single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.Google ScholarGoogle ScholarCross RefCross Ref
  18. Gina-Anne Levow. 2005. Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop on Chinese Language Processing.Google ScholarGoogle Scholar
  19. Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a map task dialogue system. Computer Speech and Language 28, 4, 903--922.Google ScholarGoogle ScholarCross RefCross Ref
  20. Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the International Conference on Intelligent Virtual Agents. 176--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kazuhiro Otsuka. 2011. Conversational scene analysis. IEEE Signal Processing Magazine 28, 127--131.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kazuhiro Otsuka, Shoko Araki, Dan Mikami, Kentaro Ishizuka, Masakiyo Fujimoto, and Junji Yamato. 2009. Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors. In Proceedings of the International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multimodal Interaction. 219--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974. A simplest systematics for the organisation of turn taking for conversation. Language 50, 696--735.Google ScholarGoogle ScholarCross RefCross Ref
  24. David Schlangen. 2006. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21.Google ScholarGoogle Scholar
  25. Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3, 199--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley, New York.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Interactive Intelligent Systems
        ACM Transactions on Interactive Intelligent Systems  Volume 6, Issue 1
        Special Issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 2 of 2), Regular Articles and Special Issue on Highlights of IUI 2015 (Part 1 of 2)
        May 2016
        219 pages
        ISSN:2160-6455
        EISSN:2160-6463
        DOI:10.1145/2896319
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 May 2016
        • Accepted: 1 February 2016
        • Revised: 1 January 2016
        • Received: 1 December 2014
        Published in tiis Volume 6, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader