skip to main content
10.1145/1647314.1647332acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Multimodal end-of-turn prediction in multi-party meetings

Published:02 November 2009Publication History

ABSTRACT

One of many skills required to engage properly in a conversation is to know the appropiate use of the rules of engagement. In order to engage properly in a conversation, a virtual human or robot should, for instance, be able to know when it is being addressed or when the speaker is about to hand over the turn. The paper presents a multimodal approach to end-of-speaker-turn prediction using sequential probabilistic models (Conditional Random Fields) to learn a model from observations of real-life multi-party meetings. Although the results are not as good as expected, we provide insight into which modalities are important when taking a multimodal approach to the problem based on literature and our own results.

References

  1. M. Argyle and M. Cook. Gaze and mutual gaze. Cambridge University Press, London, United Kingdom, 1976.Google ScholarGoogle Scholar
  2. M. Atterer, T. Baumann, and D. Schlangen. Towards incremental end-of-utterance detection in dialogue systems. In Proceedings of International Conference on Computational Linguistics, 2008.Google ScholarGoogle Scholar
  3. P. Barkhuysen, E. Krahmer, and M. Swerts. The interplay between auditory and visual cues for end-of-utterance detection. Journal of Acoustical Society of America, 123(1):354 -- 365, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. P. Boersma and V. van Heuven. Speak and unspeak with praat. Glot International, 5(9-10):341--347, November 2001.Google ScholarGoogle Scholar
  5. J. Cassell, Y. I. Nakano, T. W. Bickmore, C. L. Sidner, and C. Rich. Non-verbal cues for discourse structure. In ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 114--123, Morristown, NJ, USA, 2001. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cassell, J. Sullivan, S. Prevost, and E. F. Churchill. Embodied Conversational Agents. MIT Press, Cambridge Massachusetts, London England, 2000.Google ScholarGoogle Scholar
  7. J. Cassell, O. E. Torres, and S. Prevost. Turn taking vs. discourse structure: How best to model multimodal conversation. In Machine Conversations, pages 143--154. Kluwer, 1998.Google ScholarGoogle Scholar
  8. J. de Ruiter, H. Mitterer, and N. Enfield. Projecting the end of a speaker's turn: A cognitive cornerstone of conversation. Language, 82(3):515 -- 535, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2):283 -- 292, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Duncan and G. Niederehe. On signalling that it's your turn to speak. Journal of Experimental Social Psychology, 10:234--247, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  11. O. Fuentes, D. Vera, and T. Solorio. A filter-based approach to detect end-of-utterances from prosody in dialog systems. In HLT-NAACL (Short Papers), pages 45--48. The Association for Computational Linguistics, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Fung, D. Hakkani-Tur, M. Magimai-Doss, E. Shriberg, S. Cuendet, and N. Mirghafori. Prosodic features and feature selection for multi-lingual sentence segmentation. In Proceedings of Interspeech 2007, pages 2585--2588, 2007.Google ScholarGoogle Scholar
  13. C. Goodwin. Conversational Organization: interaction between speakers and hearers. Academic Press, 1981.Google ScholarGoogle Scholar
  14. D. Heylen. Head gestures, gaze and the principles of conversational structure. International Journal of Humanoid Robotics, 3(3):241--267, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. Heylen. Listening heads. In I. Wachsmuth and G. Knoblich, editors, Modeling Communication with robots and virtual humans, volume 4930 of Lecture Notes in Artificial Intelligence, pages 241--259. Springer Verlag, Berlin, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. http://corpus.amiproject.org. The AMI Meeting Corpus, May 2009.Google ScholarGoogle Scholar
  17. A. Kendon. Some functions of gaze direction in social interaction. Acta Psychologica, 26:22--63, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Laerty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labelling sequence data. In ICML, 2001.Google ScholarGoogle Scholar
  19. T. Minato, Y. Yoshikawa, T. Noda, S. Ikemoto, H. Ishiguro, and M. Asada. CB2: A child robot with biomimetic body for cognitive developmental robotics. In IROS 2008: Proceedings of the IEEE/RSJ 2008 International Conference on Intelligent RObots and Systems, pages 193--200, 2008.Google ScholarGoogle Scholar
  20. L.-P. Morency, I. de Kok, and J. Gratch. Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In ICMI '08: Proceedings of the 10th International Conference on Multimodal Interfaces, pages 181--188, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L.-P. Morency, I. de Kok, and J. Gratch. Predicting listener backchannels: A probabilistic multimodal approach. In Intelligent Virtual Agents (IVA '08), pages 176--190, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. C. O'Connell, S. Kowal, and E. Kaltenbacher. Turn-taking: A critical analysis of the research tradition. Journal of Psycholinguistic Research, 19(6):345 -- 373, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  23. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  24. R. J. Rienks, R. Poppe, and D. Heylen. Di erences in head orientation behavior for speakers and listeners: an experiment in a virtual environment. Transactions on Applied Perception, 7(1):accepted for publication, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Sacks, E. A. Scheglo , and G. Je erson. A simplest systematics for the organization of turn-taking for conversation. Language, 50(4):696 -- 735, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  26. D. Sakamoto, T. Kanda, T. Ono, H. Ishiguro, and N. Hagita. Android as a telecommunication medium with a human-like presence. In HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 193--200, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Schlangen. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of Interspeech 2006, 2006.Google ScholarGoogle Scholar
  28. T. Sikorski and J. F. Allen. A task-based evaluation of the trains-95 dialogue system. In ECAI '96: Workshop on Dialogue Processing in Spoken Language Systems, pages 207--220, London, UK, 1997. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Vertegaal, R. Slagter, G. van der Veer, and A. Nijholt. Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In Proceedings of CHI'01, pages 301 -- 308. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Vertegaal, G. van der Veer, and H. Vons. E ects of gaze on multiparty mediated communication. In Proceedings of Graphics Interface, pages 95 -- 102, Montreal, Canada, 2000. Morgan Kaufmann Publishers.Google ScholarGoogle Scholar
  31. N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 32(8):1177--1207, 2000.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multimodal end-of-turn prediction in multi-party meetings

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
        November 2009
        374 pages
        ISBN:9781605587721
        DOI:10.1145/1647314

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 November 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader