research-article

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings

Authors:
Ryo Ishii

NTT Corporation, Atsugi-shi, Kanagawa, Japan

NTT Corporation, Atsugi-shi, Kanagawa, Japan
View Profile

,
Shiro Kumano

NTT Corporation, Atsugi-shi, Kanagawa, Japan

NTT Corporation, Atsugi-shi, Kanagawa, Japan
View Profile

,
Kazuhiro Otsuka

NTT Corporation, Atsugi-shi, Kanagawa, Japan

NTT Corporation, Atsugi-shi, Kanagawa, Japan
View Profile

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionNovember 2015Pages 99–106https://doi.org/10.1145/2818346.2820755

Published:09 November 2015Publication History

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 99–106

ABSTRACT

Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a physical behavior that plays an important role in transferring the speaking turn between humans. Recently, a line of research has focused on the relationship between turn-taking and respiration, a biological signal that conveys information about the intention or preliminary action to start to speak. It has been demonstrated that respiration and gaze behavior separately have the potential to allow predicting the next speaker and the next utterance timing in multi-party meetings. As a multimodal fusion to create models for predicting the next speaker in multi-party meetings, we integrated respiration and gaze behavior, which were extracted from different modalities and are completely different in quality, and implemented a model uses information about them to predict the next speaker at the end of an utterance. The model has a two-step processing. The first is to predict whether turn-keeping or turn-taking happens; the second is to predict the next speaker in turn-taking. We constructed prediction models with either respiration or gaze behavior and with both respiration and gaze behaviors as features and compared their performance. The results suggest that the model with both respiration and gaze behaviors performs better than the one using only respiration or gaze behavior. It is revealed that multimodal fusion using respiration and gaze behavior is effective for predicting the next speaker in multi-party meetings. It was found that gaze behavior is more useful for predicting turn-keeping/turn-taking than respiration and that respiration is more useful for predicting the next speaker in turn-taking.

References

R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. WEKA-experiences with a java open-source project. J. Machine Learning Research, 11:2533--2541, 2010. Google ScholarDigital Library
L. Chen and M. P. Harper. Multimodal floor control shift detection. In Proceedings of International Conference Multimodal Interaction, pages 15--22, 2009. Google ScholarDigital Library
I. de Kok and D. Heylen. Multimodal end-of-turn prediction in multi-party meetings. In Proceedings of International Conference Multimodal Interaction, pages 91--98, 2009. Google ScholarDigital Library
A. Dielmann, G. Garau, and H. Bourlard. Floor holder detection and end of speaker turn prediction in meetings. In ISCA, pages 2306--2309, 2010.Google Scholar
L. Ferrer, E. Shriberg, and A. Stolcke. Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In ICSLP, volume 3, pages 2061--2064, 2002.Google Scholar
D. Gatica-Perez. Analyzing group interactions in conversations: a review. In MFI, pages 41--46, 2006.Google ScholarCross Ref
R. Ishii, S. Kumano, and K. Otsuka. Predicting next speaker using head movement in multi-party meetings. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015.Google Scholar
R. Ishii, K. Otsuka, S. Kumano, M. Matsuda, and J. Yamato. Predicting next speaker and timing from gaze transition patterns in multi-party meetings. In Proceedings of International Conference Multimodal Interaction, pages 79--86, 2013. Google ScholarDigital Library
R. Ishii, K. Otsuka, S. Kumano, and J. Yamato. Analysis and modeling of next speaking start timing based on gaze behavior in multi-party meetings. In ICASSP, pages 694--698, 2014.Google ScholarCross Ref
R. Ishii, K. Otsuka, S. Kumano, and J. Yamato. Analysis of respiration for prediction of "who will be next speaker and when?" in multi-party meetings. In Proceedings of International Conference Multimodal Interaction, pages 18--25, 2014. Google ScholarDigital Library
R. Ishii, K. Otsuka, S. Kumano, and J. Yamato. Analysis of timing structure of eye contact in turn-changing. In Proceedings of International Conference on Multimodal Interaction (ICMI), Workshop on Eye gaze in intelligent human machine interaction, pages 15--20, 2014. Google ScholarDigital Library
K. Jokinen, H. Furukawa, M. Nishida, and S. Yamamoto. Gaze and turn-taking behavior in casual conversational interactions. J. TiiS, 3(2):12, 2013. Google ScholarDigital Library
N. Jovanovic, R. op den Akker, and A. Nijholt. Addressee identification in face-to-face meetings. In Conf. European Chapter of the ACL, 2006.Google Scholar
T. Kawahara, T. Iwatate, and K. Takanashii. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In ISCA, 2012.Google Scholar
S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to platt's smo algorithm for svm classifier design. Neural Computation, 13(3):637--649, 2001. Google ScholarDigital Library
A. Kendon. Some functions of gaze direction in social interaction. ActaPsychologica, 26:22--63, 1967.Google Scholar
H. Koiso, Y. Horiuchi, S. Tutiya, A. Ichikawa, and Y. Den. An analysis of turn-taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. In Language and Speech, volume 41, pages 295--321, 1998.Google Scholar
K. Laskowski, J. Edlund, and M. Heldner. A single-port non-parametric model of turn-taking in multi-party conversation. In ICASSP, pages 5600--5603, 2011.Google ScholarCross Ref
G.-A. Levow. Turn-taking in mandarin dialogue: Interactions of tones and intonation. In SIGHAN, 2005.Google Scholar
MIND MEDIA "NeXus-10 MARKII". http://www.mindmedia.info/CMS2014/products/systems/nexus-10-mkii.Google Scholar
K. Otsuka. Conversational scene analysis. IEEE Signal Processing Magazine, 28:127--131, 2011.Google ScholarCross Ref
K. Otsuka, S. Araki, D. Mikami, K. Ishizuka, M. Fujimoto, and J. Yamato. Realtime meeting analysis and 3d meeting viewer based on omnidirectional multimodal sensors. In Proceedings of International Conference on Multimodal Interaction (ICMI), pages 219--220, 2009. Google ScholarDigital Library
H. Sacks, E. A. Schegloff, and G. Jefferson. A simplest systematics for the organisation of turn taking for conversation. Language, 50:696--735, 1974.Google ScholarCross Ref
2D. Schlangen. From reaction to prediction experiments with computational models of turn-taking. In ISCA, pages 17--21, 2006.Google Scholar

Index Terms

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

To build a model for predicting the next speaker and the start time of the next utterance in multi-party meetings, we performed a fundamental study of how respiration could be effective for the prediction model. The results of the analysis reveal that a ...
Read More
Analyzing Gaze Behavior and Dialogue Act during Turn-taking for Estimating Empathy Skill Level
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

We explored the gaze behavior towards the end of utterances and dialogue act (DA), i.e., verbal-behavior information indicating the intension of an utterance, during turn-keeping/changing to estimate empathy skill levels in multiparty discussions. This ...
Read More
Predicting next speaker and timing from gaze transition patterns in multi-party meetings
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

In multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
November 2015
678 pages
ISBN:9781450339124
DOI:10.1145/2818346
General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gaze behavior
multi-party meetings
next-speaker prediction
respiration
turn-taking
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '15 Paper Acceptance Rate52of127submissions,41%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 302
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings

Analyzing Gaze Behavior and Dialogue Act during Turn-taking for Estimating Empathy Skill Level

Predicting next speaker and timing from gaze transition patterns in multi-party meetings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analysis of Respiration for Prediction of "Who Will Be Next Speaker and When?" in Multi-Party Meetings

Analyzing Gaze Behavior and Dialogue Act during Turn-taking for Estimating Empathy Skill Level

Predicting next speaker and timing from gaze transition patterns in multi-party meetings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media