short-paper

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

Authors:
Matthew Roddy

Trinity College Dublin, Dublin, Ireland

Trinity College Dublin, Dublin, Ireland
View Profile

,
Gabriel Skantze

KTH, Stockholm, Sweden

KTH, Stockholm, Sweden
View Profile

,
Naomi Harte

Trinity College Dublin, Dublin, Ireland

Trinity College Dublin, Dublin, Ireland
View Profile

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionOctober 2018Pages 186–190https://doi.org/10.1145/3242969.3242997

Published:02 October 2018Publication History

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 186–190

ABSTRACT

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models. We propose that there is an appropriate temporal granularity at which modalities should be modeled. We design a multiscale RNN architecture to model modalities at separate timescales in a continuous manner. Our results show that modeling linguistic and acoustic features at separate temporal rates can be beneficial for turn-taking modeling. We also show that our approach can be used to incorporate gaze features into turn-taking models.

References

Anne H Anderson, Miles Bader, Ellen Gurman Bard, Elizabeth Boyle, Gwyneth Doherty, Simon Garrod, Stephen Isard, Jacqueline Kowtko, Jan McAllister, Jim Miller, and others . 1991. The HCRC Map Task Corpus. Language and speech Vol. 34, 4 (1991), 351--366.Google Scholar
Tadas Baltruv saitis, Peter Robinson, Louis-Philippe Morency, and others . 2016. OpenFace: An Open Source Facial Behavior Analysis Toolkit 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--10.Google Scholar
Sanjay Bilakhia, Stavros Petridis, Anton Nijholt, and Maja Pantic . 2015. The MAHNOB Mimicry Database: A Database of Naturalistic Human Interactions. Pattern Recognition Letters Vol. 66 (Nov. . 2015), 52--61. Google ScholarDigital Library
Iwan De Kok and Dirk Heylen . 2009. Multimodal End-of-Turn Prediction in Multi-Party Meetings Proceedings of the 2009 International Conference on Multimodal Interfaces. ACM, 91--98. Google ScholarDigital Library
Florian Eyben, Klaus R. Scherer, Bjorn W. Schuller, Johan Sundberg, Elisabeth Andre, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, and Khiet P. Truong . 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing Vol. 7, 2 (April . 2016), 190--202.Google ScholarDigital Library
Florian Eyben, Martin Wöllmer, and Björn Schuller . 2010. Opensmile: The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the International Conference on Multimedia. ACM, 1459--1462. Google ScholarDigital Library
Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke . 2002. Is the Speaker Done yet? Faster and More Accurate End-of-Utterance Detection Using Prosody. In Seventh International Conference on Spoken Language Processing.Google Scholar
Mattias Heldner and Jens Edlund . 2010. Pauses, Gaps and Overlaps in Conversations. Journal of Phonetics Vol. 38, 4 (Oct. . 2010), 555--568.Google ScholarCross Ref
Angelika Maier, Julian Hough, and David Schlangen . 2017. Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems. Proceedings of INTERSPEECH 2017 (2017).Google ScholarCross Ref
Antoine Raux and Maxine Eskenazi . 2009. A Finite-State Turn-Taking Model for Spoken Dialog Systems Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 629--637. Google ScholarDigital Library
Matthew Roddy, Gabriel Skantze, and Naomi Harte . 2018. Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs. In Proceedings of INTERSPEECH 2018. Hyderabad, India, 5.Google ScholarCross Ref
Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson . 1974. A Simplest Systematics for the Organization of Turn-Taking for Conversation. Language Vol. 50, 4 (Dec. . 1974), 696.Google ScholarCross Ref
Gabriel Skantze . 2017. Towards a General, Continuous Model of Turn-Taking in Spoken Dialogue Using LSTM Recurrent Neural Networks Proceedings of SigDial. Saarbrucken, Germany.Google Scholar

Recommendations

Smooth Turn-taking by a Robot Using an Online Continuous Model to Generate Turn-taking Cues
ICMI '19: 2019 International Conference on Multimodal Interaction

Turn-taking in human-robot interaction is a crucial part of spoken dialogue systems, but current models do not allow for human-like turn-taking speed seen in natural conversation. In this work we propose combining two independent prediction models. A ...
Read More
Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

The task of identifying when to take a conversational turn is an important function of spoken dialogue systems. The turn-taking system should also ideally be able to handle many types of dialogue, from structured conversation to spontaneous and ...
Read More
Multimodal end-of-turn prediction in multi-party meetings
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

One of many skills required to engage properly in a conversation is to know the appropiate use of the rules of engagement. In order to engage properly in a conversation, a virtual human or robot should, for instance, be able to know when it is being ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
neural networks
spoken-dialog systems
turn-taking
Qualifiers
- short-paper
Conference

Acceptance Rates
ICMI '18 Paper Acceptance Rate63of149submissions,42%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 461
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Recommendations

Smooth Turn-taking by a Robot Using an Online Continuous Model to Generate Turn-taking Cues

Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios

Multimodal end-of-turn prediction in multi-party meetings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Recommendations

Smooth Turn-taking by a Robot Using an Online Continuous Model to Generate Turn-taking Cues

Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios

Multimodal end-of-turn prediction in multi-party meetings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media