research-article

Context-based recognition during human interactions: automatic feature selection and encoding dictionary

Authors:
Louis-Philippe Morency

USC Institute for Creative Technologies, Marina del Rey, CA, USA

USC Institute for Creative Technologies, Marina del Rey, CA, USA
View Profile

,
Iwan de Kok

University of Twente, Enschede, Netherlands

University of Twente, Enschede, Netherlands
View Profile

,
Jonathan Gratch

USC Institute for Creative Technologies, Marina del Rey, CA, USA

USC Institute for Creative Technologies, Marina del Rey, CA, USA
View Profile

ICMI '08: Proceedings of the 10th international conference on Multimodal interfacesOctober 2008Pages 181–188https://doi.org/10.1145/1452392.1452426

Published:20 October 2008Publication History

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

Pages 181–188

ABSTRACT

During face-to-face conversation, people use visual feedback such as head nods to communicate relevant information and to synchronize rhythm between participants. In this paper we describe how contextual information from other participants can be used to predict visual feedback and improve recognition of head gestures in human-human interactions. For example, in a dyadic interaction, the speaker contextual cues such as gaze shifts or changes in prosody will influence listener backchannel feedback (e.g., head nod). To automatically learn how to integrate this contextual information into the listener gesture recognition framework, this paper addresses two main challenges: optimal feature representation using an encoding dictionary and automatic selection of optimal feature-encoding pairs. Multimodal integration between context and visual observations is performed using a discriminative sequential model (Latent-Dynamic Conditional Random Fields) trained on previous interactions. In our experiments involving 38 storytelling dyads, our context-based recognizer significantly improved head gesture recognition performance over a vision-only recognizer.

References

J. Allwood, J. Nivre, and E. Ahlsén. On the semantics and pragmatics of linguistic feedback. Journal of Semantics, pages 1--26, 1992.Google ScholarCross Ref
H. Anderson, M. Bader, E. Bard, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, C. Sotillo, H. Thompson, and R. Weinert. The mcrc map task corpus. Language and Speech, 34(4):351--366, 1991.Google ScholarCross Ref
J. B. Bavelas, L. Coates, and T. Johnson. Listeners as co-narrators. Journal of Personality and Social Psychology, 79(6):941--952, 2000.Google ScholarCross Ref
J. K. Burgoon, L. A. Stern, and L. Dillman. Interpersonal adaptation: Dyadic interaction patterns. Cambridge University Press, Cambridge, 1995.Google ScholarCross Ref
N. Cathcart, J. Carletta, and E. Klein. A shallow model of backchannel continuers in spoken dialogue. In European ACL, pages 51--58, 2003. Google ScholarDigital Library
H. H. Clark. Using Language. Cambridge University Press, 1996.Google Scholar
S. Fujie, Y. Ejiri, K. Nakajima, Y. Matsusaka, and T. Kobayashi. A conversation robot using head gesture recognition as para-linguistic information. In RO-MAN, pages 159--164, September 2004.Google ScholarCross Ref
S. Igor, S. Petr, M. Pavel, B. LukáŽ, F. Michal, K. Martin, and C. Jan. Comparison of keyword spotting approaches for informal continuous speech. In MLMI, 2005.Google Scholar
E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner. Mutual disambiguation of 3d multimodal interaction in augmented and virtual reality. In ICMI, pages 12--19, November 2003. Google ScholarDigital Library
A. Kapoor and R. Picard. A real-time head nod and shake detector. In PUI, November 2001. Google ScholarDigital Library
S. Kawato and J. Ohya. Real-time detection of nodding and head-shaking by directly detecting and tracking the 'between-eyes'. In FG, pages 40--45, 2000. Google ScholarDigital Library
L.-P. Morency and T. Darrell. Recognizing gaze aversion gestures in embodied conversational discourse. In ICMI, Banff, Canada, November 2006. Google ScholarDigital Library
L.-P. Morency and T. Darrell. Conditional sequence model for context-based recognition of gaze aversion. In MLMI, 2007. Google ScholarDigital Library
L.-P. Morency, C. Sidner, C. Lee, and T. Darrell. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence, 171(8-9):568--585, June 2007. Google ScholarDigital Library
R. Nishimura, N. Kitaoka, and S. Nakagawa. A spoken dialog system for chat-like conversations considering response timing. LNCS, 4629:599--606, 2007. Google ScholarDigital Library
A. Rizzo, D. Klimchuk, R. Mitura, T. Bowerly, J. Buckwalter, and T. Parsons. A virtual reality scenario for all seasons: The virtual classroom. CNS Spectrums, 11(1):35--44, 2006.Google ScholarCross Ref
L. Tickle-Degnen and R. Rosenthal. The nature of rapport and its nonverbal correlates. Psychological Inquiry, 1(4):285--293, 1990.Google ScholarCross Ref
L. Z. Tiedens and A. R. Fragale. Power moves: Complementarity in dominant and submissive nonverbal behavior. Journal of Personality and Social Psychology, 84(3):558--568, 2003.Google ScholarCross Ref
A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place and object recognition. In ICCV, Nice, France, October 2003. Google ScholarDigital Library
N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 23:1177--1207, 2000.Google ScholarCross Ref
Watson: Head tracking and gesture recognition library. http://projects.ict.usc.edu/vision/watson/.Google Scholar
Y. Xiong, F. Quek, and D. McNeill. Hand motion gestural oscillations multimodal discourse. In ICMI, pages 132--139, Vancouver B. C., Canada, November 2003. Google ScholarDigital Library
V. H. Yngve. On getting a word in edgewise. In Sixth regional Meeting of the Chicago Linguistic Society, pages 567--577, 1970.Google Scholar

Index Terms

Context-based recognition during human interactions: automatic feature selection and encoding dictionary
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Motion capture
    2. Natural language processing
      1. Discourse, dialogue and pragmatics
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

Co-occurrence graphs: contextual representation for head gesture recognition during multi-party interactions
UCVP '09: Proceedings of the Workshop on Use of Context in Vision Processing

Head pose and gesture offer several conversational grounding cues and are used extensively in face-to-face interaction among people. To accurately recognize visual feedback, humans often use contextual knowledge from previous and current events to ...
Read More
Context-based conversational hand gesture classification in narrative interaction
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Communicative hand gestures play important roles in face-to-face conversations. These gestures are arbitrarily used depending on an individual; even when two speakers narrate the same story, they do not always use the same hand gesture (movement, ...
Read More
Conditional sequence model for context-based recognition of gaze aversion
MLMI'07: Proceedings of the 4th international conference on Machine learning for multimodal interaction

Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during interaction, people often use contextual knowledge from previous and current ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces
October 2008
322 pages
ISBN:9781605581989
DOI:10.1145/1452392
General Chairs:
Vassilis Digalakis
TU Crete, Greece
,
Alex Potamianos
TU Crete, Greece
,
Matthew Turk
UC Santa Barbara, USA
,
Program Chairs:
Roberto Pieraccini
SpeechCycle, USA
,
Yuri Ivanov
MERL Research, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contextual information
head nod recognition
human-human interaction
visual gesture recognition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 384
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Context-based recognition during human interactions: automatic feature selection and encoding dictionary

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Co-occurrence graphs: contextual representation for head gesture recognition during multi-party interactions

Context-based conversational hand gesture classification in narrative interaction

Conditional sequence model for context-based recognition of gaze aversion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Context-based recognition during human interactions: automatic feature selection and encoding dictionary

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Co-occurrence graphs: contextual representation for head gesture recognition during multi-party interactions

Context-based conversational hand gesture classification in narrative interaction

Conditional sequence model for context-based recognition of gaze aversion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media