research-article

A systematic discussion of fusion techniques for multi-modal affect recognition tasks

Authors:
Florian Lingenfelser

University of Augsburg, Augsburg, Germany

University of Augsburg, Augsburg, Germany
View Profile

,
Johannes Wagner

University of Augsburg, Augsburg, Germany

University of Augsburg, Augsburg, Germany
View Profile

,
Elisabeth André

University of Augsburg, Augsburg, Germany

University of Augsburg, Augsburg, Germany
View Profile

ICMI '11: Proceedings of the 13th international conference on multimodal interfacesNovember 2011Pages 19–26https://doi.org/10.1145/2070481.2070487

Published:14 November 2011Publication History

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

Pages 19–26

ABSTRACT

Recently, automatic emotion recognition has been established as a major research topic in the area of human computer interaction (HCI). Since humans express emotions through various channels, a user's emotional state can naturally be perceived by combining emotional cues derived from all available modalities. Yet most effort has been put into single-channel emotion recognition, while only a few studies with focus on the fusion of multiple channels have been published. Even though most of these studies apply rather simple fusion strategies -- such as the sum or product rule -- some of the reported results show promising improvements compared to the single channels. Such results encourage investigations if there is further potential for enhancement if more sophisticated methods are incorporated. Therefore we apply a wide variety of possible fusion techniques such as feature fusion, decision level combination rules, meta-classification or hybrid-fusion. We carry out a systematic comparison of a total of 16 fusion methods on different corpora and compare results using a novel visualization technique. We find that multi-modal fusion is in almost any case at least on par with single channel classification, though homogeneous results within corpora point to interchangeability between concrete fusion schemes.

References

A. Battocchi, F. Pianesi, and D. Goren-Bar. Dafex: Database of facial expressions. In M. T. Maybury, O. Stock, and W. Wahlster, editors, INTETAIN, volume 3814 of Lecture Notes in Computer Science, pages 303--306. Springer, 2005. Google ScholarDigital Library
G. Caridakis, J. Wagner, A. Raouzaiou, Z. Curto, E. André, and K. Karpouzis. A multimodal corpus for gesture expressivity analysis. Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, LREC, Malta, 2010.Google Scholar
R. Duin and D. Tax. Experiments with classifier combining rules. In Lecture Notes in Computer Science, volume 1857, pages 16--29. Springer, 2000. Google ScholarDigital Library
G. Fumera and F. Roli. A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):942--956, 2005. Google ScholarDigital Library
Y. S. Huang and C. Y. Suen. Behavior-knowledge space method for combination of multiple classifiers. Proc. of IEEE Computer Vision and Pattern Rcog., 20:347--352, 1993.Google ScholarCross Ref
J. Kim and F. Lingenfelser. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Int. Conf. on Bio-inspired Systems and Signal Processing (Biosignals 2010), 2010.Google Scholar
C. Küblbeck and A. Ernst. Face detection and tracking in video sequences using the modifiedcensus transformation. Image Vision Comput., 24:564--572, June 2006. Google ScholarDigital Library
L. I. Kuncheva. Switching between selection and fusion in combining classifiers: An experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 32(2):146--156, 2002. Google ScholarDigital Library
L. I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281--286, 2002. Google ScholarDigital Library
A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01, ICCEE'09, pages 598--602, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
A. K. Seewald and J. Fuernkranz. An evaluation of grading classifiers. In Advances in In Intelligent Data Analysis, 4th International Conference, IDA 2001, Proceedings, volume 10, pages 271--289. Springer, 2001. Google ScholarDigital Library
G. Shafer. A Mathematical Theory of Evidence. NJ: Princeton Univ. Press, Princeton, 1976.Google Scholar
K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:115--124, 1999. Google ScholarDigital Library
T. Vogt and E. André. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In IEEE International Conference on Multimedia & Expo (ICME 2005), 2005.Google ScholarCross Ref
D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67--82, 1996. Google ScholarDigital Library
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1):39--58, January 2009. Google ScholarDigital Library
Z. Zeng, J. Tu, M. Liu, T. Zhang, N. Rizzolo, Z. Zhang, T. S. Huang, D. Roth, and S. Levinson. Bimodal hci-related affect recognition. In Proceedings of the 6th international conference on Multimodal interfaces, pages 137--143, 2004. Google ScholarDigital Library

A systematic discussion of fusion techniques for multi-modal affect recognition tasks
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

Describing Videos using Multi-modal Fusion
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Describing videos with natural language is one of the ultimate goals of video understanding. Video records multi-modal information including image, motion, aural, speech and so on. MSR Video to Language Challenge provides a good chance to study multi-...
Read More
A Robust Multi-Modal Emotion Recognition Framework for Intelligent Tutoring Systems
ICALT '11: Proceedings of the 2011 IEEE 11th International Conference on Advanced Learning Technologies

This paper presents a multi-modal emotion recognition framework that is capable of estimating the human emotional state through analyzing and fusing a number of non-invasive external cues. The proposed framework consists of a set of data analysis, ...
Read More
Multi-modal bioelectrical signal fusion analysis based on different acquisition devices and scene settings: Overview, challenges, and novel orientation
Highlights
- Multi-modal bioelectrical signal fusion is reviewed.
- Rationality and challenge ...
Abstract
Multi-modal fusion combines multiple modal information to overcome the limitation of incomplete information expressed by a single modality, so as to realize the complementarity of modal information and enhance feature representation. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '11: Proceedings of the 13th international conference on multimodal interfaces
November 2011
432 pages
ISBN:9781450306416
DOI:10.1145/2070481
General Chairs:
Hervé Bourlard
Idiap Research Institute, Switzerland
,
Thomas S. Huang
University of Illinois, USA
,
Enrique Vidal
Universitat Politécnica Valéncia, Spain
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Louis-Philippe Morency
University of Southern California, USA
,
Nicu Sebe
University of Trento, Italy
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
emotion recognition
multi-modal fusion
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 364
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A systematic discussion of fusion techniques for multi-modal affect recognition tasks

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

ABSTRACT

References

Cited By

Recommendations

Describing Videos using Multi-modal Fusion

A Robust Multi-Modal Emotion Recognition Framework for Intelligent Tutoring Systems

Multi-modal bioelectrical signal fusion analysis based on different acquisition devices and scene settings: Overview, challenges, and novel orientation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A systematic discussion of fusion techniques for multi-modal affect recognition tasks

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

ABSTRACT

References

Cited By

Recommendations

Describing Videos using Multi-modal Fusion

A Robust Multi-Modal Emotion Recognition Framework for Intelligent Tutoring Systems

Multi-modal bioelectrical signal fusion analysis based on different acquisition devices and scene settings: Overview, challenges, and novel orientation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media