ABSTRACT
Recently, automatic emotion recognition has been established as a major research topic in the area of human computer interaction (HCI). Since humans express emotions through various channels, a user's emotional state can naturally be perceived by combining emotional cues derived from all available modalities. Yet most effort has been put into single-channel emotion recognition, while only a few studies with focus on the fusion of multiple channels have been published. Even though most of these studies apply rather simple fusion strategies -- such as the sum or product rule -- some of the reported results show promising improvements compared to the single channels. Such results encourage investigations if there is further potential for enhancement if more sophisticated methods are incorporated. Therefore we apply a wide variety of possible fusion techniques such as feature fusion, decision level combination rules, meta-classification or hybrid-fusion. We carry out a systematic comparison of a total of 16 fusion methods on different corpora and compare results using a novel visualization technique. We find that multi-modal fusion is in almost any case at least on par with single channel classification, though homogeneous results within corpora point to interchangeability between concrete fusion schemes.
- A. Battocchi, F. Pianesi, and D. Goren-Bar. Dafex: Database of facial expressions. In M. T. Maybury, O. Stock, and W. Wahlster, editors, INTETAIN, volume 3814 of Lecture Notes in Computer Science, pages 303--306. Springer, 2005. Google ScholarDigital Library
- G. Caridakis, J. Wagner, A. Raouzaiou, Z. Curto, E. André, and K. Karpouzis. A multimodal corpus for gesture expressivity analysis. Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, LREC, Malta, 2010.Google Scholar
- R. Duin and D. Tax. Experiments with classifier combining rules. In Lecture Notes in Computer Science, volume 1857, pages 16--29. Springer, 2000. Google ScholarDigital Library
- G. Fumera and F. Roli. A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):942--956, 2005. Google ScholarDigital Library
- Y. S. Huang and C. Y. Suen. Behavior-knowledge space method for combination of multiple classifiers. Proc. of IEEE Computer Vision and Pattern Rcog., 20:347--352, 1993.Google ScholarCross Ref
- J. Kim and F. Lingenfelser. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Int. Conf. on Bio-inspired Systems and Signal Processing (Biosignals 2010), 2010.Google Scholar
- C. Küblbeck and A. Ernst. Face detection and tracking in video sequences using the modifiedcensus transformation. Image Vision Comput., 24:564--572, June 2006. Google ScholarDigital Library
- L. I. Kuncheva. Switching between selection and fusion in combining classifiers: An experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 32(2):146--156, 2002. Google ScholarDigital Library
- L. I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281--286, 2002. Google ScholarDigital Library
- A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01, ICCEE'09, pages 598--602, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
- A. K. Seewald and J. Fuernkranz. An evaluation of grading classifiers. In Advances in In Intelligent Data Analysis, 4th International Conference, IDA 2001, Proceedings, volume 10, pages 271--289. Springer, 2001. Google ScholarDigital Library
- G. Shafer. A Mathematical Theory of Evidence. NJ: Princeton Univ. Press, Princeton, 1976.Google Scholar
- K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:115--124, 1999. Google ScholarDigital Library
- T. Vogt and E. André. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In IEEE International Conference on Multimedia & Expo (ICME 2005), 2005.Google ScholarCross Ref
- D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67--82, 1996. Google ScholarDigital Library
- Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1):39--58, January 2009. Google ScholarDigital Library
- Z. Zeng, J. Tu, M. Liu, T. Zhang, N. Rizzolo, Z. Zhang, T. S. Huang, D. Roth, and S. Levinson. Bimodal hci-related affect recognition. In Proceedings of the 6th international conference on Multimodal interfaces, pages 137--143, 2004. Google ScholarDigital Library
- A systematic discussion of fusion techniques for multi-modal affect recognition tasks
Recommendations
Describing Videos using Multi-modal Fusion
MM '16: Proceedings of the 24th ACM international conference on MultimediaDescribing videos with natural language is one of the ultimate goals of video understanding. Video records multi-modal information including image, motion, aural, speech and so on. MSR Video to Language Challenge provides a good chance to study multi-...
A Robust Multi-Modal Emotion Recognition Framework for Intelligent Tutoring Systems
ICALT '11: Proceedings of the 2011 IEEE 11th International Conference on Advanced Learning TechnologiesThis paper presents a multi-modal emotion recognition framework that is capable of estimating the human emotional state through analyzing and fusing a number of non-invasive external cues. The proposed framework consists of a set of data analysis, ...
Multi-modal bioelectrical signal fusion analysis based on different acquisition devices and scene settings: Overview, challenges, and novel orientation
Highlights- Multi-modal bioelectrical signal fusion is reviewed.
- Rationality and challenge ...
AbstractMulti-modal fusion combines multiple modal information to overcome the limitation of incomplete information expressed by a single modality, so as to realize the complementarity of modal information and enhance feature representation. ...
Comments