skip to main content
10.1145/2070481.2070487acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

A systematic discussion of fusion techniques for multi-modal affect recognition tasks

Published:14 November 2011Publication History

ABSTRACT

Recently, automatic emotion recognition has been established as a major research topic in the area of human computer interaction (HCI). Since humans express emotions through various channels, a user's emotional state can naturally be perceived by combining emotional cues derived from all available modalities. Yet most effort has been put into single-channel emotion recognition, while only a few studies with focus on the fusion of multiple channels have been published. Even though most of these studies apply rather simple fusion strategies -- such as the sum or product rule -- some of the reported results show promising improvements compared to the single channels. Such results encourage investigations if there is further potential for enhancement if more sophisticated methods are incorporated. Therefore we apply a wide variety of possible fusion techniques such as feature fusion, decision level combination rules, meta-classification or hybrid-fusion. We carry out a systematic comparison of a total of 16 fusion methods on different corpora and compare results using a novel visualization technique. We find that multi-modal fusion is in almost any case at least on par with single channel classification, though homogeneous results within corpora point to interchangeability between concrete fusion schemes.

References

  1. A. Battocchi, F. Pianesi, and D. Goren-Bar. Dafex: Database of facial expressions. In M. T. Maybury, O. Stock, and W. Wahlster, editors, INTETAIN, volume 3814 of Lecture Notes in Computer Science, pages 303--306. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Caridakis, J. Wagner, A. Raouzaiou, Z. Curto, E. André, and K. Karpouzis. A multimodal corpus for gesture expressivity analysis. Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, LREC, Malta, 2010.Google ScholarGoogle Scholar
  3. R. Duin and D. Tax. Experiments with classifier combining rules. In Lecture Notes in Computer Science, volume 1857, pages 16--29. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Fumera and F. Roli. A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):942--956, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. S. Huang and C. Y. Suen. Behavior-knowledge space method for combination of multiple classifiers. Proc. of IEEE Computer Vision and Pattern Rcog., 20:347--352, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Kim and F. Lingenfelser. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Int. Conf. on Bio-inspired Systems and Signal Processing (Biosignals 2010), 2010.Google ScholarGoogle Scholar
  7. C. Küblbeck and A. Ernst. Face detection and tracking in video sequences using the modifiedcensus transformation. Image Vision Comput., 24:564--572, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. I. Kuncheva. Switching between selection and fusion in combining classifiers: An experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 32(2):146--156, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281--286, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01, ICCEE'09, pages 598--602, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. K. Seewald and J. Fuernkranz. An evaluation of grading classifiers. In Advances in In Intelligent Data Analysis, 4th International Conference, IDA 2001, Proceedings, volume 10, pages 271--289. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Shafer. A Mathematical Theory of Evidence. NJ: Princeton Univ. Press, Princeton, 1976.Google ScholarGoogle Scholar
  13. K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:115--124, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Vogt and E. André. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In IEEE International Conference on Multimedia & Expo (ICME 2005), 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67--82, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1):39--58, January 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Zeng, J. Tu, M. Liu, T. Zhang, N. Rizzolo, Z. Zhang, T. S. Huang, D. Roth, and S. Levinson. Bimodal hci-related affect recognition. In Proceedings of the 6th international conference on Multimodal interfaces, pages 137--143, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. A systematic discussion of fusion techniques for multi-modal affect recognition tasks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '11: Proceedings of the 13th international conference on multimodal interfaces
        November 2011
        432 pages
        ISBN:9781450306416
        DOI:10.1145/2070481

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 November 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader