skip to main content
10.1145/1178745.1178762acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Toward multimodal fusion of affective cues

Published:27 October 2006Publication History

ABSTRACT

During face to face communication, it has been suggested that as much as 70% of what people communicate when talking directly with others is through paralanguage involving multiple modalities combined together (e.g. voice tone and volume, body language). In an attempt to render humancomputer interaction more similar to human-human communication and enhance its naturalness, research on sensory acquisition and interpretation of single modalities of human expressions have seen ongoing progress over the last decade. These progresses are rendering current research on artificial sensor fusion of multiple modalities an increasingly important research domain in order to reach better accuracy of congruent messages on the one hand, and possibly to be able to detect incongruent messages across multiple modalities (incongruency being itself a message about the nature of the information being conveyed). Accurate interpretation of emotional signals - quintessentially multimodal - would hence particularly benefit from multimodal sensor fusion and interpretation algorithms. In this paper we provide a state of the art multimodal fusion and describe one way to implement a generic framework for multimodal emotion recognition. The system is developed within the MAUI framework [31] and Scherer's Component Process Theory (CPT) [49, 50, 51, 24, 52], with the goal to be modular and adaptive. We want the designed framework to be able to accept different single and multi modality recognition systems and to automatically adapt the fusion algorithm to find optimal solutions. The system also aims to be adaptive to channel (and system) reliability.

References

  1. P. Aleksic and A. Katsaggelos. Product hmms for audio-visual continuous speech recognition using facial, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bartneck, J. Reichenbach, and A. van Breemen. In your face, robot! the influence of a character's embodiment on how users perceive its emotional expressions. In Proceedings of the Fourth International Conference on Design & Emotion, Ankara, Turquey, July 2004.Google ScholarGoogle Scholar
  3. C. Besson, D. Graf, I. Hartung, B. Kropfhusser, and S. Voisard. The importance of non-verbal communication in professional interpretation, 2004.Google ScholarGoogle Scholar
  4. R. A. Bolt. Put-that-there: Voice and gesture at the graphics interface. In SIGGRAPH '80: Proceedings of the 7th annual conference on Computer graphics and interactive techniques, pages 262--270, New York, NY, USA, 1980. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on multimodal interfaces (ICMI '04), pages 205--211, State College, PA, USA, 2004. ACM Press, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Chen, H. Tao, T. Huang, T. Miyasato, and R. Nakatsu. Emotion recognition from audiovisual information. In Proceedings, IEEE Workshop on Multimedia Signal Processing, pages 83--88, Los Angeles, CA, USA, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Colmenarez, B. Frey, and T. Huang. Embedded face and facial expression recognition. In Proceedings of ICIP 1999, volume 1, pages 633--637, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Corradini, M. Mehta, N. Bernsen, and J.-C. Martin. Multimodal input fusion in human-computer interaction on the example of the on-going nice project. In Proceedings of the NATO-ASI conference on Data Fusion for Situation Monitoring, Incident Detection, Alert and Response Management, Yerevan (Armenia), August 2003.Google ScholarGoogle Scholar
  9. A. Duminuco, C. Liu, D. Kryze, and L. Rigazio. Flexible feature spaces based on generalized heteroscedastic linear discriminant analysis. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2006.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Ekman. Universals and cultural differences in facial expressions of emotion. J. K. Cole, editor, In Proceeding of Nebraska Symposium on Motivation, volume 19, pages 207--283, Lincoln (NE), 1971. Lincoln: University of Nebraska Press.Google ScholarGoogle Scholar
  11. P. Ekman, W. V. Friesen, and J. C. Hager. Facial Action Coding System Invistagator's Guide. A Human Face, 2002.Google ScholarGoogle Scholar
  12. P. Ekman and F. W. Facial Action Coding System. Palo Alto (CA), 1978.Google ScholarGoogle Scholar
  13. T. Fong, I. Nourbakhsh, and K. Dautenhahn. A survey of socially interactive robots. Robotics and Autonomous Systems, 42, 2002.Google ScholarGoogle Scholar
  14. A. Grizard, M. Paleari, and C. Lisetti. Adaptating psychologically grounded facial emotional expressions to different platforms. In Proceedings of KI06 26th German Annual Conference in Artificial Intelligence, Bremen, Germany, 2006.Google ScholarGoogle Scholar
  15. A. Haag, S. Goronzy, P. Schaich, and J. Williams. Emotion recognition using biosensors: First steps towards an automatic system. In Proceedings of LNCS, pages 36--48, 2004.Google ScholarGoogle Scholar
  16. R. Huber, A. Batliner, J. Buckov, E. Noth, V. Warnke, and H. Niemann. Recognition of emotions in realistic dialog scenario. In Proceedings of ICSLP 2000, pages 665--668, 2000.Google ScholarGoogle Scholar
  17. I.Poggi, C.Pelachaud, F. de Rosis, V. Carofiglio, and B. D. Carolis. Multimodal Intelligent Information Presentation, chapter GRETA. A Believable Embodied Conversational Agent. Kluwer, 2005.Google ScholarGoogle Scholar
  18. H. Ishiguro. 2006-2056 projects and vision in robotics. In Proceedings of 50 years AI Symposium at KI06 26th German Annual Conference in Artificial Intelligence, Bremen, Germany, 2006.Google ScholarGoogle Scholar
  19. T. Kang, C. Han, S. Lee, D. Youn, and C. Lee. Speaker dependent emotion recognition using speech signals. In Proceedings of ICSLP 2000, pages 383--386, 2000.Google ScholarGoogle Scholar
  20. S. Kettebekov and R. Sharma. Toward multimodal interpretation in a natural speecmgesture interface. In Proceedings of IEEE Symposium on Image, Speech, and Natural Language Systems, pages 328--335. IEEE, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Kim, S. Bang, and S. Kim. Emotion recognition system using short-term monitoring of physiological signals. Medical and Biological Engineering and Computing, 42, 2004.Google ScholarGoogle Scholar
  22. B. J. A. Kröse, J. M. Porta, A. J. N. van Breemen, K. Crucq, M. Nuttin, and E. Demeester. Lino, the user-interface robot. In EUSAI, pages 264--274, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  23. H. Leventhal. A perceptual-motor theory of emotion. Journal of Advances in Experimental Social Psychology, 17:117--182, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  24. H. Leventhal and K. R. Scherer. The relationship of emotion to cognition: A functional approach to a semantic controversy. Cognition and Emotion, 1:3--28, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Li and Q. Ji. Active affective state detection and user assistance with dynamic bayesian networks. In IEEE Transactions on Systems, Man , and Cybernetics. Part A: Systems and Humans, volume 35, pages 93--105. IEEE, January 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Li and Y. zhao. Recognition of emotions in speech using short term and long term features. In Proceedings of ICSLP 1998, pages 2255--2258, 1998Google ScholarGoogle Scholar
  27. H. Liao. Multimodal Fusion. Master's thesis, University of Cambridge, july 2002.Google ScholarGoogle Scholar
  28. C. Lisetti and F. Nasoz. Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on Applied Signal Processing, 11:16721687, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. L. Lisetti and P. J. Gmytrasiewicz. Emotions and personality in agent design. In Proceedings of AAMAS 2002, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. L. Lisetti and A. Maurpang. Bdi+e framework: An affective cognitive modeling for autonomous agents based on scherers emotion theory. In Proceedings of KI06 26th German Annual Conference in Artificial Intelligence, Bremen, Germany, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. L. Lisetti and F. Nasoz. Maui: a multimodal affective user interface. In Proceedings of the ACM Multimedia International Conference 2002, Juan les Pins, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Mallauran, F. Dugelay, J.L.and Perronnin, and C. Garcia. Online face detection and user authentication. In Proceedings of the ACM Multimedia Conference 2005, Singapore, Nov 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Mase. Recognition of facial expression from optical flow. In Proceedings of IEICE Transactions, volume E74, pages 3474--3483, 1991.Google ScholarGoogle Scholar
  34. F. Matta and J. Dugelay. Towards person recognition using head dynamics. In Proceedings of ISPA 2005, 4th International Symposium on Image and Signal Processing and Analysis, Zagreb, Croatia, September 2005.Google ScholarGoogle ScholarCross RefCross Ref
  35. A. Mehrabian. Silent Messages. Wadsworth Publishing Company, Inc, Belmont, CA, 1971.Google ScholarGoogle Scholar
  36. A. Mehrabian. Nonverbal Communication. Aldine-Atherton, Chicago, 1972.Google ScholarGoogle Scholar
  37. G. Merola and I. Poggi. Multimodality and gestures in the teachers communication. In Lecture Notes in Computer Science, volume 2915, pages 101--111, Feb 2004.Google ScholarGoogle Scholar
  38. R. Murphy, C. Lisetti, L. Irish, R. Tardif, and A. Gage. Emotion-based control of cooperating heterogeneous mobile robots. IEEE Transactions on Robotics and Automation: special issue on Multi-robots Systems, 2001.Google ScholarGoogle Scholar
  39. M. Paleari and C. Lisetti. Psychologically grounded avatar expressions. In Proceedings of KI06 26th German Annual Conference in Artificial Intelligence, Bremen, Germany, 2006.Google ScholarGoogle Scholar
  40. M. Pantic and L. Rothkrantz. Automatic analysis of facial expression: The state of the art. IEEE Transaction, Issue on Pattern Analisys and Machine Intelligence, volume 22, pages 1424--1445, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Pantic and L. Rothkrantz. Expert systems for automatic analysis of facial expression. Image, Vision and Computing Journal, 18:881--905, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  42. M. Pantic and L. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. In Proceedings of IEEE, volume 91, pages 1370--1390. IEEE, September 2003.Google ScholarGoogle ScholarCross RefCross Ref
  43. C. Pelachaud, V. Carofiglio, and I. Poggi. Embodied contextual agent in information delivering application. In Proceedings of the First International Joint Conference on Autonomous Agents & Multi-Agent Systems, Bologna, Italy, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Picard. Affective Computing. MIT Press, Cambridge (MA), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. S. Rao and M. P. Georgeff. Modeling rational agents within a bdi-architecture. J. Allen, R. Fikes, and E. Sandewall, editors, In Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning (KR'91, pages 473--484, 1991.Google ScholarGoogle Scholar
  47. A. S. Rao and M. P. Georgeff. Bdi agents: From theory to practice. In Proceedings of 1st International Conference on Multi-Agent Systems, ICMAS--95), page 312--319, San Francisco, CA., 1995.Google ScholarGoogle Scholar
  48. H. Sato, Y. Mitsukura, M. Fukumi, and N. Akamatsu. Emotional speech classification with prosodic parameters using neural networks. In Proceedings of Australian and NewZealand Intelligner Information System Conference, pages 395--398, 2001.Google ScholarGoogle Scholar
  49. K. R. Scherer. Emotion as a process: Function, origin and regulation. Social Science Information, 21:555--570, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  50. K. R. Scherer. Emotions can be rational. Social Science Information, 24(2):331--335, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  51. K. R. Scherer. Toward a dynamic theory of emotion: The component process model of affective states. Geneva Studies in Emotion and Communication, 1(1), 1--98, 1987.Google ScholarGoogle Scholar
  52. K. R. Scherer. Appraisal processes in emotion: Theory, methods, research, chapter Appraisal Considered as a Process of Multilevel Sequential Checking, pages 92--120. New York, NY, US: Oxford University Press, 2001.Google ScholarGoogle Scholar
  53. N. Sebe, I. Cohen, and T. Huang. Multimodal emotion recognition. World Scientific, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  54. N. Sebe, M. Lew, I. Cohen, A. Garg, and T. Huang. Emotion recognition using a cauchy naive bayes classifier. In Proceedings of ICPR 2002, volume 1, pages 17--20, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. R. Sharma, V. Pavlovic, and T. Huang. Toward multimodal human-computer interface. In Proceedings of the IEEE, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  56. V. Tyagi and C. Wellekens. Adaptive enhancement of speech signals for robust ASR. In ASIDE 2005, COST278 Final Workshop and ISCA Tutorial and Research Workshop, Aalborg, Denmark, Nov 2005.Google ScholarGoogle Scholar
  57. Haptek website: www.haptek.com, 2006.Google ScholarGoogle Scholar
  58. iCat website at Philips: www.research.philips.com/robotics, 2006.Google ScholarGoogle Scholar
  59. A. van Breemen. Animation engine for believable interactive user-interface robots. In Proceedings of IROS 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, September 2004.Google ScholarGoogle ScholarCross RefCross Ref
  60. A. van Breemen. Bringing robots to life: Applying principles of animation to robots. In Proceedings of Shapping Human-Robot Interaction workshop held at CHI 2004, Vienna, Austria, 2004.Google ScholarGoogle Scholar
  61. A. van Breemen. icat: Experimenting with animabotics. In Proceedings of AISB, pages 27--32, 2005.Google ScholarGoogle Scholar
  62. O. Villon and C. L. Lisetti. Toward building adaptive users psycho-physiological maps of emotions using bio-sensors. In Proceedings of KI06 26th German Annual Conference in Artificial Intelligence, Bremen, Germany, 2006.Google ScholarGoogle Scholar
  63. Y. Wu and T. Huang. Vision-based gesture recognition: A review. Lecture Notes in Computer Science, 1739:103+, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Toward multimodal fusion of affective cues

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              HCM '06: Proceedings of the 1st ACM international workshop on Human-centered multimedia
              October 2006
              138 pages
              ISBN:1595935002
              DOI:10.1145/1178745

              Copyright © 2006 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 October 2006

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Upcoming Conference

              MM '24
              MM '24: The 32nd ACM International Conference on Multimedia
              October 28 - November 1, 2024
              Melbourne , VIC , Australia

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader