skip to main content
10.1145/3242969.3243019acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Multimodal Local-Global Ranking Fusion for Emotion Recognition

Published:02 October 2018Publication History

ABSTRACT

Emotion recognition is a core research area at the intersection of artificial intelligence and human communication analysis. It is a significant technical challenge since humans display their emotions through complex idiosyncratic combinations of the language, visual and acoustic modalities. In contrast to traditional multimodal fusion techniques, we approach emotion recognition from both direct person-independent and relative person-dependent perspectives. The direct person-independent perspective follows the conventional emotion recognition approach which directly infers absolute emotion labels from observed multimodal features. The relative person-dependent perspective approaches emotion recognition in a relative manner by comparing partial video segments to determine if there was an increase or decrease in emotional intensity. Our proposed model integrates these direct and relative prediction perspectives by dividing the emotion recognition task into three easier subtasks. The first subtask involves a multimodal local ranking of relative emotion intensities between two short segments of a video. The second subtask uses local rankings to infer global relative emotion ranks with a Bayesian ranking algorithm. The third subtask incorporates both direct predictions from observed multimodal behaviors and relative emotion ranks from local-global rankings for final emotion prediction. Our approach displays excellent performance on an audio-visual emotion recognition benchmark and improves over other algorithms for multimodal fusion.

References

  1. Harika Abburi, Rajendra Prasath, Manish Shrivastava, and Suryakanth V Gan- gashetty. 2016. Multimodal Sentiment Analysis Using Deep Neural Networks. In International Conference on Mining Intelligence and Knowledge Exploration. Springer, 58--65.Google ScholarGoogle Scholar
  2. Fernando Alonso-Martin, Maria Malfaz, Joao Sequeira, Javier F. Gorostiza, and Miguel A. Salichs. 2013. A Multimodal Emotion Detection System during Human- Robot Interaction. Sensors 13, 11 (2013), 15549--15581.Google ScholarGoogle ScholarCross RefCross Ref
  3. T. Baltrušaitis, L. Li, and L. P. Morency. 2017. Local-global ranking for fa- cial expression intensity estimation. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) . 111--118.Google ScholarGoogle Scholar
  4. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Journal of Language Resources and Evaluation 42, 4 (dec 2008), 335--359.Google ScholarGoogle ScholarCross RefCross Ref
  5. Minghai Chen, Sen Wang, Paul Pu Liang, Tadas Baltrušaitis, Amir Zadeh, and Louis-Philippe Morency. 2017. Multimodal Sentiment Analysis with Word-level Fusion and Reinforcement Learning. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI 2017). ACM, New York, NY, USA, 163--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Elliott, Z. Agnew, and J. F. W. Deakin. Medial orbitofrontal cortex codes relative rather than absolute value of financial rewards in humans. European Journal of Neuroscience 27, 9 (????), 2213--2218.Google ScholarGoogle Scholar
  7. Arpad E. Elo. 1978. The rating of chessplayers, past and present. Arco Pub., New York. http://www.amazon.com/Rating-Chess-Players-Past-Present/dp/ 0668047216Google ScholarGoogle Scholar
  8. C. A. Frantzidis, C. Bratsas, M. A. Klados, E. Konstantinidis, C. D. Lithari, A. B. Vivas, C. L. Papadelis, E. Kaldoudi, C. Pappas, and P. D. Bamidis. 2010. On the Classification of Emotional Biosignals Evoked While Viewing Affective Pictures: An Integrated Data-Mining-Based Approach for Healthcare Applications. IEEE Transactions on Information Technology in Biomedicine 14, 2 (March 2010), 309--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Graves, A. r. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 6645--6649.Google ScholarGoogle Scholar
  10. Ralf Herbrich, Tom Minka, and Thore Graepel. 2007. TrueSkill?: A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems 19 , B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, 569--576. http://papers. nips.cc/paper/3079-trueskilltm-a-bayesian-skill-rating-system.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lawrence I-Kuei Lin. 1989. A Concordance Correlation-Coefficient To Evaluate Reproducibility. 45 (04 1989), 255--68.Google ScholarGoogle Scholar
  13. Paul Pu Liang, Ziyin Liu, Amir Zadeh, and Louis-Philippe Morency. 2018. Multi- modal Language Analysis with Recurrent Multistage Fusion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).Google ScholarGoogle Scholar
  14. Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2247--2256. http://aclweb.org/anthology/P18--1209Google ScholarGoogle ScholarCross RefCross Ref
  15. Z. Liu, M. Wu, W. Cao, L. Chen, J. Xu, R. Zhang, M. Zhou, and J. Mao. 2017. A facial expression emotion recognition based human-robot interaction system. IEEE/CAA Journal of Automatica Sinica 4, 4 (2017), 668--676.Google ScholarGoogle ScholarCross RefCross Ref
  16. Mehdi Malekzadeh, Mumtaz Begum Mustafa, and Adel Lahsasna. 2015. A Review of Emotion Regulation in Intelligent Tutoring Systems. Journal of Educa- tional Technology and Society 18, 4 (2015), 435--445. http://www.jstor.org/stable/jeductechsoci.18.4.435Google ScholarGoogle Scholar
  17. George Miller. 1956. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. (1956). http://cogprints.org/730/ One of the 100 most influential papers in cognitive science: http://cogsci.umn.edu/millennium/final.html.Google ScholarGoogle Scholar
  18. Sintija Petrovica, Alla Anohina-Naumeca, and HazÃDÂśm Kemal Ekenel. 2017. Emotion Recognition in Affective Tutoring Systems: Collection of Ground-truth Data. Procedia Computer Science 104 (2017), 437--444. ICTE 2016, Riga Technical University, Latvia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hai Pham, Thomas Manzini, Paul Pu Liang, and Barnabas Poczos. 2018. Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis. In Proceedings of Grand Challenge and Workshop on Human Multi-modal Language (Challenge-HML). Association for Computational Linguistics, Melbourne, Australia, 53--63. http://www.aclweb.org/anthology/W18--3308Google ScholarGoogle ScholarCross RefCross Ref
  20. Johannes Pittermann, Angela Pittermann, and Wolfgang Minker. 2009. Handling Emotions in Human-Computer Dialogues (1st ed.). Springer Publishing Company, Incorporated. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Johannes Pittermann, Angela Pittermann, and Wolfgang Minker. 2010. Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology 13, 1 (01 Mar 2010), 49--60.Google ScholarGoogle ScholarCross RefCross Ref
  22. Shyam Sundar Rajagopalan, Louis-Philippe Morency, Tadas Baltrušaitis, and Roland Goecke. 2016. Extending long short-term memory for multi-view structured learning. In European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  23. Fabien Ringeval, Andreas Sonderegger, JÃijrgen S. Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions.. In FG. IEEE Computer Society, 1--8.Google ScholarGoogle Scholar
  24. M. Schuster and K.K. Paliwal. 1997. Bidirectional Recurrent Neural Networks. Trans. Sig. Proc. 45, 11 (Nov. 1997), 2673--2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05). ACM, New York, NY, USA, 399--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Neil Stewart, Gordon D. A. Brown, and Nick Chater. 2005. Absolute identification by relative judgment. Psychological review 112 4 (2005), 881--911.Google ScholarGoogle Scholar
  27. Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning Factorized Multimodal Representations. arXiv preprint arXiv:1806.06176 (2018).Google ScholarGoogle Scholar
  28. Alexandria K. Vail, Joseph F. Grafsgaard, Kristy Elizabeth Boyer, Eric N. Wiebe, and James C. Lester. 2016. Gender Differences in Facial Expressions of Affect During Learning. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization (UMAP '16). ACM, New York, NY, USA, 65--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Veček, M.črepinšek, M. Mernik, and D. Hrnčič. 2014. A comparison between different chess rating systems for ranking evolutionary algorithms. In 2014 Federated Conference on Computer Science and Information Systems. 511--518.Google ScholarGoogle Scholar
  30. Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013).Google ScholarGoogle Scholar
  31. Hongliang Yu, Liangke Gui, Michael Madaio, Amy Ogan, Justine Cassell, and Louis-Philippe Morency. 2017. Temporally Selective Attention Model for Social and Affective State Recognition in Multimedia Content. In Proceedings of the 2017 ACM on Multimedia Conference (MM '17). ACM, New York, NY, USA, 1743--1751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory Fusion Network for Multi-view Sequential Learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018).Google ScholarGoogle ScholarCross RefCross Ref
  33. Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018. Multi-attention recurrent network for human communication comprehension. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google ScholarGoogle Scholar

Index Terms

  1. Multimodal Local-Global Ranking Fusion for Emotion Recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
        October 2018
        687 pages
        ISBN:9781450356923
        DOI:10.1145/3242969

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 October 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        ICMI '18 Paper Acceptance Rate63of149submissions,42%Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader