short-paper

Multimodal Local-Global Ranking Fusion for Emotion Recognition

Authors:
Paul Pu Liang

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Amir Zadeh

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Louis-Philippe Morency

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionOctober 2018Pages 472–476https://doi.org/10.1145/3242969.3243019

Published:02 October 2018Publication History

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 472–476

ABSTRACT

Emotion recognition is a core research area at the intersection of artificial intelligence and human communication analysis. It is a significant technical challenge since humans display their emotions through complex idiosyncratic combinations of the language, visual and acoustic modalities. In contrast to traditional multimodal fusion techniques, we approach emotion recognition from both direct person-independent and relative person-dependent perspectives. The direct person-independent perspective follows the conventional emotion recognition approach which directly infers absolute emotion labels from observed multimodal features. The relative person-dependent perspective approaches emotion recognition in a relative manner by comparing partial video segments to determine if there was an increase or decrease in emotional intensity. Our proposed model integrates these direct and relative prediction perspectives by dividing the emotion recognition task into three easier subtasks. The first subtask involves a multimodal local ranking of relative emotion intensities between two short segments of a video. The second subtask uses local rankings to infer global relative emotion ranks with a Bayesian ranking algorithm. The third subtask incorporates both direct predictions from observed multimodal behaviors and relative emotion ranks from local-global rankings for final emotion prediction. Our approach displays excellent performance on an audio-visual emotion recognition benchmark and improves over other algorithms for multimodal fusion.

References

Harika Abburi, Rajendra Prasath, Manish Shrivastava, and Suryakanth V Gan- gashetty. 2016. Multimodal Sentiment Analysis Using Deep Neural Networks. In International Conference on Mining Intelligence and Knowledge Exploration. Springer, 58--65.Google Scholar
Fernando Alonso-Martin, Maria Malfaz, Joao Sequeira, Javier F. Gorostiza, and Miguel A. Salichs. 2013. A Multimodal Emotion Detection System during Human- Robot Interaction. Sensors 13, 11 (2013), 15549--15581.Google ScholarCross Ref
T. Baltrušaitis, L. Li, and L. P. Morency. 2017. Local-global ranking for fa- cial expression intensity estimation. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) . 111--118.Google Scholar
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Journal of Language Resources and Evaluation 42, 4 (dec 2008), 335--359.Google ScholarCross Ref
Minghai Chen, Sen Wang, Paul Pu Liang, Tadas Baltrušaitis, Amir Zadeh, and Louis-Philippe Morency. 2017. Multimodal Sentiment Analysis with Word-level Fusion and Reinforcement Learning. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI 2017). ACM, New York, NY, USA, 163--171. Google ScholarDigital Library
R. Elliott, Z. Agnew, and J. F. W. Deakin. Medial orbitofrontal cortex codes relative rather than absolute value of financial rewards in humans. European Journal of Neuroscience 27, 9 (????), 2213--2218.Google Scholar
Arpad E. Elo. 1978. The rating of chessplayers, past and present. Arco Pub., New York. http://www.amazon.com/Rating-Chess-Players-Past-Present/dp/ 0668047216Google Scholar
C. A. Frantzidis, C. Bratsas, M. A. Klados, E. Konstantinidis, C. D. Lithari, A. B. Vivas, C. L. Papadelis, E. Kaldoudi, C. Pappas, and P. D. Bamidis. 2010. On the Classification of Emotional Biosignals Evoked While Viewing Affective Pictures: An Integrated Data-Mining-Based Approach for Healthcare Applications. IEEE Transactions on Information Technology in Biomedicine 14, 2 (March 2010), 309--318. Google ScholarDigital Library
A. Graves, A. r. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 6645--6649.Google Scholar
Ralf Herbrich, Tom Minka, and Thore Graepel. 2007. TrueSkill?: A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems 19 , B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, 569--576. http://papers. nips.cc/paper/3079-trueskilltm-a-bayesian-skill-rating-system.pdf Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Lawrence I-Kuei Lin. 1989. A Concordance Correlation-Coefficient To Evaluate Reproducibility. 45 (04 1989), 255--68.Google Scholar
Paul Pu Liang, Ziyin Liu, Amir Zadeh, and Louis-Philippe Morency. 2018. Multi- modal Language Analysis with Recurrent Multistage Fusion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).Google Scholar
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2247--2256. http://aclweb.org/anthology/P18--1209Google ScholarCross Ref
Z. Liu, M. Wu, W. Cao, L. Chen, J. Xu, R. Zhang, M. Zhou, and J. Mao. 2017. A facial expression emotion recognition based human-robot interaction system. IEEE/CAA Journal of Automatica Sinica 4, 4 (2017), 668--676.Google ScholarCross Ref
Mehdi Malekzadeh, Mumtaz Begum Mustafa, and Adel Lahsasna. 2015. A Review of Emotion Regulation in Intelligent Tutoring Systems. Journal of Educa- tional Technology and Society 18, 4 (2015), 435--445. http://www.jstor.org/stable/jeductechsoci.18.4.435Google Scholar
George Miller. 1956. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. (1956). http://cogprints.org/730/ One of the 100 most influential papers in cognitive science: http://cogsci.umn.edu/millennium/final.html.Google Scholar
Sintija Petrovica, Alla Anohina-Naumeca, and HazÃDÂśm Kemal Ekenel. 2017. Emotion Recognition in Affective Tutoring Systems: Collection of Ground-truth Data. Procedia Computer Science 104 (2017), 437--444. ICTE 2016, Riga Technical University, Latvia. Google ScholarDigital Library
Hai Pham, Thomas Manzini, Paul Pu Liang, and Barnabas Poczos. 2018. Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis. In Proceedings of Grand Challenge and Workshop on Human Multi-modal Language (Challenge-HML). Association for Computational Linguistics, Melbourne, Australia, 53--63. http://www.aclweb.org/anthology/W18--3308Google ScholarCross Ref
Johannes Pittermann, Angela Pittermann, and Wolfgang Minker. 2009. Handling Emotions in Human-Computer Dialogues (1st ed.). Springer Publishing Company, Incorporated. Google ScholarDigital Library
Johannes Pittermann, Angela Pittermann, and Wolfgang Minker. 2010. Emotion recognition and adaptation in spoken dialogue systems. International Journal of Speech Technology 13, 1 (01 Mar 2010), 49--60.Google ScholarCross Ref
Shyam Sundar Rajagopalan, Louis-Philippe Morency, Tadas Baltrušaitis, and Roland Goecke. 2016. Extending long short-term memory for multi-view structured learning. In European Conference on Computer Vision.Google ScholarCross Ref
Fabien Ringeval, Andreas Sonderegger, JÃijrgen S. Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions.. In FG. IEEE Computer Society, 1--8.Google Scholar
M. Schuster and K.K. Paliwal. 1997. Bidirectional Recurrent Neural Networks. Trans. Sig. Proc. 45, 11 (Nov. 1997), 2673--2681. Google ScholarDigital Library
Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05). ACM, New York, NY, USA, 399--402. Google ScholarDigital Library
Neil Stewart, Gordon D. A. Brown, and Nick Chater. 2005. Absolute identification by relative judgment. Psychological review 112 4 (2005), 881--911.Google Scholar
Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning Factorized Multimodal Representations. arXiv preprint arXiv:1806.06176 (2018).Google Scholar
Alexandria K. Vail, Joseph F. Grafsgaard, Kristy Elizabeth Boyer, Eric N. Wiebe, and James C. Lester. 2016. Gender Differences in Facial Expressions of Affect During Learning. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization (UMAP '16). ACM, New York, NY, USA, 65--73. Google ScholarDigital Library
N. Veček, M.črepinšek, M. Mernik, and D. Hrnčič. 2014. A comparison between different chess rating systems for ranking evolutionary algorithms. In 2014 Federated Conference on Computer Science and Information Systems. 511--518.Google Scholar
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013).Google Scholar
Hongliang Yu, Liangke Gui, Michael Madaio, Amy Ogan, Justine Cassell, and Louis-Philippe Morency. 2017. Temporally Selective Attention Model for Social and Affective State Recognition in Multimedia Content. In Proceedings of the 2017 ACM on Multimedia Conference (MM '17). ACM, New York, NY, USA, 1743--1751. Google ScholarDigital Library
Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory Fusion Network for Multi-view Sequential Learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018).Google ScholarCross Ref
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018. Multi-attention recurrent network for human communication comprehension. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar

Index Terms

Multimodal Local-Global Ranking Fusion for Emotion Recognition
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

Fine-grained emotion recognition: fusion of physiological signals and facial expressions on spontaneous emotion corpus

The recognition of fine-grained emotions (i.e., happiness, sad, etc.) has shown its importance in a real-world implementation. The emotion recognition using physiological signals is a challenging task due to the precision of the labelled data while using ...
Read More
Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment
Highlights
- A novel RHPRNet for multimodal physiological emotion recognition was proposed.
- A pre-training procedure with scenario adaptation was proposed.
- A loss function of contrastive alignment was employed.
- The results of multiple ...
Abstract
The lack of complementary affective responses from both the central and peripheral nervous systems could limit the performance of emotion recognition with the single-modal physiological signal. However, when integrating multimodalities, a direct ...
Read More
Multimodal Emotion Expressions of Virtual Agents, Mimic and Vocal Emotion Expressions and Their Effects on Emotion Recognition
ACII '13: Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction

Emotional expressions of virtual agents are widely believed to enhance the interaction with the user by utilizing more natural means of communication. However, as a result of the current technology virtual agents are often only able to produce facial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
emotion recognition
multimodal fusion
neural networks
Qualifiers
- short-paper
Conference

Acceptance Rates
ICMI '18 Paper Acceptance Rate63of149submissions,42%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 476
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimodal Local-Global Ranking Fusion for Emotion Recognition

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fine-grained emotion recognition: fusion of physiological signals and facial expressions on spontaneous emotion corpus

Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

Multimodal Emotion Expressions of Virtual Agents, Mimic and Vocal Emotion Expressions and Their Effects on Emotion Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multimodal Local-Global Ranking Fusion for Emotion Recognition

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fine-grained emotion recognition: fusion of physiological signals and facial expressions on spontaneous emotion corpus

Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

Multimodal Emotion Expressions of Virtual Agents, Mimic and Vocal Emotion Expressions and Their Effects on Emotion Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media