Top

Published in:

2022 | OriginalPaper | Chapter

Non-Uniform Attention Network for Multi-modal Sentiment Analysis

Authors : Binqiang Wang, Gang Dong, Yaqian Zhao, Rengang Li, Qichun Cao, Yinyin Chao

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Remarkable success has been achieved in the multi-modal sentiment analysis community thanks to the existence of annotated multi-modal data sets. However, coming from three different modalities, text, sound, and vision, establishes significant barriers for better feature fusion. In this paper, we introduce “NUAN”, a non-uniform attention network for multi-modal feature fusion. NUAN is designed based on attention mechanism via considering three modalities simultaneously, but not uniformly: the text is seen as a determinate representation, with the hope that by leveraging the acoustic and visual representation, we are able to inject the effective information into a solid representation, named as tripartite interaction representation. A novel non-uniform attention module is inserted into adjacent time steps in LSTM (Long Shot-Term Memory) and processes information recurrently. The final outputs of LSTM and NUAM are concatenated to a vector, which is imported into a linear embedding layer to output the sentiment analysis result. The experimental analysis of two databases demonstrates the effectiveness of the proposed method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Combining Knowledge and Multi-modal Fusion for Meme Classification

next chapter Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

https://github.com/A2Zadeh/CMU-MultimodalSDK.

https://www.dropbox.com/s/7z56hf9szw4f8m8/cmumosi_cmumosei_iemocap.zip?dl=0.

Hu, M., Chu, Q., Wang, X., He, L., Ren, F.: A two-stage spatiotemporal attention convolution network for continuous dimensional emotion recognition from facial video. IEEE Signal Process. Lett. 28, 698–702 (2021)CrossRef

He, J., Mai, S., Hu, H.: A unimodal reinforced transformer with time squeeze fusion for multimodal sentiment analysis. IEEE Signal Process. Lett. 28, 992–996 (2021)CrossRef

Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017)

Liu, Z., Shen, Y., Lakshminarasimhan, V.B., Liang, P.P., Zadeh, A., Morency, L.P.: Efficient low-rank multimodal fusion with modality-specific factors. arXiv preprint arXiv:1806.00064 (2018)

Abdullah, S.M.S.A., Ameen, S.Y.A., Sadeeq, M.A., Zeebaree, S.: Multimodal emotion recognition using deep learning. J. Appl. Sci. Technol. Trends 2(02), 52–58 (2021)CrossRef

Hazarika, D., Poria, S., Mihalcea, R., Cambria, E., Zimmermann, R.: Icon: interactive conversational memory network for multimodal emotion detection. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2594–2604 (2018)

Nguyen, D., Nguyen, K., Sridharan, S., Ghasemi, A., Dean, D., Fookes, C.: Deep spatio-temporal features for multimodal emotion recognition. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1215–1223. IEEE (2017)

Tzirakis, P., Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: End-to-end multimodal emotion recognition using deep neural networks. IEEE J. Sel. Topics Signal Process. 11(8), 1301–1309 (2017)CrossRef

Xu, H., Zhang, H., Han, K., Wang, Y., Peng, Y., Li, X.: Learning alignment for multimodal emotion recognition from speech. arXiv preprint arXiv:1909.05645 (2019)

10.

Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: M3er: multiplicative multimodal emotion recognition using facial, textual, and speech cues. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1359–1367 (2020)

11.

Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emoticon: Context-aware multimodal emotion recognition using frege’s principle. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234–14243 (2020)

12.

Gkoumas, D., Li, Q., Lioma, C., Yu, Y., Song, D.: What makes the difference? an empirical comparison of fusion strategies for multimodal language analysis. Inf. Fusion 66, 184–197 (2021)CrossRef

13.

Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

14.

Wang, Y., Shen, Y., Liu, Z., Liang, P.P., Zadeh, A., Morency, L.P.: Words can shift: dynamically adjusting word representations using nonverbal behaviors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7216–7223 (2019)

15.

Guanghui, C., Xiaoping, Z.: Multi-modal emotion recognition by fusing correlation features of speech-visual. IEEE Signal Process. Lett. 28, 533–537 (2021)CrossRef

16.

Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259 (2016)

17.

Rosas, V.P., Mihalcea, R., Morency, L.P.: Multimodal sentiment analysis of Spanish online videos. IEEE Intell. Syst. 28(3), 38–45 (2013)CrossRef

18.

Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2539–2544 (2015)

19.

Fu, Y., Guo, L., Wang, L., Liu, Z., Liu, J., Dang, J.: A sentiment similarity-oriented attention model with multi-task learning for text-based emotion recognition. In: Lokoč, J., Skopal, T., Schoeffmann, K., Mezaris, V., Li, X., Vrochidis, S., Patras, I. (eds.) MMM 2021. LNCS, vol. 12572, pp. 278–289. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_23CrossRef

20.

You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

21.

Tsai, Y., Bai, S., Liang, P.P., Kolter, J.Z., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)

22.

Zhang, Y., Yuan, Y., Feng, Y., Lu, X.: Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 57(8), 5535–5548 (2019)CrossRef

23.

Chen, M., He, X., Yang, J., Zhang, H.: 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process. Lett. 25(10), 1440–1444 (2018)CrossRef

24.

Ghosal, D., Akhtar, M.S., Chauhan, D., Poria, S., Bhattacharyya, P.: Contextual inter-modal attention for multi-modal sentiment analysis. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)

25.

Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)MathSciNet

26.

Rahman, W., et al.: Integrating multimodal information in large pretrained transformers. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, vol. 2020, p. 2359. NIH Public Access (2020)

27.

Zadeh, A.B., Liang, P.P., Poria, S., Cambria, E., Morency, L.P.: Multimodal language analysis in the wild: cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 2236–2246 (2018)

28.

Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

29.

Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of Empirical Methods Natural Language Processing, pp. 1532–1543 (2014)

Title: Non-Uniform Attention Network for Multi-modal Sentiment Analysis
Authors: Binqiang Wang
Gang Dong
Yaqian Zhao
Rengang Li
Qichun Cao
Yinyin Chao
Publisher: Springer International Publishing
Book: MultiMedia Modeling
Print ISBN: 978-3-030-98357-4

Electronic ISBN: 978-3-030-98358-1

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-030-98358-1_48

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner