Skip to main content
Top

2019 | OriginalPaper | Chapter

Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism

Authors : Xuelin Zhu, Biwei Cao, Shuai Xu, Bo Liu, Jiuxin Cao

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, many researchers have focused on the joint visual-textual sentiment analysis since it can better extract user sentiments toward events or topics. In this paper, we propose that visual and textual information should differ in their contribution to sentiment analysis. Our model learns a robust joint visual-textual representation by incorporating a cross-modality attention mechanism and semantic embedding learning based on bidirectional recurrent neural network. Experimental results show that our model outperforms existing the state-of-the-art models in sentiment analysis under real datasets. In addition, we also investigate different proposed model’s variants and analyze the effects of semantic embedding learning and cross-modality attention mechanism in order to provide deeper insight on how these two techniques help the learning of joint visual-textual sentiment classifier.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wang, L., Li, Y., Huang, J., et al.: Learning two-branch neural networks for image-text matching tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2018) Wang, L., Li, Y., Huang, J., et al.: Learning two-branch neural networks for image-text matching tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
2.
go back to reference You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: AAAI, pp. 231–237 (2017) You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: AAAI, pp. 231–237 (2017)
3.
go back to reference Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRef Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRef
4.
go back to reference Chen, X., Wang, Y., Liu, Q.: Visual and textual sentiment analysis using deep fusion convolutional neural networks. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1557–1561. IEEE (2017) Chen, X., Wang, Y., Liu, Q.: Visual and textual sentiment analysis using deep fusion convolutional neural networks. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1557–1561. IEEE (2017)
5.
go back to reference You, Q., Cao, L., Jin, H., et al.: Robust visual-textual sentiment analysis: when attention meets tree-structured recursive neural networks. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1008–1017. ACM (2016) You, Q., Cao, L., Jin, H., et al.: Robust visual-textual sentiment analysis: when attention meets tree-structured recursive neural networks. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1008–1017. ACM (2016)
6.
go back to reference You, Q., Luo, J., Jin, H., et al.: Cross-modality consistent regression for joint visual textual sentiment analysis of social multimedia. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 13–22. ACM (2016) You, Q., Luo, J., Jin, H., et al.: Cross-modality consistent regression for joint visual textual sentiment analysis of social multimedia. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 13–22. ACM (2016)
7.
go back to reference Katsurai, M., Satoh, S.: Image sentiment analysis using latent correlations among visual, textual, and sentiment views. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841. IEEE (2016) Katsurai, M., Satoh, S.: Image sentiment analysis using latent correlations among visual, textual, and sentiment views. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841. IEEE (2016)
8.
go back to reference Cao, D., Ji, R., Lin, D., et al.: A cross-media public sentiment analysis system for microblog. Multimedia Syst. 22(4), 479–486 (2016)CrossRef Cao, D., Ji, R., Lin, D., et al.: A cross-media public sentiment analysis system for microblog. Multimedia Syst. 22(4), 479–486 (2016)CrossRef
9.
go back to reference Yang, Z., Yang, D., Dyer, C., et al.: Hierarchical attention networks for document classification. In: 2016 Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016) Yang, Z., Yang, D., Dyer, C., et al.: Hierarchical attention networks for document classification. In: 2016 Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
10.
go back to reference Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
11.
go back to reference Pang, L., Zhu, S., Ngo, C.W.: Deep multimodal learning for affective analysis and retrieval. IEEE Trans. Multimedia 17(11), 2008–2020 (2015)CrossRef Pang, L., Zhu, S., Ngo, C.W.: Deep multimodal learning for affective analysis and retrieval. IEEE Trans. Multimedia 17(11), 2008–2020 (2015)CrossRef
12.
go back to reference Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015) Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
13.
go back to reference Ma, L., Lu, Z., Shang, L., et al.: Multimodal convolutional neural networks for matching image and sentence. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2623–2631 (2015) Ma, L., Lu, Z., Shang, L., et al.: Multimodal convolutional neural networks for matching image and sentence. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2623–2631 (2015)
14.
go back to reference Campos, V., Salvador, A., Giro-i-Nieto, X., et al.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: Proceedings of the 1st International Workshop on Affect and Sentiment in Multimedia, pp. 57–62. ACM (2015) Campos, V., Salvador, A., Giro-i-Nieto, X., et al.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: Proceedings of the 1st International Workshop on Affect and Sentiment in Multimedia, pp. 57–62. ACM (2015)
15.
go back to reference Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
16.
go back to reference Wang, M., Cao, D., Li, L., et al.: Microblog sentiment analysis based on cross-media bag-of-words model. In: Proceedings of International Conference on Internet Multi-media Computing and Service, p. 76. ACM (2014) Wang, M., Cao, D., Li, L., et al.: Microblog sentiment analysis based on cross-media bag-of-words model. In: Proceedings of International Conference on Internet Multi-media Computing and Service, p. 76. ACM (2014)
17.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate arXiv preprint arXiv:​1409.​0473 (2014)
18.
go back to reference Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
19.
go back to reference You, Q., Luo, J.: Towards social imagematics: sentiment analysis in social multimedia. In: Proceedings of the Thirteenth International Workshop on Multimedia Data Mining, p. 3. ACM (2013) You, Q., Luo, J.: Towards social imagematics: sentiment analysis in social multimedia. In: Proceedings of the Thirteenth International Workshop on Multimedia Data Mining, p. 3. ACM (2013)
20.
go back to reference Chen, T., Lu, D., Kan, M.Y., et al.: Understanding and classifying image tweets. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 781–784. ACM (2013) Chen, T., Lu, D., Kan, M.Y., et al.: Understanding and classifying image tweets. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 781–784. ACM (2013)
21.
go back to reference Socher, R., Perelygin, A., Wu, J., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013) Socher, R., Perelygin, A., Wu, J., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
22.
go back to reference Borth, D., Ji, R., Chen, T., et al.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 223–232. ACM (2013) Borth, D., Ji, R., Chen, T., et al.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 223–232. ACM (2013)
23.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing systems, pp. 1097–1105 (2012)
Metadata
Title
Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism
Authors
Xuelin Zhu
Biwei Cao
Shuai Xu
Bo Liu
Jiuxin Cao
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-05710-7_22