Skip to main content
Erschienen in: Neural Processing Letters 3/2020

14.02.2020

Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism

verfasst von: Zhuanghui Wu, Min Meng, Jigang Wu

Erschienen in: Neural Processing Letters | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, many methods that exploit attention mechanism to discover the relevant local regions via visual attributes, have demonstrated promising performance in visual sentiment prediction. In these methods, accurate detection of visual attributes is of vital importance to identify the sentiment relevant regions, which is crucial for successful assessment of visual sentiment. However, existing work merely utilize basic strategies on convolutional neural network for visual attribute detection and fail to obtain satisfactory results due to the semantic gap between visual features and subjective attributes. Moreover, it is difficult for existing attention models to localize subtle sentiment relevant regions, especially when the performance of attribute detection is relatively poor. To address these problems, we first design a multi-task learning based approach for visual attribute detection. By augmenting the attributes with sentiments supervision, the semantic gap can be effectively reduced. We then develop a multi-attention model for jointly discovering and localizing multiple relevant local regions given predicted attributes. The classifier built on top of these regions achieves a significant improvement in visual sentiment prediction. Experimental results demonstrate the superiority of our method against previous approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alameda-Pineda X, Ricci E, Yan Y, Sebe N (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5240–5248 Alameda-Pineda X, Ricci E, Yan Y, Sebe N (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5240–5248
2.
Zurück zum Zitat Borth D, Chen T, Ji R, Chang SF (2013) SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, pp 459–460 Borth D, Chen T, Ji R, Chang SF (2013) SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, pp 459–460
3.
Zurück zum Zitat Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232 Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232
4.
Zurück zum Zitat Campos V, Salvador A, Giro-i Nieto X, Jou B (2015) Diving deep into sentiment: understanding fine-tuned cnns for visual sentiment prediction. In: Proceedings of the 1st international workshop on affect sentiment in multimedia, pp 57–62 Campos V, Salvador A, Giro-i Nieto X, Jou B (2015) Diving deep into sentiment: understanding fine-tuned cnns for visual sentiment prediction. In: Proceedings of the 1st international workshop on affect sentiment in multimedia, pp 57–62
5.
Zurück zum Zitat Campos V, Jou B, Giró-I-Nieto X (2017) From pixels to sentiment: fine-tuning cnns for visual sentiment prediction. Image Vis Comput 65:15–22CrossRef Campos V, Jou B, Giró-I-Nieto X (2017) From pixels to sentiment: fine-tuning cnns for visual sentiment prediction. Image Vis Comput 65:15–22CrossRef
6.
Zurück zum Zitat Chen T, Borth D, Darrell T, Chang S (2014) DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. CoRR arXiv:1410.8586 Chen T, Borth D, Darrell T, Chang S (2014) DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. CoRR arXiv:​1410.​8586
7.
Zurück zum Zitat Chen YY, Chen T, Liu T, Liao HYM, Chang SF (2015) Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans Affect Comput 6(3):298–311CrossRef Chen YY, Chen T, Liu T, Liao HYM, Chang SF (2015) Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans Affect Comput 6(3):298–311CrossRef
8.
Zurück zum Zitat Einhauser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18.1–26 Einhauser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18.1–26
9.
Zurück zum Zitat Escorcia V, Niebles JC, Ghanem B (2015) On the relationship between visual attributes and convolutional networks. IEEE conference on computer vision and pattern recognition, CVPR 2015, pp 1256–1264 Escorcia V, Niebles JC, Ghanem B (2015) On the relationship between visual attributes and convolutional networks. IEEE conference on computer vision and pattern recognition, CVPR 2015, pp 1256–1264
10.
Zurück zum Zitat Fan S, Ng T, Herberg JS, Koenig BL, Tan CYC, Wang R (2014) An automated estimator of image visual realism based on human cognition. In: 2014 IEEE conference on computer vision and pattern recognition, pp 4201–4208 Fan S, Ng T, Herberg JS, Koenig BL, Tan CYC, Wang R (2014) An automated estimator of image visual realism based on human cognition. In: 2014 IEEE conference on computer vision and pattern recognition, pp 4201–4208
11.
Zurück zum Zitat Fan S, Jiang M, Shen Z, Koenig BL, Kankanhalli MS, Zhao Q (2017) The role of visual attention in sentiment prediction. In: Proceedings of the 2017 ACM on multimedia conference, MM 2017, pp 217–225 Fan S, Jiang M, Shen Z, Koenig BL, Kankanhalli MS, Zhao Q (2017) The role of visual attention in sentiment prediction. In: Proceedings of the 2017 ACM on multimedia conference, MM 2017, pp 217–225
12.
Zurück zum Zitat Gomes CFA, Brainerd CJ, Stein LM (2013) Effects of emotional valence and arousal on recollective and nonrecollective recall. J Exp Psychol Learn Mem Cognit 39(3):663–677CrossRef Gomes CFA, Brainerd CJ, Stein LM (2013) Effects of emotional valence and arousal on recollective and nonrecollective recall. J Exp Psychol Learn Mem Cognit 39(3):663–677CrossRef
13.
Zurück zum Zitat Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett 46(2):1–14CrossRef Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett 46(2):1–14CrossRef
14.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
15.
Zurück zum Zitat Joshi D, Datta R, Fedorovskaya E, Luong QT, Wang JZ, Jia L, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115CrossRef Joshi D, Datta R, Fedorovskaya E, Luong QT, Wang JZ, Jia L, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115CrossRef
16.
Zurück zum Zitat Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd annual ACM conference on multimedia conference, pp 159–168 Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd annual ACM conference on multimedia conference, pp 159–168
17.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, Curran Associates Inc., pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, Curran Associates Inc., pp 1097–1105
18.
Zurück zum Zitat Lei P, Zhu S, Ngo CW (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimedia 17(11):2008–2020CrossRef Lei P, Zhu S, Ngo CW (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimedia 17(11):2008–2020CrossRef
19.
Zurück zum Zitat Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: 13th European conference on computer vision ECCV 2014, pp 740–755 Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: 13th European conference on computer vision ECCV 2014, pp 740–755
20.
Zurück zum Zitat Lu X, Suryanarayan P, Adams RB Jr, Li J, Newman MG, Wang JZ (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 229–238 Lu X, Suryanarayan P, Adams RB Jr, Li J, Newman MG, Wang JZ (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 229–238
21.
Zurück zum Zitat Ma L, Lu Z, Shang L, Li H (2015) Multimodal convolutional neural networks for matching image and sentence. IEEE Int Conf Comput Vis 2015:2623–2631 Ma L, Lu Z, Shang L, Li H (2015) Multimodal convolutional neural networks for matching image and sentence. IEEE Int Conf Comput Vis 2015:2623–2631
22.
Zurück zum Zitat Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, MM ’10, pp 83–92 Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, MM ’10, pp 83–92
23.
Zurück zum Zitat Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Annual conference on neural information processing systems 2014, pp 2204–2212 Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Annual conference on neural information processing systems 2014, pp 2204–2212
24.
Zurück zum Zitat Peng K, Sadovnik A, Gallagher A, Chen T (2016) Where do emotions come from? Predicting the emotion stimuli map. In: 2016 IEEE international conference on image processing (ICIP), pp 614–618 Peng K, Sadovnik A, Gallagher A, Chen T (2016) Where do emotions come from? Predicting the emotion stimuli map. In: 2016 IEEE international conference on image processing (ICIP), pp 614–618
25.
Zurück zum Zitat Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
26.
Zurück zum Zitat Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Annual conference on neural information processing systems 2015, pp 91–99 Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Annual conference on neural information processing systems 2015, pp 91–99
27.
Zurück zum Zitat Ulrike R, Lila D, Radoslav P, Sonya D, Phelps EA (2011) Emotion enhances the subjective feeling of remembering, despite lower accuracy for contextual details. Emotion 11(3):553–562CrossRef Ulrike R, Lila D, Radoslav P, Sonya D, Phelps EA (2011) Emotion enhances the subjective feeling of remembering, despite lower accuracy for contextual details. Emotion 11(3):553–562CrossRef
29.
Zurück zum Zitat Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, pp 2048–2057 Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, pp 2048–2057
30.
Zurück zum Zitat Xun H, Shen C, Boix X, Qi Z (2015) SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE international conference on computer vision (ICCV) Xun H, Shen C, Boix X, Qi Z (2015) SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE international conference on computer vision (ICCV)
31.
Zurück zum Zitat Yang J, She D, Lai YK, Rosin PL, Yang MH (2018) Weakly supervised coupled networks for visual sentiment analysis. In: CVPR Yang J, She D, Lai YK, Rosin PL, Yang MH (2018) Weakly supervised coupled networks for visual sentiment analysis. In: CVPR
32.
Zurück zum Zitat You Q, Luo J, Jin H, Yang J (2015) Joint visual-textual sentiment analysis with deep neural networks. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15, pp 1071–1074 You Q, Luo J, Jin H, Yang J (2015) Joint visual-textual sentiment analysis with deep neural networks. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15, pp 1071–1074
33.
Zurück zum Zitat You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 381–388 You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 381–388
34.
Zurück zum Zitat You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining, pp 13–22 You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining, pp 13–22
35.
Zurück zum Zitat You Q, Jin H, Luo J (2017) Visual sentiment analysis by attending on local image regions. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 231–237 You Q, Jin H, Luo J (2017) Visual sentiment analysis by attending on local image regions. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 231–237
36.
Zurück zum Zitat Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: Image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, WISDOM ’13, pp 10:1–10:8 Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: Image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, WISDOM ’13, pp 10:1–10:8
37.
Zurück zum Zitat Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based cnn for target-dependent sentiment classification. Neural Process Lett 3:1–15 Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based cnn for target-dependent sentiment classification. Neural Process Lett 3:1–15
Metadaten
Titel
Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism
verfasst von
Zhuanghui Wu
Min Meng
Jigang Wu
Publikationsdatum
14.02.2020
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 3/2020
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-020-10201-2

Weitere Artikel der Ausgabe 3/2020

Neural Processing Letters 3/2020 Zur Ausgabe

Neuer Inhalt