Skip to main content
Top
Published in: Multimedia Systems 4/2020

25-05-2020 | Regular Paper

A deep learning architecture of RA-DLNet for visual sentiment analysis

Authors: Ashima Yadav, Dinesh Kumar Vishwakarma

Published in: Multimedia Systems | Issue 4/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Visual media has become one of the most potent means of conveying opinions or sentiments on the web. Millions of photos are being uploaded by the people on famous social networking sites for expressing themselves. The area of visual sentiment analysis is abstract in nature due to the high level of biasing in the human recognition process. This work proposes a residual attention-based deep learning network (RA-DLNet), which examines the problem of visual sentiment analysis. We aim to learn the spatial hierarchies of image features using CNN. Since the local regions in the image convey significant sentiments, we apply residual attention model, which focuses on crucial sentiment-rich, local regions in the image. The significant contribution of this work also includes an exhaustive analysis of seven popular CNN-based architectures such as VGG-16, VGG-19, Inception-Resnet-V2, Inception-V3, ResNet-50, Xception, and NASNet. The impact of fine-tuning on these CNN variants is demonstrated in visual sentiment analysis domain. The extensive experiments on eight popular benchmark data sets are conducted and the performance is measured in terms of accuracy. The comparison of accuracy with similar state-of-the-art exhibits the superiority of the proposed work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388. USA (2015) You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388. USA (2015)
2.
go back to reference Ohn-bar, E., Trivedi, M.M.: Multi-scale volumes for deep object detection and localization. Pattern Recogn. 61, 557–572 (2016)CrossRef Ohn-bar, E., Trivedi, M.M.: Multi-scale volumes for deep object detection and localization. Pattern Recogn. 61, 557–572 (2016)CrossRef
3.
go back to reference Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRef Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRef
4.
go back to reference Oquab, M., Bottou, L.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition Learning, pp. 1717–1724. Columbus, OH (2014) Oquab, M., Bottou, L.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition Learning, pp. 1717–1724. Columbus, OH (2014)
5.
go back to reference Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Florida (2009) Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Florida (2009)
6.
go back to reference Chu, B., Madhavan, V., Beijbom, O., Hoffman, J., Darrell, T.: Best practices for fine-tuning visual classifiers to new domains. In: Hua, G., Jégou, H. (eds.) European Conference on Computer Vision, pp. 435–442. Springer, Amsterdam (2016) Chu, B., Madhavan, V., Beijbom, O., Hoffman, J., Darrell, T.: Best practices for fine-tuning visual classifiers to new domains. In: Hua, G., Jégou, H. (eds.) European Conference on Computer Vision, pp. 435–442. Springer, Amsterdam (2016)
7.
go back to reference Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: 21st ACM International Conference on Multimedia, pp. 223–232 (2013) Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: 21st ACM International Conference on Multimedia, pp. 223–232 (2013)
8.
go back to reference Siersdorfer, S., Minack, E., Deng, F., Hare, J.: Analyzing and predicting sentiment of images on the social web. In: 18th ACM International Conference on Multimedia, pp. 715–718 (2010) Siersdorfer, S., Minack, E., Deng, F., Hare, J.: Analyzing and predicting sentiment of images on the social web. In: 18th ACM International Conference on Multimedia, pp. 715–718 (2010)
9.
go back to reference Vonikakis, V., Winkler, S.: Emotion-based sequence of family photos. In: Proceedings of the 20th ACM International conference on Multimedia, pp. 1371–1372 (2012) Vonikakis, V., Winkler, S.: Emotion-based sequence of family photos. In: Proceedings of the 20th ACM International conference on Multimedia, pp. 1371–1372 (2012)
10.
go back to reference Jia, J., Wu, S., Wang, X., Hu, P., Cai, L., Tang, J.: Can we understand van gogh’s mood? Learning to infer affects from images in social networks. In: 20th ACM International Conference on Multimedia, pp. 857–860 (2012) Jia, J., Wu, S., Wang, X., Hu, P., Cai, L., Tang, J.: Can we understand van gogh’s mood? Learning to infer affects from images in social networks. In: 20th ACM International Conference on Multimedia, pp. 857–860 (2012)
11.
go back to reference Li, B., Feng, S., Xiong, W., Hu, W.: Scaring or pleasing: exploit emotional impact of an image. In: 20th ACM International Conference on Multimedia, pp. 1365–1366 (2012) Li, B., Feng, S., Xiong, W., Hu, W.: Scaring or pleasing: exploit emotional impact of an image. In: 20th ACM International Conference on Multimedia, pp. 1365–1366 (2012)
12.
go back to reference Wang, S., Wang, J., Wang, Z., Ji, Q.: Multiple emotion tagging for multimedia data by exploiting high-order dependencies among emotions. IEEE Trans. Multimedia 17(12), 2185–2197 (2015)CrossRef Wang, S., Wang, J., Wang, Z., Ji, Q.: Multiple emotion tagging for multimedia data by exploiting high-order dependencies among emotions. IEEE Trans. Multimedia 17(12), 2185–2197 (2015)CrossRef
13.
go back to reference Yuan, J., You, Q., Mcdonough, S., Luo, J.: Sentribute: image sentiment analysis from a mid-level perspective. In: Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, pp. 1–8. Chicago (2013) Yuan, J., You, Q., Mcdonough, S., Luo, J.: Sentribute: image sentiment analysis from a mid-level perspective. In: Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, pp. 1–8. Chicago (2013)
14.
go back to reference Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: 22nd ACM International Conference on Multimedia, pp. 47–56. Florida (2014) Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: 22nd ACM International Conference on Multimedia, pp. 47–56. Florida (2014)
15.
go back to reference Chen, Y., Chen, T., Liu, T., Liao, H.Y.M., Chang, S.: Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans. Affect. Comput. 6(3), 298–311 (2015)CrossRef Chen, Y., Chen, T., Liu, T., Liao, H.Y.M., Chang, S.: Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans. Affect. Comput. 6(3), 298–311 (2015)CrossRef
16.
go back to reference Chen, F., Ji, R., Su, J., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimedia 20(4), 997–1007 (2018)CrossRef Chen, F., Ji, R., Su, J., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimedia 20(4), 997–1007 (2018)CrossRef
17.
go back to reference Yang, J., She, D., Sun, M., Cheng, M.-M., Rosin, P.L., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimedia 20, 2513–2525 (2018)CrossRef Yang, J., She, D., Sun, M., Cheng, M.-M., Rosin, P.L., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimedia 20, 2513–2525 (2018)CrossRef
18.
go back to reference Xiong, H., Liu, Q., Song, S., Cai, Y.: Region-based convolutional neural network using group sparse regularization for image sentiment classification. EURASIP J. Image Video Process. 30, 1–9 (2019) Xiong, H., Liu, Q., Song, S., Cai, Y.: Region-based convolutional neural network using group sparse regularization for image sentiment classification. EURASIP J. Image Video Process. 30, 1–9 (2019)
19.
go back to reference Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)CrossRef Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)CrossRef
20.
go back to reference Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. Utah (2018) Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. Utah (2018)
21.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. California (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. California (2015)
22.
go back to reference Szegedy, C., Vanhoucke, V., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2015) Szegedy, C., Vanhoucke, V., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2015)
23.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. California (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. California (2015)
24.
go back to reference Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017) Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
25.
go back to reference Campos, V., Salvador, A., Jou, B., Giró-i-nieto, X.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57–62 (2015) Campos, V., Salvador, A., Jou, B., Giró-i-nieto, X.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57–62 (2015)
26.
go back to reference Wang, J., Fu, J., Xu, Y., Mei, T.: Beyond object recognition: visual sentiment analysis with deep coupled adjective and noun neural networks. In: Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 3484–3490. New York (2016) Wang, J., Fu, J., Xu, Y., Mei, T.: Beyond object recognition: visual sentiment analysis with deep coupled adjective and noun neural networks. In: Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 3484–3490. New York (2016)
27.
go back to reference Song, K., Yao, T., Ling, Q., Mei, T.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018)CrossRef Song, K., Yao, T., Ling, Q., Mei, T.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018)CrossRef
28.
go back to reference Islam, J., Zhang, Y.: Visual sentiment analysis for social images using transfer learning approach. In: IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), pp. 124–130. Atlanta (2016) Islam, J., Zhang, Y.: Visual sentiment analysis for social images using transfer learning approach. In: IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), pp. 124–130. Atlanta (2016)
29.
go back to reference Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: 25th ACM International Conference on Multimedia, pp. 217–225. California (2017) Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: 25th ACM International Conference on Multimedia, pp. 217–225. California (2017)
30.
go back to reference Sharma, R., Tan, L.N., Sadat, F.: Multimodal sentiment analysis using deep learning. In: 17th IEEE International Conference on Machine Learning and Applications, pp. 1475–1478 (2018) Sharma, R., Tan, L.N., Sadat, F.: Multimodal sentiment analysis using deep learning. In: 17th IEEE International Conference on Machine Learning and Applications, pp. 1475–1478 (2018)
31.
go back to reference Li, Z., Jiao, Y., Yang, X., Zhang, T., Huang, S.: 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20(10), 2693–2705 (2018)CrossRef Li, Z., Jiao, Y., Yang, X., Zhang, T., Huang, S.: 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20(10), 2693–2705 (2018)CrossRef
32.
go back to reference Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: 31st AAAI Conference on Artificial Intelligence, pp. 4278–4284. Arizona (2016) Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: 31st AAAI Conference on Artificial Intelligence, pp. 4278–4284. Arizona (2016)
33.
go back to reference Li, Z., Fan, Y., Liu, W., Wang, F.: Image sentiment prediction based on textual descriptions with adjective noun pairs. Multimedia Tools Appl. 77(1), 1115–1132 (2017)CrossRef Li, Z., Fan, Y., Liu, W., Wang, F.: Image sentiment prediction based on textual descriptions with adjective noun pairs. Multimedia Tools Appl. 77(1), 1115–1132 (2017)CrossRef
34.
go back to reference Yang, H., Yuan, C., Li, B., Du, Y., Xing, J.: Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019)CrossRef Yang, H., Yuan, C., Li, B., Du, Y., Xing, J.: Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019)CrossRef
35.
go back to reference Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258. Honolulu (2017) Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258. Honolulu (2017)
36.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016)
37.
go back to reference Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258. Las Vegas (2016) Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258. Las Vegas (2016)
38.
go back to reference Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1653–1660. Ohio (2014) Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1653–1660. Ohio (2014)
39.
go back to reference Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. Columbus, Ohio (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. Columbus, Ohio (2014)
40.
go back to reference Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv preprint arXiv:1707.06347 Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv preprint arXiv:​1707.​06347
41.
go back to reference Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM International Conference on Multimedia, pp. 83–92 (2010) Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM International Conference on Multimedia, pp. 83–92 (2010)
42.
go back to reference Wang, X., Jia, J., Yin, J., Cai, L.: Interpretable aesthetic features for affective image classification. In: IEEE International Conference on Image Processing, pp. 3230–3234 (2013) Wang, X., Jia, J., Yin, J., Cai, L.: Interpretable aesthetic features for affective image classification. In: IEEE International Conference on Image Processing, pp. 3230–3234 (2013)
43.
go back to reference Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: IEEE International Conference on Image Processing (ICIP), pp. 634–638. Arizona (2016) Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: IEEE International Conference on Image Processing (ICIP), pp. 634–638. Arizona (2016)
44.
go back to reference Rao, T., Xu, M., Liu, H.: Generating affective maps for images. Multimedia Tools Appl. 77(13), 17247–17267 (2018)CrossRef Rao, T., Xu, M., Liu, H.: Generating affective maps for images. Multimedia Tools Appl. 77(13), 17247–17267 (2018)CrossRef
45.
go back to reference Liu, X., Li, N., Xia, Y.: Affective image classification by jointly using interpretable art features. J. Vis. Commun. Image Represent. 58, 576–588 (2019)CrossRef Liu, X., Li, N., Xia, Y.: Affective image classification by jointly using interpretable art features. J. Vis. Commun. Image Represent. 58, 576–588 (2019)CrossRef
46.
go back to reference Campos, V., Jou, B., Giro-i-Nieto, X.: From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis. Comput. 65, 15–22 (2017)CrossRef Campos, V., Jou, B., Giro-i-Nieto, X.: From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis. Comput. 65, 15–22 (2017)CrossRef
47.
go back to reference Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: International Conference on Learning representations (2017) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: International Conference on Learning representations (2017)
48.
go back to reference Yadav, A., Vishwakarma, D.K.: Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, pp. 1–51 (2019) Yadav, A., Vishwakarma, D.K.: Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, pp. 1–51 (2019)
49.
go back to reference She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: WSCNet: weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia. 22(5), 1358–1371 (2019)CrossRef She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: WSCNet: weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia. 22(5), 1358–1371 (2019)CrossRef
50.
go back to reference Fan, S., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., Zhao, Q.: Emotional attention: a study of image sentiment and visual attention. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7521–7531. Salt Lake (2018) Fan, S., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., Zhao, Q.: Emotional attention: a study of image sentiment and visual attention. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7521–7531. Salt Lake (2018)
51.
go back to reference Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10143–10152. Seoul (2019) Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10143–10152. Seoul (2019)
52.
go back to reference Bawa, V.S., Kumar, V.: "Emotional sentiment analysis for a group of people based on transfer learning with a multi-modal system. Neural Comput. Appl. 31(12), 9061–9072 (2018)CrossRef Bawa, V.S., Kumar, V.: "Emotional sentiment analysis for a group of people based on transfer learning with a multi-modal system. Neural Comput. Appl. 31(12), 9061–9072 (2018)CrossRef
53.
go back to reference Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 3266–3272 (2017) Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 3266–3272 (2017)
54.
go back to reference Zhu, X., Li, L., Zhang, W., Rao, T., Xu, M., Huang, Q., Xu, D.: Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3595–3601 (2017) Zhu, X., Li, L., Zhang, W., Rao, T., Xu, M., Huang, Q., Xu, D.: Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3595–3601 (2017)
55.
go back to reference Yang, J., She, D., Lai, Y.K., Yang, M.H.: Retrieving and classifying affective images via deep metric learning. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 491–498. Louisiana (2018) Yang, J., She, D., Lai, Y.K., Yang, M.H.: Retrieving and classifying affective images via deep metric learning. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 491–498. Louisiana (2018)
56.
go back to reference Zhao, S., Lin, C., Xu, P., Zhao, S., Guo, Y., Krishna, R., Ding, G., Keutzer, K.: CycleEmotionGAN: emotional semantic consistency preserved CycleGAN for adapting image emotions. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 2620–2627. Hawaii (2019) Zhao, S., Lin, C., Xu, P., Zhao, S., Guo, Y., Krishna, R., Ding, G., Keutzer, K.: CycleEmotionGAN: emotional semantic consistency preserved CycleGAN for adapting image emotions. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 2620–2627. Hawaii (2019)
57.
go back to reference Zhang, W., He, X., Lu, W.: Exploring discriminative representations for image emotion recognition with CNNs. IEEE Trans. Multimed. 22(2), 515–523 (2019)CrossRef Zhang, W., He, X., Lu, W.: Exploring discriminative representations for image emotion recognition with CNNs. IEEE Trans. Multimed. 22(2), 515–523 (2019)CrossRef
58.
go back to reference Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks (2014). arXiv preprint arXiv:1410.8586 Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks (2014). arXiv preprint arXiv:​1410.​8586
59.
go back to reference Katsurai, M., Satoh, S.: Image sentiment analysis using latent correlations among visual, textual, and sentiment views. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841 (2016) Katsurai, M., Satoh, S.: Image sentiment analysis using latent correlations among visual, textual, and sentiment views. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841 (2016)
60.
go back to reference He, X., Zhang, H., Li, N., Feng, L., Zheng, F.: A multi-attentive pyramidal model for visual sentiment analysis. In: International Joint Conference on Neural Networks, pp. 1–8 (2019) He, X., Zhang, H., Li, N., Feng, L., Zheng, F.: A multi-attentive pyramidal model for visual sentiment analysis. In: International Joint Conference on Neural Networks, pp. 1–8 (2019)
61.
go back to reference Yang, J., She, D., Lai, Y.K., Rosin, P.L., Yang, M.H.: Weakly supervised coupled networks for visual sentiment analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7584–7592. Salt Lake City (2018) Yang, J., She, D., Lai, Y.K., Rosin, P.L., Yang, M.H.: Weakly supervised coupled networks for visual sentiment analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7584–7592. Salt Lake City (2018)
62.
go back to reference Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos (2016). arXiv preprint arXiv:1606.06259 Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos (2016). arXiv preprint arXiv:​1606.​06259
63.
go back to reference Zadeh, A., Liang, P.P., Vanbriesen, J., Poria, S., Tong, E., Cambria, E., Chen, M., Morency, L.P.: Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2236–2246 (2018) Zadeh, A., Liang, P.P., Vanbriesen, J., Poria, S., Tong, E., Cambria, E., Chen, M., Morency, L.P.: Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2236–2246 (2018)
64.
go back to reference You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp. 308–314. Arizona (2016) You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp. 308–314. Arizona (2016)
65.
go back to reference Dumpala, S.H., Sheikh, I., Chakraborty, R., Kopparapu, S.K.: Sentiment classification on erroneous ASR transcripts: a multi view learning approach. In: IEEE Spoken Language Technology Workshop (SLT 2018), pp. 807–814. Greece (2018) Dumpala, S.H., Sheikh, I., Chakraborty, R., Kopparapu, S.K.: Sentiment classification on erroneous ASR transcripts: a multi view learning approach. In: IEEE Spoken Language Technology Workshop (SLT 2018), pp. 807–814. Greece (2018)
66.
go back to reference Dumpala, S.H., Sheikh, I., Chakraborty, R., Kopparapu, S.K.: Audio-visual fusion for sentiment classification using cross-modal autoencoder. In: 32nd Conference on Neural Information Processing Systems (NIPS 2018), pp. 1–4. Canada (2018) Dumpala, S.H., Sheikh, I., Chakraborty, R., Kopparapu, S.K.: Audio-visual fusion for sentiment classification using cross-modal autoencoder. In: 32nd Conference on Neural Information Processing Systems (NIPS 2018), pp. 1–4. Canada (2018)
67.
go back to reference Chauhan, D.S., Akhtar, M.S., Ekbal, A., Bhattacharyya, P.: Context-aware interactive attention for multi-modal sentiment and emotion analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 5646–5656 (2019) Chauhan, D.S., Akhtar, M.S., Ekbal, A., Bhattacharyya, P.: Context-aware interactive attention for multi-modal sentiment and emotion analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 5646–5656 (2019)
68.
go back to reference Akhtar, M.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 370–379. Minnesota (2019) Akhtar, M.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 370–379. Minnesota (2019)
69.
go back to reference Sun, Z., Sarma, P.K., Sethares, W.A., Liang, Y.: Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: AAAI Conference on Artificial Intelligence (AAAI) (2019) Sun, Z., Sarma, P.K., Sethares, W.A., Liang, Y.: Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: AAAI Conference on Artificial Intelligence (AAAI) (2019)
70.
go back to reference Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017) Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)
71.
go back to reference Chen, M., Wang,S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 163–171 (2017) Chen, M., Wang,S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 163–171 (2017)
72.
go back to reference Li, H., Xu, H.: Video-based sentiment analysis with hvnLBP-TOP feature and bi-LSTM. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 9963–9964. Hawaii (2019) Li, H., Xu, H.: Video-based sentiment analysis with hvnLBP-TOP feature and bi-LSTM. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 9963–9964. Hawaii (2019)
73.
go back to reference Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 5642–5649. Louisiana (2018) Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 5642–5649. Louisiana (2018)
75.
go back to reference Sun, Z., Sarma, P.K., Sethares, W., Bucy, E.P.: Multi-modal sentiment analysis using deep canonical correlation analysis. In: The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1323–1327 (2019) Sun, Z., Sarma, P.K., Sethares, W., Bucy, E.P.: Multi-modal sentiment analysis using deep canonical correlation analysis. In: The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1323–1327 (2019)
Metadata
Title
A deep learning architecture of RA-DLNet for visual sentiment analysis
Authors
Ashima Yadav
Dinesh Kumar Vishwakarma
Publication date
25-05-2020
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 4/2020
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-020-00656-7

Other articles of this Issue 4/2020

Multimedia Systems 4/2020 Go to the issue