Top

Published in:

2017 | OriginalPaper | Chapter

Automatic Image Description Generation with Emotional Classifiers

Authors : Yan Sun, Bo Ren

Published in: Computer Vision

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Automatically generating a natural sentence describing the content of an image is a hot issue in artificial intelligence which links the domain of computer vision and language processing. Most of the existing works leverage large object recognition datasets and external text corpora to integrate knowledge between similar concepts. As current works aim to figure out ‘what is it’, ‘where is it’ and ‘what is it dong’ in images, we focus on a less considered but critical concept: ‘how it feels’ in the content of images. We propose to express feelings contained in images via a more direct and vivid way. To achieve this goal, we extend a pre-trained caption model with an emotion classifier to add abstract knowledge to the original caption. We practice our method on datasets originated from various domains. Especially, we examine our method on the newly constructed SentiCap dataset with multiple evaluation metrics. The results show that the newly generated descriptions can summarize the images vividly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Fast Method for Scene Text Detection

Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM (2013)

Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: visual sentiment concept classification with deep convolutional neural networks. In: CVPR (2014)

Chen, T., Yu, F.X., Chen, J., Cui, Y., Chen, Y.Y., Chang, S.F.: Object-based visual sentiment concept analysis and application. In: ACM MM (2014)

Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollar, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. In: CVPR (2015)

Chen, X., Zitnick, C.L.: Learning a recurrent visual representation for image caption. In: CoRR (2014)

Chen, X., Zitnick, C.L.: Mind’s eye: a recurrent visual representation for image caption generation. In: CVPR (2015)

Chen, Y.Y., Chen, T., Hsu, W.H., Liao, H.Y.M., Chang, S.F.: Predicting viewer affective comments based on image content in social media. In: ICMR (2014)

Deemter, K.V., Sluis, I.V.D., Gatt, A.: Building a semantically transparent corpus for the generation of referring expressions. In: INLG (2006)

Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Darrell, T., Saenko, K.: Long-term recurrent convolutional networks for visual recognition and description. Elsevier (1988)

10.

Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010)

11.

FitzGerald, N., Artzi, Y., Zettlemoyer, L.S.: Learning distributions over logical forms for referring expression generation. In: EMNLP (2013)

12.

Girshick, R.: Fast R-CNN. In: ICCV (2015)

13.

Golland, D., Liang, P., Dan, K.: A game-theoretic approach to generating spatial descriptions. In: EMNLP (2010)

14.

Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: CVPR (2016)

15.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

16.

Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., Darrell, T.: Natural language object retrieval. In: CVPR (2016)

17.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. In: CVPR (2014)

18.

Johnson, J., Karpathy, A., Li, F.F.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)

19.

Karpathy, A., Joulin, A., Li, F.F.: Deep fragment embeddings for bidirectional image sentence mapping. Adv. Neural Inf. Process. Syst. 3, 1889–1897 (2014)

20.

Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)

21.

Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.: Referitgame: referring to objects in photographs of natural scenes. In: EMNLP (2014)

22.

Kuznetsova, P., Ordonez, V., Berg, A.C., Berg, T.L., Choi, Y.: Collective generation of natural image descriptions. In: ACL: Long Papers (2012)

23.

Lang, P., Bradley, M., Cuthbert, B.: International affective picture system (IAPS): technical manual and affective ratings. Technical report, University of Florida, Gainesville

24.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

25.

Lynch, C., Aryafar, K., Attenberg, J.: Unifying visual-semantic embeddings with multimodal neural language models. In: TACL (2015)

26.

Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM MM (2010)

27.

Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A., Murphy, K.: Generation and comprehension of unambiguous object descriptions. In: CVPR (2016)

28.

Mao, J., Wei, X., Yang, Y., Wang, J.: Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: ICCV (2015)

29.

Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-RNN). In: ICLR (2015)

30.

Mathews, A., Xie, L., He, X.: SentiCap: generating image descriptions with sentiments. In: AAAI (2016)

31.

Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005)CrossRef

32.

Mitchell, M., Deemter, K.V., Reiter, E.: Natural reference to objects in a visual domain. In: INLG (2010)

33.

Morina, N., Leibold, E., Ehring, T.: Vividness of general mental imagery is associated with the occurrence of intrusive memories. J. Behav. Ther. Exp. Psychiatry 44(2), 221–226 (2013)CrossRef

34.

Papineni, K.: BLEU: a method for automatic evaluation of machine translation. Wirel. Netw. 4(4), 307–318 (2002)

35.

Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: NAACL HLT Workshop (2010)

36.

Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: consensus-based image description evaluation. In: CoRR (2015)

37.

Viethen, J., Dale, R.: The use of spatial relations in referring expression generation. In: INLG (2010)

38.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015)

39.

Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: ICIP (2008)

40.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: ICCV (2015)

41.

Yao, B.Z., Yang, X., Lin, L., Lee, M.W., Zhu, S.C.: I2T: image parsing to text description. In: Proceedings of the IEEE, vol. 98, no. 8, pp. 1485–1508 (2010)

42.

You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016)

43.

You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: AAAI (2015)

44.

You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: AAAI (2016)

45.

Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. In: ACL (2014)

46.

Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM MM (2014)

Title: Automatic Image Description Generation with Emotional Classifiers
Authors: Yan Sun
Bo Ren
Publisher: Springer Singapore
Book: Computer Vision
Print ISBN: 978-981-10-7298-7

Electronic ISBN: 978-981-10-7299-4

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-981-10-7299-4_63

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner