Skip to main content
Top

2017 | OriginalPaper | Chapter

Automatic Image Description Generation with Emotional Classifiers

Authors : Yan Sun, Bo Ren

Published in: Computer Vision

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Automatically generating a natural sentence describing the content of an image is a hot issue in artificial intelligence which links the domain of computer vision and language processing. Most of the existing works leverage large object recognition datasets and external text corpora to integrate knowledge between similar concepts. As current works aim to figure out ‘what is it’, ‘where is it’ and ‘what is it dong’ in images, we focus on a less considered but critical concept: ‘how it feels’ in the content of images. We propose to express feelings contained in images via a more direct and vivid way. To achieve this goal, we extend a pre-trained caption model with an emotion classifier to add abstract knowledge to the original caption. We practice our method on datasets originated from various domains. Especially, we examine our method on the newly constructed SentiCap dataset with multiple evaluation metrics. The results show that the newly generated descriptions can summarize the images vividly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM (2013) Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM (2013)
2.
go back to reference Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: visual sentiment concept classification with deep convolutional neural networks. In: CVPR (2014) Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: visual sentiment concept classification with deep convolutional neural networks. In: CVPR (2014)
3.
go back to reference Chen, T., Yu, F.X., Chen, J., Cui, Y., Chen, Y.Y., Chang, S.F.: Object-based visual sentiment concept analysis and application. In: ACM MM (2014) Chen, T., Yu, F.X., Chen, J., Cui, Y., Chen, Y.Y., Chang, S.F.: Object-based visual sentiment concept analysis and application. In: ACM MM (2014)
4.
go back to reference Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollar, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. In: CVPR (2015) Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollar, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. In: CVPR (2015)
5.
go back to reference Chen, X., Zitnick, C.L.: Learning a recurrent visual representation for image caption. In: CoRR (2014) Chen, X., Zitnick, C.L.: Learning a recurrent visual representation for image caption. In: CoRR (2014)
6.
go back to reference Chen, X., Zitnick, C.L.: Mind’s eye: a recurrent visual representation for image caption generation. In: CVPR (2015) Chen, X., Zitnick, C.L.: Mind’s eye: a recurrent visual representation for image caption generation. In: CVPR (2015)
7.
go back to reference Chen, Y.Y., Chen, T., Hsu, W.H., Liao, H.Y.M., Chang, S.F.: Predicting viewer affective comments based on image content in social media. In: ICMR (2014) Chen, Y.Y., Chen, T., Hsu, W.H., Liao, H.Y.M., Chang, S.F.: Predicting viewer affective comments based on image content in social media. In: ICMR (2014)
8.
go back to reference Deemter, K.V., Sluis, I.V.D., Gatt, A.: Building a semantically transparent corpus for the generation of referring expressions. In: INLG (2006) Deemter, K.V., Sluis, I.V.D., Gatt, A.: Building a semantically transparent corpus for the generation of referring expressions. In: INLG (2006)
9.
go back to reference Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Darrell, T., Saenko, K.: Long-term recurrent convolutional networks for visual recognition and description. Elsevier (1988) Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Darrell, T., Saenko, K.: Long-term recurrent convolutional networks for visual recognition and description. Elsevier (1988)
10.
go back to reference Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010) Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010)
11.
go back to reference FitzGerald, N., Artzi, Y., Zettlemoyer, L.S.: Learning distributions over logical forms for referring expression generation. In: EMNLP (2013) FitzGerald, N., Artzi, Y., Zettlemoyer, L.S.: Learning distributions over logical forms for referring expression generation. In: EMNLP (2013)
12.
13.
go back to reference Golland, D., Liang, P., Dan, K.: A game-theoretic approach to generating spatial descriptions. In: EMNLP (2010) Golland, D., Liang, P., Dan, K.: A game-theoretic approach to generating spatial descriptions. In: EMNLP (2010)
14.
go back to reference Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: CVPR (2016) Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: CVPR (2016)
15.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
16.
go back to reference Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., Darrell, T.: Natural language object retrieval. In: CVPR (2016) Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., Darrell, T.: Natural language object retrieval. In: CVPR (2016)
17.
go back to reference Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. In: CVPR (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. In: CVPR (2014)
18.
go back to reference Johnson, J., Karpathy, A., Li, F.F.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016) Johnson, J., Karpathy, A., Li, F.F.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)
19.
go back to reference Karpathy, A., Joulin, A., Li, F.F.: Deep fragment embeddings for bidirectional image sentence mapping. Adv. Neural Inf. Process. Syst. 3, 1889–1897 (2014) Karpathy, A., Joulin, A., Li, F.F.: Deep fragment embeddings for bidirectional image sentence mapping. Adv. Neural Inf. Process. Syst. 3, 1889–1897 (2014)
20.
go back to reference Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015) Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)
21.
go back to reference Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.: Referitgame: referring to objects in photographs of natural scenes. In: EMNLP (2014) Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.: Referitgame: referring to objects in photographs of natural scenes. In: EMNLP (2014)
22.
go back to reference Kuznetsova, P., Ordonez, V., Berg, A.C., Berg, T.L., Choi, Y.: Collective generation of natural image descriptions. In: ACL: Long Papers (2012) Kuznetsova, P., Ordonez, V., Berg, A.C., Berg, T.L., Choi, Y.: Collective generation of natural image descriptions. In: ACL: Long Papers (2012)
23.
go back to reference Lang, P., Bradley, M., Cuthbert, B.: International affective picture system (IAPS): technical manual and affective ratings. Technical report, University of Florida, Gainesville Lang, P., Bradley, M., Cuthbert, B.: International affective picture system (IAPS): technical manual and affective ratings. Technical report, University of Florida, Gainesville
24.
go back to reference Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48 Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://​doi.​org/​10.​1007/​978-3-319-10602-1_​48
25.
go back to reference Lynch, C., Aryafar, K., Attenberg, J.: Unifying visual-semantic embeddings with multimodal neural language models. In: TACL (2015) Lynch, C., Aryafar, K., Attenberg, J.: Unifying visual-semantic embeddings with multimodal neural language models. In: TACL (2015)
26.
go back to reference Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM MM (2010) Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM MM (2010)
27.
go back to reference Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A., Murphy, K.: Generation and comprehension of unambiguous object descriptions. In: CVPR (2016) Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A., Murphy, K.: Generation and comprehension of unambiguous object descriptions. In: CVPR (2016)
28.
go back to reference Mao, J., Wei, X., Yang, Y., Wang, J.: Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: ICCV (2015) Mao, J., Wei, X., Yang, Y., Wang, J.: Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: ICCV (2015)
29.
go back to reference Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-RNN). In: ICLR (2015) Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-RNN). In: ICLR (2015)
30.
go back to reference Mathews, A., Xie, L., He, X.: SentiCap: generating image descriptions with sentiments. In: AAAI (2016) Mathews, A., Xie, L., He, X.: SentiCap: generating image descriptions with sentiments. In: AAAI (2016)
31.
go back to reference Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005)CrossRef Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005)CrossRef
32.
go back to reference Mitchell, M., Deemter, K.V., Reiter, E.: Natural reference to objects in a visual domain. In: INLG (2010) Mitchell, M., Deemter, K.V., Reiter, E.: Natural reference to objects in a visual domain. In: INLG (2010)
33.
go back to reference Morina, N., Leibold, E., Ehring, T.: Vividness of general mental imagery is associated with the occurrence of intrusive memories. J. Behav. Ther. Exp. Psychiatry 44(2), 221–226 (2013)CrossRef Morina, N., Leibold, E., Ehring, T.: Vividness of general mental imagery is associated with the occurrence of intrusive memories. J. Behav. Ther. Exp. Psychiatry 44(2), 221–226 (2013)CrossRef
34.
go back to reference Papineni, K.: BLEU: a method for automatic evaluation of machine translation. Wirel. Netw. 4(4), 307–318 (2002) Papineni, K.: BLEU: a method for automatic evaluation of machine translation. Wirel. Netw. 4(4), 307–318 (2002)
35.
go back to reference Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: NAACL HLT Workshop (2010) Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: NAACL HLT Workshop (2010)
36.
go back to reference Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: consensus-based image description evaluation. In: CoRR (2015) Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: consensus-based image description evaluation. In: CoRR (2015)
37.
go back to reference Viethen, J., Dale, R.: The use of spatial relations in referring expression generation. In: INLG (2010) Viethen, J., Dale, R.: The use of spatial relations in referring expression generation. In: INLG (2010)
38.
go back to reference Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015)
39.
go back to reference Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: ICIP (2008) Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: ICIP (2008)
40.
go back to reference Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: ICCV (2015) Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: ICCV (2015)
41.
go back to reference Yao, B.Z., Yang, X., Lin, L., Lee, M.W., Zhu, S.C.: I2T: image parsing to text description. In: Proceedings of the IEEE, vol. 98, no. 8, pp. 1485–1508 (2010) Yao, B.Z., Yang, X., Lin, L., Lee, M.W., Zhu, S.C.: I2T: image parsing to text description. In: Proceedings of the IEEE, vol. 98, no. 8, pp. 1485–1508 (2010)
42.
go back to reference You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016) You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016)
43.
go back to reference You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: AAAI (2015) You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: AAAI (2015)
44.
go back to reference You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: AAAI (2016) You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: AAAI (2016)
45.
go back to reference Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. In: ACL (2014) Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. In: ACL (2014)
46.
go back to reference Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM MM (2014) Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM MM (2014)
Metadata
Title
Automatic Image Description Generation with Emotional Classifiers
Authors
Yan Sun
Bo Ren
Copyright Year
2017
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-7299-4_63

Premium Partner