nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

ADVISE: Symbolism and External Knowledge for Decoding Advertisements

verfasst von : Keren Ye, Adriana Kovashka

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better understand the meaning of an ad. We further show how anchoring ad understanding in general-purpose object recognition and image captioning improves results. We formulate the ad understanding task as matching the ad image to human-generated statements that describe the action that the ad prompts, and the rationale it provides for taking this action. Our proposed method outperforms the state of the art on this task, and on an alternative formulation of question-answering on ads. We show additional applications of our learned representations for matching ads to slogans, and clustering ads according to their topic, without extra training.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Deep Component Analysis via Alternating Direction Neural Networks

Nur mit Berechtigung zugänglich

Non-weighted/weighted mean-pooling of word embeddings achieved 2.45/2.47 rank. The last hidden layer of an LSTM achieved 2.74 rank, while non-weighted/weighted averaging of the hidden layers achieved 2.43/2.46, respectively. Lower is better.

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

Anne Hendricks, L., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

Antol, S., et al.: VQA: visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015

Bylinskii, Z., et al.: Understanding infographics through textual and visual tag prediction. arXiv preprint arXiv:1709.09215 (2017)

Cao, Y., Long, M., Wang, J., Liu, S.: Deep visual-semantic quantization for efficient image retrieval. In: CVPR (2017)

Chen, K., Bui, T., Fang, C., Wang, Z., Nevatia, R.: AMC: attention guided multi-modal correlation learning for image search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)

Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.: Show, adapt and tell: adversarial training of cross-domain image captioner. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

10.

Eisenschtat, A., Wolf, L.: Linking image and text with 2-way nets. In: CVPR (2017)

11.

Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improved visual-semantic embeddings. arXiv preprint arXiv:1707.05612 (2017)

12.

Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

13.

Gan, C., Gan, Z., He, X., Gao, J., Deng, L.: Stylenet: generating attractive visual captions with styles. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

14.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)

15.

Goo, W., Kim, J., Kim, G., Hwang, S.J.: Taxonomy-regularized semantic deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 86–101. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_6CrossRef

16.

He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

17.

Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)

18.

Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)CrossRef

19.

Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

20.

Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

21.

Hubert Tsai, Y.H., Huang, L.K., Salakhutdinov, R.: Learning robust visual-semantic embeddings. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

22.

Hussain, Z., et al.: Automatic understanding of image and video advertisements. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

23.

Iyyer, M., et al.: The amazing mysteries of the gutter: drawing inferences between panels in comic book narratives. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

24.

Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: ICCV (2017)

25.

Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

26.

Joo, J., Li, W., Steen, F.F., Zhu, S.C.: Visual persuasion: Inferring communicative intents of images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

27.

Joo, J., Steen, F.F., Zhu, S.C.: Automated facial trait judgment and election outcome prediction: social dimensions of face. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3712–3720 (2015)

28.

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

29.

Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., Farhadi, A.: A diagram is worth a dozen images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 235–251. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_15CrossRef

30.

Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. In: TACL (2015)

31.

Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

32.

Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Carlos Niebles, J.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

33.

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef

34.

Leigh, J.H., Gabel, T.G.: Symbolic interactionism: its effects on consumer behaviour and implications for marketing strategy. J. Serv. Mark. 6(3), 5–16 (1992)CrossRef

35.

Levy, S.J.: Symbols for sale. Harv. Bus. Rev. 37(4), 117–124 (1959)

36.

Li, X., Hu, D., Lu, X.: Image2song: song retrieval via bridging image content and lyric words. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

37.

Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48CrossRef

38.

Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2CrossRef

39.

Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)

40.

Mai, L., Jin, H., Lin, Z., Fang, C., Brandt, J., Liu, F.: Spatial-semantic image search by visual feature synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

41.

Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015

42.

Marino, K., Salakhutdinov, R., Gupta, A.: The more you know: using knowledge graphs for image classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

43.

Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

44.

Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: ICCV (2017)

45.

Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: CVPR (2017)

46.

Niu, Z., Zhou, M., Wang, L., Gao, X., Hua, G.: Hierarchical multimodal LSTM for dense visual-semantic embedding. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

47.

Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.: Areas of attention for image captioning. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

48.

Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

49.

Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

50.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

51.

Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.: Joint image-text representation by gaussian visual-semantic embedding. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 207–211. ACM (2016)

52.

Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)

53.

Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015)

54.

Scott, L.M.: Images in advertising: the need for a theory of visual rhetoric. J. Consum. Res. 21(2), 252–273 (1994)CrossRef

55.

Shetty, R., Rohrbach, M., Anne Hendricks, L., Fritz, M., Schiele, B.: Speaking the same language: matching machine to human captions by adversarial training. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

56.

Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)

57.

Spears, N.E., Mowen, J.C., Chakraborty, G.: Symbolic role of animals in print advertising: content analysis and conceptual development. J. Bus. Res. 37(2), 87–95 (1996)CrossRef

58.

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017). https://arxiv.org/abs/1602.07261

59.

Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: Movieqa: understanding stories in movies through question-answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

60.

Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: learnings from the 2017 challenge. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

61.

Vedantam, R., Bengio, S., Murphy, K., Parikh, D., Chechik, G.: Context-aware captions from context-agnostic supervision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

62.

Venugopalan, S., Anne Hendricks, L., Rohrbach, M., Mooney, R., Darrell, T., Saenko, K.: Captioning images with diverse objects. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

63.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

64.

Wang, P., Wu, Q., Shen, C., Dick, A., van den Hengel, A.: FVQA: fact-based visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. (2017)

65.

Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

66.

Wu, Q., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

67.

Xu, H., Saenko, K.: Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 451–466. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_28CrossRef

68.

Yang, L., Tang, K., Yang, J., Li, L.J.: Dense captioning with joint inference and visual context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

69.

Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)

70.

You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

71.

Young, C.E.: The Advertising Research Handbook. Ideas in Flight (2005)

72.

Yu, L., Park, E., Berg, A.C., Berg, T.L.: Visual madlibs: fill in the blank description generation and question answering. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015

73.

Yu, Y., Choi, J., Kim, Y., Yoo, K., Lee, S.H., Kim, G.: Supervising neural attention models for video captioning by human gaze data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

74.

Yu, Y., Ko, H., Choi, J., Kim, G.: End-to-end concept word detection for video captioning, retrieval, and question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

75.

Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017

76.

Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

77.

Zhu, Y., Lim, J.J., Fei-Fei, L.: Knowledge acquisition for visual question answering via iterative querying. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

Titel: ADVISE: Symbolism and External Knowledge for Decoding Advertisements
verfasst von: Keren Ye
Adriana Kovashka
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01266-3

Electronic ISBN: 978-3-030-01267-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01267-0_51

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner