Skip to main content

2018 | OriginalPaper | Buchkapitel

ADVISE: Symbolism and External Knowledge for Decoding Advertisements

verfasst von : Keren Ye, Adriana Kovashka

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better understand the meaning of an ad. We further show how anchoring ad understanding in general-purpose object recognition and image captioning improves results. We formulate the ad understanding task as matching the ad image to human-generated statements that describe the action that the ad prompts, and the rationale it provides for taking this action. Our proposed method outperforms the state of the art on this task, and on an alternative formulation of question-answering on ads. We show additional applications of our learned representations for matching ads to slogans, and clustering ads according to their topic, without extra training.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Non-weighted/weighted mean-pooling of word embeddings achieved 2.45/2.47 rank. The last hidden layer of an LSTM achieved 2.74 rank, while non-weighted/weighted averaging of the hidden layers achieved 2.43/2.46, respectively. Lower is better.
 
Literatur
1.
Zurück zum Zitat Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018 Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
2.
Zurück zum Zitat Anne Hendricks, L., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 Anne Hendricks, L., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
3.
Zurück zum Zitat Antol, S., et al.: VQA: visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015 Antol, S., et al.: VQA: visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015
4.
Zurück zum Zitat Bylinskii, Z., et al.: Understanding infographics through textual and visual tag prediction. arXiv preprint arXiv:1709.09215 (2017) Bylinskii, Z., et al.: Understanding infographics through textual and visual tag prediction. arXiv preprint arXiv:​1709.​09215 (2017)
5.
Zurück zum Zitat Cao, Y., Long, M., Wang, J., Liu, S.: Deep visual-semantic quantization for efficient image retrieval. In: CVPR (2017) Cao, Y., Long, M., Wang, J., Liu, S.: Deep visual-semantic quantization for efficient image retrieval. In: CVPR (2017)
6.
Zurück zum Zitat Chen, K., Bui, T., Fang, C., Wang, Z., Nevatia, R.: AMC: attention guided multi-modal correlation learning for image search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Chen, K., Bui, T., Fang, C., Wang, Z., Nevatia, R.: AMC: attention guided multi-modal correlation learning for image search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
7.
Zurück zum Zitat Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016) Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)
8.
Zurück zum Zitat Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.: Show, adapt and tell: adversarial training of cross-domain image captioner. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.: Show, adapt and tell: adversarial training of cross-domain image captioner. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
9.
Zurück zum Zitat Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
10.
Zurück zum Zitat Eisenschtat, A., Wolf, L.: Linking image and text with 2-way nets. In: CVPR (2017) Eisenschtat, A., Wolf, L.: Linking image and text with 2-way nets. In: CVPR (2017)
11.
Zurück zum Zitat Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improved visual-semantic embeddings. arXiv preprint arXiv:1707.05612 (2017) Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improved visual-semantic embeddings. arXiv preprint arXiv:​1707.​05612 (2017)
12.
Zurück zum Zitat Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
13.
Zurück zum Zitat Gan, C., Gan, Z., He, X., Gao, J., Deng, L.: Stylenet: generating attractive visual captions with styles. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Gan, C., Gan, Z., He, X., Gao, J., Deng, L.: Stylenet: generating attractive visual captions with styles. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
14.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
16.
Zurück zum Zitat He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
17.
Zurück zum Zitat Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017) Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:​1703.​07737 (2017)
18.
Zurück zum Zitat Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)CrossRef Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)CrossRef
19.
Zurück zum Zitat Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
20.
Zurück zum Zitat Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
21.
Zurück zum Zitat Hubert Tsai, Y.H., Huang, L.K., Salakhutdinov, R.: Learning robust visual-semantic embeddings. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Hubert Tsai, Y.H., Huang, L.K., Salakhutdinov, R.: Learning robust visual-semantic embeddings. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
22.
Zurück zum Zitat Hussain, Z., et al.: Automatic understanding of image and video advertisements. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Hussain, Z., et al.: Automatic understanding of image and video advertisements. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
23.
Zurück zum Zitat Iyyer, M., et al.: The amazing mysteries of the gutter: drawing inferences between panels in comic book narratives. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Iyyer, M., et al.: The amazing mysteries of the gutter: drawing inferences between panels in comic book narratives. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
24.
Zurück zum Zitat Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: ICCV (2017) Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: ICCV (2017)
25.
Zurück zum Zitat Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
26.
Zurück zum Zitat Joo, J., Li, W., Steen, F.F., Zhu, S.C.: Visual persuasion: Inferring communicative intents of images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014) Joo, J., Li, W., Steen, F.F., Zhu, S.C.: Visual persuasion: Inferring communicative intents of images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
27.
Zurück zum Zitat Joo, J., Steen, F.F., Zhu, S.C.: Automated facial trait judgment and election outcome prediction: social dimensions of face. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3712–3720 (2015) Joo, J., Steen, F.F., Zhu, S.C.: Automated facial trait judgment and election outcome prediction: social dimensions of face. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3712–3720 (2015)
28.
Zurück zum Zitat Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
30.
Zurück zum Zitat Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. In: TACL (2015) Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. In: TACL (2015)
31.
Zurück zum Zitat Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
32.
Zurück zum Zitat Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Carlos Niebles, J.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Carlos Niebles, J.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
33.
Zurück zum Zitat Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef
34.
Zurück zum Zitat Leigh, J.H., Gabel, T.G.: Symbolic interactionism: its effects on consumer behaviour and implications for marketing strategy. J. Serv. Mark. 6(3), 5–16 (1992)CrossRef Leigh, J.H., Gabel, T.G.: Symbolic interactionism: its effects on consumer behaviour and implications for marketing strategy. J. Serv. Mark. 6(3), 5–16 (1992)CrossRef
35.
Zurück zum Zitat Levy, S.J.: Symbols for sale. Harv. Bus. Rev. 37(4), 117–124 (1959) Levy, S.J.: Symbols for sale. Harv. Bus. Rev. 37(4), 117–124 (1959)
36.
Zurück zum Zitat Li, X., Hu, D., Lu, X.: Image2song: song retrieval via bridging image content and lyric words. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Li, X., Hu, D., Lu, X.: Image2song: song retrieval via bridging image content and lyric words. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
39.
Zurück zum Zitat Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017) Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)
40.
Zurück zum Zitat Mai, L., Jin, H., Lin, Z., Fang, C., Brandt, J., Liu, F.: Spatial-semantic image search by visual feature synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Mai, L., Jin, H., Lin, Z., Fang, C., Brandt, J., Liu, F.: Spatial-semantic image search by visual feature synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
41.
Zurück zum Zitat Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015 Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015
42.
Zurück zum Zitat Marino, K., Salakhutdinov, R., Gupta, A.: The more you know: using knowledge graphs for image classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Marino, K., Salakhutdinov, R., Gupta, A.: The more you know: using knowledge graphs for image classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
43.
Zurück zum Zitat Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
44.
Zurück zum Zitat Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: ICCV (2017) Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: ICCV (2017)
45.
Zurück zum Zitat Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: CVPR (2017) Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: CVPR (2017)
46.
Zurück zum Zitat Niu, Z., Zhou, M., Wang, L., Gao, X., Hua, G.: Hierarchical multimodal LSTM for dense visual-semantic embedding. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Niu, Z., Zhou, M., Wang, L., Gao, X., Hua, G.: Hierarchical multimodal LSTM for dense visual-semantic embedding. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
47.
Zurück zum Zitat Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.: Areas of attention for image captioning. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.: Areas of attention for image captioning. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
48.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
49.
Zurück zum Zitat Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
50.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
51.
Zurück zum Zitat Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.: Joint image-text representation by gaussian visual-semantic embedding. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 207–211. ACM (2016) Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.: Joint image-text representation by gaussian visual-semantic embedding. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 207–211. ACM (2016)
52.
Zurück zum Zitat Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007) Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)
53.
Zurück zum Zitat Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015) Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015)
54.
Zurück zum Zitat Scott, L.M.: Images in advertising: the need for a theory of visual rhetoric. J. Consum. Res. 21(2), 252–273 (1994)CrossRef Scott, L.M.: Images in advertising: the need for a theory of visual rhetoric. J. Consum. Res. 21(2), 252–273 (1994)CrossRef
55.
Zurück zum Zitat Shetty, R., Rohrbach, M., Anne Hendricks, L., Fritz, M., Schiele, B.: Speaking the same language: matching machine to human captions by adversarial training. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Shetty, R., Rohrbach, M., Anne Hendricks, L., Fritz, M., Schiele, B.: Speaking the same language: matching machine to human captions by adversarial training. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
56.
Zurück zum Zitat Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016) Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)
57.
Zurück zum Zitat Spears, N.E., Mowen, J.C., Chakraborty, G.: Symbolic role of animals in print advertising: content analysis and conceptual development. J. Bus. Res. 37(2), 87–95 (1996)CrossRef Spears, N.E., Mowen, J.C., Chakraborty, G.: Symbolic role of animals in print advertising: content analysis and conceptual development. J. Bus. Res. 37(2), 87–95 (1996)CrossRef
59.
Zurück zum Zitat Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: Movieqa: understanding stories in movies through question-answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: Movieqa: understanding stories in movies through question-answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
60.
Zurück zum Zitat Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: learnings from the 2017 challenge. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018 Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: learnings from the 2017 challenge. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
61.
Zurück zum Zitat Vedantam, R., Bengio, S., Murphy, K., Parikh, D., Chechik, G.: Context-aware captions from context-agnostic supervision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Vedantam, R., Bengio, S., Murphy, K., Parikh, D., Chechik, G.: Context-aware captions from context-agnostic supervision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
62.
Zurück zum Zitat Venugopalan, S., Anne Hendricks, L., Rohrbach, M., Mooney, R., Darrell, T., Saenko, K.: Captioning images with diverse objects. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Venugopalan, S., Anne Hendricks, L., Rohrbach, M., Mooney, R., Darrell, T., Saenko, K.: Captioning images with diverse objects. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
63.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
64.
Zurück zum Zitat Wang, P., Wu, Q., Shen, C., Dick, A., van den Hengel, A.: FVQA: fact-based visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. (2017) Wang, P., Wu, Q., Shen, C., Dick, A., van den Hengel, A.: FVQA: fact-based visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. (2017)
65.
Zurück zum Zitat Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
66.
Zurück zum Zitat Wu, Q., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 Wu, Q., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
68.
Zurück zum Zitat Yang, L., Tang, K., Yang, J., Li, L.J.: Dense captioning with joint inference and visual context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Yang, L., Tang, K., Yang, J., Li, L.J.: Dense captioning with joint inference and visual context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
69.
Zurück zum Zitat Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016) Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)
70.
Zurück zum Zitat You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
71.
Zurück zum Zitat Young, C.E.: The Advertising Research Handbook. Ideas in Flight (2005) Young, C.E.: The Advertising Research Handbook. Ideas in Flight (2005)
72.
Zurück zum Zitat Yu, L., Park, E., Berg, A.C., Berg, T.L.: Visual madlibs: fill in the blank description generation and question answering. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015 Yu, L., Park, E., Berg, A.C., Berg, T.L.: Visual madlibs: fill in the blank description generation and question answering. In: The IEEE International Conference on Computer Vision (ICCV), Dec 2015
73.
Zurück zum Zitat Yu, Y., Choi, J., Kim, Y., Yoo, K., Lee, S.H., Kim, G.: Supervising neural attention models for video captioning by human gaze data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Yu, Y., Choi, J., Kim, Y., Yoo, K., Lee, S.H., Kim, G.: Supervising neural attention models for video captioning by human gaze data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
74.
Zurück zum Zitat Yu, Y., Ko, H., Choi, J., Kim, G.: End-to-end concept word detection for video captioning, retrieval, and question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Yu, Y., Ko, H., Choi, J., Kim, G.: End-to-end concept word detection for video captioning, retrieval, and question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
75.
Zurück zum Zitat Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017 Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: The IEEE International Conference on Computer Vision (ICCV), Oct 2017
76.
Zurück zum Zitat Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
77.
Zurück zum Zitat Zhu, Y., Lim, J.J., Fei-Fei, L.: Knowledge acquisition for visual question answering via iterative querying. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Zhu, Y., Lim, J.J., Fei-Fei, L.: Knowledge acquisition for visual question answering via iterative querying. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Metadaten
Titel
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
verfasst von
Keren Ye
Adriana Kovashka
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01267-0_51

Premium Partner