Skip to main content

2019 | OriginalPaper | Buchkapitel

Captioning the Images: A Deep Analysis

verfasst von : Chaitrali P. Chaudhari, Satish Devane

Erschienen in: Computing, Communication and Signal Processing

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Image captioning is one of the fundamental tasks in machine learning since the ability to generate text captions of an image can have a great impact by assisting us in day-to-day life. However, it is not just an object classification or recognition task, because the model must know the dependencies among the recognized objects and their attributes and encode that knowledge correctly in the caption using a natural language like English. Recently, the internet is overwhelmed with the huge amount of textual and visual data consisting of billions of unstructured images and videos. Meaningful captions will serve as useful keys for retrieval, creative searching, and powerful browsing of these images. In this paper, we present the goal of analysis and classification of the recent state-of-the-art in image captioning and discuss significant differences among them. We provide a comparative review of existing models, techniques with their advantages and disadvantages. Future directions in the field of automatic image caption generation are also explored.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1) (1973)CrossRef Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1) (1973)CrossRef
2.
Zurück zum Zitat Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: Proceedings of International Conference on Computer Vision, pp. II:408–415 (2001) Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: Proceedings of International Conference on Computer Vision, pp. II:408–415 (2001)
3.
Zurück zum Zitat Barnard, K., Duyguly, P., Forsyth, D.: Clustering art. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, June 2001 Barnard, K., Duyguly, P., Forsyth, D.: Clustering art. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, June 2001
4.
Zurück zum Zitat Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)MATH Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)MATH
5.
Zurück zum Zitat Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.: Object Recognition as machine Translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of European Conference Computer Vision (2002) Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.: Object Recognition as machine Translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of European Conference Computer Vision (2002)
6.
Zurück zum Zitat Hofmann, T.: Learning and representing topic: a hierarchical mixture model for word occurrence in document databases. In: Proceedings of Workshop on Learning from Text and the Web, CMU (1998) Hofmann, T.: Learning and representing topic: a hierarchical mixture model for word occurrence in document databases. In: Proceedings of Workshop on Learning from Text and the Web, CMU (1998)
7.
Zurück zum Zitat Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: European Conference on Computer Vision (2010) Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: European Conference on Computer Vision (2010)
8.
Zurück zum Zitat Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A., Berg, T.: Baby talk: understanding and generating simple image descriptions. In: Computer Vision and Pattern Recognition (CVPR), vol. 2011, pp. 1601–1608. IEEE (2011) Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A., Berg, T.: Baby talk: understanding and generating simple image descriptions. In: Computer Vision and Pattern Recognition (CVPR), vol. 2011, pp. 1601–1608. IEEE (2011)
9.
Zurück zum Zitat Yang, Y., Teo, C.L., Daume, III.H., Aloimonos, Y.: Corpus-guided sentence generation of natural images. In: Conference on Empirical Methods in Natural Language Processing (2011) Yang, Y., Teo, C.L., Daume, III.H., Aloimonos, Y.: Corpus-guided sentence generation of natural images. In: Conference on Empirical Methods in Natural Language Processing (2011)
10.
Zurück zum Zitat Yatskar, M., Galley, M., Vanderwende, L., Zettlemoyer, L.: See no Evil, say no Evil: description generation from densely labeled images. In: Joint Conference on Lexical and Computation Semantics (2014) Yatskar, M., Galley, M., Vanderwende, L., Zettlemoyer, L.: See no Evil, say no Evil: description generation from densely labeled images. In: Joint Conference on Lexical and Computation Semantics (2014)
11.
Zurück zum Zitat Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Doll ́ar, P., Gao, J., He, X., Mitchell, M., Platt, J., Zitnick, C.L., Zweig, G.: From captions to visual concepts and back. In: IEEE Conference on Computer Vision and Pattern Recognition (2015) Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Doll ́ar, P., Gao, J., He, X., Mitchell, M., Platt, J., Zitnick, C.L., Zweig, G.: From captions to visual concepts and back. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
12.
Zurück zum Zitat Elliott, D., Keller, F.: Image description using visual dependency representations. In Conference on Empirical Methods in Natural Language Processing (2013) Elliott, D., Keller, F.: Image description using visual dependency representations. In Conference on Empirical Methods in Natural Language Processing (2013)
13.
Zurück zum Zitat Elliott, D., de Vries, A.P.: Describing images using inferred visual dependency representations. In: Annual Meeting of the Association for Computational Linguistics (2015) Elliott, D., de Vries, A.P.: Describing images using inferred visual dependency representations. In: Annual Meeting of the Association for Computational Linguistics (2015)
14.
Zurück zum Zitat Ortiz, L.M.G., Wolff, C., Lapata, M.: Learning to interpret and describe abstract scenes. In: Conference of the North American Chapter of the Association of Computational Linguistics (2015) Ortiz, L.M.G., Wolff, C., Lapata, M.: Learning to interpret and describe abstract scenes. In: Conference of the North American Chapter of the Association of Computational Linguistics (2015)
15.
Zurück zum Zitat Lin, D., Fidler, S., Kong, C., Urtasun, R.: Generating multi-sentence natural language descriptions of indoor scenes. In: British Machine Vision Conference (2015) Lin, D., Fidler, S., Kong, C., Urtasun, R.: Generating multi-sentence natural language descriptions of indoor scenes. In: British Machine Vision Conference (2015)
16.
Zurück zum Zitat Li, S., Kulkarni, G., Berg, T., Berg, A., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 220–228. Association for Computational Linguistics (2011) Li, S., Kulkarni, G., Berg, T., Berg, A., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 220–228. Association for Computational Linguistics (2011)
17.
Zurück zum Zitat Kuznetsova, P., Ordonezz, V., Berg, T.L., Choi, Y.: TREETALK: composition and compression of trees for image descriptions. In: Conference on Empirical Methods in Natural Language Processing (2014)CrossRef Kuznetsova, P., Ordonezz, V., Berg, T.L., Choi, Y.: TREETALK: composition and compression of trees for image descriptions. In: Conference on Empirical Methods in Natural Language Processing (2014)CrossRef
18.
Zurück zum Zitat Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A. C., Yamaguchi, K., Berg, T. L., Stratos, K., Daume, III.H.: Midge: generating image descriptions from computer vision detections. In: Conference of the European Chapter of the Association for Computational Linguistics (2012) Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A. C., Yamaguchi, K., Berg, T. L., Stratos, K., Daume, III.H.: Midge: generating image descriptions from computer vision detections. In: Conference of the European Chapter of the Association for Computational Linguistics (2012)
19.
Zurück zum Zitat Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003) Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)
20.
Zurück zum Zitat Berg, T.L., Berg, A.C., Edwards, J., Forsyth, D.A.: Who’s in the Picture? In: Proceedings Neural Information Processing Systems Conference (2004) Berg, T.L., Berg, A.C., Edwards, J., Forsyth, D.A.: Who’s in the Picture? In: Proceedings Neural Information Processing Systems Conference (2004)
21.
Zurück zum Zitat Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Learned-Miller, E., Teh, Y.-W., Forsyth, D.A.: Names and faces in the news. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2004) Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Learned-Miller, E., Teh, Y.-W., Forsyth, D.A.: Names and faces in the news. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2004)
22.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)
23.
Zurück zum Zitat Berg, T.L., Forsyth, D.A.: Animals on the web. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2006) Berg, T.L., Forsyth, D.A.: Animals on the web. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2006)
24.
Zurück zum Zitat Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
25.
Zurück zum Zitat Lampert, C., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009) Lampert, C., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
26.
Zurück zum Zitat Kumar, N., Belhumeur, P., Nayar, S.K.: FaceTracer: a search engine for large collections of images with faces. In: ECCV (2008) Kumar, N., Belhumeur, P., Nayar, S.K.: FaceTracer: a search engine for large collections of images with faces. In: ECCV (2008)
27.
Zurück zum Zitat Kumar, N., Berg, A.C., Belhumeur, P., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009) Kumar, N., Berg, A.C., Belhumeur, P., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009)
28.
Zurück zum Zitat Jie, L., Caputo, B., Ferrari, V.: Who’s doing what: joint modeling of names and verbs for simultaneous face and pose annotation. In: NIPS, editor, Advances in Neural Information Processing Systems, NIPS (2009) Jie, L., Caputo, B., Ferrari, V.: Who’s doing what: joint modeling of names and verbs for simultaneous face and pose annotation. In: NIPS, editor, Advances in Neural Information Processing Systems, NIPS (2009)
29.
Zurück zum Zitat Li, L.-J., Fei-Fei, L.: OPTIMOL: automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88, 147–168 (2009)MathSciNetCrossRef Li, L.-J., Fei-Fei, L.: OPTIMOL: automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88, 147–168 (2009)MathSciNetCrossRef
30.
Zurück zum Zitat Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. In: Proceedings of 11th IEEE International Conference Computer Vision (2007) Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. In: Proceedings of 11th IEEE International Conference Computer Vision (2007)
31.
Zurück zum Zitat Felzenszwalb, P.F., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008) Felzenszwalb, P.F., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
32.
Zurück zum Zitat Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef
33.
Zurück zum Zitat Ordonez, V., Kulkarni, G., Berg, T.L.: Im2text: describing images using 1 million captioned photographs. In: Advances in Neural Information Processing Systems (2011) Ordonez, V., Kulkarni, G., Berg, T.L.: Im2text: describing images using 1 million captioned photographs. In: Advances in Neural Information Processing Systems (2011)
34.
Zurück zum Zitat Kuznetsova, P., Ordonez, V., Berg, A.C., Berg, T.L., Choi, Y.: Collective generation of natural image descriptions. In: Annual Meeting of the Association for Computational Linguistics (2012) Kuznetsova, P., Ordonez, V., Berg, A.C., Berg, T.L., Choi, Y.: Collective generation of natural image descriptions. In: Annual Meeting of the Association for Computational Linguistics (2012)
35.
Zurück zum Zitat Mason, R., Charniak, E.: Nonparametric method for data-driven image captioning. In: Annual Meeting of the Association for Computational Linguistics (2014) Mason, R., Charniak, E.: Nonparametric method for data-driven image captioning. In: Annual Meeting of the Association for Computational Linguistics (2014)
36.
Zurück zum Zitat Patterson, G., Xu, C., Su, H., Hays, J.: The SUN attribute database: beyond categories for deeper scene understanding. Int. J. Comput. Vis. 108 (1–2), 59–81 (2014)CrossRef Patterson, G., Xu, C., Su, H., Hays, J.: The SUN attribute database: beyond categories for deeper scene understanding. Int. J. Comput. Vis. 108 (1–2), 59–81 (2014)CrossRef
37.
Zurück zum Zitat Yagcioglu, S., Erdem, E., Erdem, A., Cakici, R.: A distributed representation based query expansion approach for image captioning. In: Annual Meeting of the Association for Computational Linguistics (2015) Yagcioglu, S., Erdem, E., Erdem, A., Cakici, R.: A distributed representation based query expansion approach for image captioning. In: Annual Meeting of the Association for Computational Linguistics (2015)
38.
Zurück zum Zitat Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., Mitchell, M.: Language models for image captioning: the quirks and what works. In: Annual Meeting of the Association for Computational Linguistics (2015) Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., Mitchell, M.: Language models for image captioning: the quirks and what works. In: Annual Meeting of the Association for Computational Linguistics (2015)
39.
Zurück zum Zitat Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. PAMI 30, 1371–1384 (2008)CrossRef Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. PAMI 30, 1371–1384 (2008)CrossRef
40.
Zurück zum Zitat Hodosh, M., Hockenmaier, J.: Sentence-based image description with scalable, explicit models. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2013) Hodosh, M., Hockenmaier, J.: Sentence-based image description with scalable, explicit models. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2013)
41.
Zurück zum Zitat Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014) Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)
42.
Zurück zum Zitat Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in Neural Information Processing Systems (2014) Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in Neural Information Processing Systems (2014)
43.
Zurück zum Zitat Sun, C., Gan, C., Nevatia, R.: Automatic concept discovery from parallel text and visual corpora. In: International Conference on Computer Vision (2015) Sun, C., Gan, C., Nevatia, R.: Automatic concept discovery from parallel text and visual corpora. In: International Conference on Computer Vision (2015)
44.
Zurück zum Zitat Pinheiro, P., Lebret, R., Collobert, R.: Simple image description generator via a linear phrase-based model. In: International Conference on Learning Representations Workshop (2015) Pinheiro, P., Lebret, R., Collobert, R.: Simple image description generator via a linear phrase-based model. In: International Conference on Learning Representations Workshop (2015)
45.
Zurück zum Zitat Ushiku, Y., Yamaguchi, M., Mukuta, Y., Harada, T.: Common subspace for model and similarity: phrase learning for caption generation from images. In: International Conference on Computer Vision (2015) Ushiku, Y., Yamaguchi, M., Mukuta, Y., Harada, T.: Common subspace for model and similarity: phrase learning for caption generation from images. In: International Conference on Computer Vision (2015)
46.
Zurück zum Zitat Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. In: Advances in Neural Information Processing Systems Deep Learning Workshop (2015) Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. In: Advances in Neural Information Processing Systems Deep Learning Workshop (2015)
47.
Zurück zum Zitat Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on Computer Vision and Pattern Recognition (2015) Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
48.
Zurück zum Zitat Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015) Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
49.
Zurück zum Zitat Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (2015) Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (2015)
50.
Zurück zum Zitat Lebret, R., Pinheiro, P.O., Collobert, R.: Phrase-based image captioning. In: International Conference on Machine Learning (2015) Lebret, R., Pinheiro, P.O., Collobert, R.: Phrase-based image captioning. In: International Conference on Machine Learning (2015)
51.
Zurück zum Zitat Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. In: PAMI (2009) Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. In: PAMI (2009)
52.
Zurück zum Zitat Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRef Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRef
53.
Zurück zum Zitat Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (2006) Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)
54.
Zurück zum Zitat Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2012)CrossRef Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2012)CrossRef
55.
Zurück zum Zitat Verma, Y., Jawahar, C.V.: Im2Text and Text2Im: associating images and texts for cross-modal retrieval. In: British Machine Vision Conference (2014) Verma, Y., Jawahar, C.V.: Im2Text and Text2Im: associating images and texts for cross-modal retrieval. In: British Machine Vision Conference (2014)
56.
Zurück zum Zitat Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)CrossRef Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)CrossRef
57.
Zurück zum Zitat Lenat, D.B.: Cyc: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995PCrossRef Lenat, D.B.: Cyc: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995PCrossRef
Metadaten
Titel
Captioning the Images: A Deep Analysis
verfasst von
Chaitrali P. Chaudhari
Satish Devane
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-1513-8_100

Neuer Inhalt