Top

Published in:

2018 | OriginalPaper | Chapter

Deep Neural Network Based Image Captioning

Authors : Anurag Tripathi, Siddharth Srivastava, Ravi Kothari

Published in: Big Data Analytics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Generating a concise natural language description of an image enables a number of applications including fast keyword based search of large image collections. Primarily inspired by deep learning, recent times have witnessed a substantially increased focus on machine based image caption generation. In this paper, we provide a brief review of deep learning based image caption generation along with a brief overview of the datasets and metrics used to evaluate the captioning algorithms. We conclude the paper with some discussion on promising directions for future research.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Adaboost.RT Based Soil N-P-K Prediction Model for Soil and Crop Specific Data: A Predictive Modelling Approach

next chapter Oversample Based Large Scale Support Vector Machine for Online Class Imbalance Problem

Google Scholar. http://scholar.google.com

MSCOCO Image Captioning Challenge (2015). http://cocodataset.org/#captions-leaderboard. Accessed 05 Oct 2018

Anderson, P., Fernando, B., Johnson, M., Gould, S.: SPICE: semantic propositional image caption evaluation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 382–398. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_24CrossRef

Aneja, J., Deshpande, A., Schwing, A.: Convolutional image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5561–5570 (2018)

Banerjee, S., Lavie, A.: Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In: ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Tanslation and/or Summarization, pp. 65–72 (2005)

Baraldi, L., Grana, C., Cucchiara, R.: Hierarchical boundary-aware neural encoder for video captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3185–3194. IEEE (2017)

Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)CrossRef

Dai, B., Fidler, S., Urtasun, R., Lin, D.: Towards diverse and natural image descriptions via a conditional GAN. arXiv preprint arXiv:1703.06029 (2017)

Devlin, J., et al.: Language models for image captioning: the quirks and what works. arXiv preprint arXiv:1505.01809 (2015)

10.

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRef

11.

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M.S.: Deep learning for visual understanding: a review. Neurocomputing 187, 27–48 (2016)CrossRef

12.

Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetCrossRef

13.

Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef

14.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

15.

Kulkarni, G., et al.: Baby talk: understanding and generating image descriptions. In: Proceedings of the 24th CVPR. Citeseer (2011)

16.

Li, D., He, X., Huang, Q., Sun, M.T., Zhang, L.: Generating diverse and accurate visual captions by comparative adversarial learning. arXiv preprint arXiv:1804.00861 (2018)

17.

Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: ACL Text Summarization Branches Out (2004)

18.

Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48CrossRef

19.

Lu, S., Zhu, Y., Zhang, W., Wang, J., Yu, Y.: Neural text generation: past, present and beyond. arXiv preprint arXiv:1803.07133 (2018)

20.

Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

21.

Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: Advances in Neural Information Processing Systems, pp. 1143–1151 (2011)

22.

Pan, J.Y., Yang, H.J., Duygulu, P., Faloutsos, C.: Automatic image captioning. In: IEEE International Conference on Multimedia and Expo, ICME 2004, vol. 3, pp. 1987–1990. IEEE (2004)

23.

Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: GCap: graph-based automatic image captioning. In: Conference on Computer Vision and Pattern Recognition Workshop, CVPRW 2004, p. 146. IEEE (2004)

24.

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

25.

Park, C.C., Kim, B., Kim, G.: Towards personalized image captioning via multimodal memory networks. IEEE Trans. Pattern Anal. Mach. Intell. (2018)

26.

Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: NAACL HLT Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 139–147. Association for Computational Linguistics (2010)

27.

Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 3 (2017)

28.

Rohrbach, A., Rohrbach, M., Schiele, B.: The long-short story of movie description. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 209–221. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_17CrossRef

29.

Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)

30.

Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4566–4575 (2015)

31.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRef

32.

Wang, Q., Chan, A.B.: CNN+ CNN: convolutional decoders for image captioning. arXiv preprint arXiv:1805.09019 (2018)

33.

Wang, X.J., Zhang, L., Jing, F., Ma, W.Y.: AnnoSearch: image auto-annotation by search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1483–1490. IEEE (2006)

34.

Wu, Q., Shen, C., Wang, P., Dick, A., van den Hengel, A.: Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1367–1381 (2018)CrossRef

35.

Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

36.

Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: European Conference on Computer Vision (ECCV), pp. 684–699 (2018)CrossRef

37.

Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: International Conference on Computer Vision (ICCV), pp. 22–29 (2017)

38.

Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)

39.

Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. arXiv preprint arXiv:1609.05473 (2017)

Title: Deep Neural Network Based Image Captioning
Authors: Anurag Tripathi
Siddharth Srivastava
Ravi Kothari
Publisher: Springer International Publishing
Book: Big Data Analytics
Print ISBN: 978-3-030-04779-5

Electronic ISBN: 978-3-030-04780-1

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-030-04780-1_23

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner