Skip to main content
Top
Published in: Pattern Recognition and Image Analysis 3/2020

01-07-2020 | SPECIAL ISSUE

Ontological Approach to Image Captioning Evaluation

Authors: D. Shunkevich, N. Iskra

Published in: Pattern Recognition and Image Analysis | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The paper considers the ontology of the existing metrics widely used for image captioning task evaluation. It is shown how the ontological approach provides more natural and resilient way to image captioning quality assurance in comparison with machine translation metrics variations. Another important problem, discussed in the paper, is the information support for researchers in the field of image captioning.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Z. Hossain, F. Sohel, M.F. Shiratuddin, and H. Laga, “A comprehensive survey of deep learning for image captioning,” ACM Comput. Surv. 51 (6), Article No. 118, 1–36 (2019). Z. Hossain, F. Sohel, M.F. Shiratuddin, and H. Laga, “A comprehensive survey of deep learning for image captioning,” ACM Comput. Surv. 51 (6), Article No. 118, 1–36 (2019).
2.
go back to reference X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, “Microsoft COCO captions: Data collection and evaluation server,” arXiv preprint arXiv:1504.00325 (2015). X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, “Microsoft COCO captions: Data collection and evaluation server,” arXiv preprint arXiv:1504.00325 (2015).
3.
go back to reference R. Krishna, Y. Zhu, O. Groth, et al., “Visual Genome: Connecting language and vision using crowdsourced dense image annotations,” Int. J. Comput. Vision 123 (1), 32–73 (2017).MathSciNetCrossRef R. Krishna, Y. Zhu, O. Groth, et al., “Visual Genome: Connecting language and vision using crowdsourced dense image annotations,” Int. J. Comput. Vision 123 (1), 32–73 (2017).MathSciNetCrossRef
4.
go back to reference K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proc. 40th Annual Meeting of the Association for Computational Linguistics (ACL) (Philadelphia, PA, USA, 2002), pp. 311–318. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proc. 40th Annual Meeting of the Association for Computational Linguistics (ACL) (Philadelphia, PA, USA, 2002), pp. 311–318.
5.
go back to reference C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Proc. ACL-04 Workshop (Barcelona, Spain, 2004), pp. 74–81. C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Proc. ACL-04 Workshop (Barcelona, Spain, 2004), pp. 74–81.
6.
go back to reference S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proc. ACL-05 Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (Ann Arbor, MI, USA, 2005), pp. 65–72. S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proc. ACL-05 Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (Ann Arbor, MI, USA, 2005), pp. 65–72.
7.
go back to reference R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based image description evaluation,” in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, 2015), pp. 4566–4575. R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based image description evaluation,” in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, 2015), pp. 4566–4575.
8.
go back to reference P. Anderson, B. Fernando, M. Johnson, and S. Gould, “SPICE: Semantic Propositional Image Caption Evaluation,” in Computer Vision − ECCV 2016, Proc. 14th European Conference, Part V, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9909, pp. 382–398. P. Anderson, B. Fernando, M. Johnson, and S. Gould, “SPICE: Semantic Propositional Image Caption Evaluation,” in Computer VisionECCV 2016, Proc. 14th European Conference, Part V, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9909, pp. 382–398.
9.
go back to reference S. Liu, Z. Zhu, N. Ye, S. Guadarrama, and K. Murphy, “Improved image captioning via policy gradient optimization of SPIDEr,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017) (Venice, Italy, 2017), pp. 873–881. S. Liu, Z. Zhu, N. Ye, S. Guadarrama, and K. Murphy, “Improved image captioning via policy gradient optimization of SPIDEr,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017) (Venice, Italy, 2017), pp. 873–881.
10.
go back to reference M. Kilickaya, A. Erdem, N. Ikizler-Cinbis, and E. Erdem, “Re-evaluating automatic metrics for image captioning,” arXiv preprint arXiv:1612.07600 (2016). M. Kilickaya, A. Erdem, N. Ikizler-Cinbis, and E. Erdem, “Re-evaluating automatic metrics for image captioning,” arXiv preprint arXiv:1612.07600 (2016).
11.
go back to reference V. V. Golenkov and N. A. Gulyakina, “Project of open semantic technology of the componential design of intelligent systems. Part 2: Unified design models,” Ontologiya Proektirovaniya (Ontology Des.), No. 4 (14), 34–53 (2014) [in Russian]. V. V. Golenkov and N. A. Gulyakina, “Project of open semantic technology of the componential design of intelligent systems. Part 2: Unified design models,” Ontologiya Proektirovaniya (Ontology Des.), No. 4 (14), 34–53 (2014) [in Russian].
12.
go back to reference I. Davydenko, “Semantic models, method and tools of knowledge bases coordinated development based on reusable components,” Otkrytye Semanticheskie Tekhnologii Proektirovaniya Intellektual’nykh System (Open Semantic Technol. Intell. Syst.), Issue 2, pp. 99–118 (2018). I. Davydenko, “Semantic models, method and tools of knowledge bases coordinated development based on reusable components,” Otkrytye Semanticheskie Tekhnologii Proektirovaniya Intellektual’nykh System (Open Semantic Technol. Intell. Syst.), Issue 2, pp. 99–118 (2018).
13.
go back to reference D. V. Shunkevich, “Agent-oriented models, methods and tools of compatible problem solvers development for intelligent systems,” Otkrytye Semanticheskie Tekhnologii Proektirovaniya Intellektual’nykh System (Open Semantic Technol. Intell. Syst.), Issue 2, pp. 119–132 (2018). D. V. Shunkevich, “Agent-oriented models, methods and tools of compatible problem solvers development for intelligent systems,” Otkrytye Semanticheskie Tekhnologii Proektirovaniya Intellektual’nykh System (Open Semantic Technol. Intell. Syst.), Issue 2, pp. 119–132 (2018).
14.
go back to reference IMS metasystem. [Online resource]. Available at: http://ims.ostis.net/. IMS metasystem. [Online resource]. Available at: http://​ims.​ostis.​net/​.​
15.
go back to reference J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 3242–3250. J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 3242–3250.
16.
go back to reference Z. Gan, C. Gan, X. He, Y. Pu, K. Tran, J. Gao, L. Carin, and L. Deng, “Semantic compositional networks for visual captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 5630–5639. Z. Gan, C. Gan, X. He, Y. Pu, K. Tran, J. Gao, L. Carin, and L. Deng, “Semantic compositional networks for visual captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 5630–5639.
17.
go back to reference L. Zhang, F. Sung, F. Liu, T. Xiang, S. Gong, Y. Yang, and T. M. Hospedales, “Actor-critic sequence training for image captioning,” arXiv preprint arXiv:1706.09601 (2017). L. Zhang, F. Sung, F. Liu, T. Xiang, S. Gong, Y. Yang, and T. M. Hospedales, “Actor-critic sequence training for image captioning,” arXiv preprint arXiv:1706.09601 (2017).
18.
go back to reference S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Self-critical sequence training for image captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 7008–7024. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Self-critical sequence training for image captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 7008–7024.
19.
go back to reference J. Gu, G. Wang, J. Cai, and T. Chen. “An empirical study of language CNN for image captioning,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017) (Venice, Italy, 2017), pp. 1222–1231. J. Gu, G. Wang, J. Cai, and T. Chen. “An empirical study of language CNN for image captioning,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017) (Venice, Italy, 2017), pp. 1222–1231.
20.
go back to reference P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. “Bottom-up and top-down attention for image captioning and visual question answering,” in Proc.2018IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) (Salt Lake City, UT, USA 2018), pp. 6077–6086. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. “Bottom-up and top-down attention for image captioning and visual question answering,” in Proc.2018IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) (Salt Lake City, UT, USA 2018), pp. 6077–6086.
Metadata
Title
Ontological Approach to Image Captioning Evaluation
Authors
D. Shunkevich
N. Iskra
Publication date
01-07-2020
Publisher
Pleiades Publishing
Published in
Pattern Recognition and Image Analysis / Issue 3/2020
Print ISSN: 1054-6618
Electronic ISSN: 1555-6212
DOI
https://doi.org/10.1134/S1054661820030256

Other articles of this Issue 3/2020

Pattern Recognition and Image Analysis 3/2020 Go to the issue

MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION, AND UNDERSTANDING

Radius Nearest Neighbour Based Feature Classification for Occlusion Handling

ARTIFICIAL INTELLIGENCE TECHNIQUES IN PATTERN RECOGNITION AND IMAGE ANALYSIS

Hierarchization of Topical Texts Based on the Estimate of Proximity to the Semantic Pattern without Paraphrasing

MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION, AND UNDERSTANDING

Probabilistic Decision Based Improved Trimmed Median Filter to Remove High-Density Salt and Pepper Noise

Premium Partner