Skip to main content
Erschienen in: International Journal of Computer Vision 10/2019

01.10.2019

LCEval: Learned Composite Metric for Caption Evaluation

verfasst von: Naeha Sharif, Lyndon White, Mohammed Bennamoun, Wei Liu, Syed Afaq Ali Shah

Erschienen in: International Journal of Computer Vision | Ausgabe 10/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system level, they fail to do so at the caption level. In this work, we propose a neural network-based learned metric to improve the caption-level caption evaluation. To get a deeper insight into the parameters which impact a learned metric’s performance, this paper investigates the relationship between different linguistic features and the caption-level correlation of the learned metrics. We also compare metrics trained with different training examples to measure the variations in their evaluation. Moreover, we perform a robustness analysis, which highlights the sensitivity of learned and handcrafted metrics to various sentence perturbations. Our empirical analysis shows that our proposed metric not only outperforms the existing metrics in terms of caption-level correlation but it also shows a strong system-level correlation against human assessments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: A system for large-scale machine learning. OSDI., 16, 265–283. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: A system for large-scale machine learning. OSDI., 16, 265–283.
Zurück zum Zitat Aditya, S., Yang, Y., Baral, C., Aloimonos, Y., & Fermüller, C. (2017). Image understanding using vision and reasoning through scene description graph. Computer Vision and Image Understanding, 173, 33–45.CrossRef Aditya, S., Yang, Y., Baral, C., Aloimonos, Y., & Fermüller, C. (2017). Image understanding using vision and reasoning through scene description graph. Computer Vision and Image Understanding, 173, 33–45.CrossRef
Zurück zum Zitat Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016) Spice: Semantic propositional image caption evaluation. In European conference on computer vision (pp. 382–398). Springer. Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016) Spice: Semantic propositional image caption evaluation. In European conference on computer vision (pp. 382–398). Springer.
Zurück zum Zitat Banerjee, S., & Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65–72). Banerjee, S., & Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65–72).
Zurück zum Zitat Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.CrossRef Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.CrossRef
Zurück zum Zitat Bojar, O., Graham, Y., Kamran, A., & Stanojević, M. (2016). Results of the wmt16 metrics shared task. In Proceedings of the first conference on machine translation: volume 2, shared task papers (vol. 2, pp. 199–231) Bojar, O., Graham, Y., Kamran, A., & Stanojević, M. (2016). Results of the wmt16 metrics shared task. In Proceedings of the first conference on machine translation: volume 2, shared task papers (vol. 2, pp. 199–231)
Zurück zum Zitat Bojar, O., Helcl, J., Kocmi, T., Libovickỳ, J., Musil, T. (2017). Results of the wmt17 neural MT training task. In Proceedings of the second conference on machine translation (pp. 525–533) Bojar, O., Helcl, J., Kocmi, T., Libovickỳ, J., Musil, T. (2017). Results of the wmt17 neural MT training task. In Proceedings of the second conference on machine translation (pp. 525–533)
Zurück zum Zitat Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., et al. (2015a). Microsoft coco captions: Data collection and evaluation server. arXiv:1504.00325. Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., et al. (2015a). Microsoft coco captions: Data collection and evaluation server. arXiv:​1504.​00325.
Zurück zum Zitat Chen, X., Fang, H., Lin, T. Y., Vedantam, R., Gupta, S., Dollár, P. et al. (2015b). Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325. Chen, X., Fang, H., Lin, T. Y., Vedantam, R., Gupta, S., Dollár, P. et al. (2015b). Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:​1504.​00325.
Zurück zum Zitat Corston-Oliver, S., Gamon, M., Brockett, C. (2001). A machine learning approach to the automatic evaluation of machine translation. In Proceedings of the 39th annual meeting on association for computational linguistics (pp. 148–155). Association for Computational Linguistics. Corston-Oliver, S., Gamon, M., Brockett, C. (2001). A machine learning approach to the automatic evaluation of machine translation. In Proceedings of the 39th annual meeting on association for computational linguistics (pp. 148–155). Association for Computational Linguistics.
Zurück zum Zitat Cui, Y., Yang, G., Veit, A., Huang, X., & Belongie, S. (2018). Learning to evaluate image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5804–5812). Cui, Y., Yang, G., Veit, A., Huang, X., & Belongie, S. (2018). Learning to evaluate image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5804–5812).
Zurück zum Zitat Dancey, C. P., & Reidy, J. (2004). Statistics without maths for psychology. Harlow: Prentice Hall. Dancey, C. P., & Reidy, J. (2004). Statistics without maths for psychology. Harlow: Prentice Hall.
Zurück zum Zitat Denkowski, M., & Lavie, A. (2014). Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation (pp. 376–380). Denkowski, M., & Lavie, A. (2014). Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation (pp. 376–380).
Zurück zum Zitat Elliott, D., & Keller, F. (2014). Comparing automatic evaluation measures for image description. In Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers) Elliott, D., & Keller, F. (2014). Comparing automatic evaluation measures for image description. In Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers)
Zurück zum Zitat Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., & Hockenmaier, J. et al. (2010). Every picture tells a story: Generating sentences from images. In European conference on computer vision (pp. 15–29). Springer. Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., & Hockenmaier, J. et al. (2010). Every picture tells a story: Generating sentences from images. In European conference on computer vision (pp. 15–29). Springer.
Zurück zum Zitat Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 249–256 (2010) Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 249–256 (2010)
Zurück zum Zitat Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.CrossRef Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.CrossRef
Zurück zum Zitat Hodosh, M., & Hockenmaier, J. (2016). Focused evaluation for image description with binary forced-choice tasks. In Proceedings of the 5th workshop on vision and language (pp. 19–28). Hodosh, M., & Hockenmaier, J. (2016). Focused evaluation for image description with binary forced-choice tasks. In Proceedings of the 5th workshop on vision and language (pp. 19–28).
Zurück zum Zitat Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47, 853–899.MathSciNetCrossRefMATH Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47, 853–899.MathSciNetCrossRefMATH
Zurück zum Zitat Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128–3137). Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128–3137).
Zurück zum Zitat Karpathy, A., Joulin, A., & Fei-Fei, L. F. (2014). Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems (pp. 1889–1897). Karpathy, A., Joulin, A., & Fei-Fei, L. F. (2014). Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems (pp. 1889–1897).
Zurück zum Zitat Khosrovian, K., Pfahl, D., & Garousi, V. (2008). Gensim 2.0: A customizable process simulation model for software process evaluation. In International conference on software process (pp. 294–306). Springer. Khosrovian, K., Pfahl, D., & Garousi, V. (2008). Gensim 2.0: A customizable process simulation model for software process evaluation. In International conference on software process (pp. 294–306). Springer.
Zurück zum Zitat Kilickaya, M., Erdem, A., Ikizler-Cinbis, N., & Erdem, E. (2016). Re-evaluating automatic metrics for image captioning. arXiv preprint arXiv:1612.07600. Kilickaya, M., Erdem, A., Ikizler-Cinbis, N., & Erdem, E. (2016). Re-evaluating automatic metrics for image captioning. arXiv preprint arXiv:​1612.​07600.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Zurück zum Zitat Kulesza, A., & Shieber, S. M. (2004). A learning approach to improving sentence-level MT evaluation. In Proceedings of the 10th international conference on theoretical and methodological issues in machine translation (pp. 75–84). Kulesza, A., & Shieber, S. M. (2004). A learning approach to improving sentence-level MT evaluation. In Proceedings of the 10th international conference on theoretical and methodological issues in machine translation (pp. 75–84).
Zurück zum Zitat Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al. (2013). Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2891–2903.CrossRef Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al. (2013). Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2891–2903.CrossRef
Zurück zum Zitat Lin, C. Y. (2004) Rouge: A package for automatic evaluation of summaries. Text summarization branches out. Lin, C. Y. (2004) Rouge: A package for automatic evaluation of summaries. Text summarization branches out.
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., & Ramanan, D. et al. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., & Ramanan, D. et al. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Zurück zum Zitat Liu, D., & Gildea, D. (2005). Syntactic features for evaluation of machine translation. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 25–32). Liu, D., & Gildea, D. (2005). Syntactic features for evaluation of machine translation. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 25–32).
Zurück zum Zitat Liu, S., Zhu, Z., Ye, N., Guadarrama, S., & Murphy, K. (2016). Improved image captioning via policy gradient optimization of spider. arXiv preprint arXiv:1612.00370. Liu, S., Zhu, Z., Ye, N., Guadarrama, S., & Murphy, K. (2016). Improved image captioning via policy gradient optimization of spider. arXiv preprint arXiv:​1612.​00370.
Zurück zum Zitat Lu, J., Xiong, C., Parikh, D., & Socher, R. (2017). Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (vol. 6). Lu, J., Xiong, C., Parikh, D., & Socher, R. (2017). Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (vol. 6).
Zurück zum Zitat Ma, Q., Bojar, O., Graham, Y. (2018). Results of the wmt18 metrics shared task: Both characters and embeddings achieve good performance. In Proceedings of the third conference on machine translation: shared task papers (pp. 671–688). Ma, Q., Bojar, O., Graham, Y. (2018). Results of the wmt18 metrics shared task: Both characters and embeddings achieve good performance. In Proceedings of the third conference on machine translation: shared task papers (pp. 671–688).
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119). Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
Zurück zum Zitat Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., & Berg, A. et al. (2012). Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th conference of the european chapter of the association for computational linguistics (pp. 747–756). Association for Computational Linguistics. Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., & Berg, A. et al. (2012). Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th conference of the european chapter of the association for computational linguistics (pp. 747–756). Association for Computational Linguistics.
Zurück zum Zitat Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., et al. (2016). Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, 119(1), 46–59.MathSciNetCrossRef Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., et al. (2016). Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, 119(1), 46–59.MathSciNetCrossRef
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311–318). Association for Computational Linguistics. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311–318). Association for Computational Linguistics.
Zurück zum Zitat Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Zurück zum Zitat Plummer, B. A., Wang, L., Cervantes, C. M., Caicedo, J. C., Hockenmaier, J., & Lazebnik, S. (2015). Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In 2015 IEEE international conference on computer vision (ICCV) (pp. 2641–2649). IEEE. Plummer, B. A., Wang, L., Cervantes, C. M., Caicedo, J. C., Hockenmaier, J., & Lazebnik, S. (2015). Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In 2015 IEEE international conference on computer vision (ICCV) (pp. 2641–2649). IEEE.
Zurück zum Zitat Ritter, S., Long, C., Paperno, D., Baroni, M., Botvinick, M., & Goldberg, A. (2015). Leveraging preposition ambiguity to assess compositional distributional models of semantics. In The fourth joint conference on lexical and computational semantics. Ritter, S., Long, C., Paperno, D., Baroni, M., Botvinick, M., & Goldberg, A. (2015). Leveraging preposition ambiguity to assess compositional distributional models of semantics. In The fourth joint conference on lexical and computational semantics.
Zurück zum Zitat Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., & Schiele, B. (2013). Translating video content to natural language descriptions. In Proceedings of the IEEE international conference on computer vision (pp. 433–440). Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., & Schiele, B. (2013). Translating video content to natural language descriptions. In Proceedings of the IEEE international conference on computer vision (pp. 433–440).
Zurück zum Zitat Sharif, N., White, L., Bennamoun, M., & Shah, S. A. A. (2018a). Nneval: Neural network based evaluation metric. In Proceedings of the 15th European conference on computer vision. Springer Lecture Notes in Computer Science. Sharif, N., White, L., Bennamoun, M., & Shah, S. A. A. (2018a). Nneval: Neural network based evaluation metric. In Proceedings of the 15th European conference on computer vision. Springer Lecture Notes in Computer Science.
Zurück zum Zitat Sharif, N., White, L., Bennamoun, M., Shah, S. A. A. (2018b). Learning-based composite metrics for improved caption evaluation. In Proceedings of ACL 2018, student research workshop (pp. 14–20). Sharif, N., White, L., Bennamoun, M., Shah, S. A. A. (2018b). Learning-based composite metrics for improved caption evaluation. In Proceedings of ACL 2018, student research workshop (pp. 14–20).
Zurück zum Zitat van Miltenburg, E., & Elliott, D. (2017). Room for improvement in automatic image description: An error analysis. arXiv preprint arXiv:1704.04198. van Miltenburg, E., & Elliott, D. (2017). Room for improvement in automatic image description: An error analysis. arXiv preprint arXiv:​1704.​04198.
Zurück zum Zitat Vedantam, R., Lawrence Zitnick, C., & Parikh, D. (2015). Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4566–4575). Vedantam, R., Lawrence Zitnick, C., & Parikh, D. (2015). Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4566–4575).
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In 2015 IEEE conference on Computer vision and pattern recognition (CVPR) (pp. 3156–3164) IEEE. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In 2015 IEEE conference on Computer vision and pattern recognition (CVPR) (pp. 3156–3164) IEEE.
Zurück zum Zitat Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., & Salakhudinov, R. et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057). Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., & Salakhudinov, R. et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).
Zurück zum Zitat Yao, T., Pan, Y., Li, Y., Qiu, Z., & Mei, T. (2016). Boosting image captioning with attributes. OpenReview, 2(5), 8. Yao, T., Pan, Y., Li, Y., Qiu, Z., & Mei, T. (2016). Boosting image captioning with attributes. OpenReview, 2(5), 8.
Zurück zum Zitat You, Q., Jin, H., & Luo, J. (2018). Image captioning at will: A versatile scheme for effectively injecting sentiments into image descriptions. arXiv preprint arXiv:1801.10121. You, Q., Jin, H., & Luo, J. (2018). Image captioning at will: A versatile scheme for effectively injecting sentiments into image descriptions. arXiv preprint arXiv:​1801.​10121.
Zurück zum Zitat You, Q., Jin, H., Wang, Z., Fang, C., & Luo, J. (2016). Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4651–4659). You, Q., Jin, H., Wang, Z., Fang, C., & Luo, J. (2016). Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4651–4659).
Zurück zum Zitat Young, P., Lai, A., Hodosh, M., & Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2, 67–78.CrossRef Young, P., Lai, A., Hodosh, M., & Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2, 67–78.CrossRef
Metadaten
Titel
LCEval: Learned Composite Metric for Caption Evaluation
verfasst von
Naeha Sharif
Lyndon White
Mohammed Bennamoun
Wei Liu
Syed Afaq Ali Shah
Publikationsdatum
01.10.2019
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 10/2019
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01206-z

Weitere Artikel der Ausgabe 10/2019

International Journal of Computer Vision 10/2019 Zur Ausgabe