Skip to main content

2020 | OriginalPaper | Buchkapitel

Dense Captioning Using Abstract Meaning Representation

verfasst von : Antonio M. S. Almeida Neto, Helena M. Caseli, Tiago A. Almeida

Erschienen in: Intelligent Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The world around us is composed of images that often need to be translated into words. This translation can take place in parts, converting regions of the image into textual descriptions what is also known as dense captioning. By doing so, the information present in this region is converted into words expressing the way objects relate to each other. Computational models have been proposed to perform this task in a similar way to human beings, mainly using deep neural networks. As the same region of the image can be described in several different forms, this study proposes to use the Abstract Meaning Representation (AMR) in the generation of descriptions for a given region. We hypothesize that by using AMR it would be possible to extract the meaning of the text and, as a consequence, improve the quality of the sentences produced by the models. AMR was investigated as a semantic representation formalism evolving the so far proposed models that are based only on purely natural language. The results show that the models trained with sentences in the form of AMR led to better descriptions and the performance achieved was superior in almost all evaluations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Image from Visual Genome Dataset [10].
 
2
The success of syntactic banks is due to the fact that unifying the various tasks in a single process allowed the use of a single tool. An example of a classic syntactic bank is the Penn Treebank.
 
3
The value before the slash indicates the total of unique tokens (types) and the value after the slash, the total number of occurrences of those tokens.
 
Literatur
1.
Zurück zum Zitat Abend, O., Rappoport, A.: The state of the art in semantic representation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 77–89 (2017) Abend, O., Rappoport, A.: The state of the art in semantic representation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 77–89 (2017)
2.
Zurück zum Zitat Banarescu, L., et al.: Abstract meaning representation for sembanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186 (2013) Banarescu, L., et al.: Abstract meaning representation for sembanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186 (2013)
3.
Zurück zum Zitat Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics, Ann Arbor, June 2005. https://www.aclweb.org/anthology/W05-0909 Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics, Ann Arbor, June 2005. https://​www.​aclweb.​org/​anthology/​W05-0909
4.
Zurück zum Zitat Cai, S., Knight, K.: Smatch: an evaluation metric for semantic feature structures. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 748–752. Association for Computational Linguistics, Sofia, August 2013. https://www.aclweb.org/anthology/P13-2131 Cai, S., Knight, K.: Smatch: an evaluation metric for semantic feature structures. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 748–752. Association for Computational Linguistics, Sofia, August 2013. https://​www.​aclweb.​org/​anthology/​P13-2131
5.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2014, pp. 580–587. IEEE Computer Society, Washington, DC (2014). https://doi.org/10.1109/CVPR.2014.81 Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2014, pp. 580–587. IEEE Computer Society, Washington, DC (2014). https://​doi.​org/​10.​1109/​CVPR.​2014.​81
6.
Zurück zum Zitat Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
7.
Zurück zum Zitat Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
11.
Zurück zum Zitat Liao, K., Lebanoff, L., Liu, F.: Abstract Meaning Representation for multi-document summarization. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1178–1190. Association for Computational Linguistics, Santa Fe, August 2018. https://www.aclweb.org/anthology/C18-1101 Liao, K., Lebanoff, L., Liu, F.: Abstract Meaning Representation for multi-document summarization. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1178–1190. Association for Computational Linguistics, Santa Fe, August 2018. https://​www.​aclweb.​org/​anthology/​C18-1101
14.
Zurück zum Zitat Lyu, C., Titov, I.: AMR parsing as graph prediction with latent alignment. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2018) Lyu, C., Titov, I.: AMR parsing as graph prediction with latent alignment. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2018)
15.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002). https://doi.org/10.3115/1073083.1073135 Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002). https://​doi.​org/​10.​3115/​1073083.​1073135
16.
Zurück zum Zitat Reiter, E., Dale, R.: Building applied natural language generation systems. Nat. Lang. Eng. 3(1), 57–87 (1997)CrossRef Reiter, E., Dale, R.: Building applied natural language generation systems. Nat. Lang. Eng. 3(1), 57–87 (1997)CrossRef
18.
Zurück zum Zitat Song, L., Zhang, Y., Wang, Z., Gildea, D.: A graph-to-sequence model for AMR-to-text generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-18), Melbourne, Australia (2018) Song, L., Zhang, Y., Wang, Z., Gildea, D.: A graph-to-sequence model for AMR-to-text generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-18), Melbourne, Australia (2018)
21.
Zurück zum Zitat Yang, L., Tang, K., Yang, J., Li, L.J.: Dense captioning with joint inference and visual context. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Yang, L., Tang, K., Yang, J., Li, L.J.: Dense captioning with joint inference and visual context. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
22.
Zurück zum Zitat Yin, G., Sheng, L., Liu, B., Yu, N., Wang, X., Shao, J.: Context and attribute grounded dense captioning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 Yin, G., Sheng, L., Liu, B., Yu, N., Wang, X., Shao, J.: Context and attribute grounded dense captioning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Metadaten
Titel
Dense Captioning Using Abstract Meaning Representation
verfasst von
Antonio M. S. Almeida Neto
Helena M. Caseli
Tiago A. Almeida
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-61377-8_31