nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

Image Caption with Prior Knowledge Graph and Heterogeneous Attention

verfasst von : Junjie Wang, Wenfeng Huang

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2023

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Currently, most image description models are limited in their ability to generate descriptions that reflect personal experiences and subjective perspectives. This makes it difficult to produce relevant and engaging descriptions that truly capture the essence of the image. To address this issue, we propose a novel approach called Subject-awareness-driven Heterogeneous Attention (SCHA). SCHA leverages users’ knowledge and expertise to generate content-adaptive image descriptions that are more human-like and reflective of personal experiences. Our approach involves a carefully designed heterogeneous cascade annotation model that captures scene information from multiple perspectives. We also incorporate a prior knowledge graph with textual information to enhance the richness and relevance of the generated descriptions. Our method has great potential for industrial production detection and can open up new possibilities for increasing the flexibility and variety of detection steps. When compared to the results of MSCOCO and Visual Genome datasets, our approach produces richer and more adaptive descriptions than widely used baseline models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Generalisation Approach for Banknote Authentication by Mobile Devices Trained on Incomplete Samples

Nächstes Kapitel Image Captioning for Nantong Blue Calico Through Stacked Local-Global Channel Attention Network

Baltrusaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2017)CrossRef

Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Computer Science (2014)

Mori, Y., Fukui, H., Hirakawa, T., Nishiyama, J., Fujiyoshi, H.: Attention neural baby talk: captioning of risk factors while driving. In: 2019 IEEE Intelligent Transportation Systems Conference - ITSC (2019)

Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention, pp. 2048–2057 (2015)

Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning (2016)

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Computer Science (2014)

Shetty, R., Rohrbach, M., Hendricks, L.A., Fritz, M., Schiele, B.: Speaking the same language: matching machine to human captions by adversarial training. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

Wang, Q., Chan, A.B.: Describing like humans: on diversity in image captioning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

10.

Gan, C., Gan, Z., He, X., Gao, J., Deng, L.: Stylenet: generating attractive visual captions with styles. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)

11.

Guo, L., Liu, J., Yao, P., Li, J., Lu, H.: Mscap: multi-style image captioning with unpaired stylized text. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

12.

Zhang, P., et al.: Training efficient saliency prediction models with knowledge distillation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 512–520 (2019)

13.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator (2015)

14.

Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123, 32–73 (2017)MathSciNetCrossRef

15.

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)

16.

Gan, Z., Gan, C., He, X., Pu, Y., Deng, L.: Semantic compositional networks for visual captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

17.

Wu, Q., Shen, C., Liu, L., Dick, A., Hengel, A.V.D.: What value do explicit high level concepts have in vision to language problems? In: Computer Science, pp. 203–212 (2015)

18.

Zhou, Y., Wang, M., Liu, D., Hu, Z., Zhang, H.: More grounded image captioning by distilling image-text matching model. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

19.

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering (2017)

20.

Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.P.: Toward controlled generation of text (2017)

21.

Moratelli, N., Barraco, M., Morelli, D., Cornia, M., Baraldi, L., Cucchiara, R.: Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates. Sensors 23(3), 1286 (2023)CrossRef

22.

Javanmardi, S., Latif, A.M., Sadeghi, M.T., Jahanbanifard, M., Bonsangue, M., Verbeek, F.J.: Caps captioning: a modern image captioning approach based on improved capsule network. Sensors 22(21), 8376 (2022)CrossRef

23.

Mathews, L.X.A., He, X.: Semstyle: learning to generate stylised image captions using unaligned text. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8591–8600 (2019)

24.

Xie, L., Mathews, A.P., He, X.: Senticap: generating image descriptions with sentiments. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

25.

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 664–676 (2016)

26.

Ward, T., Papineni, K., Roukos, S., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)

27.

Banerjee, S., Lavie, A.: Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

28.

Webber, B., Byron, D.: Proceedings of the 2004 ACL Workshop on Discourse Annotation (2004)

29.

Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: consensus-based image description evaluation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

30.

Anderson, P., Fernando, B., Johnson, M., Gould, S.: SPICE: semantic propositional image caption evaluation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 382–398. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_24CrossRef

31.

Batra, D., Aneja, J., Agrawal, H., Schwing, A.: Sequential latent spaces for modeling the intention during diverse image captioning. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

32.

Deshpande, A., Aneja, J., Wang, L., Schwing, A.G., Forsyth, D.: Fast, diverse and accurate image captioning guided by part-of-speech. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Titel: Image Caption with Prior Knowledge Graph and Heterogeneous Attention
verfasst von: Junjie Wang
Wenfeng Huang
Verlag: Springer Nature Switzerland
Buch: Artificial Neural Networks and Machine Learning – ICANN 2023
Print ISBN: 978-3-031-44209-4

Electronic ISBN: 978-3-031-44210-0

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-44210-0_28

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner