Skip to main content
Top

11-05-2025 | Original Article

Enhancing visual contextual semantic information for image captioning

Authors: Ronggui Wang, Shuo Li, Lixia Xue, Juan Yang

Published in: International Journal of Machine Learning and Cybernetics

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article explores the integration of feature extraction methods from computer vision with machine translation techniques to enhance visual contextual semantic information for image captioning. It begins by reviewing the evolution of image captioning, from early retrieval-based and template-based methods to the adoption of deep learning techniques, particularly the encoder-decoder structure. The article then introduces a novel dual-feature multi-fusion module based on cross-attention mechanisms, which effectively integrates grid features and segmentation features, providing advanced spatial semantic guidance for grid features. A multi-scale dilated multi-head sparse attention mechanism is also presented, enabling grid features to capture contextual information and semantic dependencies at different scales, thereby generating more effective visual information. The article further discusses the training details and experimental setup, including the use of the MS-COCO dataset and various evaluation metrics. Extensive experiments and comparisons with other models demonstrate the superior performance of the proposed approach, highlighting its effectiveness in generating high-quality image captions. The article concludes with a discussion on future research directions and the potential for further improvements in image captioning technology.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 102.000 books
  • more than 537 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 67.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 67.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Show more products
Literature
This content is only visible if you are logged in and have the appropriate permissions.
Metadata
Title
Enhancing visual contextual semantic information for image captioning
Authors
Ronggui Wang
Shuo Li
Lixia Xue
Juan Yang
Publication date
11-05-2025
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-025-02634-9