Skip to main content
Top
Published in:

04-09-2024 | Original Article

GRPIC: an end-to-end image captioning model using three visual features

Authors: Shixin Peng, Can Xiong, Leyuan Liu, Laurence T. Yang, Jingying Chen

Published in: International Journal of Machine Learning and Cybernetics | Issue 3/2025

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article presents GRPIC, an innovative image captioning model that integrates grid, region, and pixel features to achieve superior performance. It utilizes the Swin Transformer for grid features, DETR for region features, and DeepLab for pixel features. The model incorporates relative and absolute positional information and employs an Align Graph for enhanced feature alignment. Extensive experiments on the MS COCO dataset demonstrate the model's effectiveness, achieving state-of-the-art results in image captioning tasks. The inclusion of pixel-level features significantly improves caption accuracy by capturing fine-grained details and contextual information. The model's end-to-end architecture and unique feature fusion approach set it apart from existing methods, showcasing its potential to revolutionize image captioning in various applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 102.000 books
  • more than 537 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 67.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 67.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Show more products
Literature
This content is only visible if you are logged in and have the appropriate permissions.
Metadata
Title
GRPIC: an end-to-end image captioning model using three visual features
Authors
Shixin Peng
Can Xiong
Leyuan Liu
Laurence T. Yang
Jingying Chen
Publication date
04-09-2024
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 3/2025
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-024-02352-8