Skip to main content
Top

EMAFN: Enhanced Multimodal Alignment and Fusion for Visual Question Answering Networks

  • 2025
  • OriginalPaper
  • Chapter
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter delves into the EMAFN model, a novel approach to enhancing visual question answering (VQA) systems by improving multimodal alignment and fusion. The text explores the integration of visual and textual features, addressing the limitations of current models that often overlook location information. Key topics include the introduction of the Location Attention Module (LAM) and the Attention-based Multimodal Fusion Module (AMFM), which together enhance the model's ability to align and fuse multimodal features effectively. The chapter also discusses the use of a contrast loss function to optimize feature alignment and presents detailed experiments on datasets like VQA-v2 and GQA. Comparisons with state-of-the-art models highlight the superior performance of the EMAFN model, particularly in terms of accuracy and reliability. The conclusion underscores the model's strengths while acknowledging areas for future improvement, such as handling numerical and knowledge-based VQA tasks. Readers will gain insights into advanced techniques for improving VQA systems and the practical implications of the EMAFN model's innovative modules.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 130.000 books
  • more than 540 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Surfaces + Materials Technology
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 75.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials
  • Surfaces + Materials Technology





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 100.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Title
EMAFN: Enhanced Multimodal Alignment and Fusion for Visual Question Answering Networks
Authors
Ke Xu
Yuchen Liu
Chen Liang
Shengrong Zhao
Copyright Year
2025
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-9911-7_1
This content is only visible if you are logged in and have the appropriate permissions.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG