Skip to main content
Top

Advancing Vision-Language Models with Generative AI

  • 2026
  • OriginalPaper
  • Chapter
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The chapter delves into the recent advancements in vision-language models (VLMs), highlighting the integration of large language models (LLMs) with image encoders to process multimodal data. It explores various models like CLIP, Flamingo, and BLIP, each with unique approaches to multimodal learning. The text discusses the challenges in training these models, including data limitations, computational complexity, and biases. It also covers future directions, such as improving video-based models and developing more robust evaluation methods. The chapter concludes by emphasizing the need for high-quality datasets and better grounding to enhance model performance and reliability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 102.000 books
  • more than 537 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 67.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 67.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Title
Advancing Vision-Language Models with Generative AI
Authors
Rahul Raja
Arpita Vats
Copyright Year
2026
DOI
https://doi.org/10.1007/978-3-032-02853-2_1
This content is only visible if you are logged in and have the appropriate permissions.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG