Skip to main content
Top

2025 | OriginalPaper | Chapter

Diffusion-Based Multimodal Video Captioning

Authors : Jaakko Kainulainen, Zixin Guo, Jorma Laaksonen

Published in: Computer Vision – ACCV 2024

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The chapter delves into the complex task of video captioning, which involves summarizing video content through natural language sentences. It discusses the challenges of integrating multimodal information and the benefits of using diffusion models for generative tasks. The proposed Multimodal-Diffusion-Network (MM-Diff-Net) is introduced, which leverages diffusion processes to generate captions by fusing video, audio, speech transcripts, and generated descriptions. The model's architecture and experimental results on various datasets are presented, showcasing its competitive performance and the advantages of incorporating generated descriptions for enhanced caption quality.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 102.000 books
  • more than 537 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 67.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 67.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Literature
This content is only visible if you are logged in and have the appropriate permissions.
Metadata
Title
Diffusion-Based Multimodal Video Captioning
Authors
Jaakko Kainulainen
Zixin Guo
Jorma Laaksonen
Copyright Year
2025
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-0908-6_9

Premium Partner