Article

Early versus late fusion in semantic video analysis

Authors:
Cees G. M. Snoek

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Marcel Worring

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Arnold W. M. Smeulders

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaNovember 2005Pages 399–402https://doi.org/10.1145/1101149.1101236

Published:06 November 2005Publication History

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Pages 399–402

ABSTRACT

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.

References

A. Amir et al. IBM research TRECVID-2003 video retrieval system. In Proc. TRECVID Workshop, Gaithersburg, USA, 2003.Google Scholar
J. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002. Google ScholarDigital Library
G. Iyengar, H. Nock, and C. Neti. Discriminative model fusion for semantic concept detection and annotation in video. In ACM Multimedia, pages 255--258, Berkeley, USA, 2003. Google ScholarDigital Library
NIST. TREC Video Retrieval Evaluation, 2004. http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google ScholarCross Ref
C. Snoek et al. The MediaMill TRECVID 2004 semantic video search engine. In Proc. TRECVID Workshop, Gaithersburg, USA, 2004.Google Scholar
S. Tsekeridou and I. Pitas. Content-based video parsing and indexing based on audio-visual interaction. IEEE Trans. CSVT, 11(4):522--535, 2001. Google ScholarDigital Library
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NY, USA, 2th edition, 2000. Google ScholarDigital Library
T. Westerveld et al. A probabilistic multimedia retrieval model and its evaluation. EURASIP JASP, (2):186--197, 2003. Google ScholarDigital Library
Y. Wu, E. Chang, K.-C. Chang, and J. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, New York, USA, 2004. Google ScholarDigital Library

Index Terms

Early versus late fusion in semantic video analysis
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

On Comparing Early and Late Fusion Methods
Advances in Computational Intelligence
Abstract
This paper presents a theoretical comparison of early and late fusion methods. An initial discussion on the conditions to apply early or late (soft or hard) fusion is introduced. The analysis show that, if large training sets are available, early ...
Read More
Two-layer similarity fusion model for cover song identification

Various musical descriptors have been developed for Cover Song Identification (CSI). However, different descriptors are based on various assumptions, designed for representing distinct characteristics of music, and often differ in scale and noise level. ...
Read More
Early and Late Fusion Methods for the Automatic Creation of Twitter Lists
ASONAM '12: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Twitter's list feature allows users to organize their followees into groups for easier information access and filtering. However, the percentage of users using lists is very small and most existing lists have only a few members. One reason for this may ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
November 2005
1110 pages
ISBN:1595930442
DOI:10.1145/1101149
General Chairs:
Hongjiang Zhang
Microsoft Research Asia, China
,
Tat-Seng Chua
National University of Singapore, Singapore
,
Program Chairs:
Ralf Steinmetz
Technische Universitat Darmstadt, Germany
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Lynn Wilcox
FXPAL
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
early fusion
late fusion
multimedia understanding
semantic concept detection
Qualifiers
- Article
Conference

Acceptance Rates
MULTIMEDIA '05 Paper Acceptance Rate49of312submissions,16%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 491
  Total Citations
  View Citations
- 3,101
  Total Downloads
- Downloads (Last 12 months)196
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Early versus late fusion in semantic video analysis

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Comparing Early and Late Fusion Methods

Two-layer similarity fusion model for cover song identification

Early and Late Fusion Methods for the Automatic Creation of Twitter Lists