ABSTRACT
Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.
- A. Amir et al. IBM research TRECVID-2003 video retrieval system. In Proc. TRECVID Workshop, Gaithersburg, USA, 2003.Google Scholar
- J. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002. Google ScholarDigital Library
- G. Iyengar, H. Nock, and C. Neti. Discriminative model fusion for semantic concept detection and annotation in video. In ACM Multimedia, pages 255--258, Berkeley, USA, 2003. Google ScholarDigital Library
- NIST. TREC Video Retrieval Evaluation, 2004. http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
- J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google ScholarCross Ref
- C. Snoek et al. The MediaMill TRECVID 2004 semantic video search engine. In Proc. TRECVID Workshop, Gaithersburg, USA, 2004.Google Scholar
- S. Tsekeridou and I. Pitas. Content-based video parsing and indexing based on audio-visual interaction. IEEE Trans. CSVT, 11(4):522--535, 2001. Google ScholarDigital Library
- V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NY, USA, 2th edition, 2000. Google ScholarDigital Library
- T. Westerveld et al. A probabilistic multimedia retrieval model and its evaluation. EURASIP JASP, (2):186--197, 2003. Google ScholarDigital Library
- Y. Wu, E. Chang, K.-C. Chang, and J. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, New York, USA, 2004. Google ScholarDigital Library
Index Terms
- Early versus late fusion in semantic video analysis
Recommendations
On Comparing Early and Late Fusion Methods
Advances in Computational IntelligenceAbstractThis paper presents a theoretical comparison of early and late fusion methods. An initial discussion on the conditions to apply early or late (soft or hard) fusion is introduced. The analysis show that, if large training sets are available, early ...
Two-layer similarity fusion model for cover song identification
Various musical descriptors have been developed for Cover Song Identification (CSI). However, different descriptors are based on various assumptions, designed for representing distinct characteristics of music, and often differ in scale and noise level. ...
Early and Late Fusion Methods for the Automatic Creation of Twitter Lists
ASONAM '12: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)Twitter's list feature allows users to organize their followees into groups for easier information access and filtering. However, the percentage of users using lists is very small and most existing lists have only a few members. One reason for this may ...
Comments