skip to main content
10.1145/1101149.1101236acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Early versus late fusion in semantic video analysis

Published:06 November 2005Publication History

ABSTRACT

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.

References

  1. A. Amir et al. IBM research TRECVID-2003 video retrieval system. In Proc. TRECVID Workshop, Gaithersburg, USA, 2003.Google ScholarGoogle Scholar
  2. J. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Iyengar, H. Nock, and C. Neti. Discriminative model fusion for semantic concept detection and annotation in video. In ACM Multimedia, pages 255--258, Berkeley, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. NIST. TREC Video Retrieval Evaluation, 2004. http://www-nlpir.nist.gov/projects/trecvid/.Google ScholarGoogle Scholar
  5. J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Snoek et al. The MediaMill TRECVID 2004 semantic video search engine. In Proc. TRECVID Workshop, Gaithersburg, USA, 2004.Google ScholarGoogle Scholar
  7. S. Tsekeridou and I. Pitas. Content-based video parsing and indexing based on audio-visual interaction. IEEE Trans. CSVT, 11(4):522--535, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NY, USA, 2th edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Westerveld et al. A probabilistic multimedia retrieval model and its evaluation. EURASIP JASP, (2):186--197, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Wu, E. Chang, K.-C. Chang, and J. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, New York, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Early versus late fusion in semantic video analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
        November 2005
        1110 pages
        ISBN:1595930442
        DOI:10.1145/1101149

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 November 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        MULTIMEDIA '05 Paper Acceptance Rate49of312submissions,16%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader