ABSTRACT
In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus.
- W. Adams, G. Iyengar, C.-Y. Lin, et. al Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues. Eurasip JASP., 2:170--185, 2003.Google Scholar
- W. H. Adams, A. Amir, C. Dorai, et. al Ibm research TREC-2002 video retrieval system. In E. M. Voorhees and D. K. Harman, editors, Proc. TREC-11, Gaithersburg, MD, 2003. NIST.Google Scholar
- S. F. Chang, W. Chen, and H. Sundaram. Semantic visual templates - linking features to semantics. In Proc. ICIP, volume 3, pages 531--535, Chicago, IL, October 1998. IEEE.Google ScholarCross Ref
- G. Iyengar and A. B. Lippman. Models for automatic classification of video sequences. In Storage and Retrieval from Image and Video Databases, volume VI. SPIE, Jan 1998.Google Scholar
- H. J. Nock, W. H. Adams, and G. Iyengar et. al. User-trainable video annotation using multimodal cues. In Proc. SIGIR, Toronto, Canada, July 2003. ACM. Google ScholarDigital Library
- S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proc. TREC-3, pages 109--126. NIST Special Publication 500-226, 1995.Google Scholar
- J. R. Smith and S.-F. Chang. Visualseek: a fully automated content-based query system. In Proc. fourth intl. conf. multimedia, pages 87--92. ACM, May 1996. Google ScholarDigital Library
- V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, USA, 1995. Google ScholarDigital Library
- N. Vasconcelos and A. Lippman. Bayesian modeling of video editing and structure: Semantic features for video summarization and browsing. In Proc. ICIP, volume 2, pages 550--555, Chicago IL, October 1998. IEEE.Google ScholarCross Ref
- T. Zhang and C. Kuo. An integrated approach to multimodal media content analysis. In Storage and Retrieval from Image and Video Databases, volume 3972, pages 506--517, San Jose, CA, January 2000. SPIE.Google Scholar
Index Terms
- Discriminative model fusion for semantic concept detection and annotation in video
Recommendations
Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaSemantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
User-trainable video annotation using multimodal cues
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalThis paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building ...
Semantic concept detection for video based on extreme learning machine
Semantic concept detection is an important step in concept-based semantic video retrieval, which can be regarded as an intermediate descriptor to bridge the semantic gap. Most existing concept detection methods utilize Support Vector Machines (SVM) as ...
Comments