Article

Discriminative model fusion for semantic concept detection and annotation in video

Authors:
G. Iyengar

IBM TJ Watson Research Center, NY

IBM TJ Watson Research Center, NY
View Profile

,
H. J. Nock

IBM TJ Watson Research Center, NY

IBM TJ Watson Research Center, NY
View Profile

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on MultimediaNovember 2003Pages 255–258https://doi.org/10.1145/957013.957065

Published:02 November 2003Publication History

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

Pages 255–258

ABSTRACT

In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus.

References

W. Adams, G. Iyengar, C.-Y. Lin, et. al Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues. Eurasip JASP., 2:170--185, 2003.Google Scholar
W. H. Adams, A. Amir, C. Dorai, et. al Ibm research TREC-2002 video retrieval system. In E. M. Voorhees and D. K. Harman, editors, Proc. TREC-11, Gaithersburg, MD, 2003. NIST.Google Scholar
S. F. Chang, W. Chen, and H. Sundaram. Semantic visual templates - linking features to semantics. In Proc. ICIP, volume 3, pages 531--535, Chicago, IL, October 1998. IEEE.Google ScholarCross Ref
G. Iyengar and A. B. Lippman. Models for automatic classification of video sequences. In Storage and Retrieval from Image and Video Databases, volume VI. SPIE, Jan 1998.Google Scholar
H. J. Nock, W. H. Adams, and G. Iyengar et. al. User-trainable video annotation using multimodal cues. In Proc. SIGIR, Toronto, Canada, July 2003. ACM. Google ScholarDigital Library
S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proc. TREC-3, pages 109--126. NIST Special Publication 500-226, 1995.Google Scholar
J. R. Smith and S.-F. Chang. Visualseek: a fully automated content-based query system. In Proc. fourth intl. conf. multimedia, pages 87--92. ACM, May 1996. Google ScholarDigital Library
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, USA, 1995. Google ScholarDigital Library
N. Vasconcelos and A. Lippman. Bayesian modeling of video editing and structure: Semantic features for video summarization and browsing. In Proc. ICIP, volume 2, pages 550--555, Chicago IL, October 1998. IEEE.Google ScholarCross Ref
T. Zhang and C. Kuo. An integrated approach to multimodal media content analysis. In Storage and Retrieval from Image and Video Databases, volume 3972, pages 506--517, San Jose, CA, January 2000. SPIE.Google Scholar

Index Terms

Discriminative model fusion for semantic concept detection and annotation in video
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
Read More
User-trainable video annotation using multimodal cues
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building ...
Read More
Semantic concept detection for video based on extreme learning machine

Semantic concept detection is an important step in concept-based semantic video retrieval, which can be regarded as an intermediate descriptor to bridge the semantic gap. Most existing concept detection methods utilize Support Vector Machines (SVM) as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia
November 2003
670 pages
ISBN:1581137222
DOI:10.1145/957013
General Chairs:
Lawrence Rowe
University of California, Berkeley
,
Harrick Vin
University of Texas, Austin
,
Program Chairs:
Thomas Plagemann
University of Oslo
,
Prashant Shenoy
University of Massachusetts, Amherst
,
John R. Smith
IBM T.J. Watson Research Center
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ACM proceedings
digital video annotation and indexing
semantic concept detection
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 805
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discriminative model fusion for semantic concept detection and annotation in video

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Early versus late fusion in semantic video analysis

User-trainable video annotation using multimodal cues

Semantic concept detection for video based on extreme learning machine