Article

The challenge problem for automated detection of 101 semantic concepts in multimedia

Authors:
Cees G. M. Snoek

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Marcel Worring

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Jan C. van Gemert

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Jan-Mark Geusebroek

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Arnold W. M. Smeulders

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

MM '06: Proceedings of the 14th ACM international conference on MultimediaOctober 2006Pages 421–430https://doi.org/10.1145/1180639.1180727

Published:23 October 2006Publication History

MM '06: Proceedings of the 14th ACM international conference on Multimedia

Pages 421–430

ABSTRACT

We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that inuence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/.

References

L.A. Rowe and R. Jain. ACM SIGMM retreat report on future directions in multimedia research. ACM Transactions on Multimedia Computing, Communications, and Applications, 1(1):3--13, 2005. Google ScholarDigital Library
S. Sarkar, P.J. Phillips, Z. Liu, I.R. Vega, P. Grother, and K.W. Bowyer. The humanID gait challenge problem: Data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):162--177, 2005. Google ScholarDigital Library
K. Barnard, L. Martin, B. Funt, and A. Coath. A data set for color research. Color Research & Application, 27(3):147--151, 2002.Google ScholarCross Ref
P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. Google ScholarDigital Library
M. Everingham et al. The 2005 pascal visual object classes challenge. In Selected Proceedings of the First PASCAL Challenges Workshop, LNAI. 2006.Google Scholar
L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594--611, 2006. Google ScholarDigital Library
A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In ACM Multimedia, New York, USA, 2004. Google ScholarDigital Library
A.F. Smeaton. Large scale evaluations of multimedia information retrieval: The TRECVid experience. In CIVR, volume 3569 of LNCS, pages 19--27. Springer-Verlag, 2005. Google ScholarDigital Library
A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarDigital Library
H.J. Zhang, J.R. Smith, and Q. Tian, editors. Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval. Singapore, 2005. Google Scholar
R. Lienhart, C. Kuhmunch, and W. Effelsberg. On the detection and recognition of television commercials. In IEEE Conference on Multimedia Computing and Systems, pages 509--516, Ottawa, Canada, 1997. Google ScholarDigital Library
J.R. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4(3):12--20, 1997. Google ScholarDigital Library
Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for TV baseball programs. In ACM Multimedia, pages 105--115, Los Angeles, USA, 2000. Google ScholarDigital Library
M.R. Naphade and T.S. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Transactions on Multimedia, 3(1):141--151, 2001. Google ScholarDigital Library
A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.Google Scholar
C.G.M. Snoek, M. Worring, and A.W.M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005. Google ScholarDigital Library
C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 2006. Google ScholarDigital Library
C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.Google Scholar
M. Christel, T. Kanade, M. Mauldin, R. Reddy, M. Sirbu, S. Stevens, and H. Wactlar. Informedia digital video library. Communicationns of the ACM, 38(4):57--58, 1995. Google ScholarDigital Library
T. Volkmer, J.R. Smith, A.P. Natsev, M. Campbell, and M. Naphade. A web-based system for collaborative annotation of large image and video collections. In ACM Multimedia, Singapore, 2005. Google ScholarDigital Library
A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In International Conference on Image and Video Retrieval, volume 3115 of LNCS, pages 674--675. Springer-Verlag, 2004.Google ScholarCross Ref
M.R. Naphade, L. Kennedy, J.R. Kender, S.-F. Chang, J.R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for trecvid 2005. Technical Report RC23612, IBM T.J. Watson Research Center, 2005.Google Scholar
M. Naphade, J.R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis Large-Scale Concept Ontology for Multimedia. IEEE Multimedia, 13(3):86--91, 2006. Google ScholarDigital Library
G.M. Quenot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proceedings of the 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.Google Scholar
J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002. Google ScholarDigital Library
H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66--73, 1999. Google ScholarDigital Library
J. Tague-Sutcliffe. The pragmatics of information retrieval experimentation, revisited. Information Processing & Management, 28(4):467--490, 1992. Google ScholarDigital Library
C. Petersohn. Fraunhofer HHI at TRECVID 2004: Shot boundary detection system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2004.Google Scholar
K. Walker. Linguistic data consortium, http://www.ldc.upenn.edu/, April 2006. Personal communication.Google Scholar
P. Over. Trecvid data availability website, April 2006. http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html/.Google Scholar
C.G.M. Snoek, M. Worring, J.C. van Gemert, J.M. Geusebroek, D.C. Koelma, G.P. Nguyen, O. de Rooij, and F.J. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proc. ACM Multimedia, pages 225--226, Singapore, 2005. Google ScholarDigital Library
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.Google Scholar
J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google Scholar
M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation, 15(3):348--369, 2004. Google ScholarDigital Library
J.C. van Gemert, J.M. Geusebroek, C.J. Veenman, C.G.M. Snoek, and A.W.M. Smeulders. Robust scene categorization by learning image statistics in context. In Int'l Workshop on Semantic Learning Applications in Multimedia, in conjunction with CVPR'06, New York, USA, 2006. Google ScholarDigital Library

Index Terms

The challenge problem for automated detection of 101 semantic concepts in multimedia
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing

This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive ...
Read More
Using semantic context for multiple concepts detection in still images
Abstract
Multimedia documents indexing systems performances have been improved significantly in recent years, especially after the involvement of deep learning approaches. However, this progress remains insufficient with the evolution of users' needs that ...
Read More
Multimedia Grand Challenge 2012

The Multimedia Grand Challenge is a recurring event at the ACM Multimedia Conference series. During this event, delegates from various industries define a number of challenges that they consider of interest from both a business and scientific ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '06: Proceedings of the 14th ACM international conference on Multimedia
October 2006
1072 pages
ISBN:1595934472
DOI:10.1145/1180639
General Chairs:
Klara Nahrstedt
UIUC
,
Matthew Turk
UCSB
,
Program Chairs:
Yong Rui
Microsoft Research
,
Wolfgang Klas
Universität Wien
,
Ketan Mayer-Patel
UNC
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
baseline
generic concept detection
video analysis
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 364
  Total Citations
  View Citations
- 1,800
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The challenge problem for automated detection of 101 semantic concepts in multimedia

MM '06: Proceedings of the 14th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing

Using semantic context for multiple concepts detection in still images

Multimedia Grand Challenge 2012