ABSTRACT
We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that inuence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/.
- L.A. Rowe and R. Jain. ACM SIGMM retreat report on future directions in multimedia research. ACM Transactions on Multimedia Computing, Communications, and Applications, 1(1):3--13, 2005. Google ScholarDigital Library
- S. Sarkar, P.J. Phillips, Z. Liu, I.R. Vega, P. Grother, and K.W. Bowyer. The humanID gait challenge problem: Data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):162--177, 2005. Google ScholarDigital Library
- K. Barnard, L. Martin, B. Funt, and A. Coath. A data set for color research. Color Research & Application, 27(3):147--151, 2002.Google ScholarCross Ref
- P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. Google ScholarDigital Library
- M. Everingham et al. The 2005 pascal visual object classes challenge. In Selected Proceedings of the First PASCAL Challenges Workshop, LNAI. 2006.Google Scholar
- L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594--611, 2006. Google ScholarDigital Library
- A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In ACM Multimedia, New York, USA, 2004. Google ScholarDigital Library
- A.F. Smeaton. Large scale evaluations of multimedia information retrieval: The TRECVid experience. In CIVR, volume 3569 of LNCS, pages 19--27. Springer-Verlag, 2005. Google ScholarDigital Library
- A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarDigital Library
- H.J. Zhang, J.R. Smith, and Q. Tian, editors. Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval. Singapore, 2005. Google Scholar
- R. Lienhart, C. Kuhmunch, and W. Effelsberg. On the detection and recognition of television commercials. In IEEE Conference on Multimedia Computing and Systems, pages 509--516, Ottawa, Canada, 1997. Google ScholarDigital Library
- J.R. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4(3):12--20, 1997. Google ScholarDigital Library
- Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for TV baseball programs. In ACM Multimedia, pages 105--115, Los Angeles, USA, 2000. Google ScholarDigital Library
- M.R. Naphade and T.S. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Transactions on Multimedia, 3(1):141--151, 2001. Google ScholarDigital Library
- A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.Google Scholar
- C.G.M. Snoek, M. Worring, and A.W.M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005. Google ScholarDigital Library
- C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 2006. Google ScholarDigital Library
- C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.Google Scholar
- M. Christel, T. Kanade, M. Mauldin, R. Reddy, M. Sirbu, S. Stevens, and H. Wactlar. Informedia digital video library. Communicationns of the ACM, 38(4):57--58, 1995. Google ScholarDigital Library
- T. Volkmer, J.R. Smith, A.P. Natsev, M. Campbell, and M. Naphade. A web-based system for collaborative annotation of large image and video collections. In ACM Multimedia, Singapore, 2005. Google ScholarDigital Library
- A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In International Conference on Image and Video Retrieval, volume 3115 of LNCS, pages 674--675. Springer-Verlag, 2004.Google ScholarCross Ref
- M.R. Naphade, L. Kennedy, J.R. Kender, S.-F. Chang, J.R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for trecvid 2005. Technical Report RC23612, IBM T.J. Watson Research Center, 2005.Google Scholar
- M. Naphade, J.R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis Large-Scale Concept Ontology for Multimedia. IEEE Multimedia, 13(3):86--91, 2006. Google ScholarDigital Library
- G.M. Quenot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proceedings of the 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.Google Scholar
- J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002. Google ScholarDigital Library
- H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66--73, 1999. Google ScholarDigital Library
- J. Tague-Sutcliffe. The pragmatics of information retrieval experimentation, revisited. Information Processing & Management, 28(4):467--490, 1992. Google ScholarDigital Library
- C. Petersohn. Fraunhofer HHI at TRECVID 2004: Shot boundary detection system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2004.Google Scholar
- K. Walker. Linguistic data consortium, http://www.ldc.upenn.edu/, April 2006. Personal communication.Google Scholar
- P. Over. Trecvid data availability website, April 2006. http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html/.Google Scholar
- C.G.M. Snoek, M. Worring, J.C. van Gemert, J.M. Geusebroek, D.C. Koelma, G.P. Nguyen, O. de Rooij, and F.J. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proc. ACM Multimedia, pages 225--226, Singapore, 2005. Google ScholarDigital Library
- V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.Google Scholar
- J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google Scholar
- M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation, 15(3):348--369, 2004. Google ScholarDigital Library
- J.C. van Gemert, J.M. Geusebroek, C.J. Veenman, C.G.M. Snoek, and A.W.M. Smeulders. Robust scene categorization by learning image statistics in context. In Int'l Workshop on Semantic Learning Applications in Multimedia, in conjunction with CVPR'06, New York, USA, 2006. Google ScholarDigital Library
Index Terms
- The challenge problem for automated detection of 101 semantic concepts in multimedia
Recommendations
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing
This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive ...
Using semantic context for multiple concepts detection in still images
AbstractMultimedia documents indexing systems performances have been improved significantly in recent years, especially after the involvement of deep learning approaches. However, this progress remains insufficient with the evolution of users' needs that ...
Multimedia Grand Challenge 2012
The Multimedia Grand Challenge is a recurring event at the ACM Multimedia Conference series. During this event, delegates from various industries define a number of challenges that they consider of interest from both a business and scientific ...
Comments