skip to main content
10.1145/1180639.1180727acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

The challenge problem for automated detection of 101 semantic concepts in multimedia

Published:23 October 2006Publication History

ABSTRACT

We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that inuence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/.

References

  1. L.A. Rowe and R. Jain. ACM SIGMM retreat report on future directions in multimedia research. ACM Transactions on Multimedia Computing, Communications, and Applications, 1(1):3--13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Sarkar, P.J. Phillips, Z. Liu, I.R. Vega, P. Grother, and K.W. Bowyer. The humanID gait challenge problem: Data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):162--177, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Barnard, L. Martin, B. Funt, and A. Coath. A data set for color research. Color Research & Application, 27(3):147--151, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  4. P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Everingham et al. The 2005 pascal visual object classes challenge. In Selected Proceedings of the First PASCAL Challenges Workshop, LNAI. 2006.Google ScholarGoogle Scholar
  6. L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594--611, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In ACM Multimedia, New York, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A.F. Smeaton. Large scale evaluations of multimedia information retrieval: The TRECVid experience. In CIVR, volume 3569 of LNCS, pages 19--27. Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H.J. Zhang, J.R. Smith, and Q. Tian, editors. Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval. Singapore, 2005. Google ScholarGoogle Scholar
  11. R. Lienhart, C. Kuhmunch, and W. Effelsberg. On the detection and recognition of television commercials. In IEEE Conference on Multimedia Computing and Systems, pages 509--516, Ottawa, Canada, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J.R. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4(3):12--20, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for TV baseball programs. In ACM Multimedia, pages 105--115, Los Angeles, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M.R. Naphade and T.S. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Transactions on Multimedia, 3(1):141--151, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.Google ScholarGoogle Scholar
  16. C.G.M. Snoek, M. Worring, and A.W.M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.Google ScholarGoogle Scholar
  19. M. Christel, T. Kanade, M. Mauldin, R. Reddy, M. Sirbu, S. Stevens, and H. Wactlar. Informedia digital video library. Communicationns of the ACM, 38(4):57--58, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Volkmer, J.R. Smith, A.P. Natsev, M. Campbell, and M. Naphade. A web-based system for collaborative annotation of large image and video collections. In ACM Multimedia, Singapore, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In International Conference on Image and Video Retrieval, volume 3115 of LNCS, pages 674--675. Springer-Verlag, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  22. M.R. Naphade, L. Kennedy, J.R. Kender, S.-F. Chang, J.R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for trecvid 2005. Technical Report RC23612, IBM T.J. Watson Research Center, 2005.Google ScholarGoogle Scholar
  23. M. Naphade, J.R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis Large-Scale Concept Ontology for Multimedia. IEEE Multimedia, 13(3):86--91, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G.M. Quenot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proceedings of the 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.Google ScholarGoogle Scholar
  25. J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66--73, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Tague-Sutcliffe. The pragmatics of information retrieval experimentation, revisited. Information Processing & Management, 28(4):467--490, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Petersohn. Fraunhofer HHI at TRECVID 2004: Shot boundary detection system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2004.Google ScholarGoogle Scholar
  29. K. Walker. Linguistic data consortium, http://www.ldc.upenn.edu/, April 2006. Personal communication.Google ScholarGoogle Scholar
  30. P. Over. Trecvid data availability website, April 2006. http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html/.Google ScholarGoogle Scholar
  31. C.G.M. Snoek, M. Worring, J.C. van Gemert, J.M. Geusebroek, D.C. Koelma, G.P. Nguyen, O. de Rooij, and F.J. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proc. ACM Multimedia, pages 225--226, Singapore, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.Google ScholarGoogle Scholar
  34. J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google ScholarGoogle Scholar
  35. M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation, 15(3):348--369, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J.C. van Gemert, J.M. Geusebroek, C.J. Veenman, C.G.M. Snoek, and A.W.M. Smeulders. Robust scene categorization by learning image statistics in context. In Int'l Workshop on Semantic Learning Applications in Multimedia, in conjunction with CVPR'06, New York, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The challenge problem for automated detection of 101 semantic concepts in multimedia

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '06: Proceedings of the 14th ACM international conference on Multimedia
          October 2006
          1072 pages
          ISBN:1595934472
          DOI:10.1145/1180639

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 October 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader