skip to main content
10.1145/2502081.2502119acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

We are not equally negative: fine-grained labeling for multimedia event detection

Published:21 October 2013Publication History

ABSTRACT

Multimedia event detection (MED) is an effective technique for video indexing and retrieval. Current classifier training for MED treats the negative videos equally. However, many negative videos may resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not accurate enough to characterize the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the knowledge from the visual features and the attributes representations, which brings mutual reciprocality. Our model obtains two kinds of classifiers, one from the attributes and one from the features, which incorporate the informative cues from the fine-grained labels. The outputs of both classifiers on the testing videos are fused for detection. Extensive experiments on the challenging TRECVID MED 2012 development set have validated the efficacy of our proposed approach.

References

  1. http://www-nlpir.nist.gov/projects/tv2012/tv2012.html#sin.Google ScholarGoogle Scholar
  2. http://www.nist.gov/itl/iad/mig/upload/med11-evalplan-v03--20110801a.pdf.Google ScholarGoogle Scholar
  3. A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):555--560, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In NIPS, pages 41--48, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Bao, L. Zhang, S.-I. Yu, Z. zhong Lan, L. Jiang, A. Overwijk, Q. Jin, S. Takahashi, B. Langner, Y. Li, M. Garbus, S. Burger, F. Metze, and A. Hauptmann. Informedia @ TRECVID2011. In NIST TRECVID Workshop, 2011.Google ScholarGoogle Scholar
  6. L. Cao, S.-F. Chang, N. Codella, C. Cotton, D. Ellis, L. Gong, M. Hill, G. Hua, J. Kender, M. Merler, Y. Mu, A. Natsev, and J. R. Smith. IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In NIST TRECVID Workshop, 2011.Google ScholarGoogle Scholar
  7. S. Dhar, V. Ordonez, and T. L. Berg. High level describable attributes for predicting aesthetics and interestingness. In CVPR, pages 1657--1664, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Ding, F. Metze, S. Rawat, P. F. Schulam, S. Burger, E. Younessian, L. Bao, M. G. Christel, and A. G. Hauptmann. Beyond audio and video retrieval: towards multimedia summarization. In ICMR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth. Describing objects by their attributes. In CVPR, pages 1778--1785, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. G. Hauptmann. Lessons for the future from a decade of informedia video analysis research. In CIVR, pages 1--10, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. G. Hauptmann, R. Yan, W.-H. Lin, M. G. Christel, and H. D. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia, 9(5):958--966, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. J. Hwang, F. Sha, and K. Grauman. Sharing features between objects and their attributes. In CVPR, pages 1761--1768, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jalali, P. D. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In NIPS, pages 964--972, 2010.Google ScholarGoogle Scholar
  14. Y.-G. Jiang, J. Yang, C.-W. Ngo, and A. G. Hauptmann. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia, 12(1):42--53, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. Hmdb: A large video database for human motion recognition. In ICCV, pages 2556--2563, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Ma, Y. Yang, Y. Cai, N. Sebe, and A. G. Hauptmann. Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM Multimedia, pages 469--478, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Ma, Y. Yang, N. Sebe, K. Zheng, and A. G. Hauptmann. Multimedia event detection using a classifier-specific intermediate representation. IEEE Transactions on Multimedia, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Ma, Y. Yang, Z. Xu, S. Yan, N. Sebe, and A. G. Hauptmann. Complex event detection via multi-source video attributes. In CVPR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Natarajan, S. Wu, S. N. P. Vitaladevuni, X. Zhuang, U. Park, R. Prasad, and P. Natarajan. Multi-channel shape-flow kernel descriptors for robust video event detection and retrieval. In ECCV (2), pages 301--314, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Natarajan, S. Wu, S. N. P. Vitaladevuni, X. Zhuang, S. Tsakalidis, U. Park, R. Prasad, and P. Natarajan. Multimodal feature fusion for robust event detection in web videos. In CVPR, pages 1298--1305, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299--1319, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Wang, Y.-G. Jiang, and C.-W. Ngo. Video event detection using motion relativity and visual relatedness. In ACM Multimedia, pages 239--248, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Wang, T.-S. Chua, and M. Zhao. Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video. In ACM Multimedia, pages 249--258, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Wang and D. A. Forsyth. Joint learning of visual attributes, object classes and visual saliency. In ICCV, pages 537--544, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  27. Y. Wang and G. Mori. A discriminative latent model of object classes and attributes. In ECCV (5), pages 155--168, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Xu, J. Wang, K. Wan, Y. Li, and L. Duan. Live sports event detection based on broadcast video and web-casting text. In ACM Multimedia, pages 221--230, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Yang and M. Shah. Complex events detection using data-driven concepts. In ECCV (3), pages 722--735, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Yang, H. T. Shen, Z. Ma, Z. Huang, and X. Zhou. l$_\mbox2, 1$-norm regularized discriminative feature selection for unsupervised learning. In IJCAI, pages 1589--1594, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Yang, J. Song, Z. Huang, Z. Ma, N. Sebe, and A. G. Hauptmann. Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia, 15(3):572--581, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S.-I. Yu, Z. Xu, D. Ding, W. Sze, F. Vicente, Z. Lan, Y. Cai, S. Rawat, P. Schulam, N. Markandaiah, S. Bahmani, A. Juarez, W. Tong, Y. Yang, S. Burger, F. Metze, R. Singh, B. Raj, R. Stern, T. Mitamura, E. Nyberg, and A. Hauptmann. Informedia e-lamp @ TRECVID2012: Multimedia event detection and recounting med and mer. In NIST TRECVID Workshop, 2012.Google ScholarGoogle Scholar
  33. J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. In NIPS, pages 702--710, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. We are not equally negative: fine-grained labeling for multimedia event detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '13: Proceedings of the 21st ACM international conference on Multimedia
        October 2013
        1166 pages
        ISBN:9781450324045
        DOI:10.1145/2502081

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 October 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '13 Paper Acceptance Rate47of235submissions,20%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader