skip to main content
10.1145/3323873.3325051acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

V3C1 Dataset: An Evaluation of Content Characteristics

Published:05 June 2019Publication History

ABSTRACT

In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.

References

  1. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2018. YouTube-8M: A Large-Scale Video Classification Benchmark. (2018). http://arxiv.org/pdf/1609.08675v1Google ScholarGoogle Scholar
  2. Apple Inc. 2016. About Core Image. (2016). https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_intro/ci_intro.htmlGoogle ScholarGoogle Scholar
  3. Apple Inc. 2019. CITextFeature: Core Image. (2019). https://developer.apple.com/documentation/coreimage/citextfeatureGoogle ScholarGoogle Scholar
  4. Zlatka Avramova, Danny de Vleeschauwer, Pedro Debevere, Sabine Wittevrongel, Peter Lambert, Rik van de Walle, and Herwig Bruneel. 2011. On the performance of scalable video coding for VBR TV channels transport in multiple resolutions and qualities. Multimedia Tools and Applications, Vol. 53, 3 (2011), 487--517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, Joao Magalhaes, David Semedo, and Saverio Blasi. 2018. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search. In Proceedings of TRECVID 2018 . NIST, USA.Google ScholarGoogle Scholar
  6. Jun-Ho Choi and Jong-Seok Lee. 2016. Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset. In Proceedings of the 2016 ACM Workshop on Multimedia COMMONS, Bart Thomee (Ed.). ACM, New York, NY, 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  8. Basura Fernando and Stephen Gould. 2017. Discriminatively Learned Hierarchical Rank Pooling Networks. International Journal of Computer Vision, Vol. 124, 3 (2017), 335--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nick Haber, Catalin Voss, Azar Fazel, Terry Winograd, and Dennis P. Wall. 2016. A practical approach to real-time neutral feature subtraction for facial expression recognition. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE Winter Conference on Applications of Computer Vision (Ed.). IEEE, {Piscataway, NJ}, 1--9.Google ScholarGoogle Scholar
  10. Hamid A. Jalab. 2011. Image retrieval system based on color layout descriptor and Gabor filters. In ICOS 2011 . IEEE, {Piscataway, NJ}, 32--36.Google ScholarGoogle ScholarCross RefCross Ref
  11. E. Kasutani and A. Yamada. 2001. The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In 2001 international conference on image processing . IEEE, 674--677.Google ScholarGoogle Scholar
  12. Asmar A. Khan and Shahid Masud. 2009. Memory Efficient VLSI Architecture for QCIF to VGA Resolution Conversion. Advances in image and video technology, Toshikazu Wada, Fay Huang, and Stephen Lin (Eds.). Lecture notes in computer science, 0302--9743, Vol. 5414. Springer, Berlin, 829--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc, 1097--1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Joonseok Lee, Apostol (Paul) Natsev, Walter Reade, Rahul Sukthankar, and George Toderici. 2018. The 2nd YouTube-8M Large-Scale Video Understanding Challenge. (2018). https://static.googleusercontent.com/media/research.google.com/de//youtube8m/workshop2018/c_01.pdfGoogle ScholarGoogle Scholar
  16. Pengchao Li, Liangrui Peng, and Juan Wen. 2016. Rejecting Character Recognition Errors Using CNN Based Confidence Estimation. Chinese Journal of Electronics, Vol. 25, 3 (2016), 520--526.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jakub Lokoc, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, and George Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015--2017. IEEE Transactions on Multimedia (2018).Google ScholarGoogle Scholar
  18. Atif Nazir, Rehan Ashraf, Talha Hamdani, and Nouman Ali. 2018. Content based image retrieval system by using HSV color histogram, discrete wavelet transform and edge histogram descriptor. 2018 International Conference on Computing 2018. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  19. Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st workshop on Web-scale multimedia corpus, Benoit Huet (Ed.). ACM, New York, NY, 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dong Kwon Park, Yoon Seok Jeon, and Chee Sun Won. 2000. Efficient use of local edge histogram descriptor. Proceedings ACM Multimedia 2000 workshops, Shahram Ghandeharizadeh, Shih-Fu Chang, Stephen Fischer, Joseph Konstan, and Klara Nahrstedt (Eds.). Association for Computing Machinery, New York NY, 51--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2016. Learning Features by Watching Objects Move. (2016). http://arxiv.org/pdf/1612.06370v2Google ScholarGoogle Scholar
  22. Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: a multi-feature sketch-based video retrieval engine. In Multimedia (ISM), 2014 IEEE International Symposium on. IEEE, 18--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Luca Rossetto and Heiko Schuldt. 2017. Web video in numbers-an analysis of web-video metadata. arXiv preprint arXiv:1707.01340 (2017).Google ScholarGoogle Scholar
  24. Luca Rossetto, Heiko Schuldt, George Awad, and Asad A Butt. 2019. V3C -- A Research Video Collection. (2019), 349--360.Google ScholarGoogle Scholar
  25. Guo Sheng, Huang Weilin, Wang Limin, and Qiao Yu. 2017. Locally Supervised Deep Hybrid Model for Scene Recognition. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Vol. 26, 2 (2017), 808--820. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tej Singh and Dinesh Kumar Vishwakarma. 2018. Video benchmarks of human action datasets: a review. Artificial Intelligence Review, Vol. 43, 3 (2018), 1.Google ScholarGoogle Scholar
  27. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. (2016). http://arxiv.org/pdf/1602.07261v2Google ScholarGoogle Scholar
  28. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR, Vol. abs/1512.00567 (2015). arxiv: 1512.00567 http://arxiv.org/abs/1512.00567Google ScholarGoogle Scholar
  29. Bart Thomee, Benjamin Elizalde, David A. Shamma, Karl Ni, Gerald Friedland, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M - The New Data in Multimedia Research. Commun. ACM, Vol. 59, 2 (2016), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le V. 2017. Learning Transferable Architectures for Scalable Image Recognition. (2017). http://arxiv.org/pdf/1707.07012v4Google ScholarGoogle Scholar

Index Terms

  1. V3C1 Dataset: An Evaluation of Content Characteristics

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval
            June 2019
            427 pages
            ISBN:9781450367653
            DOI:10.1145/3323873

            Copyright © 2019 ACM

            © 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 June 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper

            Acceptance Rates

            Overall Acceptance Rate254of830submissions,31%

            Upcoming Conference

            ICMR '24
            International Conference on Multimedia Retrieval
            June 10 - 14, 2024
            Phuket , Thailand

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader