ABSTRACT
In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.
- Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2018. YouTube-8M: A Large-Scale Video Classification Benchmark. (2018). http://arxiv.org/pdf/1609.08675v1Google Scholar
- Apple Inc. 2016. About Core Image. (2016). https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_intro/ci_intro.htmlGoogle Scholar
- Apple Inc. 2019. CITextFeature: Core Image. (2019). https://developer.apple.com/documentation/coreimage/citextfeatureGoogle Scholar
- Zlatka Avramova, Danny de Vleeschauwer, Pedro Debevere, Sabine Wittevrongel, Peter Lambert, Rik van de Walle, and Herwig Bruneel. 2011. On the performance of scalable video coding for VBR TV channels transport in multiple resolutions and qualities. Multimedia Tools and Applications, Vol. 53, 3 (2011), 487--517. Google ScholarDigital Library
- George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, Joao Magalhaes, David Semedo, and Saverio Blasi. 2018. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search. In Proceedings of TRECVID 2018 . NIST, USA.Google Scholar
- Jun-Ho Choi and Jong-Seok Lee. 2016. Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset. In Proceedings of the 2016 ACM Workshop on Multimedia COMMONS, Bart Thomee (Ed.). ACM, New York, NY, 27--34. Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
- Basura Fernando and Stephen Gould. 2017. Discriminatively Learned Hierarchical Rank Pooling Networks. International Journal of Computer Vision, Vol. 124, 3 (2017), 335--355. Google ScholarDigital Library
- Nick Haber, Catalin Voss, Azar Fazel, Terry Winograd, and Dennis P. Wall. 2016. A practical approach to real-time neutral feature subtraction for facial expression recognition. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE Winter Conference on Applications of Computer Vision (Ed.). IEEE, {Piscataway, NJ}, 1--9.Google Scholar
- Hamid A. Jalab. 2011. Image retrieval system based on color layout descriptor and Gabor filters. In ICOS 2011 . IEEE, {Piscataway, NJ}, 32--36.Google ScholarCross Ref
- E. Kasutani and A. Yamada. 2001. The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In 2001 international conference on image processing . IEEE, 674--677.Google Scholar
- Asmar A. Khan and Shahid Masud. 2009. Memory Efficient VLSI Architecture for QCIF to VGA Resolution Conversion. Advances in image and video technology, Toshikazu Wada, Fay Huang, and Stephen Lin (Eds.). Lecture notes in computer science, 0302--9743, Vol. 5414. Springer, Berlin, 829--838. Google ScholarDigital Library
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73. Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc, 1097--1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Google ScholarDigital Library
- Joonseok Lee, Apostol (Paul) Natsev, Walter Reade, Rahul Sukthankar, and George Toderici. 2018. The 2nd YouTube-8M Large-Scale Video Understanding Challenge. (2018). https://static.googleusercontent.com/media/research.google.com/de//youtube8m/workshop2018/c_01.pdfGoogle Scholar
- Pengchao Li, Liangrui Peng, and Juan Wen. 2016. Rejecting Character Recognition Errors Using CNN Based Confidence Estimation. Chinese Journal of Electronics, Vol. 25, 3 (2016), 520--526.Google ScholarCross Ref
- Jakub Lokoc, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, and George Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015--2017. IEEE Transactions on Multimedia (2018).Google Scholar
- Atif Nazir, Rehan Ashraf, Talha Hamdani, and Nouman Ali. 2018. Content based image retrieval system by using HSV color histogram, discrete wavelet transform and edge histogram descriptor. 2018 International Conference on Computing 2018. 1--6.Google ScholarCross Ref
- Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st workshop on Web-scale multimedia corpus, Benoit Huet (Ed.). ACM, New York, NY, 25. Google ScholarDigital Library
- Dong Kwon Park, Yoon Seok Jeon, and Chee Sun Won. 2000. Efficient use of local edge histogram descriptor. Proceedings ACM Multimedia 2000 workshops, Shahram Ghandeharizadeh, Shih-Fu Chang, Stephen Fischer, Joseph Konstan, and Klara Nahrstedt (Eds.). Association for Computing Machinery, New York NY, 51--54. Google ScholarDigital Library
- Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2016. Learning Features by Watching Objects Move. (2016). http://arxiv.org/pdf/1612.06370v2Google Scholar
- Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: a multi-feature sketch-based video retrieval engine. In Multimedia (ISM), 2014 IEEE International Symposium on. IEEE, 18--23.Google ScholarDigital Library
- Luca Rossetto and Heiko Schuldt. 2017. Web video in numbers-an analysis of web-video metadata. arXiv preprint arXiv:1707.01340 (2017).Google Scholar
- Luca Rossetto, Heiko Schuldt, George Awad, and Asad A Butt. 2019. V3C -- A Research Video Collection. (2019), 349--360.Google Scholar
- Guo Sheng, Huang Weilin, Wang Limin, and Qiao Yu. 2017. Locally Supervised Deep Hybrid Model for Scene Recognition. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Vol. 26, 2 (2017), 808--820. Google ScholarDigital Library
- Tej Singh and Dinesh Kumar Vishwakarma. 2018. Video benchmarks of human action datasets: a review. Artificial Intelligence Review, Vol. 43, 3 (2018), 1.Google Scholar
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. (2016). http://arxiv.org/pdf/1602.07261v2Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR, Vol. abs/1512.00567 (2015). arxiv: 1512.00567 http://arxiv.org/abs/1512.00567Google Scholar
- Bart Thomee, Benjamin Elizalde, David A. Shamma, Karl Ni, Gerald Friedland, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M - The New Data in Multimedia Research. Commun. ACM, Vol. 59, 2 (2016), 64--73. Google ScholarDigital Library
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le V. 2017. Learning Transferable Architectures for Scalable Image Recognition. (2017). http://arxiv.org/pdf/1707.07012v4Google Scholar
Index Terms
- V3C1 Dataset: An Evaluation of Content Characteristics
Recommendations
The relative effectiveness of concept-based versus content-based video retrieval
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on MultimediaThree video search systems were compared in the interactive search task at the TRECVID 2003 workshop: a <i>text-only</i> system, which searched video shots through transcripts; a <i>features-only</i> system, which searched video shots through 16 video ...
Aggregated feature retrieval for MPEG-7 via clustering
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalIn this paper, we describe an approach to combining text and visual features from MPEG-7 descriptions of video. A video retrieval process is aligned to a text retrieval process based on the TF*IDF vector space model via clustering of low-level visual ...
News video retrieval by learning multimodal semantic information
VISUAL'07: Proceedings of the 9th international conference on Advances in visual information systemsWith the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The ...
Comments