short-paper

V3C1 Dataset: An Evaluation of Content Characteristics

Authors:
Fabian Berns

University of Münster, Münster, Germany

University of Münster, Münster, Germany
View Profile

,
Luca Rossetto

University of Basel, Basel, Switzerland

University of Basel, Basel, Switzerland
View Profile

,
Klaus Schoeffmann

University of Klagenfurt, Klagenfurt, Austria

University of Klagenfurt, Klagenfurt, Austria
View Profile

,
Christian Beecks

University of Münster, Münster, Germany

University of Münster, Münster, Germany
View Profile

,
George Awad

NIST & Dakota Consulting, Inc, Gaithersburg, MD, USA

NIST & Dakota Consulting, Inc, Gaithersburg, MD, USA
View Profile

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia RetrievalJune 2019Pages 334–338https://doi.org/10.1145/3323873.3325051

Published:05 June 2019Publication History

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pages 334–338

ABSTRACT

In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.

References

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2018. YouTube-8M: A Large-Scale Video Classification Benchmark. (2018). http://arxiv.org/pdf/1609.08675v1Google Scholar
Apple Inc. 2016. About Core Image. (2016). https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_intro/ci_intro.htmlGoogle Scholar
Apple Inc. 2019. CITextFeature: Core Image. (2019). https://developer.apple.com/documentation/coreimage/citextfeatureGoogle Scholar
Zlatka Avramova, Danny de Vleeschauwer, Pedro Debevere, Sabine Wittevrongel, Peter Lambert, Rik van de Walle, and Herwig Bruneel. 2011. On the performance of scalable video coding for VBR TV channels transport in multiple resolutions and qualities. Multimedia Tools and Applications, Vol. 53, 3 (2011), 487--517. Google ScholarDigital Library
George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, Joao Magalhaes, David Semedo, and Saverio Blasi. 2018. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search. In Proceedings of TRECVID 2018 . NIST, USA.Google Scholar
Jun-Ho Choi and Jong-Seok Lee. 2016. Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset. In Proceedings of the 2016 ACM Workshop on Multimedia COMMONS, Bart Thomee (Ed.). ACM, New York, NY, 27--34. Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
Basura Fernando and Stephen Gould. 2017. Discriminatively Learned Hierarchical Rank Pooling Networks. International Journal of Computer Vision, Vol. 124, 3 (2017), 335--355. Google ScholarDigital Library
Nick Haber, Catalin Voss, Azar Fazel, Terry Winograd, and Dennis P. Wall. 2016. A practical approach to real-time neutral feature subtraction for facial expression recognition. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE Winter Conference on Applications of Computer Vision (Ed.). IEEE, {Piscataway, NJ}, 1--9.Google Scholar
Hamid A. Jalab. 2011. Image retrieval system based on color layout descriptor and Gabor filters. In ICOS 2011 . IEEE, {Piscataway, NJ}, 32--36.Google ScholarCross Ref
E. Kasutani and A. Yamada. 2001. The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In 2001 international conference on image processing . IEEE, 674--677.Google Scholar
Asmar A. Khan and Shahid Masud. 2009. Memory Efficient VLSI Architecture for QCIF to VGA Resolution Conversion. Advances in image and video technology, Toshikazu Wada, Fay Huang, and Stephen Lin (Eds.). Lecture notes in computer science, 0302--9743, Vol. 5414. Springer, Berlin, 829--838. Google ScholarDigital Library
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc, 1097--1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Google ScholarDigital Library
Joonseok Lee, Apostol (Paul) Natsev, Walter Reade, Rahul Sukthankar, and George Toderici. 2018. The 2nd YouTube-8M Large-Scale Video Understanding Challenge. (2018). https://static.googleusercontent.com/media/research.google.com/de//youtube8m/workshop2018/c_01.pdfGoogle Scholar
Pengchao Li, Liangrui Peng, and Juan Wen. 2016. Rejecting Character Recognition Errors Using CNN Based Confidence Estimation. Chinese Journal of Electronics, Vol. 25, 3 (2016), 520--526.Google ScholarCross Ref
Jakub Lokoc, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, and George Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015--2017. IEEE Transactions on Multimedia (2018).Google Scholar
Atif Nazir, Rehan Ashraf, Talha Hamdani, and Nouman Ali. 2018. Content based image retrieval system by using HSV color histogram, discrete wavelet transform and edge histogram descriptor. 2018 International Conference on Computing 2018. 1--6.Google ScholarCross Ref
Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st workshop on Web-scale multimedia corpus, Benoit Huet (Ed.). ACM, New York, NY, 25. Google ScholarDigital Library
Dong Kwon Park, Yoon Seok Jeon, and Chee Sun Won. 2000. Efficient use of local edge histogram descriptor. Proceedings ACM Multimedia 2000 workshops, Shahram Ghandeharizadeh, Shih-Fu Chang, Stephen Fischer, Joseph Konstan, and Klara Nahrstedt (Eds.). Association for Computing Machinery, New York NY, 51--54. Google ScholarDigital Library
Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2016. Learning Features by Watching Objects Move. (2016). http://arxiv.org/pdf/1612.06370v2Google Scholar
Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: a multi-feature sketch-based video retrieval engine. In Multimedia (ISM), 2014 IEEE International Symposium on. IEEE, 18--23.Google ScholarDigital Library
Luca Rossetto and Heiko Schuldt. 2017. Web video in numbers-an analysis of web-video metadata. arXiv preprint arXiv:1707.01340 (2017).Google Scholar
Luca Rossetto, Heiko Schuldt, George Awad, and Asad A Butt. 2019. V3C -- A Research Video Collection. (2019), 349--360.Google Scholar
Guo Sheng, Huang Weilin, Wang Limin, and Qiao Yu. 2017. Locally Supervised Deep Hybrid Model for Scene Recognition. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Vol. 26, 2 (2017), 808--820. Google ScholarDigital Library
Tej Singh and Dinesh Kumar Vishwakarma. 2018. Video benchmarks of human action datasets: a review. Artificial Intelligence Review, Vol. 43, 3 (2018), 1.Google Scholar
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. (2016). http://arxiv.org/pdf/1602.07261v2Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR, Vol. abs/1512.00567 (2015). arxiv: 1512.00567 http://arxiv.org/abs/1512.00567Google Scholar
Bart Thomee, Benjamin Elizalde, David A. Shamma, Karl Ni, Gerald Friedland, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M - The New Data in Multimedia Research. Commun. ACM, Vol. 59, 2 (2016), 64--73. Google ScholarDigital Library
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc Le V. 2017. Learning Transferable Architectures for Scalable Image Recognition. (2017). http://arxiv.org/pdf/1707.07012v4Google Scholar

Index Terms

V3C1 Dataset: An Evaluation of Content Characteristics
1. General and reference
  1. Cross-computing tools and techniques
    1. Evaluation
    2. Metrics
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Similarity measures
  2. Information systems applications
    1. Multimedia information systems

Recommendations

The relative effectiveness of concept-based versus content-based video retrieval
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Three video search systems were compared in the interactive search task at the TRECVID 2003 workshop: a <i>text-only</i> system, which searched video shots through transcripts; a <i>features-only</i> system, which searched video shots through 16 video ...
Read More
Aggregated feature retrieval for MPEG-7 via clustering
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

In this paper, we describe an approach to combining text and visual features from MPEG-7 descriptions of video. A video retrieval process is aligned to a text retrieval process based on the TF*IDF vector space model via clustering of low-level visual ...
Read More
News video retrieval by learning multimodal semantic information
VISUAL'07: Proceedings of the 9th international conference on Advances in visual information systems

With the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval
June 2019
427 pages
ISBN:9781450367653
DOI:10.1145/3323873
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada
,
Alberto Del Bimbo
University of Florence, Italy
,
Zhongfei Zhang
Binghamton University, State University of New York, USA
,
Program Chairs:
Alexander Hauptmann
Carnegie Mellon University, USA
,
K. Selcuk Candan
Arizona State University, USA
,
Marco Bertini
University of Florence, Italy
,
Lexing Xie
Australia National University, Australia
,
Xiao-Yong Wei
Sichuan University, China
Copyright © 2019 ACM
© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
content statistics
trecvid
v3c
video analytics
video collection
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 277
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

V3C1 Dataset: An Evaluation of Content Characteristics

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

The relative effectiveness of concept-based versus content-based video retrieval

Aggregated feature retrieval for MPEG-7 via clustering

News video retrieval by learning multimodal semantic information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media