ABSTRACT
This paper introduces a web image dataset created by NUS's Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments extracted over 5x5 fixed grid partitions, and 500-D bag of words based on SIFT descriptions; and (3) ground-truth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval.
- S. Arya, D. M. Mount, N. S. N. R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Journal of ACM, 45: 891--923, 1998. Google ScholarDigital Library
- K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3: 1107--1135, 2003. Google ScholarDigital Library
- F. Blog. http://blog.flickr.net/en/2007/05/29/were-going-down/.Google Scholar
- L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision, 2004. Google ScholarDigital Library
- A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia, 9(5): 958--966, 2007. Google ScholarDigital Library
- J. Huang, S. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih. Image indexing using color correlogram. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 762--768, June 1997. Google ScholarDigital Library
- D. Lowe. Distinctive image features from scale-invariant keypoints. Int'l J. Computer Vision, 2(60): 91--110, 2004. Google ScholarDigital Library
- Y. Lu, L. Zhang, Q. Tian, and W.-Y. Ma. What are the high-level concepts with small semantic gaps? In IEEE Conf. on Computer Vision and Pattern Recognition, 2008.Google Scholar
- B. S. Manjunath and W.-Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8): 837--842, August 1996. Google ScholarDigital Library
- M. Naphade, J. R. Smith, J. Tesic, S. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. A large-scale concept ontology for multimedia. IEEE MultiMedia, 13: 86--91, July 2006. Google ScholarDigital Library
- D. K. Park, Y. S. Jeon, and C. S. Won. Efficient use of local edge histogram descriptor. In ACM Multimedia, 2000. Google ScholarDigital Library
- G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM Multimedia, 2007. Google ScholarDigital Library
- G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, and H.-J. Zhang. Two-dimensional multi-label active learning with an efficient online adaptation model for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear. Google ScholarDigital Library
- L. G. Shapiro and G. C. Stockman. Computer Vision. Prentice Hall, 2003. Google ScholarDigital Library
- C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In ACM Multimedia, Oct. 2006. Google ScholarDigital Library
- M. Stricker and M. Orengo. Similarity of color images. In SPIE Storage and Retrieval for Image and Video Databases III, Feb. 1995.Google ScholarCross Ref
- J. Tang, X.-S. Hua, M. Wang, Z. Gu, G.-J. Qi, and X. Wu. Correlative linear neighborhood propagation for video annotation. IEEE Transactions on Systems, Man, and Cybernetics--Part B: Cybernetics, 39(2), April 2009. Google ScholarDigital Library
- J. Tang, Y. Song, X.-S. Hua, T. Mei, and X. Wu. To construct optimal training set for video annotation. In ACM Multimedia, Oct. 2006. Google ScholarDigital Library
- A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11): 1958--1970, November 2008. Google ScholarDigital Library
- X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11): 1919--1932, November 2008. Google ScholarDigital Library
Index Terms
- NUS-WIDE: a real-world web image database from National University of Singapore
Recommendations
Label-specific training set construction from web resource for image annotation
Recently many research efforts have been devoted to image annotation by leveraging on the associated tags/keywords of web images as training labels. A key issue to resolve is the relatively low accuracy of the tags. In this paper, we propose a novel ...
Tagging and retrieving images with co-occurrence models: from corel to flickr
LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and miningThis paper presents two models for content-based automatic image annotation and retrieval in web image repositories, based on the co-occurrence of tags and visual features in the images. In particular, we show how additional measures can be taken to ...
Accuracy Of User-Contributed Image Tagging In Flickr: A Natural Disaster Case Study
SMSociety '16: Proceedings of the 7th 2016 International Conference on Social Media & SocietySocial media platforms have become extremely popular during the past few years, presenting an alternate, and often preferred, avenue for information dissemination within massive global communities. Such user-generated multimedia content is emerging as a ...
Comments