survey

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval

Authors:
Xirong Li

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Tiberio Uricchio

University of Florence, Firenze, Italy

University of Florence, Firenze, Italy
View Profile

,
Lamberto Ballan

University of Florence, Stanford University, Firenze, Italy

University of Florence, Stanford University, Firenze, Italy
View Profile

,
Marco Bertini

University of Florence, Firenze, Italy

University of Florence, Firenze, Italy
View Profile

,
Cees G. M. Snoek

University of Amsterdam, Qualcomm Research Netherlands, Science Park, Netherlands

University of Amsterdam, Qualcomm Research Netherlands, Science Park, Netherlands
View Profile

,
Alberto Del Bimbo

University of Florence, Firenze, Italy

University of Florence, Firenze, Italy
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 49 Issue 1Article No.: 14pp 1–39https://doi.org/10.1145/2906152

Published:06 June 2016Publication History

ACM Computing Surveys

Abstract

Where previous reviews on content-based image retrieval emphasize what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems (i.e., image tag assignment, refinement, and tag-based image retrieval) is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, that is, estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this article introduces a two-dimensional taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison with the state of the art, a new experimental protocol is presented, with training sets containing 10,000, 100,000, and 1 million images, and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.

References

Morgan Ames and Mor Naaman. 2007. Why we tag: Motivations for annotation in mobile and online media. In Proc. of ACM CHI. 971--980. Google ScholarDigital Library
Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. 2003. Support vector machines for multiple-instance learning. In Proc. of NIPS. 561--568.Google Scholar
Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (2010), 345--379. Google ScholarDigital Library
Lamberto Ballan, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2015. Data-driven approaches for social image and video tagging. Multimedia Tools and Applications 74, 4 (2015), 1443--1468. Google ScholarDigital Library
Lamberto Ballan, Tiberio Uricchio, Lorenzo Seidenari, and Alberto Del Bimbo. 2014. A cross-media model for automatic image annotation. In Proc. of ACM ICMR. 73--80. Google ScholarDigital Library
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media. Google ScholarDigital Library
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarDigital Library
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proc. of ACM MM. 223--232. Google ScholarDigital Library
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM 58, 3 (2011), 11.Google ScholarDigital Library
Lin Chen, Dong Xu, Ivor W. Tsang, and Jiebo Luo. 2012. Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Transactions on Multimedia 14, 4 (2012), 1057--1067. Google ScholarDigital Library
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proc. of ACM CIVR. 48:1--48:9. Google ScholarDigital Library
Rudi L. Cilibrasi and Paul M. B. Vitanyi. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19, 3 (2007), 370--383. Google ScholarDigital Library
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. Computing Surveys 40, 2 (2008), 5:1--5:60. Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. of CVPR. 248--255.Google ScholarCross Ref
Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé, III, Alexander C. Berg, and Tamara L. Berg. 2012. Detecting visual text. In Proc. of NAACL. 762--772. Google ScholarDigital Library
Kun Duan, David J. Crandall, and Dhruv Batra. 2014. Multimodal learning in loosely-organized web images. In Proc. of CVPR. 2465--2472. Google ScholarDigital Library
Lixin Duan, Wen Li, Ivor Wai-Hung Tsang, and Dong Xu. 2011. Improving web image search by bag-based reranking. IEEE Transactions on Image Processing 20, 11 (2011), 3280--3290. Google ScholarDigital Library
Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The PASCAL visual object classes challenge: A retrospective. International Journal of Computer Vision 111, 1 (2015), 98--136. Google ScholarDigital Library
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871--1874. Google ScholarDigital Library
Songhe Feng, Congyan Lang, and Bing Li. 2012. Towards relevance and saliency ranking of image tags. In Proc. of ACM MM. 917--920. Google ScholarDigital Library
Zheyun Feng, Songhe Feng, Rong Jin, and Anil K. Jain. 2014. Image tag completion by noisy matrix recovery. In Proc. of ECCV. 424--438.Google Scholar
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4 (2003), 933--969. Google ScholarDigital Library
Yue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, and Xindong Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22, 1 (2013), 363--376. Google ScholarDigital Library
Alexandru Lucian Ginsca, Adrian Popescu, Bogdan Ionescu, Anil Armagan, and Ioannis Kanellos. 2014. Toward an estimation of user tagging credibility for social image retrieval. In Proc. of ACM MM. 1021--1024. Google ScholarDigital Library
Scott A. Golder and Bernardo A. Huberman. 2006. Usage patterns of collaborative tagging systems. Journal of Information Science 32, 2 (2006), 198--208. Google ScholarDigital Library
Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Johns Hopkins University Press.Google Scholar
Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proc. of ICCV. 309--316.Google ScholarCross Ref
Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han. 2010. Survey on social tagging techniques. SIGKDD Explorations Newsletter 12, 1 (2010), 58--72. Google ScholarDigital Library
Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proc. of ACM MM. 243--252. Google ScholarDigital Library
Mark J. Huiskes, Bart Thomee, and Michael S. Lew. 2010. New trends and ideas in visual concept detection: The MIR Flickr retrieval evaluation initiative. In Proc. of ACM MIR. 527--536. Google ScholarDigital Library
Fouzia Jabeen, Shah Khusro, Amna Majid, and Azhar Rauf. 2016. Semantics discovery in social tagging systems: A review. Multimedia Tools and Applications 75, 1 (2016), 573--605. Google ScholarDigital Library
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Intelligent Systems and Technology 20, 4 (2002), 422--446. Google ScholarDigital Library
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarDigital Library
Yu-Gang Jiang, Chong-Wah Ngo, and Shih-Fu Chang. 2009. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proc. of ACM MM. 155--164. Google ScholarDigital Library
Yohan Jin, Latifur Khan, Lei Wang, and Mamoun Awad. 2005. Image annotations by combining multiple evidence & wordNet. In Proc. of ACM MM. 706--715. Google ScholarDigital Library
Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Proc. of ICML. 200--209. Google ScholarDigital Library
Justin Johnson, Lamberto Ballan, and Li Fei-Fei. 2015. Love thy neighbors: Image annotation by exploiting image metadata. In Proc. of ICCV. Google ScholarDigital Library
Mahdi M. Kalayeh, Haroon Idrees, and Mubarak Shah. 2014. NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In Proc. of CVPR. 184--191. Google ScholarDigital Library
Lyndon S. Kennedy, Shih-Fu Chang, and Igor V. Kozintsev. 2006. To search or to label?: Predicting the performance of search-based automatic image classifiers. In Proc. of ACM MIR. 249--258. Google ScholarDigital Library
Lyndon S. Kennedy, Malcolm Slaney, and Kilian Weinberger. 2009. Reliable tags using image similarity: Mining specificity and expertise from large-scale multimedia databases. In Proc. of ACM MM Workshop on Web-Scale Multimedia Corpus. 17--24. Google ScholarDigital Library
Gunhee Kim and Eric P. Xing. 2013. Time-sensitive web image ranking and retrieval via dynamic multi-task regression. In Proc. of ACM WSDM. 163--172. Google ScholarDigital Library
Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Transactions on Multimedia 14, 4 (2012), 1079--1090. Google ScholarDigital Library
Tian Lan and Greg Mori. 2013. A max-margin riffled independence model for image tag ranking. In Proc. of CVPR. 3103--3110. Google ScholarDigital Library
Sihyoung Lee, Wesley De Neve, and Yong Man Ro. 2013. Visually weighted neighbor voting for image tag relevance learning. Multimedia Tools and Applications 72, 2 (2013), 1363--1386. Google ScholarDigital Library
Mingling Li. 2007. Texture moment for content-based image retrieval. In Proc. of ICME. 508--511.Google ScholarCross Ref
Wen Li, Lixin Duan, Dong Xu, and Ivor Wai-Hung Tsang. 2011a. Text-based image retrieval using progressive multi-instance learning. In Proc. of ICCV. 2049--2055. Google ScholarDigital Library
Xirong Li. 2016. Tag relevance fusion for social image retrieval. Multimedia Systems. In press (2016). DOI:http://dx.doi.org/10.1007/s00530-014-0430-9Google Scholar
Xirong Li, Efstratios Gavves, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2011b. Personalizing automated image annotation using cross-entropy. In Proc. of ACM MM. 233--242. Google ScholarDigital Library
Xirong Li and Cees G. M. Snoek. 2013. Classifying tag relevance with relevant positive and negative examples. In Proc. of ACM MM. 485--488. Google ScholarDigital Library
Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2009a. Annotating images by harnessing worldwide user-tagged photos. In Proc. of ICASSP. 3717--3720. Google ScholarDigital Library
Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2009b. Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia 11, 7 (2009), 1310--1322. Google ScholarDigital Library
Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2010. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proc. of ACM CIVR. 10--17. Google ScholarDigital Library
Xirong Li, Cees G. M. Snoek, Marcel Worring, Dennis Koelma, and Arnold W. M. Smeulders. 2013. Bootstrapping visual categorization with relevant negatives. IEEE Transactions on Multimedia 15, 4 (2013), 933--945. Google ScholarDigital Library
Xirong Li, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2012. Harvesting social images for bi-concept search. IEEE Transactions on Multimedia 14, 4 (2012), 1091--1104. Google ScholarDigital Library
Zechao Li, Jing Liu, and Hanqing Lu. 2013. Nonlinear matrix factorization with unified embedding for social tag relevance learning. Neurocomputing 105 (2013), 38--44. Google ScholarDigital Library
Zechao Li, Jing Liu, Xiaobin Zhu, Tinglin Liu, and Hanqing Lu. 2010. Image annotation using multi-correlation probabilistic matrix factorization. In Proc. of ACM MM. 1187--119. Google ScholarDigital Library
Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. 2007. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68, 3 (2007), 267--276. Google ScholarDigital Library
Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang, and Xiaojun Ye. 2013. Image tag completion via image-specific and tag-specific linear sparse reconstructions. In Proc. of CVPR. 1618--1625. Google ScholarDigital Library
Dong Liu, Xian-Sheng Hua, Meng Wang, and Hong-Jiang Zhang. 2010. Image retagging. In Proc. of ACM MM. 491--500. Google ScholarDigital Library
Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, and Hong-Jiang Zhang. 2009. Tag ranking. In Proc. of WWW. 351--360. Google ScholarDigital Library
Dong Liu, Xian-Sheng Hua, and Hong-Jiang Zhang. 2011. Content-based tag processing for internet social images. Multimedia Tools and Applications 51, 2 (2011), 723--738. Google ScholarDigital Library
Dong Liu, Shuicheng Yan, Xian-Sheng Hua, and Hong-Jiang Zhang. 2011b. Image retagging using collaborative tag propagation. IEEE Transactions on Multimedia 13, 4 (2011), 702--712. Google ScholarDigital Library
Jing Liu, Zechao Li, Jinhui Tang, Yu Jiang, and Hanqing Lu. 2014. Personalized geo-specific tag recommendation for photos on social websites. IEEE Transactions on Multimedia 16, 3 (2014), 588--600. Google ScholarDigital Library
Jing Liu, Yifan Zhang, Zechao Li, and Hanqing Lu. 2013. Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing 119, 7 (2013), 3--9. Google ScholarDigital Library
Yang Liu, Fei Wu, Yin Zhang, Jian Shao, and Yueting Zhuang. 2011a. Tag clustering and refinement on semantic unity graph. In Proc. of ICDM. 417--426. Google ScholarDigital Library
Hao Ma, Jianke Zhu, Michael Rung-Tsong Lyu, and Irwin King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473. Google ScholarDigital Library
Subhransu Maji, Alexander C. Berg, and Jitendra Malik. 2008. Classification using intersection kernel support vector machines is efficient. In Proc. of CVPR. 1--8.Google ScholarCross Ref
Ameesh Makadia, Vladimir Pavlovic, and Sanjiv Kumar. 2010. Baselines for image annotation. International Journal of Computer Vision 90, 1 (2010), 88--105. Google ScholarDigital Library
Julian McAuley and Jure Leskovec. 2012. Image labeling on a network: Using social-network metadata for image classification. In Proc. of ECCV. 828--841. Google ScholarDigital Library
Philip McParlane, Stewart Whiting, and Joemon Jose. 2013b. Improving automatic image tagging using temporal tag co-occurrence. In Proc. of MMM. 251--262.Google ScholarCross Ref
Philip J. McParlane, Yashar Moshfeghi, and Joemon M. Jose. 2013a. On contextual photo tag recommendation. In Proc. of ACM SIGIR. 965--968. Google ScholarDigital Library
Tao Mei, Yong Rui, Shipeng Li, and Qi Tian. 2014. Multimedia search reranking: A literature survey. Computing Surveys 46, 3 (2014), 38. Google ScholarDigital Library
Ryszard S. Michalski. 1993. A theory and methodology of inductive learning. In Readings in Knowledge Acquisition and Learning. Morgan Kaufmann Publishers, 323--348. Google ScholarDigital Library
Liqiang Nie, Shuicheng Yan, Meng Wang, Richang Hong, and Tat-Seng Chua. 2012. Harvesting visual concepts for image search with complex queries. In Proc. of ACM MM. 59--68. Google ScholarDigital Library
Zhenxing Niu, Gang Hua, Xinbo Gao, and Qi Tian. 2014. Semi-supervised relational topic model for weakly annotated image recognition in social media. In Proc. of CVPR. 4233--4240. Google ScholarDigital Library
Oded Nov and Chen Ye. 2010. Why do people tag?: Motivations for photo tagging. Communications of the ACM 53, 7 (2010), 128--131. Google ScholarDigital Library
Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 521--535. Google ScholarDigital Library
Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 850--862. Google ScholarDigital Library
Xueming Qian, Xian-Sheng Hua, Yuan Yan Tang, and Tao Mei. 2014. Social image tagging with diverse semantics. IEEE Transactions on Cybernetics 44, 12 (2014), 2493--2508.Google ScholarCross Ref
Zhiming Qian, Ping Zhong, and Runsheng Wang. 2015. Tag refinement for user-contributed images via graph learning and nonnegative tensor factorization. IEEE Signal Processing Letters 22, 9 (2015), 1302--1305.Google ScholarCross Ref
Fabian Richter, Stefan Romberg, Eva Hörster, and Rainer Lienhart. 2012. Leveraging community metadata for multimodal image ranking. Multimedia Tools and Applications 56, 1 (2012), 35--62. Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarDigital Library
Jitao Sang, Changsheng Xu, and Jing Liu. 2012a. User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia 14, 3 (2012), 883--895. Google ScholarDigital Library
Jitao Sang, Changsheng Xu, and Dongyuan Lu. 2012b. Learn to personalized image search from the photo sharing websites. IEEE Transactions on Multimedia 14, 4 (2012), 963--974. Google ScholarDigital Library
Neela Sawant, Ritendra Datta, Jia Li, and James Z. Wang. 2010. Quest for relevant tags using local interaction networks and visual content. In Proc. of ACM MIR. 231--240. Google ScholarDigital Library
Neela Sawant, Jia Li, and James Z. Wang. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications 51, 1 (2011), 213--246. Google ScholarDigital Library
Shilad Sen, Shyong K. Lam, Al Mamunur Rashid, Dan Cosley, Dan Frankowski, Jeremy Osterhouse, F. Maxwell Harper, and John Riedl. 2006. Tagging, communities, vocabulary, evolution. In Proc. of CSCW. 181--190. Google ScholarDigital Library
Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proc. of WWW. 327--336. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. of ICLR.Google Scholar
Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380. Google ScholarDigital Library
Nitish Srivastava and Ruslan R. Salakhutdinov. 2014. Multimodal learning with deep Boltzmann machines. Journal of Machine Learning Research 15, 1 (2014), 2949--2980. Google ScholarDigital Library
Aixin Sun, Sourav S. Bhowmick, Nam Nguyen, Khanh Tran, and Ge Bai. 2011. Tag-based social image retrieval: An empirical evaluation. Journal of the American Society for Information Science and Technology 62, 12 (2011), 2364--2381. Google ScholarDigital Library
Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology 2, 2 (2011), 14:1--14:15. Google ScholarDigital Library
Jinhui Tang, Shuicheng Yan, Richang Hong, Guo-Jun Qi, and Tat-Seng Chua. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proc. of ACM MM. 223--232. Google ScholarDigital Library
Ba Quan Truong, Aixin Sun, and Sourav S. Bhowmick. 2012. Content is still king: The effect of neighbor voting schemes on tag relevance for social image retrieval. In Proc. of ACM ICMR. 9:1--9:8. Google ScholarDigital Library
Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279--311.Google ScholarCross Ref
Tiberio Uricchio, Lamberto Ballan, Marco Bertini, and Alberto Del Bimbo. 2013. An evaluation of nearest-neighbor methods for tag refinement. In Proc. of ICME. 1--6.Google ScholarCross Ref
Koen E. A. Van De Sande, Theo Gevers, and Cees G. M. Snoek. 2010. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1582--1596. Google ScholarDigital Library
Jakob Verbeek, Matthieu Guillaumin, Thomas Mensink, and Cordelia Schmid. 2010. Image annotation with TagProp on the MIRFLICKR set. In Proc. of ACM MIR. 537--546. Google ScholarDigital Library
Daan T. J. Vreeswijk, Cees G. M. Snoek, Koen E. A. van de Sande, and Arnold W. M. Smeulders. 2012. All vehicles are cars: Subclass preferences in container concepts. In Proc. of ACM ICMR. 8:1--8:7. Google ScholarDigital Library
Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang. 2006. Image annotation refinement using random walk with restarts. In Proc. of ACM MM. 647--650. Google ScholarDigital Library
Gang Wang, Derek Hoiem, and David Forsyth. 2009. Building text features for object image classification. In Proc. of CVPR. 1367--1374.Google ScholarCross Ref
Jingdong Wang, Jiazhen Zhou, Hao Xu, Tao Mei, Xian-Sheng Hua, and Shipeng Li. 2014. Image tag refinement by regularized latent Dirichlet allocation. Computer Vision and Image Understanding 124 (2014), 61--70.Google ScholarCross Ref
Meng Wang, Bingbing Ni, Xian-Sheng Hua, and Tat-Seng Chua. 2012. Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. Computing Surveys 44, 4 (2012), 25:1--25:24. Google ScholarDigital Library
Meng Wang, Kuiyuan Yang, Xian-Sheng Hua, and Hong-Jiang Zhang. 2010. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia 12, 8 (2010), 829--842. Google ScholarDigital Library
Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, and Shipeng Li. 2008. Flickr distance. In Proc. of ACM MM. 31--40. Google ScholarDigital Library
Lei Wu, Rong Jin, and Anubhav K. Jain. 2013. Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 3 (2013), 716--727. Google ScholarDigital Library
Lei Wu, Linjun Yang, Nenghai Yu, and Xian-Sheng Hua. 2009. Learning to tag. In Proc. of WWW. 361--370. Google ScholarDigital Library
Pengcheng Wu, Steven Chu-Hong Hoi, Peilin Zhao, and Ying He. 2011. Mining social images with distance metric learning for automated image tagging. In Proc. of ACM WSDM. 97--206. Google ScholarDigital Library
Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proc. of ACL. 133--138. Google ScholarDigital Library
Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2009. Tag refinement by regularized LDA. In Proc. of ACM MM. 573--576. Google ScholarDigital Library
Xing Xu, Akira Shimada, and Rin-ichiro Taniguchi. 2014. Tag completion with defective tag assignments via image-tag re-weighting. In Proc. of ICME. 1--6.Google ScholarCross Ref
Kuiyuan Yang, Xian-Sheng Hua, Meng Wang, and Hong-Jiang Zhang. 2011. Tag tagging: Towards more descriptive keywords of image content. IEEE Transactions on Multimedia 13, 4 (2011), 662--673. Google ScholarDigital Library
Yang Yang, Yue Gao, Hanwang Zhang, Jie Shao, and Tat-Seng Chua. 2014. Image tagging with social assistance. In Proc. of ACM ICMR. 81--88. Google ScholarDigital Library
Bolei Zhou, Vignesh Jagadeesh, and Robinson Piramuthu. 2015. ConceptLearner: Discovering visual concepts from weakly labeled image collections. In Proc. of CVPR.Google ScholarCross Ref
Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with hypergraphs: Clustering, classification, and embedding. In Proc. of NIPS. 1601--1608.Google Scholar
Guangyu Zhu, Shuicheng Yan, and Yi Ma. 2010. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proc. of ACM MM. 461--470. Google ScholarDigital Library
Shiai Zhu, Chong-Wah Ngo, and Yu-Gang Jiang. 2012. Sampling and ontologically pooling web images for visual concept learning. IEEE Transactions on Multimedia 14, 4 (2012), 1068--1078. Google ScholarDigital Library
Xiaofei Zhu, Wolfgang Nejdl, and Mihai Georgescu. 2014. An adaptive teleportation random walk model for learning social tag relevance. In Proc. of ACM SIGIR. 223--232. Google ScholarDigital Library
Jinfeng Zhuang and Steven C. H. Hoi. 2011. A two-view learning approach for image tag ranking. In Proc. of ACM WSDM. 625--634. Google ScholarDigital Library
Amel Znaidia, Hervé Le Borgne, and Céline Hudelot. 2013. Tag completion based on belief theory and neighbor voting. In Proc. of ACM ICMR. 49--56. Google ScholarDigital Library

Index Terms

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Image Tag Assignment, Refinement and Retrieval
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

This tutorial focuses on challenges and solutions for content-based image annotation and retrieval in the context of online image sharing and tagging. We present a unified review on three closely linked problems, i.e., tag assignment, tag refinement, ...
Read More
Enriching and localizing semantic tags in internet videos
MM '11: Proceedings of the 19th ACM international conference on Multimedia

Tagging of multimedia content is becoming more and more widespread as web 2.0 sites, like Flickr and Facebook for images, YouTube and Vimeo for videos, have popularized tagging functionalities among their users. These user-generated tags are used to ...
Read More
Content-Irrelevant Tag Cleansing via Bi-Layer Clustering and Peer Cooperation

User-provided tags for social images have facilitated many fields, such as social image organization, summarization and retrieval. Since the users utilize their own knowledge and personalized language to describe the visual content of social images, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 49, Issue 1
March 2017
705 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2911992
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville, FL
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2016
- Accepted: 1 March 2016
- Revised: 1 December 2015
- Received: 1 March 2015
Published in csur Volume 49, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Social media
content-based image retrieval
social tagging
tag assignment
tag refinement
tag relevance
tag retrieval
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 124
  Total Citations
  View Citations
- 1,162
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Image Tag Assignment, Refinement and Retrieval

Enriching and localizing semantic tags in internet videos

Content-Irrelevant Tag Cleansing via Bi-Layer Clustering and Peer Cooperation