skip to main content

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval

Published:06 June 2016Publication History
Skip Abstract Section


Where previous reviews on content-based image retrieval emphasize what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems (i.e., image tag assignment, refinement, and tag-based image retrieval) is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, that is, estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this article introduces a two-dimensional taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison with the state of the art, a new experimental protocol is presented, with training sets containing 10,000, 100,000, and 1 million images, and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.


  1. Morgan Ames and Mor Naaman. 2007. Why we tag: Motivations for annotation in mobile and online media. In Proc. of ACM CHI. 971--980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. 2003. Support vector machines for multiple-instance learning. In Proc. of NIPS. 561--568.Google ScholarGoogle Scholar
  3. Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (2010), 345--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lamberto Ballan, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2015. Data-driven approaches for social image and video tagging. Multimedia Tools and Applications 74, 4 (2015), 1443--1468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lamberto Ballan, Tiberio Uricchio, Lorenzo Seidenari, and Alberto Del Bimbo. 2014. A cross-media model for automatic image annotation. In Proc. of ACM ICMR. 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proc. of ACM MM. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM 58, 3 (2011), 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lin Chen, Dong Xu, Ivor W. Tsang, and Jiebo Luo. 2012. Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Transactions on Multimedia 14, 4 (2012), 1057--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proc. of ACM CIVR. 48:1--48:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rudi L. Cilibrasi and Paul M. B. Vitanyi. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19, 3 (2007), 370--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. Computing Surveys 40, 2 (2008), 5:1--5:60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. of CVPR. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé, III, Alexander C. Berg, and Tamara L. Berg. 2012. Detecting visual text. In Proc. of NAACL. 762--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kun Duan, David J. Crandall, and Dhruv Batra. 2014. Multimodal learning in loosely-organized web images. In Proc. of CVPR. 2465--2472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lixin Duan, Wen Li, Ivor Wai-Hung Tsang, and Dong Xu. 2011. Improving web image search by bag-based reranking. IEEE Transactions on Image Processing 20, 11 (2011), 3280--3290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The PASCAL visual object classes challenge: A retrospective. International Journal of Computer Vision 111, 1 (2015), 98--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Songhe Feng, Congyan Lang, and Bing Li. 2012. Towards relevance and saliency ranking of image tags. In Proc. of ACM MM. 917--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zheyun Feng, Songhe Feng, Rong Jin, and Anil K. Jain. 2014. Image tag completion by noisy matrix recovery. In Proc. of ECCV. 424--438.Google ScholarGoogle Scholar
  22. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4 (2003), 933--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, and Xindong Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22, 1 (2013), 363--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Alexandru Lucian Ginsca, Adrian Popescu, Bogdan Ionescu, Anil Armagan, and Ioannis Kanellos. 2014. Toward an estimation of user tagging credibility for social image retrieval. In Proc. of ACM MM. 1021--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Scott A. Golder and Bernardo A. Huberman. 2006. Usage patterns of collaborative tagging systems. Journal of Information Science 32, 2 (2006), 198--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Johns Hopkins University Press.Google ScholarGoogle Scholar
  27. Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proc. of ICCV. 309--316.Google ScholarGoogle ScholarCross RefCross Ref
  28. Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han. 2010. Survey on social tagging techniques. SIGKDD Explorations Newsletter 12, 1 (2010), 58--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proc. of ACM MM. 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mark J. Huiskes, Bart Thomee, and Michael S. Lew. 2010. New trends and ideas in visual concept detection: The MIR Flickr retrieval evaluation initiative. In Proc. of ACM MIR. 527--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Fouzia Jabeen, Shah Khusro, Amna Majid, and Azhar Rauf. 2016. Semantics discovery in social tagging systems: A review. Multimedia Tools and Applications 75, 1 (2016), 573--605. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Intelligent Systems and Technology 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yu-Gang Jiang, Chong-Wah Ngo, and Shih-Fu Chang. 2009. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proc. of ACM MM. 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yohan Jin, Latifur Khan, Lei Wang, and Mamoun Awad. 2005. Image annotations by combining multiple evidence & wordNet. In Proc. of ACM MM. 706--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Proc. of ICML. 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Justin Johnson, Lamberto Ballan, and Li Fei-Fei. 2015. Love thy neighbors: Image annotation by exploiting image metadata. In Proc. of ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Mahdi M. Kalayeh, Haroon Idrees, and Mubarak Shah. 2014. NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In Proc. of CVPR. 184--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lyndon S. Kennedy, Shih-Fu Chang, and Igor V. Kozintsev. 2006. To search or to label?: Predicting the performance of search-based automatic image classifiers. In Proc. of ACM MIR. 249--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Lyndon S. Kennedy, Malcolm Slaney, and Kilian Weinberger. 2009. Reliable tags using image similarity: Mining specificity and expertise from large-scale multimedia databases. In Proc. of ACM MM Workshop on Web-Scale Multimedia Corpus. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Gunhee Kim and Eric P. Xing. 2013. Time-sensitive web image ranking and retrieval via dynamic multi-task regression. In Proc. of ACM WSDM. 163--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Transactions on Multimedia 14, 4 (2012), 1079--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tian Lan and Greg Mori. 2013. A max-margin riffled independence model for image tag ranking. In Proc. of CVPR. 3103--3110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sihyoung Lee, Wesley De Neve, and Yong Man Ro. 2013. Visually weighted neighbor voting for image tag relevance learning. Multimedia Tools and Applications 72, 2 (2013), 1363--1386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mingling Li. 2007. Texture moment for content-based image retrieval. In Proc. of ICME. 508--511.Google ScholarGoogle ScholarCross RefCross Ref
  46. Wen Li, Lixin Duan, Dong Xu, and Ivor Wai-Hung Tsang. 2011a. Text-based image retrieval using progressive multi-instance learning. In Proc. of ICCV. 2049--2055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xirong Li. 2016. Tag relevance fusion for social image retrieval. Multimedia Systems. In press (2016). DOI: ScholarGoogle Scholar
  48. Xirong Li, Efstratios Gavves, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2011b. Personalizing automated image annotation using cross-entropy. In Proc. of ACM MM. 233--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Xirong Li and Cees G. M. Snoek. 2013. Classifying tag relevance with relevant positive and negative examples. In Proc. of ACM MM. 485--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2009a. Annotating images by harnessing worldwide user-tagged photos. In Proc. of ICASSP. 3717--3720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2009b. Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia 11, 7 (2009), 1310--1322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2010. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proc. of ACM CIVR. 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xirong Li, Cees G. M. Snoek, Marcel Worring, Dennis Koelma, and Arnold W. M. Smeulders. 2013. Bootstrapping visual categorization with relevant negatives. IEEE Transactions on Multimedia 15, 4 (2013), 933--945. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xirong Li, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2012. Harvesting social images for bi-concept search. IEEE Transactions on Multimedia 14, 4 (2012), 1091--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zechao Li, Jing Liu, and Hanqing Lu. 2013. Nonlinear matrix factorization with unified embedding for social tag relevance learning. Neurocomputing 105 (2013), 38--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zechao Li, Jing Liu, Xiaobin Zhu, Tinglin Liu, and Hanqing Lu. 2010. Image annotation using multi-correlation probabilistic matrix factorization. In Proc. of ACM MM. 1187--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. 2007. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68, 3 (2007), 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang, and Xiaojun Ye. 2013. Image tag completion via image-specific and tag-specific linear sparse reconstructions. In Proc. of CVPR. 1618--1625. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Dong Liu, Xian-Sheng Hua, Meng Wang, and Hong-Jiang Zhang. 2010. Image retagging. In Proc. of ACM MM. 491--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, and Hong-Jiang Zhang. 2009. Tag ranking. In Proc. of WWW. 351--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Dong Liu, Xian-Sheng Hua, and Hong-Jiang Zhang. 2011. Content-based tag processing for internet social images. Multimedia Tools and Applications 51, 2 (2011), 723--738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Dong Liu, Shuicheng Yan, Xian-Sheng Hua, and Hong-Jiang Zhang. 2011b. Image retagging using collaborative tag propagation. IEEE Transactions on Multimedia 13, 4 (2011), 702--712. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jing Liu, Zechao Li, Jinhui Tang, Yu Jiang, and Hanqing Lu. 2014. Personalized geo-specific tag recommendation for photos on social websites. IEEE Transactions on Multimedia 16, 3 (2014), 588--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jing Liu, Yifan Zhang, Zechao Li, and Hanqing Lu. 2013. Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing 119, 7 (2013), 3--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yang Liu, Fei Wu, Yin Zhang, Jian Shao, and Yueting Zhuang. 2011a. Tag clustering and refinement on semantic unity graph. In Proc. of ICDM. 417--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Hao Ma, Jianke Zhu, Michael Rung-Tsong Lyu, and Irwin King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Subhransu Maji, Alexander C. Berg, and Jitendra Malik. 2008. Classification using intersection kernel support vector machines is efficient. In Proc. of CVPR. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  68. Ameesh Makadia, Vladimir Pavlovic, and Sanjiv Kumar. 2010. Baselines for image annotation. International Journal of Computer Vision 90, 1 (2010), 88--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Julian McAuley and Jure Leskovec. 2012. Image labeling on a network: Using social-network metadata for image classification. In Proc. of ECCV. 828--841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Philip McParlane, Stewart Whiting, and Joemon Jose. 2013b. Improving automatic image tagging using temporal tag co-occurrence. In Proc. of MMM. 251--262.Google ScholarGoogle ScholarCross RefCross Ref
  71. Philip J. McParlane, Yashar Moshfeghi, and Joemon M. Jose. 2013a. On contextual photo tag recommendation. In Proc. of ACM SIGIR. 965--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Tao Mei, Yong Rui, Shipeng Li, and Qi Tian. 2014. Multimedia search reranking: A literature survey. Computing Surveys 46, 3 (2014), 38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Ryszard S. Michalski. 1993. A theory and methodology of inductive learning. In Readings in Knowledge Acquisition and Learning. Morgan Kaufmann Publishers, 323--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Liqiang Nie, Shuicheng Yan, Meng Wang, Richang Hong, and Tat-Seng Chua. 2012. Harvesting visual concepts for image search with complex queries. In Proc. of ACM MM. 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Zhenxing Niu, Gang Hua, Xinbo Gao, and Qi Tian. 2014. Semi-supervised relational topic model for weakly annotated image recognition in social media. In Proc. of CVPR. 4233--4240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Oded Nov and Chen Ye. 2010. Why do people tag?: Motivations for photo tagging. Communications of the ACM 53, 7 (2010), 128--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 521--535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 850--862. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Xueming Qian, Xian-Sheng Hua, Yuan Yan Tang, and Tao Mei. 2014. Social image tagging with diverse semantics. IEEE Transactions on Cybernetics 44, 12 (2014), 2493--2508.Google ScholarGoogle ScholarCross RefCross Ref
  80. Zhiming Qian, Ping Zhong, and Runsheng Wang. 2015. Tag refinement for user-contributed images via graph learning and nonnegative tensor factorization. IEEE Signal Processing Letters 22, 9 (2015), 1302--1305.Google ScholarGoogle ScholarCross RefCross Ref
  81. Fabian Richter, Stefan Romberg, Eva Hörster, and Rainer Lienhart. 2012. Leveraging community metadata for multimodal image ranking. Multimedia Tools and Applications 56, 1 (2012), 35--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Jitao Sang, Changsheng Xu, and Jing Liu. 2012a. User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia 14, 3 (2012), 883--895. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Jitao Sang, Changsheng Xu, and Dongyuan Lu. 2012b. Learn to personalized image search from the photo sharing websites. IEEE Transactions on Multimedia 14, 4 (2012), 963--974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Neela Sawant, Ritendra Datta, Jia Li, and James Z. Wang. 2010. Quest for relevant tags using local interaction networks and visual content. In Proc. of ACM MIR. 231--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Neela Sawant, Jia Li, and James Z. Wang. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications 51, 1 (2011), 213--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Shilad Sen, Shyong K. Lam, Al Mamunur Rashid, Dan Cosley, Dan Frankowski, Jeremy Osterhouse, F. Maxwell Harper, and John Riedl. 2006. Tagging, communities, vocabulary, evolution. In Proc. of CSCW. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proc. of WWW. 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. of ICLR.Google ScholarGoogle Scholar
  90. Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Nitish Srivastava and Ruslan R. Salakhutdinov. 2014. Multimodal learning with deep Boltzmann machines. Journal of Machine Learning Research 15, 1 (2014), 2949--2980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Aixin Sun, Sourav S. Bhowmick, Nam Nguyen, Khanh Tran, and Ge Bai. 2011. Tag-based social image retrieval: An empirical evaluation. Journal of the American Society for Information Science and Technology 62, 12 (2011), 2364--2381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology 2, 2 (2011), 14:1--14:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Jinhui Tang, Shuicheng Yan, Richang Hong, Guo-Jun Qi, and Tat-Seng Chua. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proc. of ACM MM. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Ba Quan Truong, Aixin Sun, and Sourav S. Bhowmick. 2012. Content is still king: The effect of neighbor voting schemes on tag relevance for social image retrieval. In Proc. of ACM ICMR. 9:1--9:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279--311.Google ScholarGoogle ScholarCross RefCross Ref
  97. Tiberio Uricchio, Lamberto Ballan, Marco Bertini, and Alberto Del Bimbo. 2013. An evaluation of nearest-neighbor methods for tag refinement. In Proc. of ICME. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  98. Koen E. A. Van De Sande, Theo Gevers, and Cees G. M. Snoek. 2010. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1582--1596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Jakob Verbeek, Matthieu Guillaumin, Thomas Mensink, and Cordelia Schmid. 2010. Image annotation with TagProp on the MIRFLICKR set. In Proc. of ACM MIR. 537--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Daan T. J. Vreeswijk, Cees G. M. Snoek, Koen E. A. van de Sande, and Arnold W. M. Smeulders. 2012. All vehicles are cars: Subclass preferences in container concepts. In Proc. of ACM ICMR. 8:1--8:7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang. 2006. Image annotation refinement using random walk with restarts. In Proc. of ACM MM. 647--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Gang Wang, Derek Hoiem, and David Forsyth. 2009. Building text features for object image classification. In Proc. of CVPR. 1367--1374.Google ScholarGoogle ScholarCross RefCross Ref
  103. Jingdong Wang, Jiazhen Zhou, Hao Xu, Tao Mei, Xian-Sheng Hua, and Shipeng Li. 2014. Image tag refinement by regularized latent Dirichlet allocation. Computer Vision and Image Understanding 124 (2014), 61--70.Google ScholarGoogle ScholarCross RefCross Ref
  104. Meng Wang, Bingbing Ni, Xian-Sheng Hua, and Tat-Seng Chua. 2012. Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. Computing Surveys 44, 4 (2012), 25:1--25:24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Meng Wang, Kuiyuan Yang, Xian-Sheng Hua, and Hong-Jiang Zhang. 2010. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia 12, 8 (2010), 829--842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, and Shipeng Li. 2008. Flickr distance. In Proc. of ACM MM. 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Lei Wu, Rong Jin, and Anubhav K. Jain. 2013. Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 3 (2013), 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Lei Wu, Linjun Yang, Nenghai Yu, and Xian-Sheng Hua. 2009. Learning to tag. In Proc. of WWW. 361--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Pengcheng Wu, Steven Chu-Hong Hoi, Peilin Zhao, and Ying He. 2011. Mining social images with distance metric learning for automated image tagging. In Proc. of ACM WSDM. 97--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proc. of ACL. 133--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2009. Tag refinement by regularized LDA. In Proc. of ACM MM. 573--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Xing Xu, Akira Shimada, and Rin-ichiro Taniguchi. 2014. Tag completion with defective tag assignments via image-tag re-weighting. In Proc. of ICME. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  113. Kuiyuan Yang, Xian-Sheng Hua, Meng Wang, and Hong-Jiang Zhang. 2011. Tag tagging: Towards more descriptive keywords of image content. IEEE Transactions on Multimedia 13, 4 (2011), 662--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Yang Yang, Yue Gao, Hanwang Zhang, Jie Shao, and Tat-Seng Chua. 2014. Image tagging with social assistance. In Proc. of ACM ICMR. 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Bolei Zhou, Vignesh Jagadeesh, and Robinson Piramuthu. 2015. ConceptLearner: Discovering visual concepts from weakly labeled image collections. In Proc. of CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  116. Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with hypergraphs: Clustering, classification, and embedding. In Proc. of NIPS. 1601--1608.Google ScholarGoogle Scholar
  117. Guangyu Zhu, Shuicheng Yan, and Yi Ma. 2010. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proc. of ACM MM. 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Shiai Zhu, Chong-Wah Ngo, and Yu-Gang Jiang. 2012. Sampling and ontologically pooling web images for visual concept learning. IEEE Transactions on Multimedia 14, 4 (2012), 1068--1078. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Xiaofei Zhu, Wolfgang Nejdl, and Mihai Georgescu. 2014. An adaptive teleportation random walk model for learning social tag relevance. In Proc. of ACM SIGIR. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Jinfeng Zhuang and Steven C. H. Hoi. 2011. A two-view learning approach for image tag ranking. In Proc. of ACM WSDM. 625--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Amel Znaidia, Hervé Le Borgne, and Céline Hudelot. 2013. Tag completion based on belief theory and neighbor voting. In Proc. of ACM ICMR. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval



      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 49, Issue 1
        March 2017
        705 pages
        • Editor:
        • Sartaj Sahni
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 June 2016
        • Accepted: 1 March 2016
        • Revised: 1 December 2015
        • Received: 1 March 2015
        Published in csur Volume 49, Issue 1


        Request permissions about this article.

        Request Permissions

        Check for updates


        • survey
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.



      View online with eReader.
