ABSTRACT
The web has the potential to serve as an excellent source of example imagery for visual concepts. Image search engines based on text keywords can fetch thousands of images for a given query; however, their results tend to be visually noisy. We present a technique that allows a user to refine noisy search results and characterize a more precise visual object class. With a small amount of user intervention we are able to re-rank search engine results to obtain many more examples of the desired concept. Our approach is based on semi-supervised machine learning in a novel probabilistic graphical model composed of both generative and discriminative elements. Learning is achieved via a hybrid expectation maximization / expected gradient procedure initialized with a small example set defined by the user. We demonstrate our approach on images of musical instruments collected from Google image search. The rankings given by our model show significant improvement with respect to the user-refined query. The results are suitable for improving user experience in image search applications and for collecting large labeled datasets for computer vision research.
- K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107---1135, 2003. Google ScholarDigital Library
- T. Berg and D. Forsyth. Animals on the Web. Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2, 2006. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(5):993--1022, 2003. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.Google ScholarCross Ref
- G. Druck, C. Pal, J. Zhu, and A. McCallum. Semi-supervised classification with hybrid generative/discriminative methods. In Proceedings of Knowledge Discovery and Data Mining (KDD), 2007. Google ScholarDigital Library
- R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categories from GoogleŠs Image Search. Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2, 2005. Google ScholarDigital Library
- R. S. S. T. R. Z. Ghahramani. Optimization with em and expectation-conjugate-gradient. In International Conference on Machine Learning (ICML), 2003.Google Scholar
- J. Lasserre, C. M. Bishop, and T. Minka. Principled hybrids of generative and discriminative models. In In Proceedings 2006 IEEE Conference on Computer Vision and Pattern Recognition, New York, 2006. Google ScholarDigital Library
- G. W. Li-Jia Li and L. Fei-Fei. Optimol: automatic object picture collection via incremental model learning. In IEEE Computer Vision and Pattern Recognition (CVPR), 2007.Google Scholar
- D. Lowe. Object recognition from local scale-invariant features. International Conference on Computer Vision, 2:1150--1157, 1999. Google ScholarDigital Library
- Y. Lu, C. Hu, X. Zhu, H. Zhang, and Q. Yang. A unified framework for semantics and feature based relevance feedback in image retrieval systems. Proceedings of the eighth ACM international conference on Multimedia, pages 31--37, 2000. Google ScholarDigital Library
- A. McCallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/discriminative training for clustering and classification. Proceedings of 21st National Conference on Artificial Intelligence (AAAI), 2006. Google ScholarDigital Library
- A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
- A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In NIPS, pages 841--848, 2001.Google Scholar
- E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. Proc. ECCV, 4:490--503, 2006. Google ScholarDigital Library
- J. Ponce, T. Berg, M. Everingham, D. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B. Russell, A. Torralba, et al. Dataset Issues in Object Recognition. Toward Category-Level Object Recognition. LNCS, 4170.Google Scholar
- Y. Rui, T. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: a power tool for interactive content-basedimage retrieval. Circuits and Systems for Video Technology, IEEE Transactions on, 8(5):644--655, 1998. Google ScholarDigital Library
- F. Schroff, A. Criminisi, and A. Zisserman. Harvesting image databases from the web. In Proceedings of the 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007.Google ScholarCross Ref
- Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarCross Ref
- A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007-024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 2007.Google Scholar
Index Terms
- Mining the web for visual concepts
Recommendations
Generative Multi-Label Correlation Learning
In real-world applications, a single instance could have more than one label. To solve this task, multi-label learning methods emerged in recent years. It is a more challenging problem for many reasons, such as complex label correlation, long-tail label ...
Combining visual attention model with multi-instance learning for tag ranking
Tag ranking has emerged as an important research topic recently due to its potential application on web image search. Existing tag relevance ranking approaches mainly rank the tags according to their relevance levels with respect to a given image. ...
Multi-label Learning with Missing Labels Using Mixed Dependency Graphs
This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their ...
Comments