Elsevier

Neurocomputing

Volume 119, 7 November 2013, Pages 26-32
Neurocomputing

Regularized Semi-Supervised Latent Dirichlet Allocation for visual concept learning

https://doi.org/10.1016/j.neucom.2012.04.043Get rights and content

Abstract

Topic model is a popular tool for visual concept learning. Most topic models are either unsupervised or fully supervised. In this paper, to take advantage of both limited labeled training images and rich unlabeled images, we propose a novel regularized Semi-Supervised Latent Dirichlet Allocation (r-SSLDA) for learning visual concept classifiers. Instead of introducing a new complex topic model, we attempt to find an efficient way to learn topic models in a semi-supervised way. Our r-SSLDA considers both semi-supervised properties and supervised topic model simultaneously in a regularization framework. Furthermore, to improve the performance of r-SSLDA, we introduce the low rank graph to the framework. Experiments on Caltech 101 and Caltech 256 have shown that r-SSLDA outperforms both unsupervised LDA and achieves competitive performance against fully supervised LDA with much fewer labeled images.

Introduction

Visual concept detection is a key problem in image retrieval. It aims at automatically mapping images into predefined semantic concepts (such as indoor, sunset, airplane, and face), so as to bridge the so-called semantic gap between low-level visual features and high-level semantic content of images. Although there have been many studies over the last decades [1], [2], [3], it is still a challenging problem within multimedia and computer vision communities. Recently, topic models have been introduced to solve this problem, and achieve impressive results [4], [5], [6], [7], [8], [9]. In these applications, each image is treated as a document, and represented by a histogram of visual words. A visual word is equivalent to a text word, and often generated by clustering various local descriptors such as SIFT. Topic models cluster co-occurring visual words into topics, which are used to image classification.

Among current topic models, Latent Dirichlet Allocation (LDA) [10] is one of the most popular ones. Classic LDA is an unsupervised model without using any prior label information. The lack of useful supervised information usually leads to slow convergence and unsatisfactory performance. Moreover, only the visual words in the training images are modeled in classic LDA. During classification, class labels are simply treated as features extracted from the topic distribution [5]. Since class label is not part of the model, classic LDA is not well suited for classification problems, thus resulting in not so robust performance in visual concept detection.

To make LDA more effective for classification and prediction problem, Blei et al. introduced a supervised Latent Dirichlet Allocation (sLDA) model [11], [7]. In the sLDA model, label parameter is a domain structure and topics are trained to best fit the corresponding variables or labels. Both visual words and class labels are modeled at the same time. Similarly, Wang et al. [6] proposed a Semi-Latent Dirichlet Allocation for human action recognition. Different from sLDA, Semi-LDA introduces supervised information into its model by associating image class labels with visual words. That is, Semi-LDA assumes that the topic of a visual word is observable and equal to the image class label. Fig. 1 shows the difference between classic LDA, sLDA and Semi-LDA. By modeling the class label, both sLDA and Semi-LDA outperforms classic LDA significantly for classification problems. Beside sLDA and Semi-LDA, Pang et al. [12] also proposed a supervised topic model called Travelogue Model, which can extract both local and global topics with each local topic corresponding to some semantics that characterize a few specific locations.

However, all these models (sLDA, Semi-LDA and Travelogue Model) improve the model performance in a fully supervised fashion, and therefore require all training images to be labeled. For a large dataset, any label information is labor intensive and expensive, making fully supervised topic models greatly restricted to only a few concepts. On the other hand, huge amounts of unlabeled images are available in the Internet and easy to obtain. These unlabeled images contain enough information to train visual concept classifiers, and can help avoid overfitting. Therefore, learning visual concepts classifiers with a fully supervised topic model in a semi-supervised manner, which aims to utilize a large amount of unlabeled images, is a promising direction to explore.

Although much work on semi-supervised learning (SSL) algorithms has been developed, few considered combining semi-supervised properties with topic models to solve the visual concept learning problem. In [8], Zhuang et al. proposed a method called Semi-supervised pLSA (Ss-pLSA) for image classification. By introducing category label information into the EM algorithm during training, they can train classifiers with pLSA in a semi-supervised fashion. Although supervised information effectively speeds up the convergence to achieve desire results, Ss-pLSA does not encode class labels into its model, and seems to be a loosely coupled way of simple label propagation in conjunction with a unsupervised pLSA model. Different from [8], [13], [14], [15] carried out semi-supervised topic models in a more consistent fashion by incorporating the manifold assumption into the topic model. They assumed that the probabilities of latent topics of images resided on or close to a manifold, and incorporated the manifold structure into the standard EM algorithm as a regularization term. Since the underlying manifold was unknown, they simply used a nearest neighbor graph to approximate it. However, a nearest neighbor graph is mainly based on pairwise Euclidean distances, and thus is very sensitive to data noise. Since only taking local pairwise relationship into account, a nearest neighbor graph cannot well capture the global geometric structure of the manifold, thus having poor performance. Moreover, all these methods use only class label information to help model learning, while not modeling the class label in their models. As the above analysis, this will decrease the performance of visual concept classifiers.

In this paper, we propose a novel semi-supervised topic model called regularized Semi-Supervised Latent Dirichlet Allocation (r-SSLDA) for visual concept learning. Inspired by Wang et al. [16], instead of attempting to introduce a new Bayesian statistical model, we try to find a simple and an efficient semi-supervised way to learn visual concept classifier with topic models. Unlike the loosely coupled solution in [8], we consider both semi-supervised properties and topic models simultaneously in a regularization framework. By minimizing the cost function of the regularization framework, we provide a direct solution to the semi-supervised topic model problem. Different from current semi-supervised topic models [8], [13], [14], [15], our r-SSLDA encodes class labels into its framework by adopting a supervised LDA model to learn the visual concept classifiers. Meanwhile, instead of using a nearest neighbor graph, r-SSLDA uses the low rank graph (LR-graph) [17] to approximate the manifold. Compared with existing popular graphs (k NN-graph [18], 1-graph [19], [20], LLE-graph [21], [22]), LR-graph uses both the global property and local property of the graph, and thus is better at capturing the global structure of all data. Experimental results showed that r-SSLDA significantly outperformed classic unsupervised LDA and achieved competitive performance compared with fully supervised LDA with fewer labeled images.

The rest of this paper is organized as follows: In Section 2, we give the detail of low rank graph construction. Then, we introduce the regularized Semi-supervised LDA framework in Section 3. Experiments and result analysis follow in Session 4. Section 5 is our conclusions.

Section snippets

Low rank graph construction

Let X=[x1,x2,,xn]Rd×n be a set of data points drawn from a manifold. Each column of X is a data point in Rd. Since the manifold is unknown, we construct a graph from these data points to approximate it. Let G=(V,E) be a graph, where V={v1,,vn} is the set of graph vertices (node vi corresponds to data point xi), and E is the set of graph edges and associated with a weight matrix WRn×n. For any two neighboring nodes vi and vj, Wij>0 if they are connected with an edge EijE, otherwise Wij=0.

Framework of regularized semi-supervised LDA

Given an image set X={x1,,xl,xl+1,,xn}Rd and a label set C={1,,c}R, the first l images XL={x1,,xl} are labeled and the others XU={xl+1,,xn} are unlabeled. Let y=(y1,y2,,yn)T be the label vector of all images. For labeled image xiXL, yi is set to one of the elements in C. For unlabeled images xiXU, yi can be any limited value beyond C. To simplify our discussion, this paper only considers binary classification with C={1,1}. In this case, yi is set to 1 for positive labeled images, 1

Data preparation

The datasets used in this paper were Caltech 101 and Caltech 256, two popular image datasets in the literature of image classification. Compared with Caltech 101, Caltech 256 is more challenging because of containing more complex clutters. In our experiments, only 10 categories were selected, and 200 images were randomly selected from each category, 100 images for training and 100 images for test. Specifically, we chose five categories (leopard, motorbike, watch, airplane and face) from Caltech

Conclusion

In this work, we developed a novel regularized Semi-Supervised Latent Dirichlet Allocation (r-SSLDA) for visual concept learning. r-SSLDA considered both semi-supervised properties and topic models simultaneously in the regularization framework. Also, we introduced the low rank graph into the framework to improve the performance. Experiments on Caltech 101 and Caltech 256 showed that our r-SSLDA could effectively utilize both labeled images and unlabeled images and achieved competitive

Acknowledgments

We would like to thank Dr. Yi Ma (Microsoft Research Asia) for his helpful conversations about sparse representation and low rank representation. We also thank anonymous reviewers for their constructive comments. This work is partially supported by the National Science Foundation of China (No. 60933013, No. 61103134), the National Science and Technology Major Project (No. 2010ZX03004-003), the Fundamental Research Funds for the Central Universities (WK210023002, WK2101020003), and the Science

Liansheng Zhuang received the B.Sc. degree and Ph.D. degree from University of Science and Technology of China (USTC), in 2001 and 2006, respectively. He is now a Lecturer in the School of Information Science and Techonlogy, USTC. His current research interests include computer vision, image & video retrieval, and machine learning. He is a member of the IEEE and ACM.

References (33)

  • Y. Pang et al.

    Summarizing tourist destinations by mining user-generated travelogues and photos

    Comput. Vis. Image Understanding

    (2011)
  • J. Tang, S. Yan, R. Hong, G. Qi, T. Chua, Inferring semantic concepts from community-contributed images and noisy tags,...
  • J. Tang et al.

    Image annotation by graph-based inference with integrated multiple/single instance representations

    IEEE Trans. Multimedia

    (2010)
  • J. Tang et al.

    Correlative linear neighborhood propagation for video annotation

    IEEE Trans. Syst. Man Cybern.Part B

    (2009)
  • Y. Chen et al.

    Image categorization by learning and reasoning with regions

    J. Mach. Learn. Res.

    (2004)
  • R. Fergus, F.-F. Li, P. Perona, A. Zisserman, Learning object categories from google's image search, in: IEEE...
  • Y. Wang et al.

    Human action recognition by semi-latent topic models

    IEEE Trans. Pattern Anal. Mach. Intell. (Special Issue on Probabilistic Graphical Models in Computer Vision)

    (2009)
  • C. Wang, D. Blei, F.-F. Li, Simultaneous image classification and annotation, in: Proceedings of IEEE Computer Society...
  • L. Zhuang, L. She, Y. Jiang, K. Tang, N. Yu, Image classification via semi-supervised plsa, in: Proceedings of the...
  • Y. Pang et al.

    Travelogue enriching and scenic spot overview based on textual and visual topic model

    Int. J. Pattern Recognition Artif. Intell.

    (2011)
  • D. Blei et al.

    Latent Dirichlet allocation

    J. Mach. Learn. Res.

    (2003)
  • D. Blei et al.

    Supervised topic models

    Adv. Neural Inf. Process. Syst.

    (2007)
  • Y. Shao, Y. Zhou, X. He, D. Cai, H. Bao, Semi-supervised topic modeling for image annotation, in: Proceedings of the...
  • Q. Mei, D. Cai, D. Zhang, C. Zhai, Topic modeling with network regularization, in: Proceedings of the 17th...
  • Y. Lu, C. Zhai, Opinion integration through semi-supervised topic modeling, in: Proceedings of the 17th International...
  • C. Wang, L. Zhang, H.-J. Zhang, Graph-based multiple-instance learning for object-based image retrieval, in:...
  • Cited by (11)

    • Multi-view learning via multiple graph regularized generative model

      2017, Knowledge-Based Systems
      Citation Excerpt :

      In text analysis, these methods model each document as a mixture over a fixed set of underlying topics, where each topic is characterized as a distribution over words. These approaches have shown impressive success in discovering low-rank hidden structures for textual and visual data [12–14]. Recently, Zhuang et al. [15] proposed MVPLSA method, which is a multi-view topic modeling algorithm via Probabilistic Latent Semantic Analysis.

    • Image color harmony modeling through neighbored co-occurrence colors

      2016, Neurocomputing
      Citation Excerpt :

      Thus, to introduce the discriminability into LDA, researchers proposed supervised topic models by introducing the label information of each document/image during the training phase, which provide more reliable classification performances. In [36], a semi-supervised topic model was described, where both semi-supervised properties and supervised topic model were integrated into a regularization framework simultaneously. By considering both observed label frequencies and label dependency, Li et al. designed a supervised topic model to solve the multi-label classification problems [37].

    • A Guided Topic-Noise Model for Short Texts

      2022, WWW 2022 - Proceedings of the ACM Web Conference 2022
    • Semi-supervised max-margin topic model with manifold posterior regularization

      2017, IJCAI International Joint Conference on Artificial Intelligence
    • Scene classification based on spatial semantic topic

      2017, Journal of Computational and Theoretical Nanoscience
    View all citing articles on Scopus

    Liansheng Zhuang received the B.Sc. degree and Ph.D. degree from University of Science and Technology of China (USTC), in 2001 and 2006, respectively. He is now a Lecturer in the School of Information Science and Techonlogy, USTC. His current research interests include computer vision, image & video retrieval, and machine learning. He is a member of the IEEE and ACM.

    Haoyuan Gao received the B.Sc. degree from University of Science and Technology of China (USTC) in 2009. He is currently working toward the Master degree from USTC. His research interests include computer vision, image & video retrieval, and machine learning.

    Jiebo Luo received the Ph.D. degree from the University of Rochester in 1995. He is a Professor in CS Department, University of Rochester since Fall 2011. Before that he was a Senior Principal Scientist leading research and advanced development at Kodak Research Laboratories, Rochester, New York. His research spans image processing, computer vision, machine learning, data mining, medical imaging, and ubiquitous computing. He has authored more than 150 technical papers and holds 50 US patents. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010 and IEEE CVPR 2012. He is the Editor-in-Chief of the Journal of Multimedia, and has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. He is a Fellow of the SPIE, IEEE, and IAPR.

    Zhouchen Lin received the Ph.D. degree in Applied Mathematics from Peking University in 2000. He is currently a Full Professor in Peking University. He is also now a Guest Professor to Beijing Jiaotong University, Southeast University and Shanghai Jiaotong University. He is also a Guest Researcher to Institute of Computing Technology, Chinese Academy of Sciences. His research interests include computer vision, computer graphics, image processing, pattern recognition, and machine learning.

    View full text