Regularized Semi-Supervised Latent Dirichlet Allocation for visual concept learning

doi:10.1016/j.neucom.2012.04.043

Neurocomputing

Volume 119, 7 November 2013, Pages 26-32

https://doi.org/10.1016/j.neucom.2012.04.043 Get rights and content

Abstract

Topic model is a popular tool for visual concept learning. Most topic models are either unsupervised or fully supervised. In this paper, to take advantage of both limited labeled training images and rich unlabeled images, we propose a novel regularized Semi-Supervised Latent Dirichlet Allocation (r-SSLDA) for learning visual concept classifiers. Instead of introducing a new complex topic model, we attempt to find an efficient way to learn topic models in a semi-supervised way. Our r-SSLDA considers both semi-supervised properties and supervised topic model simultaneously in a regularization framework. Furthermore, to improve the performance of r-SSLDA, we introduce the low rank graph to the framework. Experiments on Caltech 101 and Caltech 256 have shown that r-SSLDA outperforms both unsupervised LDA and achieves competitive performance against fully supervised LDA with much fewer labeled images.

Introduction

Visual concept detection is a key problem in image retrieval. It aims at automatically mapping images into predefined semantic concepts (such as indoor, sunset, airplane, and face), so as to bridge the so-called semantic gap between low-level visual features and high-level semantic content of images. Although there have been many studies over the last decades [1], [2], [3], it is still a challenging problem within multimedia and computer vision communities. Recently, topic models have been introduced to solve this problem, and achieve impressive results [4], [5], [6], [7], [8], [9]. In these applications, each image is treated as a document, and represented by a histogram of visual words. A visual word is equivalent to a text word, and often generated by clustering various local descriptors such as SIFT. Topic models cluster co-occurring visual words into topics, which are used to image classification.

Among current topic models, Latent Dirichlet Allocation (LDA) [10] is one of the most popular ones. Classic LDA is an unsupervised model without using any prior label information. The lack of useful supervised information usually leads to slow convergence and unsatisfactory performance. Moreover, only the visual words in the training images are modeled in classic LDA. During classification, class labels are simply treated as features extracted from the topic distribution [5]. Since class label is not part of the model, classic LDA is not well suited for classification problems, thus resulting in not so robust performance in visual concept detection.

To make LDA more effective for classification and prediction problem, Blei et al. introduced a supervised Latent Dirichlet Allocation (sLDA) model [11], [7]. In the sLDA model, label parameter is a domain structure and topics are trained to best fit the corresponding variables or labels. Both visual words and class labels are modeled at the same time. Similarly, Wang et al. [6] proposed a Semi-Latent Dirichlet Allocation for human action recognition. Different from sLDA, Semi-LDA introduces supervised information into its model by associating image class labels with visual words. That is, Semi-LDA assumes that the topic of a visual word is observable and equal to the image class label. Fig. 1 shows the difference between classic LDA, sLDA and Semi-LDA. By modeling the class label, both sLDA and Semi-LDA outperforms classic LDA significantly for classification problems. Beside sLDA and Semi-LDA, Pang et al. [12] also proposed a supervised topic model called Travelogue Model, which can extract both local and global topics with each local topic corresponding to some semantics that characterize a few specific locations.

However, all these models (sLDA, Semi-LDA and Travelogue Model) improve the model performance in a fully supervised fashion, and therefore require all training images to be labeled. For a large dataset, any label information is labor intensive and expensive, making fully supervised topic models greatly restricted to only a few concepts. On the other hand, huge amounts of unlabeled images are available in the Internet and easy to obtain. These unlabeled images contain enough information to train visual concept classifiers, and can help avoid overfitting. Therefore, learning visual concepts classifiers with a fully supervised topic model in a semi-supervised manner, which aims to utilize a large amount of unlabeled images, is a promising direction to explore.

Although much work on semi-supervised learning (SSL) algorithms has been developed, few considered combining semi-supervised properties with topic models to solve the visual concept learning problem. In [8], Zhuang et al. proposed a method called Semi-supervised pLSA (Ss-pLSA) for image classification. By introducing category label information into the EM algorithm during training, they can train classifiers with pLSA in a semi-supervised fashion. Although supervised information effectively speeds up the convergence to achieve desire results, Ss-pLSA does not encode class labels into its model, and seems to be a loosely coupled way of simple label propagation in conjunction with a unsupervised pLSA model. Different from [8], [13], [14], [15] carried out semi-supervised topic models in a more consistent fashion by incorporating the manifold assumption into the topic model. They assumed that the probabilities of latent topics of images resided on or close to a manifold, and incorporated the manifold structure into the standard EM algorithm as a regularization term. Since the underlying manifold was unknown, they simply used a nearest neighbor graph to approximate it. However, a nearest neighbor graph is mainly based on pairwise Euclidean distances, and thus is very sensitive to data noise. Since only taking local pairwise relationship into account, a nearest neighbor graph cannot well capture the global geometric structure of the manifold, thus having poor performance. Moreover, all these methods use only class label information to help model learning, while not modeling the class label in their models. As the above analysis, this will decrease the performance of visual concept classifiers.

In this paper, we propose a novel semi-supervised topic model called regularized Semi-Supervised Latent Dirichlet Allocation (r-SSLDA) for visual concept learning. Inspired by Wang et al. [16], instead of attempting to introduce a new Bayesian statistical model, we try to find a simple and an efficient semi-supervised way to learn visual concept classifier with topic models. Unlike the loosely coupled solution in [8], we consider both semi-supervised properties and topic models simultaneously in a regularization framework. By minimizing the cost function of the regularization framework, we provide a direct solution to the semi-supervised topic model problem. Different from current semi-supervised topic models [8], [13], [14], [15], our r-SSLDA encodes class labels into its framework by adopting a supervised LDA model to learn the visual concept classifiers. Meanwhile, instead of using a nearest neighbor graph, r-SSLDA uses the low rank graph (LR-graph) [17] to approximate the manifold. Compared with existing popular graphs (k NN-graph [18], $ℓ_{1} - graph$ [19], [20], LLE-graph [21], [22]), LR-graph uses both the global property and local property of the graph, and thus is better at capturing the global structure of all data. Experimental results showed that r-SSLDA significantly outperformed classic unsupervised LDA and achieved competitive performance compared with fully supervised LDA with fewer labeled images.

The rest of this paper is organized as follows: In Section 2, we give the detail of low rank graph construction. Then, we introduce the regularized Semi-supervised LDA framework in Section 3. Experiments and result analysis follow in Session 4. Section 5 is our conclusions.

Section snippets

Low rank graph construction

Let $X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{d \times n}$ be a set of data points drawn from a manifold. Each column of X is a data point in $R^{d}$ . Since the manifold is unknown, we construct a graph from these data points to approximate it. Let $G = (V, E)$ be a graph, where $V = {v_{1}, \dots, v_{n}}$ is the set of graph vertices (node v_i corresponds to data point x_i), and E is the set of graph edges and associated with a weight matrix $W \in R^{n \times n}$ . For any two neighboring nodes v_i and v_j, $W_{ij} > 0$ if they are connected with an edge $E_{ij} \in E$ , otherwise $W_{ij} = 0$ .

Framework of regularized semi-supervised LDA

Given an image set $X = {x_{1}, \dots, x_{l}, x_{l + 1}, \dots, x_{n}} \subset R^{d}$ and a label set $C = {1, \dots, c} \subset R$ , the first l images $X^{L} = {x_{1}, \dots, x_{l}}$ are labeled and the others $X^{U} = {x_{l + 1}, \dots, x_{n}}$ are unlabeled. Let $y = {(y_{1}, y_{2}, \dots, y_{n})}^{T}$ be the label vector of all images. For labeled image $x_{i} \in X^{L}$ , y_i is set to one of the elements in C. For unlabeled images $x_{i} \in X^{U}$ , y_i can be any limited value beyond C. To simplify our discussion, this paper only considers binary classification with $C = {1, - 1}$ . In this case, y_i is set to 1 for positive labeled images, $- 1$

Data preparation

The datasets used in this paper were Caltech 101 and Caltech 256, two popular image datasets in the literature of image classification. Compared with Caltech 101, Caltech 256 is more challenging because of containing more complex clutters. In our experiments, only 10 categories were selected, and 200 images were randomly selected from each category, 100 images for training and 100 images for test. Specifically, we chose five categories (leopard, motorbike, watch, airplane and face) from Caltech

Conclusion

In this work, we developed a novel regularized Semi-Supervised Latent Dirichlet Allocation (r-SSLDA) for visual concept learning. r-SSLDA considered both semi-supervised properties and topic models simultaneously in the regularization framework. Also, we introduced the low rank graph into the framework to improve the performance. Experiments on Caltech 101 and Caltech 256 showed that our r-SSLDA could effectively utilize both labeled images and unlabeled images and achieved competitive

Acknowledgments

We would like to thank Dr. Yi Ma (Microsoft Research Asia) for his helpful conversations about sparse representation and low rank representation. We also thank anonymous reviewers for their constructive comments. This work is partially supported by the National Science Foundation of China (No. 60933013, No. 61103134), the National Science and Technology Major Project (No. 2010ZX03004-003), the Fundamental Research Funds for the Central Universities (WK210023002, WK2101020003), and the Science

Liansheng Zhuang received the B.Sc. degree and Ph.D. degree from University of Science and Technology of China (USTC), in 2001 and 2006, respectively. He is now a Lecturer in the School of Information Science and Techonlogy, USTC. His current research interests include computer vision, image & video retrieval, and machine learning. He is a member of the IEEE and ACM.

References (33)

Y. Pang et al.
Summarizing tourist destinations by mining user-generated travelogues and photos
Comput. Vis. Image Understanding
(2011)
J. Tang, S. Yan, R. Hong, G. Qi, T. Chua, Inferring semantic concepts from community-contributed images and noisy tags,...
J. Tang et al.
Image annotation by graph-based inference with integrated multiple/single instance representations
IEEE Trans. Multimedia
(2010)
J. Tang et al.
Correlative linear neighborhood propagation for video annotation
IEEE Trans. Syst. Man Cybern.Part B
(2009)
Y. Chen et al.
Image categorization by learning and reasoning with regions
J. Mach. Learn. Res.
(2004)
R. Fergus, F.-F. Li, P. Perona, A. Zisserman, Learning object categories from google's image search, in: IEEE...
Y. Wang et al.
Human action recognition by semi-latent topic models
IEEE Trans. Pattern Anal. Mach. Intell. (Special Issue on Probabilistic Graphical Models in Computer Vision)
(2009)
C. Wang, D. Blei, F.-F. Li, Simultaneous image classification and annotation, in: Proceedings of IEEE Computer Society...
L. Zhuang, L. She, Y. Jiang, K. Tang, N. Yu, Image classification via semi-supervised plsa, in: Proceedings of the...
Y. Pang et al.
Travelogue enriching and scenic spot overview based on textual and visual topic model
Int. J. Pattern Recognition Artif. Intell.
(2011)

D. Blei et al.

Latent Dirichlet allocation

J. Mach. Learn. Res.

(2003)

D. Blei et al.

Supervised topic models

Adv. Neural Inf. Process. Syst.

(2007)

Y. Shao, Y. Zhou, X. He, D. Cai, H. Bao, Semi-supervised topic modeling for image annotation, in: Proceedings of the...

Q. Mei, D. Cai, D. Zhang, C. Zhai, Topic modeling with network regularization, in: Proceedings of the 17th...

Y. Lu, C. Zhai, Opinion integration through semi-supervised topic modeling, in: Proceedings of the 17th International...

C. Wang, L. Zhang, H.-J. Zhang, Graph-based multiple-instance learning for object-based image retrieval, in:...

Cited by (11)

Multi-view learning via multiple graph regularized generative model
2017, Knowledge-Based Systems
Citation Excerpt :
In text analysis, these methods model each document as a mixture over a fixed set of underlying topics, where each topic is characterized as a distribution over words. These approaches have shown impressive success in discovering low-rank hidden structures for textual and visual data [12–14]. Recently, Zhuang et al. [15] proposed MVPLSA method, which is a multi-view topic modeling algorithm via Probabilistic Latent Semantic Analysis.
Topic models, such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), have shown impressive success in many fields. Recently, multi-view learning via probabilistic latent semantic analysis (MVPLSA), is also designed for multi-view topic modeling. These approaches are instances of generative model, whereas they all ignore the manifold structure of data distribution, which is generally useful for preserving the nonlinear information. In this paper, we propose a novel multiple graph regularized generative model to exploit the manifold structure in multiple views. Specifically, we construct a nearest neighbor graph for each view to encode its corresponding manifold information. A multiple graph ensemble regularization framework is proposed to learn the optimal intrinsic manifold. Then, the manifold regularization term is incorporated into a multi-view topic model, resulting in a unified objective function. The solutions are derived based on the Expectation Maximization optimization framework. Experimental results on real-world multi-view data sets demonstrate the effectiveness of our approach.
Image color harmony modeling through neighbored co-occurrence colors
2016, Neurocomputing
Citation Excerpt :
Thus, to introduce the discriminability into LDA, researchers proposed supervised topic models by introducing the label information of each document/image during the training phase, which provide more reliable classification performances. In [36], a semi-supervised topic model was described, where both semi-supervised properties and supervised topic model were integrated into a regularization framework simultaneously. By considering both observed label frequencies and label dependency, Li et al. designed a supervised topic model to solve the multi-label classification problems [37].
The traditional color harmony models for the photo esthetics assessment, such as Moon & Spencer׳s model and the adaptive hue template based approach, only utilize the distribution of (co-occurrence) colors based on heuristic rules or principled probability based metrics, where the spatial relationships between compatible colors are ignored. In this paper, we propose a discriminant learning approach to train a color harmony model based on Latent Dirichlet Allocation (LDA), which constrains the LDA׳s training by considering the spatial distances between harmonious colors. Our main contributions are: (1) employing spatial relationship between colors as the feature to build color harmony model, (2) proposing a consolidated framework to supervise the training phase of LDA, where the distribution and spatial information of colors and the target esthetics scores of images are used under this framework, and (3) designing an efficient algorithm to solve the alternating optimization problem for proposed color harmony model. The experimental results illustrate that compared to the existing color harmony models, the proposed method provides more reliable harmony scores to assess the photos’ esthetics quality.
A Guided Topic-Noise Model for Short Texts
2022, WWW 2022 - Proceedings of the ACM Web Conference 2022
Keypoints detection and feature extraction: A dynamic genetic programming approach for evolving rotation-invariant texture image descriptors
2017, IEEE Transactions on Evolutionary Computation
Semi-supervised max-margin topic model with manifold posterior regularization
2017, IJCAI International Joint Conference on Artificial Intelligence
Scene classification based on spatial semantic topic
2017, Journal of Computational and Theoretical Nanoscience

View all citing articles on Scopus

Haoyuan Gao received the B.Sc. degree from University of Science and Technology of China (USTC) in 2009. He is currently working toward the Master degree from USTC. His research interests include computer vision, image & video retrieval, and machine learning.

Jiebo Luo received the Ph.D. degree from the University of Rochester in 1995. He is a Professor in CS Department, University of Rochester since Fall 2011. Before that he was a Senior Principal Scientist leading research and advanced development at Kodak Research Laboratories, Rochester, New York. His research spans image processing, computer vision, machine learning, data mining, medical imaging, and ubiquitous computing. He has authored more than 150 technical papers and holds 50 US patents. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010 and IEEE CVPR 2012. He is the Editor-in-Chief of the Journal of Multimedia, and has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. He is a Fellow of the SPIE, IEEE, and IAPR.

Zhouchen Lin received the Ph.D. degree in Applied Mathematics from Peking University in 2000. He is currently a Full Professor in Peking University. He is also now a Guest Professor to Beijing Jiaotong University, Southeast University and Shanghai Jiaotong University. He is also a Guest Researcher to Institute of Computing Technology, Chinese Academy of Sciences. His research interests include computer vision, computer graphics, image processing, pattern recognition, and machine learning.

View full text

Regularized Semi-Supervised Latent Dirichlet Allocation for visual concept learning

Abstract

Introduction

Section snippets

Low rank graph construction

Framework of regularized semi-supervised LDA

Data preparation

Conclusion

Acknowledgments

Comput. Vis. Image Understanding

Image annotation by graph-based inference with integrated multiple/single instance representations

IEEE Trans. Multimedia

Correlative linear neighborhood propagation for video annotation

IEEE Trans. Syst. Man Cybern.Part B

Image categorization by learning and reasoning with regions

J. Mach. Learn. Res.

Human action recognition by semi-latent topic models

IEEE Trans. Pattern Anal. Mach. Intell. (Special Issue on Probabilistic Graphical Models in Computer Vision)

Travelogue enriching and scenic spot overview based on textual and visual topic model

Int. J. Pattern Recognition Artif. Intell.

Latent Dirichlet allocation

J. Mach. Learn. Res.

Supervised topic models

Adv. Neural Inf. Process. Syst.