Elsevier

Neurocomputing

Volume 208, 5 October 2016, Pages 136-142
Neurocomputing

Multi-view semi-supervised learning for image classification

https://doi.org/10.1016/j.neucom.2016.02.072Get rights and content

Abstract

With the massive growth of digital image data uploaded to the Internet, classifying each image into appropriate semantic category with respect to its image content for image index and image retrieval has become an increasingly difficult and laborious task. To deal with this issue, we propose a novel multi-view semi-supervised learning framework which leverages the information contained in pseudo-labeled images to improve the prediction performance of image classification using multiple views of an image. In the training process, labeled images are first adopted to train view-specific classifiers independently using uncorrelated and sufficient views, and each view-specific classifier is then iteratively re-trained with respect to a measure of confidence using initial labeled samples and additional pseudo-labeled samples. In the classification process, the maximum entropy principle is utilized to assign appropriate category labels to unlabeled images via optimally trained view-specific classifiers. Experimental results on a general-purpose image database demonstrate the effectiveness and efficiency of the proposed multi-view semi-supervised image classification scheme.

Introduction

With the explosive growth of digital image collections on the internet, the demand of developing content-based analysis technologies to effectively organize, manage and utilize such huge amount of information resources has become an important and challenging research topic in the field of intelligent multimedia analysis. Among these technologies, image classification, which aims to build an exact correspondence between visual information at the perceptual level and linguistic descriptions at the semantic level, is an elementary step and a promising step for content-based image indexing, retrieval and other related multimedia applications. Therefore, classifying an image into high-level semantic category has emerged as an important and challenging research topic in recent years.

In the past couple of years, many novel algorithmic techniques have been proposed to deal with the problem of image classification, such as hierarchical semantic similarity based method [1], compact binary based method [2], optimized pulse-coupled neural network based method [3], locality constrained low-rank coding based method [4], global structure and sparse feature based method [5], low-rank sparse coding based method [6], structured low-rank representations based method [7], discriminative multi-manifold based method [8], bi-linear deep learning based method [9], local and global information based method [10], separable principal components analysis based method [11], cost-sensitive subspace based method [12], hierarchical Gaussianization based method [13], two dimensional multi-label active learning based method [14], multiple partially observed views based method [15], Wavelet feature based metric [16], and random sub-windows based method [17].

The task of these existing algorithms is to assign an appropriate category to a given image with respect to its semantic contents. There are two issues that should be considered when designing an effective and efficient image classification algorithm: on the one hand, the number of labeled images is often very small while the number of unlabeled images is often very large; on the other hand, an image is generally represented by a combination of feature set, such as color, shape and texture. The performance of image classification is seriously affected by the two issues discussed above. To address the first issue, the semi-supervised methods are adopted to leverage the information contained in unlabeled images to improve the prediction performance [18], [19], [21], [20], [23], [22], [24], [25], [26], [27]. To address the second issue, the multi-view learning algorithms are utilized to achieve the informative and representative training images to reduce the amount of labeled samples required for training [28], [29], [30], [31], [32]. During the process of multi-view learning, multiple classifiers are first separately trained via several distinct views extracted from the labeled images; then, these trained classifiers assign labels to pseudo-labeled images; next, the disagreement among different classifiers is utilized to selected additional pseudo-labeled images; finally, new view-specific classifiers are trained using initial labeled images and newly pseudo-labeled images to improve the overall classification performance. The main idea of the proposed method is that multi-view learning and semi-supervised learning can be effectively integrated to improve the performance of image classification, such as the method proposed by Luo [33] which deals with the classification tasks by learning the weights between different views and the correlation between labels.

A new multi-view semi-supervised learning framework is here proposed for automatic image classification, which is different from the method presented in [33] where the proposed framework aims to learn optimal multi-view classifiers with more representative labeled images and pseudo-labeled images. The basic idea of the proposed framework, as shown in Fig. 1, is described as follows. Firstly, uncorrelated and sufficient views, such as color histogram, wavelet texture and edge direction histogram, are extracted to independently train V view-specific classifiers {h1, h2, …, hv, …, hV}, and the label of each pseudo-labeled image under each view is achieved by using the learned classifiers. Secondly, initial labeled images and pairs of pseudo-labeled images with high confidence are utilized to iteratively re-train view-specific classifiers to improve the system performance. Finally, the maximum entropy principle is adopted to assign appropriate category label to each unlabeled image via optimally trained view-specific classifiers. The experimental results show that by taking advantage of both multi-view learning and multi-view learning, the proposed approach significantly outperforms the previous state-of-the-art methods.

The main contributions of this paper include:

  • A novel framework for image classification is proposed, which learns the optimal multi-view classifiers from more representative labeled images and pseudo-labeled images;

  • A vote entropy principle is utilized to accurately measure the classification results over multi-view classifiers.

The rest of the paper is organized as follows. The proposed image classification framework is detailed in Section 2. The information of dataset and experimental setup is detailed in Section 3. The experimental results and some discussions of the proposed image classification framework are presented in Section 4. Finally, this paper is concluded in Section 5.

Section snippets

Multi-view semi-supervised classification framework

In this section, we will detail the proposed multi-view semi-supervised learning framework, which consists of the following processes: (1) multi-view classifiers learning process via the uncorrelated and sufficient visual features; (2) multi-view classifiers optimal process via the initial labeled images and pairs of pseudo-labeled images with high confidence level; (3) multi-view classification process via the principle of maximum vote entropy.

Before detailing the process of multi-view

Dataset and experimental setup

In this section, we will first present the dataset used in our experiments, then describe the features selected to train multi-view classifiers, and finally discuss the evaluation measures for image classification.

Experimental results

In this section, we will first introduce the process of optimal training over each view-specific classifier, and then present the experimental results and give some analysis.

Conclusion

In this paper, a multi-view semi-supervised method is proposed to deal with the issue of image classification in the general case where images are described with heterogeneous visual features and text representation, as well as the size of training samples in each class is mostly imbalanced. To deal with the lack of training samples, the proposed method takes advantage of the multi-view features of samples by reducing the disagreement between view-specific classifiers which are optimally

Acknowledgment

This work is supported by Postdoctoral Foundation of China under No. 2014M550297, Postdoctoral Foundation of Jiangsu Province under No. 1302087B, Graduate Education Reform Research and Practice Program of Jiangsu Province under No. JGZZ13_041 and JGLX15_055.

Songhao Zhu received his Ph.D. degree in image processing and pattern recognition from Shanghai Jiao Tong University. In September 2009, he joined Nanjing University of Posts and Telecommunications. His research interests include image processing, pattern recognition, multimedia communication and computer vision.

References (34)

  • S. Zhong et al.

    Bilinear deep learning for image classification

    Proc. ACM Conf. Multimed.

    (2011)
  • D. Parikh

    Recognizing jumbled images: the role of local and global information in image classification

    Proc. IEEE Conf. Comput. Vis.

    (2011)
  • Y. Xi et al.

    Separable PCA for image classification

    Proc. IEEE Conf. Acoust. Speech. Signal Process.

    (2009)
  • J. Lu et al.

    Cost-sensitive subspace analysis and extensions for face recognition

    IEEE Trans. Inf. Forensics Secur.

    (2013)
  • X. Zhou et al.

    Hierarchical Gaussianization for image classification

    Proc. IEEE Conf. Comput. Vis.

    (2009)
  • G. Qi et al.

    Two-dimensional multilabel active learning with an efficient online adaptation model for image classification

    Pattern Anal. Mach. Intell.

    (2009)
  • M. Amini et al.

    Learning from multiple partially observed views-an application to multilingual text categorization

    Proc. IEEE Conf. Neural Inf. Process. Syst.

    (2009)
  • Cited by (48)

    • A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring

      2021, Applied Soft Computing
      Citation Excerpt :

      Semi-supervised learning addresses this problem using a large amount of unlabeled data based on the assumption of similarity to build better classifiers [22]. It has been widely used in word representation [23], image classification [24], sequence labeling [25] and text detection [26]. Semi-supervised learning gives us a great opportunity to reduce human effort and receive higher predictive accuracy, as well as enrich the study of RI.

    View all citing articles on Scopus

    Songhao Zhu received his Ph.D. degree in image processing and pattern recognition from Shanghai Jiao Tong University. In September 2009, he joined Nanjing University of Posts and Telecommunications. His research interests include image processing, pattern recognition, multimedia communication and computer vision.

    Xian Sun is currently a Master candidate at Institute of Image Processing and Pattern Recognition, Nanjing University of Posts and Telecommunications. His research interests include image processing, pattern recognition, multimedia communication and computer vision.

    Dongliang Jin is currently a Master candidate at Institute of Image Processing and Pattern Recognition, Nanjing University of Posts and Telecommunications. Her research interests include image processing, pattern recognition, multimedia communication and computer vision.

    View full text