Semantic preserving distance metric learning and applications
Introduction
Recently, many methods [7], [9], [39] have been successfully adopted for image clustering to better organize, represent and browse images, as well as to improve the performance of related applications, such as Content-Based Image Retrieval (CBIR) [54], [69], image annotation and image indexing. For example, to improve the performance of CBIR, images are grouped into clusters in [12] using the K-means classical clustering algorithm as a pre-processing technique. Image clustering is also applied in an automatic image annotation system in [55], in which personal album images are first clustered, and then a web-based annotation method is developed to obtain the conceptual labels for image clusters. This is followed by a graph-based semi-supervised learning method to propagate the conceptual labels to the whole photo album. In [32], image clustering is used to extract a small subset of representative images to represent the diverse aspects of the landmarks. In recent years, low-level visual information, e.g., color [20], texture and shape, has been successfully utilized in image clustering. Representative color features are HSV color histograms [49] and color coherence vectors [43]. Texture information is another important cue for image clustering, and existing research show that texture information according to structure and orientation fits well with the model of human perception, as does shape information. Representative texture features include multiresolution simultaneous autoregressive models [38] and wavelet-based decompositions [37]. One of the most popular shape features is the edge directional histogram [25]. Because global visual features are sensitive to small geometric and photometric distortions, local feature descriptors such as shape context [3] have been developed and widely used in image analysis. The shape context can describe the relative spatial distribution (distance and orientation) of landmark points around feature points. Recently, biologically inspired features [23] have been proposed to efficiently mimic the process of the human visual cortex in recognition tasks.
Reports from both neuroscience and computer vision demonstrate that these features are attractive in visual recognition; however, they still cannot effectively describe the semantics in images. In addition to extracting proper features to represent images, an accurate distance metric learning method, which can preserve semantic similarity between images, is also critical for clustering. Yang et al. [66] provided a novel semi-supervised approach called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In their approach, a local linear regression model is adopted for each data point to predict the ranking scores of its neighboring points. Then, an integrated objective function is proposed to globally align the local models. The idea of Local Regression and Global Alignment (LRGA) has also been widely adopted in different approaches [14], [64], [65], [67]. Distance Metric Learning (DML) [10], [30], [34], [36], [63], [81] has been widely applied to find a new meaningful feature space that captures the manifold structure of the original data. Most DML techniques [19], [21], [22], [26], [31], [68], [74] can be classified into two categories: unsupervised Distance Metric Learning (U-DML) and supervised Distance Metric Learning (S-DML). The drawback of U-DML methods is that when they are applied to image clustering, images in the same cluster may not have semantic similarity. S-DML can preserve the semantic similarity by making use of the class label information; however, the class labels are not generally available for image clustering tasks. The side information [62] is more natural and convenient for the user to provide semantic similarity in the form of pairwise constraints [19], [27], [33], [46], [60], [61], [75], [76], including must-link and cannot-link constraints. Hence, the optimal distance metric can be learned to preserve the semantic similarity by keeping the data points of must-link constraints close to each other while separating the data points of cannot-link constraints. Weinberger and Saul [57] proposed a distance metric learning method called LMNN, which is derived by penalizing large distances between examples in the same class that are desired as k-nearest neighbors, while also penalizing small distances between examples with non-matching labels. Davis et al. [8] presented an information-theoretic approach to learn a Mahalanobis distance function (ITML) that can accommodate a range of constraints, including similarity or dissimilarity constraints, and relations between pairs of distances. Chechik et al. [5] provided an online algorithm for scalable image similarity estimation (OASIS). In this algorithm, similarity information is extracted from pairs of images that share a common label or are retrieved in response to a common text query. Hence, OASIS can learn a class-independent similarity measure with no need for class labels. Furthermore, the semantic similarity of pairwise constraints has been widely used in many different applications, including image annotations [59], image classification [73] and music annotation and retrieval [58]. However, the previous methods do not consider the manifold distribution of the data points in distance metric learning.
In this paper, we propose a Semantic Preserving Distance Metric Learning (SP-DML) method to encode the pairwise constraints [70] with the manifold structure of the data points in the construction of a new feature space. In this space, the obtained distance metric is used to measure the similarity of data, and K-means can be directly used for image clustering. We consider the problem of semantic preserving distance metric learning based on the patch alignment scheme [80]. According to this scheme, the distance metric learning can be generalized into two stages: local patch construction and global coordinate alignment. The new method adopts the linear transformation model to construct local patches for each data point and aligns these patches to obtain the optimal distance metric to preserve the feature similarity. Studies on semi-supervised learning [6], [28], [42], [51], [52], [53], [56], [71], [72], [82] have shown that taking the labeled data into account can effectively improve the performance of distance metric construction. Therefore, pairwise constraints indicating the semantic similarity and dissimilarity among images are adopted to enhance the local patch construction.
The major contributions of this paper are summarized as follows:
- •
First, we propose the SP-DML method to successfully encode the visual features and semantic content in new distance metric construction. Based on the SP-DML framework, K-means can be used directly in image clustering.
- •
Second, by integrating the manifold structure among data points (encoding the feature similarity information) in patch construction and incorporating the pairwise constraints (encoding the semantic similarity information), a novel distance metric learning method is proposed to seamlessly integrate the advantages of the patch alignment scheme and the semi-supervised learning.
- •
Finally, the complementary characteristics of feature and semantic similarity can be explored semi-automatically with SP-DML. Based on real-world image datasets, the experimental results of image clustering demonstrate the excellent performance of the proposed SP-DML in distance metric learning.
The remainder of the paper is organized as follows: Section 2 reviews the work related to distance metric learning. Section 3 describes the complementary characteristics of feature and semantic similarity. Section 4 describes the semantic preserving distance metric learning method. The results of experiments on image clustering are shown in Section 5, and conclusions are drawn in Section 6.
Section snippets
Related works
Suppose we have a dataset X consisting of n data xi(1 ⩽ i ⩽ n) in space Rm. That is, X = [x1, … , xn] ∈ Rm×n. Given information that certain pairs of data are “similar”, P: (xi, xj) ∈ P, i ≠ j if xi and xj are similar. The question then arises of how we can learn a distance metric dM(xi, xj) between points xi and xj so that “similar” points end up close to each other. Here, we consider learning to be a distance metric of the form [63]:where M ∈ Rm×m should be positive semi-definite.
Complementary characteristics of feature similarity and semantic similarity
In this section, we present the complementary characteristics of feature similarity and semantic similarity. To perform this task, Fig. 1 shows image examples from six datasets: COIL-20 [40], USPS [4], YALE-B [11], UMIST [13], MPEG7 [47] and MIRFLICKR [24]. The details of these datasets are provided in Section 5. In Fig. 1(a), 1024 dimensional gray-scale intensity (GSI) features are extracted to represent the images in the COIL-20 datasets. It is obvious that the objects in images C1 and C2 are
Semantic preserving distance metric learning
In this section, we present the Semantic Preserving Distance Metric Learning (SP-DML) method, which encodes the pairwise constraints with the manifold structure of the data points in a meaningful new feature space. Fig. 2 shows the workflow of SP-DML for image clustering. First, two groups of images from shape image dataset MPEG-7 [47] are imported as input. Second, the features of these images are described by the shape context descriptor [3]. Subsequently, feature patches of the images are
Experiments and discussions
To demonstrate the effectiveness of the proposed approach, we conduct experiments with six different image datasets, including COIL-20 [40], the handwritten digit image datasets USPS [4], two face image datasets (i.e., YALE-B [11], UMIST [13]), the shape image datasets MPEG-7 [47] and the image dataset MIRFLICKR [24]. Sample images from these six datasets are shown in Fig. 3. For image clustering, we compare the performance of the proposed Semantic Preserving Distance Metric Learning (SP-DML)
Conclusions
To efficiently and effectively learn the complementary characteristics of visual features and semantic content in measuring the distance between images, we have developed a novel distance metric learning method termed Semantic Preserving Distance Metric Learning (SP-DML), which encodes the feature similarity and semantic similarity in a new unified feature space. In this space, the learned distance metric can be directly used to measure the similarity/dissimilarity between two images. The new
Acknowledgements
This work has been supported by the grant of the National Natural Science Foundation of China (No. 61100104), the Program for New Century Excellent Talents in University (No. NCET-12-0323), the Hong Kong Scholar Programme (No. XJ2013038) and the Natural Science Foundation of Fujian Province of China (2012J01287).
References (82)
- et al.
Error bounds of multi-graph regularized semi-supervised classification
Inform. Sci.
(2009) - et al.
Distance metrics for high dimensional nearest neighborhood recovery: compression and normalization
Inform. Sci.
(2012) - et al.
Image classification with manifold learning for out-of-sample data
Signal Process.
(2013) - et al.
Image retrieval using color and shape
Pattern Recogn.
(1996) - et al.
Multi-label ensemble based on variable pairwise constraint projection
Inform. Sci.
(2013) - et al.
Texture classification and segmentation using multiresolution simultaneous autoregressive models
Pattern Recogn.
(1992) - et al.
Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization
Inform. Sci.
(2013) - et al.
Exploiting pairwise recommendation and clustering strategies for image re-ranking
Inform. Sci.
(2012) - et al.
A transductive multi-label learning approach for video concept detection
Pattern Recogn.
(2011) - et al.
Exploring hypergraph-based semi-supervised ranking for query-oriented summarization
Inform. Sci.
(2013)
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recogn.
Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme
Inform. Sci.
Pairwise constraints based multiview features fusion for scene classification
Pattern Recogn.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Trans. Pattern Anal. Mach. Intell.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
Large scale online learning of image similarity through ranking
J. Mach. Learn. Res.
Image retrieval: ideas, influences, and trends of the new age
ACM Comput. Surv.
From few to many: illumination cone models for face recognition under variable lighting and pose
IEEE Trans. Pattern Anal. Mach. Intell.
Generalized sparse metric learning with relative comparisons
Knowl. Inform. Syst.
Biologically inspired features for scene classification in video surveillance
IEEE Trans. Syst. Man Cybern. Part B
Automatic image annotation by semi-supervised manifold kernel density estimation
Inform. Sci.
Principal Component Analysis
Cited by (115)
Safe semi-supervised clustering based on Dempster–Shafer evidence theory
2023, Engineering Applications of Artificial IntelligenceA review on semi-supervised clustering
2023, Information SciencesEffective and efficient negative sampling in metric learning based recommendation
2022, Information SciencesStacked Capsule Graph Autoencoders for geometry-aware 3D head pose estimation
2021, Computer Vision and Image UnderstandingFeedback-based metric learning for activity recognition
2020, Expert Systems with Applications