Domain class consistency based transfer learning for image classification across domains

doi:10.1016/j.ins.2017.08.034

Information Sciences

Volumes 418–419, December 2017, Pages 242-257

https://doi.org/10.1016/j.ins.2017.08.034 Get rights and content

Abstract

Distribution mismatch between the modeling data and the query data is a known domain adaptation issue in machine learning. To this end, in this paper, we propose a l_2,1-norm based discriminative robust kernel transfer learning (DKTL) method for high-level recognition tasks. The key idea is to realize robust domain transfer by simultaneously integrating domain-class-consistency (DCC) metric based discriminative subspace learning, kernel learning in reproduced kernel Hilbert space, and representation learning between source and target domain. The DCC metric includes two properties: domain-consistency used to measure the between-domain distribution discrepancy and class-consistency used to measure the within-domain class separability. The essential objective of the proposed transfer learning method is to maximize the DCC metric, which is equivalently to minimize the domain-class-inconsistency (DCIC), such that domain distribution mismatch and class inseparability are well formulated and unified simultaneously. The merits of the proposed method include (1) the robust sparse coding selects a few valuable source data with noises (outliers) removed during knowledge transfer, and (2) the proposed DCC metric can pursue more discriminative subspaces of different domains. As a result, the maximum class-separability is also well guaranteed. Extensive experiments on a number of visual datasets demonstrate the superiority of the proposed method over other state-of-the-art domain adaptation and transfer learning methods.

Introduction

One basic assumption of machine learning is that the training data and testing data should hold similar probability distribution, i.e. independent identical distribution (i.i.d) which shares the same feature subspace. However, in many real applications, machine learning faces with the dilemma of insufficient labeled data. For learning a robust classification model, researchers have to “borrow” more data from other domains for training. One problem of the borrowed data is that the distribution mismatch between source domain and target domain violates the basic assumption of machine learning. Specifically, domain mismatch often results from a variety of visual cues or abrupt feature changes, such as camera viewpoint, resolution (e.g. image sensor from webcam to DSLR), illumination conditions, color correction, poses (e.g. faces with different angles), and background, etc. Physically, such distribution mismatch or domain shift is common knowledge in vision problems. With this violation, significant performance degradation is suffered in classification [2]. For example, given a typical object recognition scenario in computer vision, users often recognize a given query object captured by a mobile phone via a well-trained model using the labeled training data from an existing object dataset, such as Caltech 256 [14] or web images. However, these training data may be sampled under different ambient visual cues from the query image. As a result, a failure will be encountered during users’ testing process. Some example images of objects from different domains are shown in Fig. 1, which explicitly shows the domain shifts/bias.

In order to deal with such domain distribution mismatch issues, transfer learning and domain adaptation based methods have been emerged [4], [13], [16], [20], [32], [33], [40], [41], [42], which can be generally divided into two categories: classifier-based and feature-based. Specifically, the classifier based methods advocate learning a transfer classifier on the source data, by leveraging a few labeled data from the target domain simultaneously [1], [4], [5], [6], [40], [42]. The “borrowed” target data implies the role of regularization, which can trade-off the decision boundary, such that the learned decision function (e.g. SVM) is posed the transfer capability and can be used for classification of domains with bias. The idea of classifier based techniques is straightforward and easy to understand, however, during the decision boundary determination, a number of labeled data are necessary, which may increase the cost of data labeling. Essentially, the classifier based methods attempt to learn a generalized decision function without mining the intrinsic visual drifting mechanism, thus they cannot solve the distribution mismatch fundamentally.

Further, the feature based representation and transformation methods [9], [12], [13], [43], [44] aim at aligning the domain shift by adapting features from the source domain to target domain without training classifiers. Although these methods have been proven to be effective for domain adaptation, two issues still exist. First, for representation based adaptation, the noise and outliers from source data may also be transferred to target data due to overfitting of naïve transformation, which leads to significantly distorted and corrupted data structure. Second, the learned subspace is suboptimal, due to the fact that the subspace and the representation (e.g. global low-rank, local sparse coding etc.) are learned independently, which limits the transfer ability. Third, nonlinear transfer often happens in real application, and cannot be effectively interpreted by using linear reconstruction. Therefore, subspace learning and kernel learning that help most to representation transfer and nonlinear transfer should be conducted and integrated simultaneously.

Additionally, Long, et al. [24], [25] proposed class-wise adaptation regularization method (ARTL) which learns an adaptive classifier by jointly optimizing the structural risk and distribution matching between both marginal and conditional distribution for transfer learning. Considering the labeling cost of target domain, unsupervised domain adaptation methods have been proposed [11], [26]. By leveraging the strong learning capability of deep learning, with the convolution neural network (CNN) and maximum mean discrepancy (MMD) criteria, deep transfer learning methods such as residual transfer network (RTN) [27], deep adaptation network (DAN) [28], [29], and joint CNN model [37], [38] have also been proposed. Deep transfer learning depends on pre-trained knowledge network on a larger dataset (e.g. ImageNet), so that the transfer performance is greatly improved. In this paper, the proposed method is essentially a shallow transfer learning model, therefore, for comparing with deep transfer models, the CNN based deep features (e.g. DeCAF) are exploited in this paper.

As described in Fig. 2, in this paper, we propose a novel model which targets at learning a discriminative subspace P by using a newly proposed domain-class-consistency metric, a reproduced kernel Hilbert space, and a l_2,1-norm constrained representation. This work is an extension of the IJCNN conference paper [45], by adding more detailed algorithmic deduction and discussion throughout the paper, conducting new experiments on benchmark datasets, introducing parameter sensitive analysis, empirical comparison of computational time, and comparing with more deep transfer learning methods. The proposed method has three merits:

(1)
It can learn a discriminative subspace for each domain and guarantee the maximum separability of different classes (i.e. c₁, c₂, c₃.) within the same domain. In the model, we formulate to maximize the inter-class distance within the same domain, such that the inter-class difference within a domain can cover the between-domain discrepancy. In this way, the inter-class difference can be enhanced and the impact of distribution mismatch is thus reduced, such that the proposed method is not sensitive to domain bias. This is motivated by the fact that in face recognition, the difference between two images of the same person captured under different illumination condition may be larger than that of two persons captured under the same condition.
(2)
By imposing l_2,1-norm constraint on the transfer representation coefficient Z between source and target data points, only a few valuable source data points are utilized, such that the outliers in the source domain can be well removed without incorrectly transferring to the target domain. Therefore, the proposed method is not sensitive to noises or implicit outliers during transferring. Additionally, with the l_2,1-norm constraint on Z, the closed-form solution can be obtained with a higher computational efficiency than l₁-norm sparse constraint or low-rank constraint.
(3)
Due to the fact that nonlinear domain shift may often be encountered in complex vision applications, the kernel learning idea using an implicit nonlinear mapping function for approximated linear separability in the reproduced kernel Hilbert space (RKHS) is naturally motivated. With the above description, discriminative subspace learning, representation learning and kernel learning are formulated in the proposed method. For convenience, we call our method discriminative kernel transfer learning (DKTL).

The rest of this paper is organized as follows. Section 2 summarizes the related work in transfer learning and domain adaptation. The proposed model and optimization algorithms are presented in Section 3. The experiments on a number of datasets for transfer learning tasks and discussions are conducted in Section 4. The parameter sensitivity and computational time analysis are provided in Section 5. Finally, a concluding remark of the present work is given.

Section snippets

Related work

In recent years, a number of transfer learning and domain adaptation methods have been proposed, which are summarized as two categories: classifier adaptation based methods and feature adaptation based methods.

For the former, Yang et al. [40] proposed an adaptive support vector machine (ASVM), which aims at learning the perturbation term for adapting the source classifier to the target classifier. Collobert et al. [1] proposed a transductive SVM (T-SVM), which utilized the labeled and unlabeled

Notations

In this paper, the source and target domain are defined by subscript “S” and “T”. The training set of source and target domain is defined as $X_{S} \in ℜ^{D \times N_{S}}$ and $X_{T} \in ℜ^{D \times N_{T}}$ , where D denotes the dimension of data, N_S and N_T denote the number of samples of source and target domain, respectively. Let P ∈ ℜ^D × d (d ≤ D) represent the discriminative basis transformation that maps the original data space of the source and target data into a d dimensional subspace. The reconstruction coefficient matrix is

Experiments

In this section, the experiments on several benchmark datasets, including 3DA object data, 4DA object data, COIL-20 object data, Multi-PIE face data, USPS data, SEMEION data, and MINIST handwritten digits data, have been conducted for evaluating the proposed DKTL method. For classification, the regularized least square classifier and support vector machine can be used.

Parameter sensitivity analysis

In the proposed DKTL model, there are two hyper-parameters λ and τ. Additionally, there are also several internal model parameters such as the dimensionality d, the kernel parameter σ, and the constrained coefficient α_S. For more insight of their impact on the model, we have provided parameter sensitivity analysis in this section. Specifically, the hyper-parameters λ and τ are tuned in the range of 10⁻⁴–10⁴, the kernel parameter σ is tuned in the range of 2⁻⁴–2⁴, and the coefficient α_S is tuned

Conclusion

In this paper, we propose a discriminative kernel transfer learning (DKTL) via l_2,1-norm minimization. In the model, the domain class consistency (DCC) that simultaneously interprets the domain consistency and class consistency (double consistency) is proposed. To this end, in subspace learning, the discriminative mechanism for strengthening the importance of between-domain intra-class consistency and within-domain inter-class inconsistency is integrated. For reducing the domain inconsistency,

Acknowledgments

The authors would like to thank the AE and anonymous reviewers for their insightful and constructive comments. This work was supported by the Fundamental Research Funds for the Central Universities (Project No. 106112017CDJQJ168819), the National Natural Science Foundation of China under Grants 61401048, 91420201 and 61472187, the 973 Program No. 2014CB349303, and Program for Changjiang Scholars.

References (49)

R. Gross et al.
Multi-pie
Image Vision Comput.
(2010)
L. Zhang et al.
MetricFusion: generalized metric swarm learning for similarity measure
Inf. Fusion
(2016)
R. Collobert et al.
Large scale transductive SVMs
J. Mach. Learn. Res.
(2006)
H. Daumé
Frustratingly easy domain adaptation
ACL
(2007)
J. Donahue et al.
DeCAF: a deep convolutional activation feature for generic visual recognition
L. Duan et al.
Domain transfer multiple kernel learning
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
L. Duan et al.
Domain adaptation from multiple sources: a domain-dependent regularization approach
IEEE Trans. Neural Networks Learn. Syst.
(2012)
L. Duan et al.
Visual event recognition in videos by learning from web data.
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
E. Elhamifar et al.
Sparse subspace clustering
E. Elhamifar et al.
Sparse subspace clustering: algorithm, theory, and applications
IEEE Trans. Pattern Anal. Mach. Intell
(2013)

B. Fernando et al.

Unsupervised visual domain adaptation using subspace alignment

A. Frank et al.

UCI machine learning repository

Y. Ganin et al.

Unsupervised domain adaptation by backpropagation

B. Gong et al.

Geodesic flow kernel for unsupervised domain adaptation

R. Gopalan et al.

Domain adaptation for object recognition: an unsupervised approach

G. Griffin, A. Holub, and P. Perona. Caltech-256 Object Category Dataset, Technical Report...

J. Hoffman et al.

Asymmetric and category invariant feature transformations for domain adaptation

Int. J. Comput. Vis.

(2014)

C. Hou et al.

Joint embedding learning and sparse regression: a framework for unsupervised feature selection

IEEE Trans. Cybern.

(2014)

I.H. Jhuo et al.

Robust visual domain adaptation with low-rank reconstruction

A. Krizhevsky et al.

ImageNet classification with deep convolutional neural networks

B. Kulis et al.

What you saw is not what you get: domain adaptation using asymmetric kernel transforms

Y. LeCun et al.

Gradient-based learning applied to document recognition

G. Liu et al.

Robust subspace segmentation by low-rank representation

G. Liu et al.

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

Cited by (0)

Lei Zhang received his Ph.D degree in Circuits and Systems from the College of Communication Engineering, Chongqing University, Chongqing, China, in 2013. He is currently a Professor/Distinguished Research Fellow with Chongqing University. He was selected as a Hong Kong Scholar in China in 2013, and worked as a Post-Doctoral Fellow with The Hong Kong Polytechnic University, Hong Kong, from 2013 to 2015. He has authored more than 60 scientific papers in top journals, including the IEEE Transactions, such as T-NNLS, T-IP, T-MM, T-SMCA, T-IM, IEEE Sensors Journal, Information Fusion, Sensors & Actuators B, Neurocomputing, and Analytica Chimica Acta, etc. His current research interests include machine learning, pattern recognition, computer vision and intelligent system. Dr. Zhang was a recipient of Outstanding Doctoral Dissertation Award of Chongqing, China, in 2015, Hong Kong Scholar Award in 2014, Academy Award for Youth Innovation of Chongqing University in 2013 and the New Academic Researcher Award for Doctoral Candidates from the Ministry of Education, China, in 2012.

Jian Yang received the PhD degree from Nanjing University of Science and Technology (NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza. From 2004 to 2006, he was a Postdoctoral Fellow at Biometrics Centre of Hong Kong Polytechnic University. From 2006 to 2007, he was a Postdoctoral Fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a Chang-Jiang professor in the School of Computer Science and Technology of NUST. He is the author of more than 100 scientific papers in pattern recognition and computer vision. His journal papers have been cited more than 4000 times in the ISI Web of Science, and 9000 times in the Web of Scholar Google. His research interests include pattern recognition, computer vision and machine learning. Currently, he is/was an associate editor of Pattern Recognition Letters, IEEE Trans. Neural Networks and Learning Systems, and Neurocomputing. He is a Fellow of IAPR.

David Zhang graduated in Computer Science from Peking University. He received his MSc in 1982 and his PhD in 1985 in Computer Science from the Harbin Institute of Technology (HIT), respectively. From 1986 to 1988 he was a Postdoctoral Fellow at Tsinghua University and then an Associate Professor at the Academia Sinica, Beijing. In 1994 he received his second PhD in Electrical and Computer Engineering from the University of Waterloo, Ontario, Canada. He is  a Chair Professor since 2005 at the Hong Kong Polytechnic University where he is the Founding Director of the Biometrics Research Centre (UGC/CRC) supported by the Hong Kong SAR Government in 1998. He also serves as Visiting Chair Professor in Tsinghua University, and Adjunct Professor in Peking University, Shanghai Jiao Tong University, HIT, and the University of Waterloo. He is the Founder and Editor-in-Chief, International Journal of Image and Graphics (IJIG); Book Editor, Springer International Series on Biometrics (KISB); Organizer, the International Conference on Biometrics Authentication (ICBA); Associate Editor of more than ten international journals including IEEE Transactions and so on; and the author of more than 10 books, over 300 international journal papers and 30 patents from USA/Japan/HK/China. Professor Zhang is a Croucher Senior Research Fellow, Distinguished Speaker of the IEEE Computer Society, and a Fellow of both IEEE and IAPR.

View full text

Domain class consistency based transfer learning for image classification across domains

Abstract

Introduction

Section snippets

Related work

Notations

Experiments

Parameter sensitivity analysis

Conclusion

Acknowledgments

Image Vision Comput.

Inf. Fusion

Large scale transductive SVMs

J. Mach. Learn. Res.

Frustratingly easy domain adaptation

ACL

DeCAF: a deep convolutional activation feature for generic visual recognition

Domain transfer multiple kernel learning

IEEE Trans. Pattern Anal. Mach. Intell.

Domain adaptation from multiple sources: a domain-dependent regularization approach

IEEE Trans. Neural Networks Learn. Syst.

Visual event recognition in videos by learning from web data.

IEEE Trans. Pattern Anal. Mach. Intell.

Sparse subspace clustering

Sparse subspace clustering: algorithm, theory, and applications

IEEE Trans. Pattern Anal. Mach. Intell

Unsupervised visual domain adaptation using subspace alignment

UCI machine learning repository

Unsupervised domain adaptation by backpropagation

Geodesic flow kernel for unsupervised domain adaptation

Domain adaptation for object recognition: an unsupervised approach

Asymmetric and category invariant feature transformations for domain adaptation

Int. J. Comput. Vis.

Joint embedding learning and sparse regression: a framework for unsupervised feature selection

IEEE Trans. Cybern.

Robust visual domain adaptation with low-rank reconstruction

ImageNet classification with deep convolutional neural networks

What you saw is not what you get: domain adaptation using asymmetric kernel transforms

Gradient-based learning applied to document recognition

Robust subspace segmentation by low-rank representation

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.